Hi! I am a student of computer science and I am learning python for machine learning/data science. One of the most interesting topics I found during my studies is Computer Vision. I was always curious to know if machines can see like humans. Hence, I found this topic very interesting and started my research on computer vision. I am here to give you a progress report on the latest advances in computer vision.
The camera can take pictures by converting lights into a 2-dimensional array of numbers known as pixel. But these pixels are just lifeless numbers they do not carry a meaning within themselves. Vision begins with the eyes but truly takes place in the brain. So ultimately we want to teach a machine to see as we see ourselves - naming objects, identifying people, understanding relations, emotions, actions, and intentions. Just like this picture:
The first step towards this goal is to teach a computer to see objects the building blocks of the visual world. In the simplest terms, imagine this teaching process as showing a computer some training images of a particular object, eg: a cat.
How hard it can be? Not much, a cat is just a collection of shapes and a colors like this:
We tell the computer algorithm in a mathematical language that a cat has a round face, chubby body, two pointing ears, and a long tail.That looks all fine, but what about this silly cat:
Now you have to add another shape and viewpoint to the object model. But what if cats are hidden, now you get my point even a cat can present an infinite number of variations to the object model, and that's just one object.
After some research, I found that no one tells a child how to see, especially in the early years. Children learn this from real-world experiences. If you analyse visual memory of a child, they take one picture every 200 milliseconds! So by the age of three, a child has seen and stored 100 million pictures of the real world in their mind. To enable a computer with similar visual library, we need to collect data-sets that have more images than we have ever had before. For this, the biggest treasure is the internet. We have to download billions of images and used crowdsourcing technology like Amazon Mechanical Turk platform to help us to label these images. After the information or images is provided, it is time to deploy algorithm. The algorithm that is most used in computer vision is called Convolutional Neural Network. We will discuss this algorithm in some other blog and discover how it became a winning architecture to generate exciting new results in object recognition.
After this algorithm is used computer is able to recognize the picture
Using CNN, we just taught the computer to see objects. Next step is to teach the computer to see a picture and generate sentences. The marriage between big data and machine learning algorithm needs to generate real world results that are usable by a lay-person. Hence, computer has to learn from both pictures as well as natural language sentences. Just like the brain integrates vision and language. This is a multi-step journey. In my next blog I will tell you about Neural Network and about Convolutional Neural Network.
Bye for now; see you in my next blog. Happy Learning!