It might feel like a deja vu out of a google-glass dream, but there’s now a camera you can install in your house that will capture the most important moments of your family life. Whether or not you agree with the idea, this is certainly a piece of tech that deserves attention. And here is why:
A typical trying-to-catch the moment in my books as my son plays with his new Chinese friend.
With Clips, every time something important happens, the camera switches on automatically. It records for however long time the scene takes place, and then, switches off just at the right time to capture the entire moment. So how would it know how to do that? Well, firstly, let’s break it down to the basic tech elements and see how the software works.
According to product lead, Juston Payne, ‘Clips’s AI is built on convolutional neural network (CNN) and semantic analysis. These tech terms describe a system that works similarly to the human brain and the way we learn to make sense of things. It is also what helps us making a decision about what to film and what not to film in our daily lives.
The biggest sensory part of the brain, right under your forehead is called cerebral cortex in posh language. This is the part that makes sure that you feel, smell and see things as they happen. It’s also responsible to notify the rest of the brain when and how to respond to a situation.
It is common sense to organise this complicated system in a simple way. Instead of notifying everyone at all times, you target specific bits of the whole chain. So it triggers the neurons in a specific area – i.e. when someone touches your hand, only neurons in your hand are affected, and all action the brain takes from then on will be affecting the hand only.
Convolutional neural network system allows the camera to “filter out” important info from things that can be ignored. Basically, only when an object hits particular part of the camera does it start working analysing and acting from that point on. When Clips sees movement or light, it kicks off in the same way as your brain does when we see movement and lights – basically saying “pay attention, something important might be happening” Machines have been able to do that for a very long time but it’s only recently that it has become possible to do it in a small device like Clips.
So, after it has understood that there is something worth seeing, we need context to establish whether or not it’s important. For this, clips’ AI uses semantic analysis – a much less technical term that tells the camera to analyse the situation based on our previous knowledge.
Of course, to do that, we need to make sure that there is some basic understanding of what the world around us holds. The same way as humans do, machines get this by picking up as much information as possible, and store it in our brains. We know that a table is a table and a vase is a vase. When the cat jumps on top of that table to knock down the vase, we know that this is a situation that is worth remembering (or not). At the very least, we know that it’s (hopefully) unusual.
The unique feature of Clips is that it knows when to record. Much like a human, machines need a lot of knowledge to do this kind of analysis and this knowledge takes up a lot of processing power. This is based on previous experience and assumption, which is something we learn through feedback about different life situations.
Passing on human perception to machines is, in technical terms, called “training the model”. This, literally, is what it says on the tin. Normally, you give the computer a specific type of data (it could be a comedy script for example) and train it to be funny. In this case, Google hired tons of video editors and “picture monitors” (I would love to see a job description for that). These teams spend their days giving feedback about what looks good and make it as interactive as when one of us pick up a camera to capture the best moments of our cats and babies.