Video calling used to be one mode of communication. Then pandemic happened and turned it into our default mode of interaction. People who had never made a video call were now making half a dozen a day.
With headlines like “work from home is here to stay” floating around the internet every day, video-conferencing technology quickly started being perceived as one of the best technologies for investment.
An army of engineers around the world are now working aggressively at improving every imaginable aspect of video calling. If we look at the inventions related to video calling filed to the United States patent office, we see that the number, which was sluggishly increasing up until 2018, suddenly rose by 25% in 2019.
The 2020 data is still pouring in at the time of writing of this article, but it’s clear that the jump is even higher. On an average, a new invention on video calls was made every single day in 2020.
Some of these inventions are quite interesting. In this post, we will look at few:
Many of the enhancements introduce more sophisticated hardware. For example, Bose is exploring a video conferencing system to improve the voice pickup for microphones that are built into the speakers. It uses the video calling camera to track a user’s head in 3D space and then passes on this information to the microphone.
The microphone is a special one in this case. That’s because instead of using a typical single sensor, it employs an array of sensors. This array has the unique ability to tune itself to listen to sounds coming from any particular direction, using a technique called beamforming.
This means that even if you are moving around while taking the call, you would still sound perfectly clear because the microphone can now actively “follow you around”. This technique makes me think of how some animals move their ears to capture sounds better.
Another set of inventions revolve around doing some additional chore during an ongoing call. Google, for instance, seems to working on bringing its Google Assistant into the video calls. One of their recent patents seems to suggest that assistant might act pretty much like another participant in the call. You could get it to do a number of things as the call is going.
You would be able to ask it, for instance, to bring up a list of restaurants near a particular place. The assistant will show a list, which will be visible to all participants. To appreciate how this is different from screen sharing — consider the possibility that you and your friend might want to be able to scroll this list independently. The image below shows how it might look:
On similar lines, Microsoft seems to be developing an AI agent for creating automatic notes from a call. Now, this is different from and more sophisticated then simple transcription bots, which give you a text dump of everything that was said in the call.
In short, it seems that the agent would record the call like a movie script. It would capture everything relevant, including copies of slides and whiteboard drawings, a list of scheduled and actual attendees, as well as more subtle details like people laughing, clapping, voting, etc.
Another related patent, filed by Konika Minolta, which does not make video conferencing equipment (at least yet) but makes security cameras, also deserves mention. This one goes one step further and also takes note of the emotional reactions of people during a video call. When I first came across this patent, I found it funny as well as scary, but was relieved to see that the company has already abandoned it.
Next, let us see Nvidia’s work to supplement video calls with AI. Nvidia already has a tool that does things like adjusting lighting, cleaning audio, adding a backgrounds, etc. In one of their recent patents, though, they have described their much anticipated video transmission technology which they demoed last year.
This is a completely new way of transmitting a video call. It transmits your video not in terms of what it sees, but in terms of what you do. Sounds confusing, I know. Let me explain with this simple example:
In your mind, pick one of your friends. Think about their face as if they are right in front of you. Now imagine them smiling. Done? Now imagine them surprised. Did it? Good. Now imagine them holding their palm on their forehead. You can do it, right?
When you are imagining your friend doing these things, you are actually “composing” an image in your mind. You know (a) how your friend’s face looks like and (b) how faces change when they smile. So you can put these two pieces of knowledge together to mentally come up with the smiling face of your friend.
Nvidia’s technology does the exact same thing. Instead of transmitting a continuous stream of pixels, its AI will “understand” what you are doing and just transmit that information. At the receiving end, the AI will then apply those action to a static image.
What’s the upside of doing this? Well, for one, it requires much less data to be exchanged, which can let you do video calls even with poor reception. Also, say you are taking a work related call and you have your comfy clothes on. In that case, the AI could just as well animate a picture of you wearing a suit on the other end. Isn’t that handy?
For fun, you would also be able to replace yourself with an avatar of your choosing. (I call dibs on Rick from Rick n Morty.) The avatar will copy all your actions in real time. Wouldn’t that be cool? I think it certainly would be.
The list of these inventions just goes on and on, but I think it’s enough material for this post. I’ll definitely cover few more in a future post. Before you go, though, can you do me a favor?
Subscribe to our newsletter (below). This way I can sleep better, knowing that my weekly post can never be skipped by my best reader!