Over the past three decades, digitalization has phenomenally innovated and modernized photography and filming. Digital cameras that capture reality with digitally receptive sensors, entered the mass market around the late 1990s. These devices completely overtook the market of analog cameras and sold up to 120 million units per year world-wide by 2010. The following decade however, sales of digital camera’s would fall to 20 million as small, high quality camera sensors became an integral part of smartphones.
The first camera phones were released in the year 2000 and they had only enough memory to store around 20 photos. By that time, each image had a resolution between 0.11 to 0.35 megapixels. In the earliest models of phone cameras, users had to physically connect their phone to a computer in order to view snapshots.
Over the years mobile photography tech would progress dominantly to include features of flash, self-timers, zoom functionality and iterations of filters (e.g. black and white, sepia, etc.). By 2010, mobile technology has advanced significantly, making smartphones capable of wireless image transfer, video capabilities, touchscreen, panoramic photos, built-in image editing, filtering, retouching, etc. The advantage that mobile phones have is a stable operating system with versatility for software along with the stronger computing power with each new generation.
Mega software for Mega picture
The mobile cameras continuously improve due to high-resolution sensor hardware, optical zoom capabilities, and computational photography. Every year new benchmarks are set. For example, in 2018 40MP camera was a certified outlier, but in 2019 48MP and 64MP sensors are quite normal.
In recent years, we can take the data from one camera and enhance it with the data from another camera. Advancement in algorithms and computing power of phones makes the setup of dual and triple camera possible. This delivers the consumer the ability to shoot wide-angle or telephoto, low-light functionality, super-fast autofocus, and optical image stabilization for steady capture.
For quite a while, there wasn’t much of interest on digitalization of images, partly from a lack of processing power for computation. The real computational photography features were arguably object identification and tracking for the purposes of autofocus were CPU intensive. Face and eye tracking made it easier to capture people in complex lighting or poses, and object tracking made sports and action photography easier as the system adjusted its AutoFocus point to a target moving objects. These were early examples of deriving metadata from the image and using it proactively, to improve that image or feeding forward to the next.
“AI” photography becomes smarter
AI cameras, or computational photography, is a staple of the modern smartphone market. Nearly every smartphone uses some form of AI to help the pictures look better. Computational photography techniques have quickly grown into essential camera features.
The Google Pixel’s astrophotography makes for a good example when talking about computational photography. It showcases an example of AI processing of computational photography called semantic segmentation. Google uses it to pick out the sky in a night scene and then apply multiple exposure processing to bring out the stars without overexposing the rest of the image. However, semantic segmentation can also be used to differentiate between parts of the scene like skin, clothes and more, and apply different processing to each. The potential improvements range from color enhancements, sharpening, exposure, and filters to make a picture look at its best.
Enhanced AI processing is not only a consumer-facing feature though. Machine learning based de-noise algorithms are increasingly used to clean up images in low light without sacrificing too much detail. Likewise, super-resolution zooms based on neural network detail reclamation are enabling digital zooms at 2x and beyond to appear virtually indistinguishable from their optical equivalents.
Likewise, the introduction of background blur, the phone understands what parts of the image constitute a particular physical object and the exact contours of that object. This can be derived from motion in the stream, from stereo separation in multiple cameras and from machine learning models that have been trained to identify shapes that can extend to the schematic segmentation. Companies developed highly efficient algorithms to perform these calculations, trained on enormous data sets and immense amounts of computation time to achieve this.
Early examples of computational photography, such as multi-frame HDR, night modes and software background-blur for portraits are now considered standard in modern smartphones. These features were cutting edge just a couple of years ago and we’ll soon be talking about ideas like semantic segmentation, object detection and machine learning-based imagery in the same manner.
Perfect Photo, Perfect Videos. Wait, there is more!
About 5 years ago, tech companies like Facebook focused on virtual reality, buying pioneer Oculus for $2 billion. The industry has not lived up to expectations yet. It turns out even the slightest bit of lag between head movement and eye tracking is enough to make people nauseous. Also, wearing those goggles is a bit uncomfortable. The solution away from these problems was Augmented Reality (AR). Augmented Reality is an interactive experience of a real-world environment where the objects that reside in the real world are enhanced by computer-generated information. When Niantic Inc. launched its smash hit, Pokemon Go displayed the significance of Augmented Reality vs. Virtual Reality.
Smart glasses, such as Google Glass, did not yet deliver on the expectations. However, the concept is still not forgotten. Customized and optimal combinations of hardware and AI algorithms hold the promise to deliver augmented reality. To effectively execute environment perception on lower powered hardware is challenging. Nevertheless, this will be a key factor to implement augmented reality.
New robotic vacuum cleaners generate a map of rooms they are cleaning. Consumers can tell the vacuum cleaner on what areas to focus and which to avoid. Maps in smartphones are getting larger. It is a pre-requisite to propel Augmented Reality forward. Utilize multiple cameras in a smart phone to have a better depth perception.
The smartphone attempts to understand what it is seeing. This creates the need for advanced AI techniques that aid it to do so. The overall data produced in the process is humongous. Teraki’s technology assists this process with data enrichment to improve AI model accuracy and by real-time data reduction and identification of objects. This makes for efficient fusion techniques between sensors for merging inputs from multiple sensors to improve the dimensions of perception.
These advancements in the augmented reality and 3D mapping in smart phones has high relevance towards Autonomous driving functionalities including the necessary HD mapping that will find a place in the car of the future. In current sensor technology, we have a camera to capture 2D and LIDAR to capture 3D inputs. The 3D and 2D inputs must be bridged first by a computer for a car to perceive its environment. Advancements in visual sensors and visual AI techniques can also enable depth estimates via images. One such approach that relates to autonomous driving is highlighted in the paper on “Pseudo-LiDAR from Visual Depth Estimation”. This would enable Autonomous Cars - like smartphones - to do object detection via visual sensors only.
The Future of Light
The future of photography will be more computational than optical of nature. The application of photography differs between computer-consumption and the consumption by the human eye. This is a shift in paradigm and one that companies associated with cameras are grappling with. There will be repercussions for traditional cameras like Single Lens Reflex camera’s (SLR’s) giving way to mirrorless systems in phones and in embedded devices - and everywhere where light is captured and turned into images.
This means that the cameras modules will have saturated megapixel counts, ISO ranges, f-numbers, etc. till human eye can’t perceive the improvements anymore. That’s okay. We can reasonably expect that glass isn’t getting any clearer and that our human vision isn’t getting more accurate. What those computers can do with that light, however, is changing at an incredible rate.
We have experimented with physical parts of the camera for the last century and brought them to varying levels of perfection. Now we move to the “virtual part” with AI and machine learning-based algorithms and these will play the biggest roles in innovation for the coming years.