The need for Region and Time of Interest on edge video applications
The major challenge for developing and using automotive AI applications are latency and data volumes needed to be processed at the edge, and, moreover, the access to datasets to validate the performance of automotive grade sensor systems. This includes fusing data from various sensors, such as camera, radar, LIDAR and IMU data. Capturing rare events - i.e. events characterized by seldom objects; and situations or incidents that do not occur in most standard training datasets - is particularly challenging. This requires a new approach: Region-of-Interest (ROI) and time-of-Interest (TOI) models. These are models that can operate at the edge and pre-filter relevant objects (ROI) or events (TOI). Such pre-selection enables sensors to capture these rare objects and situations already at the SoC domain. This enables the sensor to operate as an edge detector itself. The sensor decides to trigger - or not to trigger - the sensor fusion process. If it is triggered by an object or event, it will run at highest resolution to increase the detection accuracy, furthermore enabling the capture of similar events at highest possible resolution, whenever a similar situation is encountered.
This blog post explains how the new ROI / TOI software stack by Teraki meets these new edge-AI requirements.
Target use cases are, for example, running applications on remotely controlled vehicles, (partly) autonomous cars (L2+), autonomous robots, drones and transmitting recorded data to a server for inspection and refining the application offline. Latency is crucial here, to enable the application (e.g. object detection or semantic segmentation) to run in real-time in a restricted hardware environment, i.e. low-power, low-memory and limited computational resources. Teraki software helps to reduce the latency, i.e. the time it takes for the application to run on the - typically constrained - production hardware.
Despite the maturity of many models, they still require substantial computational resources. Furthermore, new AI applications need to go through training and testing phases. For these phases, data must be recorded in relevant environments to capture situations of interest required to refine the application.
This data can easily grow from gigabytes to terabytes, which often have unnecessarily high energy and storage requirements for transmission and processing.
The Teraki software stack helps to solve these challenges by pre-filtering and optimizing the amount of data to be transmitted and processed later by using regions-of-interest (objects within an image), times-of-interest (events within a sequences of images) and other techniques that are described below.
Once the data has been transmitted, one typically wants to analyze the data and refine & develop an application with it, for instance, by focusing on situations in which the application did not work as intended and therefore retrain the underlying model.
Teraki not only allows for focusing on this relevant TOI, but also, for adapting the ROI and TOI models for continuous refinement of the application, and without the need to go through the data-filtering and data-labeling loop all over again for each model version. Moreover, Teraki allows in each iteration loop for preserving important features to allow for up to 20% better machine learning (ML) model accuracy compared to standard compression techniques like JPEG or H.264.
Furthermore, with Teraki’s ability to fuse sensor data from other sensors (LIDAR, radar, telematics), applications will reach higher accuracies while maintaining low latency and low data transmission volumes. In all these cases the accuracy of a ML model, compared to standard techniques, is up to 20% better before being ingested into the target sensor fusion model.
Teraki’s software can run on a multitude of hardware platforms, including (but not limited to):
NVidia GPUs: Jetson Nano, Jetson Xavier, and others
SoC e.g. with ARM, NXP, Infineon and Qualcomm processors
AI accelerator chips such as Kneron
Example use case
One use-case scenario is the following: A camera-equipped vehicle records footage to train an AI application. Relevant for the application is a region-of-interest (ROI) defined by the presence of certain objects, for instance, an image area that contains cyclists or pedestrians. The recorded data can be uploaded to the Teraki Platform, where these ROIs are automatically extracted, and can be pre-labeled for an initial model training. Subsequently, the model gets independently deployed locally in the vehicle, so data costs and latency are significantly lowered. With subsequent iterations, one can refine the ROI to become more specific, for example, on cyclists that ride in the same direction in order to develop a path planning and collision avoidance model.
Naturally, the initial training does not yet deliver the desired performance. The model needs to see more data, and especially more diverse data: data that covers different rare cases in various environments with diverse conditions (such as light, weather, etc.). This capturing of more data leads to storage, transmission and processing bottlenecks.
As a first result, Teraki’s video pre-processing methods deliver higher data reduction compared to standard MJPEG or H.264 video compression. This reduces the file sizes, which leads to faster and less expensive transmission, but keeps the AI-model accuracy high. In other words, Teraki’s pre-processing is optimized for deep learning applications. Unlike visual perception models catered for humans - such as e.g. VMAF1 - with Teraki, the data volume can be reduced without impeding AI-model performance, instead enhancing it.
Teraki offers a software stack to train and refine both the model for pre-labeling based on ROI/TOI models and to optimize the pre-processing for input data using data from the vehicle. We do this at latencies that enable an operation at the SoC level of neuromorphic cameras, such as e.g. Prophesee, and equivalenty for Radar and Lidar systems (described in further blogs).
The optimized pre-processing is, however, optional, and it is important to note that the compression is not optimized for human perception, but for high AI-model performance (e.g. an advanced ML application in the cloud). Hence, images compressed with this technique may seem a little odd to the human eye, but not to a machine learning algorithm like a neural network. For training an advanced neural network model, optimizing the image representation for AI (instead for humans) without loss of model accuracy is the required approach delivered by the Teraki software stack.
The next natural step in the model training pipeline is to filter the recorded data in the vehicle through the deployment of a TOI model. A TOI model is defined by a certain event, e.g. the vehicle performs a lane change/departure, is approaching a traffic light, or a robot encounters a human. Using a TOI model allows for extracting only scenes that are of special relevance, be it for the remote operation of a vehicle or for refining the model further, leading to better models and to significant data savings. For example, to optimize the fuel consumption of a L2+ car when approaching crossings and traffic lights (e.g. learning when to let the car roll), a TOI pre-filtering is crucial to keep the data amounts feasible. Moreover, TOI detection reduces the processing requirements for sensor fusion for a multitude of camera streams read-out at 30-60fps.
With the ROI model also deployed in the vehicle, Teraki’s video encoder uses the ROI information to reduce the sizes for videos even further by strongly compressing regions that are not of interest (RONI = regions-of-no-interest) and maintaining high resolution for the ROI.
ROI models are already available in Teraki Platform. First TOI models (I.e. Lane Changes) are available in the Teraki Platform (Q3 2020). The feature of training a new TOI model will be available to Teraki subscription customers in end 2020. Please click here to request access to the Platform.