Best-in-class AI-chip meets best-in-class edge-AI

Introduction

The emergence of edge computing and intelligent, vision-based driver assistance system is key for the development of Autonomous Vehicles and Robots. The strong computation capability in the cloud in combination with energy-efficient devices at the edge, opens a myriad of opportunities for real-time, cost efficient applications running at the edge.

The challenge

The processing and transmission of a large amounts of raw video data at the edge causes delays and demands power-hungry edge processing. In addition, just ingesting raw data slows down the training of AI-models and makes it unnecessarily expensive.

Operating at the sensor level (edge) in the automotive or robotics industry means working with safety related use cases and processors. This means that these need to operate deterministically and have low resources such as CPU, RAM and power consumption at the edge. The challenge here is twofold: 1) how to come with cost-effective and power-efficient hardware; and 2) how to deploy algorithms that can run on such low resources and produce accurate results in real-time?

The opportunity

The emergence of edge computing and intelligent vision-based driver assistance system is of great significance for the development of Autonomous Vehicles, Driver Monitoring, Remote Operations, etc. The available, growing computation capability in the cloud in combination with new energy-efficient edge devices and software, opens a myriad of new opportunities for real-time, cost efficient applications running at the edge.

Introducing Teraki ROI: less than 0.5MB of RAM/ROM for 5X less bandwidth

In the previous blog we made an introduction to Region of Interest (ROI) models: an object detection algorithm based on a ROI pre-selection. The ROI approach significantly improves the latency performance but without compromise on detection accuracy. Pre-defined objects of interest are captured in every frame of the video. The remaining part of the frame, called Region of Non-Interest (RONI), is reduced to achieve a small data-size but still with the relevant information accurately present. The feature-based classification algorithm can detect cars, pedestrians, traffic lights, stop signs, busses, trucks, bicycles, and trains in the car’s hardware itself. This object detection has many use-cases and is implemented e.g. in Remote Operation and AI-decision making.

Teraki’s embedded ROI SDK achieves such efficiency and requires very low computational resources for processing intensive tasks such as video processing.

Introducing Kneron KL520: 0.56 TOPS/W for a cost-effective solution

The KL520 Neural Processor Unit (NPU) provides efficient AI computing performance of 0.56 TOPS per Watt at an average power consumption of 0.5W. This is ideal for all kinds of edge AI deployment such as remote, mobile and unmanned applications that need to run at a low Bill of Material (BoM). The strong price-performance of the Kneron enables new products and business cases that previously were limited by the hardware costs and/or processing performance. For systems that already use an application processor, the Kneron KL520 allows AI processing to be securely offloaded thus freeing up the application process to perform other tasks.

Teraki and Kneron: a powerful combination

To demonstrate the combined prowess of leading edge-hardware with leading edge-software, we implemented Teraki ROI-SDK on the power efficient Kneron KL520 chipset. We loaded the KL520 chipset with Teraki library to execute real-time ROI-based pre-processing.

Teraki performance on KL520 NPU

Teraki performance on KL520 NPU

The Teraki edge-AI model detecting cars and pedestrians in real-time was evaluated. The model detects and bounds cars and pedestrians as ROI to which highest resolution is maintained. The detected region of interest is displayed in white in the image. The remaining portions are segmented as RONI and processed and reduced to a lower resolution (blurred).

Specs: Teraki was able to compute the region of interest for this test run with a memory requirement of 227KB. Adhering to a 25 FPS the memory required was a RAM size of 345KB and ROM size of 92KB. Overall, for video pre-processing in such a chipset Teraki requires less than 0.5 MB of RAM/ROM to detect objects and pre-process the data in real time at 25 FPS.

Deliverables: Teraki embedded software delivers a factor 4x – 5x bandwidth savings through size reduction while preserving the important information containing the selected objects in the image. ​Typically, Teraki’s edge AI improves our customers’ AI-model accuracy with 10% - 30%. ​

Combining best-in-class edge soft- and hardware

Teraki provides proven software to enable various real-time applications driven by AI models on low powered NPU’s such as Kneron KL520. Its certified models can be deployed across various edge devices for almost any operating systems while also possible to integrate at SoC level. With memory capacity as little as less than 0.5 MB Teraki delivers 25 FPS video pre-processing with object detection for ROI and 5x reduction of bandwidth. In combination of the highly cost-effective KL520 (for a quotation, please reach out directly to Kneron that comes with an impressive 0.56 TOPS/W performance, this opens many new use cases that previously – either financially or technically – were not possible.

To find out how we can support your use-cases and to get a quick overview of Teraki services check out the Teraki Platform. We offer an evaluation license for free.

Share this Post: