Autonomous driving algorithms detect and label objects in streams of complementary types of data, each collected with a different type of sensor: telematics, image and point cloud. Every data type or sensor has strengths and weaknesses. Combining the information from two or more different sensors improves the accuracy of object detecting and labelling. This process is called sensor fusion.
A telematics data stream presents one variable as a function of time. It is sometimes called a one dimensional, or 1D, data stream. Several of such variables are typically collected in a car: velocity, acceleration, GPS coordinates, number of rotations for motors, or wheels, etc. Telematics data has typically a small size, is processed with microcontrollers, and is processed efficiently in today’s advanced driver assistance systems (ADAS) in real-time by automotive-grade, low-power, embedded hardware.
An image is presented by colors for each pixel of a two dimensional grid, corresponding to the horizontal and the vertical. It is sometimes called a two dimensional, 2D, data format. Typically the intensity of three base colors are stored (red, green, blue, or RGB). Grey scale intensities, or infrared values (for night vision) can also be stored. A stream of images is also called a video, and is recorded with a video camera, at typically 30 or 60 frames per second.
A point cloud presents a set of points, each described by its three spatial coordinate (for example x, y, z). For this reason, it is sometimes called a 3D data format. But several other variables (called attributes) can also be added to each point, for example the intensity. Point clouds are obtained with sensors that emit narrow laser beams of infra-red light of fixed intensities in various directions. When one beam hits an object, a fraction of the beam is reflected back and reaches the sensor. The time of flight between the emission and the detection determines the distance to the point. The direction of the beam corresponds to the angular coordinates theta and phi. The natural system of coordinate for such a sensor is therefore spherical coordinates. For user convenience, the coordinates are converted to the Cartesian system of x, y, z directly near the sensor. The percent of the light that returns to the sensor is called intensity. It has values between 0.0 and 1.0. A typical point cloud stores for each point the values x, y, z and intensity. A stream of point clouds produces a movie of point clouds, at typically 10 or 20 frames per second. The sensors producing point clouds are called LiDARs or radars.
Complementarity of images and point clouds
To transition successfully from ADAS to autonomous driving (AD), the other two sensors need to be used: images and point clouds. These type of data are much larger than telematics, requiring large computing resources. Their larger size requires larger RAM memory to lead the entire image into memory, larger CPUs due to the large number of operations needed for each pixel or each point. So much so, that for images the state of the art processing involves convolutional neural networks that run on GPUs, consuming large power, that would be taken from the car’s battery, thus reducing its mileage. Point clouds are also large, but need fewer computing resources than images.
Images and point clouds and their sensors are complementary in their properties, and therefore the accuracies of detecting and labelling objects. Images have more pixels in the transverse plane (horizontal and vertical), leading to better resolutions. Adding colors makes it easier to detect and label various objects. Camera sensors are relatively cheap. LiDAR sensors are relatively expensive. For this reason they have a relatively smaller number of pixels on the horizontal and vertical. The detected objects are more “blurry”. An image can recognize there is a human and by their face, and can identify that person. A point cloud recognizes there is a human, but is not able with current technology, to identify the person. LiDAR is therefore compliant with privacy protection regulation (such as GDPR in the EU) in context of counting people in an anonymous way, such as at airports.
Another difference is that in an image the relative spatial position of objects is hard to estimate, whereas in a point cloud it stands out clearly.
Also, cameras do not shine their own light, whereas LiDARs do. This makes the illumination of the images dependent on the ambient light. Images are clear in good light, either natural or artificial. Objects are not well detected in the shadow and not detected at all at night or darkness. Point clouds, on the other hand, detect well at night and in other bad lighting conditions.
Illustration in four KITTI scenes
This complementarity is illustrated below in four different scenes extracted from the open source KITTI dataset. Several cameras are mounted to the car. Images are extracted from the front right camera. One Velodyne LiDAR with 64 vertical lasers or lines is mounted on the top of the car. It rotates 360 degrees, creating an all-around point cloud. A subset of the point cloud is illustrated to point as close as possible to the same angular field seen in the image. The color in the point cloud image represents the intensity, after it has been scaled so that the maximum intensity in the point cloud has now the value of 1.0 and thus the color red. The closer to blue, the lower the intensity. For an object of the same material, the further away the object is, the smaller the intensity, so closer to blue. For an object at the same distance, the more reflecting the object is (the closest to white), the larger its intensity (the closest to the red color in the image). Black objects reflect the least and would appear close to blue in the point cloud. An interesting note is that the road markings in white paint reflect differently than the road, allowing lane demarcation also with point cloud, not only in images. Lets go through four examples.
Due to the shadow, the pedestrian on the right is not seen very clearly in the image, but identified without a doubt in the point cloud. Did you notice in the image the traffic light in the middle sits upon an elevated structure? It is also hardly noticeable due to the shadow. But it is clearly seen in the point cloud, thanks to the two circular lines. Also did you notice in the image there is a second circular traffic sign, on the other side of the pedestrian space to cross? It is seen with difficulty in the image, but it is seen clearly and geometrically in the point cloud. The separation between the pavement on the right and the street, as well as the two longitudinal street markings, are clearly seen in the point cloud. On the car on the left, the white license plate and the head lights reflects more than the black body of the car, this appearing in red and blue, respectively. The second car from the left is also well visible and appears in red, as its real color is white, and reflects more than the first car on the left, which was black. The right most car is also white, but being further away, reflects less light as the second one, and thus appears in green and blue, and not red any more. An interesting thing happens with the third car from the left. In both the image and the point cloud, its middle part is covered by the traffic light. In the point cloud in three dimensions these middle points are missing completely, as they are in the “shadow” of two traffic signs. The laser beam could not hit the car at all. There is a traffic sign on the right, on the same pole as the traffic light. While the pole is seen in the point cloud, the traffic signs on the side is not visible at all. Probably due to their height, reflected light was going far away from the car and could not reach the LiDAR sensor to be detected back. Finally, in the forward left corner there are four traffic signs on poles. Their relative spatial position is hardly distinguishable in the image, but clearly marked in the point cloud.
The residential area
Stationary cars and a biker are clearly visible on a quiet street amongst residential buildings. The biker is visible thanks to the small patch of sun, but it would be less clear in the nearby shadow. Whereas they would always be clear with the LiDAR.
The open road
The point cloud sees clearly a car, a tram, a pavement, the road markings, the tram track, a pedestrian on the pavement, a wall on the side of the pavement. Note the pedestrian is not clearly seen in the image, due to the low contrast between colors of their cloths and the fence.
The city center
The point cloud sees the van on the left, the tram tracks, the biker crossing the train tracks, the vases of flowers, the persons sitting on a bench, the persons waking and those talking in front of the shop, the poles with traffic signs.
We note that in general it is easier to detect objects in point clouds, than it is to identify correctly their type. An image can identify easily what object is in it, but it is more complicated to build a bounding box around the object.
Teraki has built such a sensor fusion, running live on automotive grade hardware (NXP Bluebox). Furthermore, both point clouds and images are compressed also in real time. For images, the regions of interest marked by the bounding boxes arriving from the point cloud are kept at very good resolutions, while the non-interesting regions are compressed more heavily. Overall, the size of the image is reduced. Teraki demonstrates real-time data reduction and processing of images and point clouds at the same time (sensor fusion).