In-Cabin Voice Interaction in Autonomous Driving

2 min readFeb 25, 2023

In autonomous driving scenario, in-cabin voice interaction technology can help drivers and passengers communicate with smart devices through voice without barriers, making a safer driving experience.

In the smart cockpit, the sound field environment is relatively complex, which may contain noises such as background human voices, mobile phone sounds, wind noise, tire noise, and air conditioners; but on the other hand, the interior space is determined during design, which is more conducive to acoustic positioning and voice separation , which is the advantage of intelligent cockpit voice interaction over other scenarios with uncertain spaces. In the field of interaction, the most commonly used scenarios in the smart cockpit are device control, navigation and media entertainment.

The voice interaction technology of the smart cockpit can be divided into two parts: the voice front-end and the voice back-end. The front-end includes VAD (Voice Activity Detection), echo cancellation, noise suppression, sound source localization, gain control, etc.; the back-end includes voice recognition, semantic Comprehension, dialog management, speech synthesis, and more.

The first is the front-end signal processing technology, because the environment in the car is usually more complicated, there will be various noises such as music and chatting. First of all, preprocessing is required to eliminate the DC part, and remove the sound interference played by the device through echo cancellation.

For mixed vocals, separation is required. In the car, since each seat position is fixed, beamforming can be used to directional enhance the sound source at each position.

The separated sound may contain other noises, which need to be removed by the noise suppression algorithm. Finally, the gain adjustment algorithm is used to adjust the sound energy to obtain a volume suitable for speech recognition.

Datatang In-Cabin Speech Data Solution

ASR Data

Datatang has off-the-shelf 200,000 hours of finished voice data sets. All the data has clear copyright and with proper data collection authorization. The data quality has tested by the world’s leading AI companies. Our data solution could help customers save 60% of data collection costs and 100% of time.

TTS Data

Based on massive TTS project implementation experience and advanced TTS technology, Datatang provides high-quality, multi-scenario, multi-category TTS data solutions.

About

Founded in 2011, Nexdata is a professional artificial intelligence data service provider and committed to providing high-quality training data and data services for global AI companies. Relying on own data resources, technical advantages and intensive data processing experiences, Nexdata provides data services to 1000+ companies and institutions worldwide.

If you need data services, please feel free to contact us: info@nexdata.ai

In-Cabin Voice Interaction in Autonomous Driving

About

Written by Nexdata

No responses yet