Train Speech Enhancement Models with Noise Speech Training Data

3 min readApr 16


As more and more voice interactive devices are put into our daily life, the issue of voice enhancement has gradually attracted the attention of scholars from all over the world. They proposed a large number of speech enhancement algorithms, including signal processing-based methods, modeled spectral estimation methods, and supervised learning methods.

Speech enhancement refers to the technology of extracting useful speech signals from background noise to suppress and reduce noise interference when speech signals are interfered and submerged by various noises. However, due to the random nature of interference, it is almost impossible to extract completely pure speech from noisy speech.

Speech enhancement aims to improve the quality and intelligibility of speech by utilizing signal processing algorithms. It mainly includes 1. Speech de-reverberation, reverberation is caused by the reflection of the sound signal by the space environment; 2. Speech noise reduction, the interference mainly comes from various environmental and human noises; 3. Speech separation, the noise mainly comes from Voice signals of other speakers. Improve the quality of speech by removing these noises or human voices. Speech enhancement technology has been used in real life, such as telephones, speech recognition, hearing aids, VoIP, and teleconferencing systems.

As the world’s leading data service provider, Datatang developed noise data, covering multiple application scenes, such as smart home, in-vehicle, and public place, to facilitate the research and development of speech enhancement technology.

1,297 Hours — Scene Noise Data by Voice Recorder

Scene noise data, with a duration of 1,297 hours. The data covers multiple scenarios, including subways, supermarkets, restaurants, roads, etc.; audio is recorded using professional recorders, high sampling rate, dual-channel format collection; time and type of non-noise are annotated. this data set can be used for noise modeling.

531 Hours — In-Car Noise Data by Microphone and Mobile Phone

531 hours of noise data in in-car scene. It contains various vehicle models, road types, vehicle speed and car window close/open condition. Six recording points are placed to record the noise situation at different positions in the vehicle and accurately match the vehicle noise modeling requirements.

20 Hours Microphone Collecting Radio Frequency Noise Data

The data is collected in 66 rooms, 2–4 point locations in each room. According to the relative position of the sound source and the point, 2–5 sets of data are collected for each point. The valid time is 20 hours. The data is recorded in a wide range and can be used for smart home scene product development.

10 Hours — Far-filed Noise Speech Data in Home Environment by Mic-Array

The data consists of multiple sets of products, each with a different type of microphone arrays. Noise data is collected from real home scenes of the indoor residence of ordinary residents. The data set can be used for tasks such as voice enhancement and automatic speech recognition in a home scene.

About Datatang

Founded in 2011, Datatang is a professional artificial intelligence data service provider and committed to providing high-quality training data and data services for global AI companies. Relying on own data resources, technical advantages and intensive data processing experiences, Datatang provides data services to 1000+ companies and institutions worldwide. Datatang entered Chinese stock market (NEEQ: 831428) in 2014 and became the first listed company in China’s artificial intelligence data service industry.

If you need data services, please feel free to contact us:




Off-the-shelf AI training data, on-demand data collection & annotation services