ETRI WEBZINE

× Special ICT Trend Focus On ICT News

PAST ISSUE

Special

Vol.72

ETRI, releasing multilingual speech recognition with 24 languages

- The largest number of languages in Korea and the competitive technology against the global foreign companies
- Expanding to 30 languages within this year, mainly focused on the languages with rare resources

The Electronics and Telecommunications Research Institute(ETRI) has developed a multilingual speech recognition technology that understand 24 languages including Korean, English, Chinese, Japanese, German, French, Spanish, Russian and south Asian language. The performance of the speech recognition is comparable or better than that of global companies such as Google. In the era of digital transformation, the speech technology plays a key role in artificial intelligence(AI) services such as AI assistants and AI tutors.

In general, the speech recognition requires a large-scale training data. So ETRI has been trying to solve the language extension difficulties due to rare resources using ▲Self Supervised Learning ▲Pseudo label based semi-supervised learning ▲Large multilingual pre-trained Model ▲Audio data generation using TTS, etc. Self-supervised learning is a technology that enables the AI models to learn from unlabeled data, and pseudo-label is a technology which creates virtual labels for supervised learning. In addition, the pre-trained model is an AI model in which related knowledge is learned in advance using large amounts of data.

As an AI model, the end-to-end speech recognition technology was applied. Although this model shows better performance than that of HMM based speech recognition, it is difficult to run in realtime and not easy to specialize in a specific domain. Accordingly, ETRI developed the streaming end-to-end speech recognition to infer a hypothesis in realtime. And also, the hybrid engine was newly implemented to adapt easily at specific domains. So far, the technology has been delivered to about 30 domestic and foreign companies for various AI services such as ▲ meeting ▲ subtitle generation ▲ kiosk ▲ medical and education ▲ AI contact centers.

And also, ETRI plans to expand the supported languages to about 30 languages by this year, and actively promote commercialization through the domestic and foreign exhibitions and business meetings. This technology is the result of efforts to develop speech recognition technology over the past 20 years. The research team had provided core technology to the official automatic interpretation service for the 2018 PyeongChang Winter Olympics.

Dr. Kim Sang-hun, project leader, said, “It is meaningful that the multilingual speech recognition has been released to lots of domestic companies not to depend on abroad. Hopefully, this technology expects to be of great help in enhancing the global competitiveness of Korea’s artificial intelligence field and securing technological sovereignty.”

Currently, ETRI has been providing the speech recognition services in 11 languages through the open API(https://aiopen.etri.re.kr/). In addition, the speech recognition open API services will be expanded to 24 languages to provide business opportunities to various users such as small and medium-sized venture companies, schools, and individual developers.

This technology was developed as part of the Ministry of Science and ICT’s project, called “Self-Improving Integrated Artificial Intelligence System”. During the project period, researchers have achieved the following: establishing 17 domestic and foreign papers, obtaining 43 patents, accomplishing 20 cases of technology transer, and receiving 1.9 billion KRW for their technology royalty fee.

Song Hwa Jeon, Director
Integrated Intelligence Research Section
(+82-42-860-5836, songhj@etri.re.kr)