Audio samples from “Towards Natural Bilingual and Code-Switching Speech Synthesis Based on Mix of Monolingual Data and Cross-Lingual Voice Conversion”

Authors: Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma


The English target speaker’s voices from professional records:

text: I think it could be something to do with the soil and the climate.

text: We are very concerned that it will not happen and we will be engaged.

text: They arrive at the door of the food bank from all over the city.

The Mandarin target speaker’s voices from professional records:

text: 五千年传统文化信手拈来,被涂脂抹粉,戏谑调侃。

text: 今天,和平广场里盛开着这样的黄玫瑰。

text: 他说,他会将车收藏一辈子,以后再传给儿女们。

Cross-lingual voice conversion:

Source speaker: Mandarin female; Target speaker: English female

text: 五千年传统文化信手拈来,被涂脂抹粉,戏谑调侃。

Source speaker Target Speaker Tacotron2-VC  

text: 他们家靠这个药,世代漂洗为生,日子过得艰难。

Source speaker Target Speaker Tacotron2-VC  

text: 众所周知,潍坊因风筝而在全国闻名遐迩。

Source speaker Target Speaker Tacotron2-VC  

Source speaker: English female; Target speaker: Mandarin female

text: Hazel would like to sell the business.

Source speaker Target Speaker Tacotron-VC  

text: Many people have lost their jobs altogether.

Source speaker Target Speaker Tacotron-VC  

text: This is a serious accident, and we will do our utmost to identify the cause.

Source speaker Target Speaker Tacotron-VC  

Bilingual and code-switching speech synthesis:

English input text (all text not seen in training set)

text: A microscopic water creature could live until the end of the Earth.


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

text: Christmas is widely celebrated and enjoyed across the United States and the world.


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

text: Many lessons are boring, and he is very tired after doing gym.


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

text: Besides carving pumpkins, some celebrate Halloween by putting decorations up.


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

Chinese input text (all text not seen in training set)

Text: 儿子一气之下没有去,靠自学考上了函授大学。


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

Text: 目前您的电话接入后可能存在声音不清晰的情况。


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

Text: 就在她伸手想拿起那个红通通的果子试吃的时候,肩膀突然被人轻轻一拍。


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

Text: 现在看天上的星星总是觉得没有小时候看到的多。


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

Code-switching input text (all text not seen in training set)

Text: 我刚刚去 Starbucks 买了杯 Vanilla Latte 和两块 Oatmeal Raisin Cookie, 搭配起来还蛮不错的。


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

Text: Brunch 这个词是 breakfast 和 lunch 两个词的结合,意思是“早午餐。”


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

Text: 《life of Pi》的中文名字是《少年派的奇幻漂流》。


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

Text: 这个源于拳击运动的表达 “to punch above your weight” 的本意是“能和高于自己重量级别的对手较量。”


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

Text: 一见钟情 is similar to 一见倾心 which means love at first sight.


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

Text: 你多吃一点 means “Have some more.” 而慢慢吃 expresses politeness to someone when eating.


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

Text: When you wish to raise your drink to someone, to drink with them or propose a toast, you can say我敬你一杯。


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer

Text: The “闻” in “百闻不如一见” does not refer to smelling, but rather means to hear of, such as news, or by word of mouth.


 Target Speaker: English

FastSpeech Tacotron2 Transformer

Target Speaker: Mandarin

FastSpeech Tacotron2 Transformer