top of page

Vall-E & Kosmos

I​nternship experience at Microsoft during summer 2023

What is Vall-E:

Vall-E (x), a language modeling approach for text-to-speech synthessis, is a cross-lingual neural codec language model which using discrete codes derived from an off-the-shelf neural audio codec model. VALL-E (x) can generate high-quality speech in the target language via just one speech utterance in the source language as a prompt while preserving the unseen speaker’s voice, emotion, and acoustic environment. ( For research details, please check https://www.microsoft.com/en-us/research/project/vall-e-x/overview/ )

My job:

During 2023, MSRA started productizing the Vall-E & Kosmos model as a new TTS online AI tool. My job engaged in constructring the product user experience and desiging several iterations of the web app interfaces of both the two models. The work also included the logo design, prototyping, collabrating with engineer colleagues to deliver a inner test demo.

The project currently is in active development at Microsoft Research Asia. To maintain confidentiality, full details are not publicly available. If you're interested in a deeper dive into my contributions to this project, please request page access passcode by the chat box at the right-bottom side.

bottom of page