Text-to-speech conversion has become increasingly smart, but there is a problem: it may still take lots of training resources and time to generate natural-sounding output. Microsoft and Chinese investigators may have a better way. They have crafted a text-to-speech AI that may generate realistic speech with only 200 voice samples (approximately 20 minutes’ worth) and fitting transcriptions.

The system is based in part on Transformers or profound neural networks which approximately emulate nerves from the mind. Transformers weigh each input and output on the fly such as synaptic connections, helping to process even extended sequences quite effectively — state, an intricate sentence. Combine this with a noise-removing encoder component along with the AI can do a whole lot with comparatively small.

