This is one that truly depends on the targeted audience. I still believe that the 1st solely owned & operated female robotics company will make billions.
Beyond correct pronunciation, there is the even larger challenge of correctly placing human qualities like inflection and emotion into speech. Linguists call this “prosody,” the ability to add correct stress, intonation or sentiment to spoken language.
Today, even with all the progress, it is not possible to completely represent rich emotions in human speech via artificial intelligence. The first experimental-research results — gained from employing machinelearning algorithms and huge databases of human emotions embedded in speech — are just becoming available to speech scientists.
Synthesised speech is created in a variety of ways. The highest-quality techniques for natural-sounding speech begin with a human voice that is used to generate a database of parts and even subparts of speech spoken in many different ways. A human voice actor may spend from 10 hours to hundreds of hours, if not more, recording for each database.