Computer scientists at MIT have developed a machine-learning system that can identify objects in an image based on a spoken description of the image.
Typical speech recognition systems like Google Voice and Siri rely on transcriptions of thousands of hours of speech recordings, which are then used to map speech signals to specific words.
Still in its early stages, the MIT system learns words from recorded speech clips and objects in images and then links them. Several hundred different works and objects can be recognized so far, with expectations that future versions can advance to a larger scale.
Read more