Alibaba’s new AI system ‘EMO’ creates realistic talking and singing videos from photos

The system, described in a research paper published on arXiv, is able to create fluid and expressive facial movements and head poses that closely match the nuances of a provided audio track. This represents a major advance in audio-driven talking head video generation, an area that has challenged AI researchers for years.

Researchers at Alibaba's Institute for Intelligent Computing have developed a new artificial intelligence system called "EMO," short for Emote Portrait Alive, that can animate a single portrait photo and generate videos of the person talking or singing in a remarkably lifelike fashion.

