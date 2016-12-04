DUBAI – Researchers from Google’s AI division DeepMind and the University of Oxford have collaborated and used artificial intelligence to make the most accurate lip-reading software to date.

According to a report by cbronline.com, the AI system, which was trained using almost 5,000 hours of TV footage from the BBC, contained a total of 118,000 sentences from the videos.

The key contributions include a “Watch, Listen, Attend and Spell” (WLAS), a network that learns to transcribe videos of mouth motion to characters.

In an article published by arix.org, the goal of the work was to recognize phrases and sentences uttered by a talking face, with or without the audio.

DeepMind stated that unlike previous works that have focused on recognizing a limited number of words or phrases, they tackle lip reading as an open-world problem – unconstrained natural language sentences, and in the wild videos.

The same media report added that the AI was trained on shows which aired during the period between January 2010 and December 2015, and later tested its performance on programs between March and September of this year.

The lip reader was only able to decipher less than one-quarter of the spoken words, whilst the WLAS model was able to decipher half of the spoken words.

Based on the report, citing the News Scientist, Ziheng Zhou at the University of Oulu, Finland, said that it was a big step for developing fully automatic lip-reading systems.

“Without the huge dataset, it’s very difficult for us to verify new technologies like deep learning,” he added.

In its summary, the architecture of the WLAS model surpassed the performance of all previous work on standard lip reading benchmark datasets, and demonstrated that visual information helps to improve speech recognition performance even when audio is used.