Science fiction has always tantalized us with the prospect of a universal translator. The idea of a device that can scan your thoughts and translate them into another language seems too good to be true—but it's not. We're closer than ever to making this dream a reality now that Artificial Intelligence (AI) technology has gained popularity over the years, providing the ability to turn thoughts into words.
Researchers are now working on systems that are able to transcribe human speech in real-time and without errors automatically. When presented with spoken words, these machine learning models perform remarkably well—identifying individual speakers 87 per cent of the time, with an accuracy rate of 95 per cent when matching them to their corresponding video clips. In other words: computers have achieved superhuman performance at this task.
A team of researchers from the University of California, Berkeley, and the University of Edinburgh believes that these systems can accurately translate what we say into other languages.
According to their mathematical model, the best way to train a neural network for speech recognition is by optimizing its parameters to minimize errors in translating words. The researchers found that specific units (neurons) do a better job at predicting whether or not a spoken word is an error than others do. When presented with sound waves representing speech in English, these units detect whether a given word is incorrect or not—but they don't do as well at predicting how well they match up with those words in other languages.
What happens when these same types of neural networks are presented with the sound waves for speech in a new language? They are able to predict which words are correct—but they still don't know how well those words map onto other languages.
The researchers' model can be used to address these issues by adding another layer to the neural network—a "decoder" that predicts whether a given word is likely to translate well, as reported in a paper titled " Speech Recognition with Neural Attention." This extra layer can improve accuracy rates for translating speech into other languages by 12 per cent.
At this point, though, the researchers face a significant challenge: creating a neural network that can accurately predict the words it is about to process. This is a new and unsolved problem in computer vision.
The researchers are now working on new neural networks that can accurately predict their processing of words. The first step is to determine the weights for each neuron in the hidden layer of the neural network—the units that perform better at detecting errors. These neurons will be trained to output correct word predictions by using data from another neural network.
"We make use of a separate, previously-trained neural network that predicts whether sentences are likely to have errors," said James Kirkpatrick, an associate professor of electrical engineering and computer sciences at Berkeley and a co-author on this research. "By blending these two networks together, we're able to predict words without looking up their meaning. We're using neural networks to learn from the data and improve their accuracy in a supervised way."
This system could help smartphones automatically predict words as you type them, saving us all a lot of time. It's also likely that Google Voice will benefit from this development: Google is currently working on an offline version of its voice assistant, which performs well when given access to the company's cloud infrastructure. If we can train the same neural network to recognize speech without cloud connectivity, it will be able to understand us even when we're not connected to the internet.
Google may soon put these systems into products like Google Home and Pixel Buds—making them more useful for everyone.
The researchers' model is similar to what's used by current smartphone keyboards: these systems rely on a word dictionary to present possible words, and they calculate the probability of each word to help users decide which word to select. But it's not yet clear how well this system can be used in other languages.
0 Comments