We heard so much about machine learning at Google I/O this year.
One of the most interesting demos at this week’s Google I/O keynote featured a new version of Google’s voice assistant that’s due out later this year. A Google employee asked the Google Assistant to bring up her photos and then show her photos with animals. She tapped one and said, “Send it to Justin.” The photo was dropped into the messaging app.
From there, things got more impressive.
“Hey Google, send an email to Jessica,” she said. “Hi Jessica, I just got back from Yellowstone and completely fell in love with it.” The phone transcribed her words, putting “Hi Jessica” on its own line.
Set subject to Yellowstone adventures,” she said. The assistant understood that it should put “Yellowstone adventures” into the subject line, not the body of the message.
Then without any explicit command, the woman went back to dictating the body of the message. Finally, she said “send it,” and Google’s assistant did.
Google is also working to expand the assistant’s understanding of personal references, the company said. If a user says, “Hey Google, what’s the weather like at Mom’s house,” Google will be able to figure out that “mom’s house” refers to the home of the user’s mother, look up her address, and provide a weather forecast for her city.
Google says that its next-generation assistant is coming to “new Pixel phones”—that is, the phones that come after the current Pixel 3 line—later this year.
Obviously, there’s a big difference between a canned demo and a shipping product. We’ll have to wait and see if typical interactions with the new assistant work this well. But Google seems to be making steady progress toward the dream of building a virtual assistant that can competently handle even complex tasks by voice.
A lot of the announcements at I/O were like this: not the announcement of major new products, but the use of machine learning techniques to gradually make a range of Google products more sophisticated and helpful. Google also touted a number of under-the-hood improvements to its machine learning software, which will allow both Google-created and third-party software to use more sophisticated machine learning techniques.
In particular, Google is making a big push to shift machine learning operations from the cloud onto peoples’ mobile devices. This should allow ML-powered applications to be faster, more private, and able to operate offline.
Google has led the charge on machine learning
If you ask machine learning experts when the current deep learning boom started, many will point to a 2012 paper known as “AlexNet” after lead author Alex Krizhevsky. The authors, a trio of researchers from the University of Toronto, entered the ImageNet competition to classify images into one of a thousand categories.
The ImageNet organizers supplied more than a million labeled example images to train the networks. AlexNet achieved unprecedented accuracy by using a deep neural network, with eight trainable layers and 650,000 neurons. They were able to train such a massive network on so much data because they figured out how to harness consumer-grade GPUs, which are designed for large-scale parallel processing.
AlexNet demonstrated the importance of what you might call the three-legged stool of deep learning: better algorithms, more training data, and more computing power. Over the last seven years, companies have been scrambling to beef up their capabilities on all three fronts, resulting in better and better performance.
Google has been leading this charge almost from the beginning. Two years after AlexNet won an image recognition competition called ImageNet in 2012, Google entered the contest with an even deeper neural network and took top prize. The company has hired dozens of top-tier machine learning experts, including the 2014 acquisition of deep learning startup DeepMind, keeping the company at the forefront of neural network design.
The company also has unrivaled access to large data sets. A 2013 paper described how Google was using deep neural networks to recognize address numbers in tens of millions of images captured by Google Street View.
Google has been hard at work on the hardware front, too. In 2016, Google announced that it had created a custom chip called a Tensor Processing Unit specifically designed to accelerate the operations used by neural networks.
“Although Google considered building an Application-Specific Integrated Circuit (ASIC) for neural networks as early as 2006, the situation became urgent in 2013,” Google wrote in 2017. “That’s when we realized that the fast-growing computational demands of neural networks could require us to double the number of data centers we operate.”
This is why Google I/O has had such a focus on machine learning for the last three years. The company believes that these assets—a small army of machine learning experts, vast amounts of data, and its own custom silicon—make it ideally positioned to exploit the opportunities presented by machine learning.
This year’s Google I/O didn’t actually have a lot of major new ML-related product announcements because the company has already baked machine learning into many of its major products. Android has had voice recognition and the Google Assistant for years. Google Photos has long had an impressive ML-based search function. Last year, Google introduced Google Duplex, which makes a reservation on behalf of a user with an uncannily realistic human voice created by software.
Instead, I/O presentations on machine learning focused on two areas: shifting more machine learning activity onto smartphones and using machine learning to help disadvantaged people—including people who are deaf, illiterate, or suffering from cancer.
Squeezing machine learning onto smartphones
Past efforts to make neural networks more accurate have involved making them deeper and more complicated. This approach has produced impressive results, but it has a big downside: the networks often wind up being too complex to run on smartphones.
People have mostly dealt with this by offloading computation to the cloud. Early versions of Google and Apple’s voice assistants would record audio and upload it to the companies servers for processing. That worked all right, but it had three significant downsides: it had higher latency, it had weaker privacy protection, and the feature would only work online.
So Google has been working to shift more and more computation on-device. Current Android devices already have basic on-device voice recognition capabilities, but Google’s virtual assistant requires an Internet connection. Google says that situation will change later this year with a new offline mode for Google Assistant.
This new capability is a big reason for the lightning-fast response times demonstrated by this week’s demo. Google says the assistant will be “up to 10 times faster” for certain tasks.
Smaller networks, more hardware acceleration
The key to this switch was dramatically reducing the size of the neural networks used for speech recognition. Researchers—both inside and outside of Google—have been working on this problem for a while.
A 2016 paper, for example, described how a team of researchers slimmed down the classic AlexNet architecture. They found that certain elements in a convolutional neural network add a lot of parameters without increasing the network’s accuracy very much. By judiciously revamping the network’s structure, they were able to reduce the number of parameters in AlexNet by a factor of 50 without reducing its accuracy. Further compression techniques allowed them to squeeze the size of the model by a factor of 500.
Google says it has accomplished a similar feat with the more complex neural network it uses to understand Google Assistant commands, reducing the network’s size from 100GB to about 500MB.
Google has also been working to make Google Assistant respond more quickly. A clue about how Google did this comes from a 2018 paper by several Google researchers. Whereas other researchers have tuned the structure of neural networks by hand, Google researchers automated the process. They used software to experiment with different neural network configurations and measure the speed and accuracy of the resulting networks—taking into account the capabilities and limitations of real-world smartphones.
Earlier this year, Google announced another move to accelerate on-device machine learning capabilities. TensorFlow is a Google-created framework for machine learning applications. TensorFlow Lite is the mobile version of the software. In January, Google announced that TensorFlow Lite now supports GPU acceleration on certain smartphone models.
We’ve also seen Google and other companies start to develop AI-specific hardware for smartphones. The Pixel 2 introduced a new Google-designed chip for image processing. Apple’s newest chips include a “Neural Engine” optimized for machine learning applications. Qualcomm’s latest chips also come with with specialized silicon for AI. It will be interesting to see if the next generation of Pixel phones come with more powerful custom hardware to support Google’s on-device assistant and other machine learning applications.
Earlier this year, Google introduced a chip called Edge TPU—a small, low-power version of the machine learning chip the company has had in its data centers for the last few years. The company is currently marketing it as an “Internet of things” product, but it’s easy to imagine Google putting a version of the chip in the next Pixel phone and other future hardware products like smart speakers.
And Google isn’t just using more on-device machine learning for its own apps. The company also offers third-party developers a library called ML Kit, available for both iOS and Android. It offers developers off-the-shelf machine learning models for common tasks like recognizing text and objects in images, detecting faces, and translating text from one language to another.
This week, Google announced a new ML Kit API for on-device translation for 59 languages, offering private, fast translations that work with or without a network connection. ML Kit also now offers the ability to do on-device object detection and tracking.
Google wants to use machine learning to help disadvantaged people
One major focus of this week’s keynote was on the ways Google is using machine learning to help a wide range of disadvantaged groups—from people who are illiterate to cancer patients.
The Google Translate app already allows users to point their camera at a block of text in the real world and see an instant translation to another language. Now users will be able to request the software to read the text aloud—either in the original language or in a different language—highlighting the corresponding words as the text is read.
Google highlighted its recently launched live transcribe app, which provides people who are hard of hearing with subtitles for real-life conversations. A new feature called live caption will allow Android users to display real-time transcriptions to any audio being played by the phone. Another feature called live relay allows deaf people to treat a phone call as if it were a text chat: the caller’s words are transcribed as chat messages in realtime. The recipient can type words back and have them read aloud to the caller.
Google says it’s working to gather voice recordings from people with speech impediments to help the company’s products better understand them.
Google is also trying to use machine learning to help people with degenerative conditions that prevent them from speaking altogether. Currently, these people often have to slowly type out messages—if not with their fingers, then with their eyes—and have them read by a synthetic voice. Google hopes that software based on machine learning can pick up more complex cues, allowing these people to “speak” quickly enough to participate in conversations in real time.
Google has a team working on applying machine learning techniques to diagnosing cancer from radiology scans.
Google sees machine learning as its future
Google’s mission is to organize the world’s information and make it accessible and useful. Google has made a lot of progress on the first half of that mission statement—it may have access to more data than any other company on the planet. But making the information more useful will require software that understands the information in a more sophisticated way—which is exactly what machine learning technology could do.
At I/O this week, Google sent a clear signal that Google’s machine learning push is only getting started. The company is pouring resources into developing new chips, algorithms, and platforms because it believes these technologies have a lot of room to improve its existing products—or allow the creation of entirely new ones. And Google believes it has a unique combination of talent and resources to fully exploit those opportunities.