Every year, the Consumer Electronics Show crowns a stand-out gadget, something that wows the masses and stands as a kind of harbinger of our technological futures. In the year 2018, that headliner isn’t a tangible gizmo, but the AI that powers hundreds of thousands of them. And in this emerging market, a rivalry between two tech giants has emerged. In one corner there’s Amazon’s Alexa, the voice assistant that first infiltrated people’s homes when the company debuted its smart speaker, Echo, in 2014. In the other, there’s Google Assistant, a more competent latecomer that’s integrated into the company’s growing list of hardware products, including the Google Home. The latter’s ubiquitous marketing presence on the Las Vegas Strip is proof enough that the competition between these two has become increasingly fierce.
But beyond the question of which monotone disembodied voice will reign supreme is a much simpler concern: How are humans reacting to their presence? To learn more about the AI takeover, I met up with Gummi Hafsteinsson, the project manager director for the Google Assistant. Hafsteinsson was stationed in a small, windowless room beneath a tubular slide installed at Google’s playground-themed CES headquarters. As we listened to the sounds of anonymous bodies traveling above us, we chatted about the unanticipated ways people are conversing with their voice assistants, why giving AIs personality is equally important to making them functional, and what the technology will look like in five years.
One of the first things I noticed when I tried out the Google Home is that people interacted with it in a way that predated them and showed that they’d had less-than-ideal experiences with voice-controlled systems. Maybe everyone’s reference point was a voice on the phone they spoke to when they called the cable company. So their interactions are very patient, very loud, very articulated. I’m curious whether you’ve recognized a change in the cadence with which people speak to Google’s devices as they’ve been entering people’s homes.
What you’ve just referred to are things like where you’re actually giving more commands. Especially in the early days, you’d say “Call somebody,” and it would call. It’s not a conversation, it’s just a command you give out. But we wanted the assistant to — and this is why we have this chat UI and everything — we wanted to emphasize it’s not just about issuing commands in the shortest amount of words. But actually think about it as you’re having a conversation with Google.
It’s interesting, I actually did see people react pretty quickly, pretty naturally to that. You initially saw a huge chunk of the queries were much more natural-speaking versus previously they’d be much more keyboardy. They’d come out the same way as type-search queries. With the system, they actually started to do much longer sentences that felt more natural. They felt more natural having a follow-up question. Even things like saying “please” and “thank you.”
The power of having that is, once you’re having an actual conversation, it is actually a very powerful medium in multiple ways. One is it offers a tremendous amount of disambiguation ability. As you have a conversation, you go back and forth — that’s a good way of saying: Look, I say something, you are not quite sure, you ask a question, I can clarify. In a few steps you can go from this to actually something really specific.
The other thing is conversation, being a natural medium that we all know how to do, is also easily adapted. We’re very used to it being adapted, we can have a conversation over a phone, we can have it face-to-face. We’re kind of used to the idea that if we have it face-to-face we can have all sorts of gestures and scribbling on paper and stuff like that. If it’s on the phone, we stick with voice only, and so the same thing applies to it. This is why we can take the assistant and go from a phone to a display to a speaker with no display, to a smart display, to a TV. It’s the same product, it’s the same modality. People don’t seem to have a problem with adapting to all different types of conversation. So we saw people just immediately latched onto it and seemed to get it.
There was a Billboard story last year about how record companies were suddenly fielding ambiguous requests from users — people would be like, “Play that song about how it never rains in California.” These fragments of memories were presented to speakers. Have you noticed behavior like that in other categories?
Speech is so interesting. You think you know all the phrases people might say, and then they say all kinds of different things. So we put a lot of work to ensure we can handle that.
The one category I would call out specifically is just the idea of it having a personality. There are basic questions, like, “What’s your favorite ice cream?,” and, “How are you doing?,” and all that. Basic chitchat in some sense. But it’s also an important part of: once you promise a conversation, you want to be able to have that conversation. A conversation without personality is something that we, as human beings, just don’t quite understand.
We have writers on staff who put a lot of thought into it. The trivia game [that you can play on the latest version of the Google Home] is made by the same team. Because they’re trying to think about how to make the assistant more engaging, in terms of having a stronger emotional connection with it. And it turns out that people actually really, really wanted to do that. A good chunk of the conversations we’re seeing are actually around that area. Not exclusively, but it’s just human nature. You can’t change human nature. It’s just the way it is.
How would you describe the Google Assistant’s personality?
[The writers] are the team that worked on Google Doodles [the occasional artwork and animations that appear on the Google homepage]. I think they’re expressing the same personality of that company through the assistant. It’s the same thing you’re seeing in the playground here. It feels very Google.
Like, playful and helpful?
Do you find that people act differently toward the assistant depending on the vessel it occupies? We’re here at CES, and these things are going into mirrors and fridges and things like that. Are people more connected with the voice if, for instance, they’re staring at themselves in the mirror?
I think the biggest difference would be the context you’re in, as opposed to what you’re talking to. So, what we see in a smart speaker is that it’s in the home. So, surprise surprise, people ask it to play a lot of music. It can do things like set timers and alarms. And then it can also control our devices. Those are the kinds of things that are very popular in that context. But when you get in the car you might be using your mobile phone to use the assistant, you’re seeing a lot of navigation, communication, calling people, texting, controlling a medium, playing a podcast on your way home or something like that. Even with the phone we notice that when people get home, their behavior becomes much more like they’re interacting with a smart speaker than when they’re in the car. So I think it’s much more about where you are and what you need to do than the device itself.
What do you see the capabilities of the assistant in five years?
Wow. I’ll try to paint the picture without [laughs] —
— Without making it scary?
Well, without making it too committal in terms of whether that’s actually going to happen. The vision is to provide the ultimate natural interface to technology. You go from the keyboard, which is not natural at all, to the mouse, which is a proxy pointer, to the touch screen, which is slightly better but kind of awkward in some sense. And now to a conversation, to voice.
Everyone knows how to do that. I have my kids use the assistant, I have my parents use the assistant. Like, literally no manual, they both get it. And so you’re heading toward a picture of the world where now you have an interface to technology that is perfectly natural and super easy to use. And as I mentioned, it can adapt to different spaces and different devices. It becomes sort of like this helper that’s with you all the time. You get home, it’s there, you can ask it to play music and turn the lights on, do whatever you want. It’s not like you’re actually engaging with it all the time. It’s just there when you need it. So you can have these really quick, short interactions. And it’s not like using a computer, it’s just helping you along the way. You don’t have to think about it.
Ultimately we need to get to the point where it feels so natural that you almost forget about it being there, but you can always get to it. That’s where we’re headed with all these devices in your home and in your car, and on your phone. We’re sort of painting that. We’re not there yet, but you can imagine that’d be the case. In my case when I’m home I’ll stream all my media through the assistant. If it’s music, I’ll ask it to play music. But if I sit down in front of the TV and I want to watch Game of Thrones, I just sit down on my couch and say, “Hey, Google, continue playing Game of Thrones.” The TV turns on and starts playing Game of Thrones. I don’t even notice the devices. I don’t even think about it. I just say it and it happens. To have that expand across everything — that’s where we’re headed.