Joey Tribbiani may be the most dimwitted of the Friends characters, but he is still very nuanced. The man is all at once an ordained minister, a womanizer with a single pick-up line, and a lover of his stuffed “bedtime penguin pal,” Hugsy. It’s for this reason that he was the sole character to earn a spinoff (albeit, an ill-fated one). He’s also the first one chosen for something else: to be resurrected in virtual form for a recent machine-learning project at the University of Leeds in England.
In a study titled “Virtual Immortality: Reanimating characters from TV shows,” researchers James Charles, Derek Magee, and David Hogg used software to dissect all 236 episodes of Friends, constructed specific language models based on that data, and created a video chat bot that generates new lines for Joey. Some examples of the more coherent material are: “Hey Ross do you want me to talk to some lady,” “I want to do something wrong,” and “I like pizza with cheese.” More nonsensical, drunkish virtual Joey communications include: “All right now well that was a soup,” “Oh ooh what do you,” and “Chandler I’m gonna get my Porsche.”
This is just the beginning of virtual Joey’s new, wondrous life — which may also include a virtual reality cast reunion in the near future. I caught up with Charles to learn more.
How did you land on Friends for your source material on this project?
Friends is a really popular TV show. People know it. Plus, there’s a lot of episodes of Friends. To train our avatar, we need a lot of data. We chose Joey because he has a very distinctive character. He’s well-recognized, he’s got a very specific style of talking and mannerisms, and this served well for our experiment.
Was there any concern that choosing someone like Ross or Rachel might make things more … complicated?
No. It’s easy to make these avatars with a neutral character, and without much intonation on their words. But we wanted somebody who was recognizable, and wasn’t hard to work with. The reason why we initially chose Joey over the other characters was just because he had a very distinctive way of talking and particular mannerisms that we could easily tell whether our algorithm had replicated in our experiments.
Joey is very expressive with his mouth. Like, when he’s attracted to a woman he does a kind of weird frown. Is that something you see reflected in the avatar?
I can see you’ve watched the show quite a bit.
Yes, I have.
We’re currently only concerned with the part of the show where there’s dialogue. And that’s what we’re analyzing at the moment. When it comes to interaction, then you’re quite right. These bits where there is no dialogue but facial expression, they would be very interesting to add in as future work.
Can you describe what it’s like to watch virtual Joey?
At the minute, you turn it on and it will just start generating sentences. Some of those sentences will come out as gibberish. Others will come out more constructed, some will come out as more understandable.
Were there sitcom-specific hurdles you encountered when attempting to go through the entire Friends series?
The very first stage of the method is that we want to train our avatar so that it will act like a particular TV character. And in order to do that we have an automatic system that we’ve developed, that you can run on all episodes of any TV show — not just Friends. This will extract certain training data that you can use to train the avatar. The training data that you require is phoneme and word alignment: You need to know what the characters are saying and when. You also need to know where they are on the screen and who they are. So you need some kind of face-recognition system going on. And all of this is fully automatic.
One of the hurdles that we overcame when we started working with the data is there’s audience laughter in the background. This can cause a problem with audio alignment. What we found is that a quarter of each episode of Friends is actually laughter. So that had to be removed.
Another interesting aspect we found from the analysis is that the characters only get a total talking time just over a minute per character, per episode.
Man, weren’t they each paid, like, a million dollars per episode during the final season?
Yeah, that’s 10 years worth of [the show]: Four to five hours [of talking for each character].
Do you plan to create characters for the whole gang, so their virtual versions can drink coffee together until the end of time?
Yes. We’re going to add interaction to the system. It would be interesting to see what happens if we have character interaction.
Are you hoping that eventually this avatar would be able to interact with anybody? So If I were like, “Hey Joey,” he’d be able to creepily respond: “How you doin’?”
That’s right. Or whatever he feels like he should say. And although we’re using TV series to do our research, the main goal of the research is to try and capture the nuances of what makes a person who they are and how they move and how they talk. Is it actually possible to put that in a computation model so you can then replay new content of that person? You can imagine this not just within the domain of TV series. You could also use this for improving human computer interactions, and having a more natural interface. Like Siri on the iPhone. It makes sense that a natural improvement of the technology would be to put a face on such a system, which is a much more natural form of interaction.
Do you think people would want an AI assistant based on a certain Friends character? I think I’d probably choose Phoebe.
A lot of personal assistants at the minute are quite neutral in personality. It is more natural to have a personality to an assistant. But rather than trying to manually construct a personality, is it not better to use those that have already been constructed by producers and directors and scriptwriters? That are designed so that we enjoy listening to them and find them approachable? Characters that we’re happy to talk to.
Have you seen Westworld?
I have not.
It’s kind of like what we’re discussing except way more fantastical and [with] way more murder.
What’s it called? I’m writing it down.
Westworld. Do you feel a fondness toward this virtual Joey you’ve created?
No. I don’t know what you mean by that sentence.
Because it generates its own content, was there any sort of connection you felt to it?
It’s still a very low-level system. It doesn’t have any high-level knowledge. So when it’s generating sentences, for instance, it doesn’t understand who the characters in Friends are. It doesn’t understand their relationships, or even objects that it’s talking about in its sentence. It’s low-level in that respect. And it requires a lot more work to get to a level where you’re understanding these high-level concepts.
Or, for you to actually … like it?
That’s another question. I can’t answer that.