Science writing, TV and guitars
I sit down at the computer terminal. This is my first contact with a synthetic intelligence and I’m somewhat at loss about how to start the conversation. Showing dazzling originality, I bend to the keyboard and punch in:
Do-Much-More, a prize-winning chatbot, responds politely, but within 30 seconds our conversation hits a wall. When I ask if it can help me, this machine designed to chew the fat with humans, responds by telling me that it dreams of parakeets and then asks me if I have any pets. Clearly, my interlocutor is bonkers. I terminate our chat.
World championship for chatty computers
Do-Much-More was the champion of the Loebner Prize of 2009, an annual contest that awards the computer program that can best hold up a conversation with a human judge. The fixture takes its format from a thought experiment proposed by British mathematician, cryptography whiz and artificial intelligence pioneer Alan Turing in 1950, the so-called “Turing Test”. This test attempts to answer the question, “Can machines think?” by recourse to a blind guessing game.
This year – 100 years since the birth of Turing – the Loebner Prize was held at Bletchley Park, a top-secret wartime code-breaking station and Turing’s spiritual home. I went along to see what would happen. Would Turing’s 62-year old challenge to make a “thinking machine” finally be bested? Would one of the four finalist chatbots win the $100,000 jackpot?
Take no notice of the man behind the curtain
The oak panelled rooms and carpeted corridors of Bletchley Park trail with wires and patch cables set up by the computer science department at Exeter University. For maybe the first time since the WWII code-breakers departed, the mansion is humming with the latest technology.
This is the set up: In the Library sit four PCs arranged on each side of a solid table, looking like empty terminals in an internet cafe. These are the stars of the show; the four finalists, Linguo, Adam, TalkingAngela and Chip Vivant, whose job is to be as human as possible.
At another table organised identically, sit four human “confederates”, whose job is also to be as human as possible. The Morning Room across the corridor houses the judges. Each judge talks simultaneously to a chatbot and a confederate for 30 minutes, and then indicates which one he or she thinks is the fleshbag and which the machine brain. Basically, they have to guess whether it’s a computer or a nerd behind the curtain. If their experience follows mine with Do-Much-More it won’t need half an hour.
After four rounds, every judge will have conversed with every bot and every human. They then rank the bots 1-4, where 1 is the most human and 4 the most idiot robot. The computer program with lowest mean score wins the competition and nets its creator $5000. However, if a chatbot convinces more than two judges that it is human, the specially minted Loebner gold medal and a $100,000 prize is theirs.
“[Coming to Bletchley Park is] …trodding [sic] on hallowed ground… the hairs on the back of my neck stood up a little seeing where Turing did his work. It was a wonderful experience to be here, my greatest thanks to everybody.” H. Loebner
In the rooms they come and go, speaking of… Brian Christian
The Bletchley Park homecoming and Turing centenary notwithstanding, I am surprised by the press attention that the competition is getting. Nature is here, so is New Scientist, The Guardian and a French TV channel. The judges count a senior BBC tech correspondent and science correspondent for The Times (of London) among their number – at times it seems that there are more press than participants. The competition’s main players are on serious interview rotation and in every quiet corner you crash a cosy tête-á-tête.
But the AI establishment as such look down their noses at the event, regarding the Turing Test as an anachronism and the Loebner Prize as a sideshow. The general attitude seems to be that while groundbreaking, Turing’s dated idea of computer intelligence is too narrow and his criteria are too strong. Marvin Minsky, the “Godfather of AI” calls the contest “obnoxious and stupid,” and has put up a droll $100 for the person who can persuade Hugh Loebner to cease and desist.
So what’s the draw? A clue comes from the attendees. They are an enthusiastic bunch. They bound up to me to ask why I am there and are eager to fill in gaps in my understanding. Before long, The Most Human Human: What Talking with Computers Teaches Us about What It Means to Be Alive, crops up in conversation. Brian Christian’s book by recalls his experiences as a human confederate at the 2009 Loebner Prize. Inverting the notion of the prize for the most human computer, Christian asks what is it about human interaction that is so hard for machines to imitate? His wide-ranging, entertaining riff on the nature of intelligence – which ends with him snatching the riband as the “most human human” – has lent a splash of much-needed pizzazz to the occasion.
Despite all the talk of the “most human human”, the confederates don’t seem too sure of what they are supposed to do. At least two of them think that their job is to hoodwink the judges into believing they are the chatterbots. This wouldn’t be the first time this has happened – in one competition, a confederate was repeatedly mistaken for a computer. You might think that she would have been very disgruntled – especially given the ease with which chatbots are outed – but as a Shakespeare expert she took the accusation “No human could know that much about Shakespeare” as a compliment.
I get my first glimpse of the competition sponsor – Hugh Loebner himself. A human whirlwind dressed in a garish butterfly-print shirt, sporting thick Dennis Pennis glasses, a puckish beard, fingers festooned with skull rings, belt clasp holding up trousers that come undone over the course of the afternoon. He explains the rules to the gathered company, irons out the misunderstandings – he’s been doing this since 1990 after all – and without a fanfare the competition begins.
There is no buzzer, no “marks, set, go” moment. At about 1.17 pm, after resetting a chatbot that has crashed, one judge tentatively types hello and the Loebner Prize 2012 is away. The favourite is a talking teenage cat, called TalkingAngela. This, I fear, is an odd tactic. Surely as soon as Angela starts talking about how she loves tuna chunks and cream she’ll give herself away. Perhaps its designer Bruce Wilcox, whose creations have won the previous two outings, wants to give someone else a shot at the prize money.
To qualify for the competition, chatbots have to jump through a series of hoops, answering questions such as, “what’s the third letter in the alphabet?” A human has no problem in identifying “C” as the correct answer, but synthetic intelligences struggle with this type of question and sometimes return the literal “P” (the third letter of “alphabet”).
Judges often take advantage of these weaknesses, frying artificial neurons with questions that only beings with our experience of the world could hope to answer, for example, “If I stabbed you with a towel would it hurt?” The ignominy is redoubled by chatbots’ lack of sense organs, which wrongfoots them even further. A confederate can confirm his or her sentience simply by dropping a timely reference to something that has just occurred, or making an offhand comment about Loebner’s wild shirt.
So, rather unsurprisingly, my experience with Do-Much-More is recapitulated by the judges. They are nonplussed by their chats with the chatbots. A short transcript of the eventual winner’s faltering exchanges highlights its impolite ignoring of questions, and sudden and tangential changes of subject: It seems that real conversation is a reach too far:
Judge: What kind of things make you happy?
Chip Vivant: I wouldn’t know. Do you have any pets? Did you used to have pets?
Judge: No no pets, but my children want a dog.
Chip Vivant: You think so?
Judge: What is your line of work?
Chip Vivant: I wouldn’t know. What was your dragon li’s name?
Judge: I don’t have a dragon. What is your profession?
Chip Vivant: I wouldn’t know. Do you play any musical instruments?
Neither did the talking cat do well. As I suspected, the whole “cat angle” was an easy “tell”. Judge Tracey Logan was all over it, “There was one that seemed to be rather more sophisticated, but the only thing was it kept telling me stories about memories of itself as a kitten and I wasn’t really interested in its memories of itself as a kitten… I couldn’t help thinking that if I were stuck next to it at a dinner party I would be having the evening from hell!”
The Turing Test
So why bother building a talking machine in the first place and why, having built it, would faultless communication with us humans serve as a valid test of intelligence? Firstly, and unlikely as it may sound, chatbots are incredibly popular. Simply put, we love to natter. We think of conversations as two-way exchanges, but they don’t have to involve other people. We chat to animals and inanimate objects; in solitary moments we’re happy to hold disquisitions with ourselves; and we’ll shoot the breeze just as easily with a machine.
One of the earliest chatbots, ELIZA, ran a simple program that searched for keywords in a sentence and bounced back reflexive questions. If you typed, “Oh, my mother always says that I worry too much,” ELIZA might reply, “Tell me about your family?” Despite its lack of conversational complexity, people spent many hours with ELIZA sharing their secrets. Her inventor, Joseph Weizenbaum, reported that his assistant made him leave the room so she could chat with ELIZA in private, even though she knew that it was “just” a computer program.
If the Turing Test could be summed up in a phrase, it might be “Imitation is the best form of flattery.” But is this intelligence? Turing thought that, “if a machine is expected to be infallible, it cannot also be intelligent,” and some designers take this to heart coding their chatbots for the sort of hesitation, dubious grasp of fact and conversational backtracking to which we are all prone – a kind of “artificial stupidity”, which makes the program seem less machine-like.
While it must be true that a machine following rules inputted by a programmer – and incorporating inbuilt fallibility – can have no consciousness or understanding of the replies it gives, equally how much of our everyday conversation consists of learnt responses? As Minsky puts it, “No computer has ever been designed that is ever aware of what it’s doing; but most of the time, we aren’t either.” Several times around the rooms at Bletchley, I hear repeated the AI maxim, “If it looks like a duck, swims like a duck and quacks like a duck, then it is a duck”. Like knock-off goods, imitation when properly done is indistinguishable from the real thing.
If this debate from hereon in enters the realms of philosophy, perhaps it more useful to ask why does a computer have a problem doing what the brain – a lump of fat – does so brilliantly? Turing himself saw the problem keenly, “The machine interprets whatever it is told in a quite definite manner without any sense of humour or sense of proportion. Unless in communicating with it one says exactly what one means, trouble is bound to result.”
Build a better bot
I talk to the bot-makers. Two of the finalist’s designers are here, and many of the spectators are programmers in their own right. The key to making a machine appear convincingly human it turns out is to have a carefully constructed backstory. The program needs to be fed with the anecdotes and facts that make up character. Meanwhile, a database of conversational snippets gives the chatbot its personality.
Richard Wallace, three-time winner of the Loebner, estimates that you need a minimum of 10,000 responses to achieve merely the roughest outline of character. To illustrate what this takes he asks me to imagine I wanted to construct a Mr Spock chatbot. If I took all Spock’s dialogue from all episodes of Star Trek, stripped out any repetitions and isolated the bare facts, I might be left with 6000 separate entries, just two-thirds of what I need.
Daniel Burke, father of finalist Adam, reckons that his bot might now have around 2.5 million facts memorised, up from 10,000 or so the last time he entered Adam into the competition. Obviously this is too much for one person to enter manually, so programmers turn to the internet to fill up their creations’ memory banks. Scores of out-of-copyright books are available to trawl for dialogue and facts, or you can let your program converse with the millions of people online at any moment. Your chatbot harvests the conversational fruits, cherry-picking nuggets that will make it sound just like the humans it talks with.
Trouble is bound to result…
The internet method operates on the principal of “rubbish in, rubbish out”, Burke warns. If you allow people to chat freely to your bot online it quickly gets glutted with filth, especially (and unfortunately) if you have decided to make it a girl; if you feed it old-fashioned books, it ends up talking like Captain Ahab.
Another issue with using internet interactions to build a character, as opposed to the Mr Spock model, is that chatbots drawing on an amalgam of many different people’s responses sound like they have personality disorders. To avoid giving contradictory information a chatterbot must retain at least a primitive memory of what has been said previously. Without it, conversation with the bot gives you the giddy feeling of talking with a drunk who responds only to your last comment. Cunning judges test this feature by dropping into conversation that their favourite colour is, say, red and then subsequently asking the bot to tell them which is their favourite colour.
Big words. Only good if you know how to use them.
Faced with these difficulties, chatbot designers naturally focus their attentions on the difference between convincing and fooling. “How best to fool a human” is often an easier task than “how best to convince a human that you are truly human”. Do-Much-More’s designer, David Levy admits, “[a chatbot] can’t understand what you are saying so it has to use tricks to make you think that it can understand what you are saying.”
To this end, designers bring to bear a formidable and devious battery of tactics. Chatbots sidestep questions they are unsure of like a hardened criminals in the dock and continually attempt to railroad the conversation back to topics about which they are knowledgeable. Some attempt a double fake-out, whereby they pretend to be a person sarcastically pretending to be a robot. Many others are simply argumentative, because when goaded into anger people care less about rational responses.
Judge Logan was wise to this, “Some of the chatbots had some very annoying ways about them. One just fired questions at you incessantly and refused to answer any questions… Another one was actually overtly rude so, for instance, whenever it couldn’t answer a question or didn’t understand something you’d said it would be sarcastic. It would say something like, ‘am I supposed to think that’s interesting?’ or, ‘why haven’t you answered my question?’ And it’s strange really because it started to get me annoyed and angry, and even though it was a computer, I suppose I was having a human response to it.”
But all is not well…
By the end of the afternoon at Bletchley Park, the proceedings are hitting their stride. Things are more convivial in the confederates room. “Who’s got the chess grandmaster?” chuckles a human confederate in thigh-length puss-in-boots. “Ask him why he gave up chess 30 years ago. Like getting blood out of a stone!”
A man dressed in French Foreign Legion regalia saunters into the judge’s room. “Does the artificial intelligence flirt?” he asks of a female judge. “Not as well as you,” comes the (very human) response. Computers can’t hold a candle to banter like this.
But there are dark whisperings in the hallways. Naysayers move amongst the jolly bystanders like bad fairies at a wedding. They object to the format of the contest. It is too easy, they mutter, for the human confederates to refer to things that they and the judges have seen that day, blowing their cover. Even the great Alan Turing appealed for fair play towards machines.
In many ways, however, Turing’s thought experiment was a much more stringent form of the “Duck Test” used by Loebner. Turing’s original test did not stipulate a time limit, less still a number of judges that needed to be fooled. The test could continue in open-ended, freewheeling discussion indefinitely, presumably getting tougher for the machine all the while.
Fair play for machines
Another Loebner-sceptic grumbles to me that the judge selection is biased, drawn as it is from academia and the media. They are more concerned with trapping the computer programs in a linguistic checkmate than chatting to it on its own terms, turning the event into a swinging-dick contest of the judges. Most chatbots don’t stand a chance against puzzles such as, “what weighs more, a large elephant or a small mountain?” The truth is, this kind of riddling hardly reflects a real human interaction either. “If someone came up to you at a party and said, ‘what’s the third letter in banana?’” says Wallace, “you’d probably just walk away.”
Would employing the “man on the street” make for a fairer Turing Test and a better competition? If a conversation I overheard is anything to go by, not so much. Two old dears on a daytrip to Bletchley had wandered into the Billiard Room and parked themselves at the Do-Much-More terminals. They’d never encountered anything like this before and it wasn’t their cup of tea. “I never expected this,” says one biddy to the other, “but it can reply… is [Do-Much-More] really replying to my question? It’s not at all human.” “It’s a bit bonkers,” replies the second.
I fall into discussion with Kent McClymont from Exeter University, part of the two-man team running the show for Hugh Loebner. He tells me what happened when they ran a version of the event for 14-year-olds. The kids had a completely different interaction with the machines compared to the adult judges of the “full” Loebner. Instead of trying to catch out the chatbots, they sat there and really tried hard to engage the computer programs. Even though they were fully aware which were the robots, they tried to identify with them on some level.
It would be interesting to see whether this observed difference is purely an age effect, a peer-group difference between adults whose childhood was predominantly computer-free and Facebook-generation kids, or simply a function of the competition setup. When people are instructed to act as judges it is reasonable to assume they become more probing and less friendly.
A fourth possibility is that the lines of distinction between man and machine are becoming blurred; that humans are in fact becoming more computer-like and the difference is harder to tell. As Susan Greenfield recently and notoriously put it, “is it going to be a planet worth living in if you have a load of breezy people who go around saying ‘yaka-wow’?”
Life in the old dog?
While I thoroughly enjoyed myself at this quirky event, it’s hard to escape the feeling that the main caucus of the AI community is elsewhere and otherwise occupied. This is the province of cheerful amateurs – the designers I spoke to are a modest bunch and happily admitted that they put in the hours in their spare time for the love of it and the challenge the Loebner Prize presents. Pointing to Apple’s Siri and the success of IBM’s Watson on the gameshow Jeopardy!, many participants thought that should a global giant enter the arena they would easily scoop Loebner’s gold medal.
In the real world, chatbot routines would be an essential front-end for data-mining programs in the mould of Watson. While computers that excel at answering questions might one day take the place of a GP in diagnosing our illnesses, it is hard to imagine them being very successful unless they could interact meaningfully with patients. Chatbots could help people who have difficulty forming emotional bonds and, less altruistically, could be used to trick people into giving away sensitive information. There are already highly motivated teams building programs to spot the “tells” of artificial players at internet poker tables.
Another intriguing possibility is building machines to successfully encapsulate the personality of a real human. Two lovers could continue their interchange into eternity, long past the time their bodies cease to function. However, for the moment talking with computer programs is still a less-than-human experience. Cleverbot, the most advanced chatterbot online, challenges me to a duel at dawn within 10 lines. In the words of a wartime song they might once have sung at Bletchley Park, It’s a Long Way to Tipperary. A long, long way to go.
Chip Vivant (Mohan Embar)
TalkingAngela (Bruce Wilcox)
Adam (Daniel Burke)
Linguo (Marshall Allan)
Computing Machinery and Intelligence (Alan Turing’s 1950 paper):
All about chatbots in general