What Is a Mind? More Hype from Big Data

Editor’s note: ENV is delighted to welcome Dr. Larson as a new contributor, commenting on mind and technology. Founder and CEO of a software company in Austin, Texas, he has been a Research Scientist Associate at the IC² Institute, University of Texas at Austin, where in 2009 he also received his PhD focusing on computational linguistics, computer science, and analytic philosophy.

Robert Wilensky, who passed away last year, was professor emeritus in computer science at UC Berkeley and a pioneer in the field of Artificial Intelligence (AI). Having devoted a long career to studying deep problems in machine intelligence, he left us with a memorable quip skewering today’s digital culture. "We’ve all heard," said Wilensky, "that a million monkeys banging on a million typewriters will eventually reproduce the entire works of Shakespeare. Now, thanks to the Internet, we know this is not true."

It’s a joke, of course, but like many good jokes it contains a large measure of truth. The butt of the good professor’s joke is so-called "Big Data," the latest in a seemingly never-ending string of fads that mark the strange history of Artificial Intelligence. AI launched officially in the 1950s, and after generations of disappointing results, its advocates hit upon the next big thing. That was the notion that machine intelligence might somehow emerge out of volumes of data, solving by sheer brute size what couldn’t be accomplished before with clever engineering.

How realistic was that vision? Well, to understand "Big Data," we have to look at AI. And to see AI for what it is, we have to start with some big-picture ideas, such as materialism and mechanism.

Strong AI, Materialism, and Mechanism

Artificial Intelligence — specifically, "Strong Artificial Intelligence" (Strong AI) — is a thesis about the capabilities of machines, and equally a commentary on the nature of mind. It holds, in short, that mind doesn’t really exist. Or to put it another way, mind is material. Or: simply material. Or: merely material.

The materialist underpinnings here are hardly accidental, or hidden. Wired Magazine, that glossy-paged rhapsody to all things tech, made the connection explicit back in August 2010:

Scientists have assumed that unless there’s some supernatural aspect to thought, we ought to be able to reverse-engineer this sucker. But so far, the mechanics of human cognition remain largely unknown.

The counterfactual here is obvious: "Since there’s no supernatural aspect to thought (clearly!), the project of AI will eventually succeed."

There’s the rub. To speak of the supernatural presumes some definition of natural. Roughly, the notion of "matter" used here is taken from Descartes, who was influenced by Galileo, who was influenced by Democritus with his theory of atoms as "uncuttables" and primary qualities like quantity and extension. No one in the AI crowd pays much attention to debates over the philosophy of matter. Still, this is roughly their idea of matter, the Cartesian one, and mechanism is the idea that laws push this matter around. Mind is the ghostly stuff that the Enlightenment debunked.

Today in certain circles it’s almost fashionable to espouse reductionist views about the mind. In 1994, DNA co-elucidator Francis Crick called the notion that mind was nothing but matter and its operations "The Astonishing Hypothesis." Fast forward a couple of decades, and the adjective seems a bit out of place. There’s no longer anything astonishing about it. False ideas can become commonplace, after all. As the old saw reminds us, a lie goes half way around the world before the truth gets out of bed.

But not everyone is sanguine about the Strong AI agenda. Jaron Lanier, a pioneer of Virtual Reality (VR) and author of the best-selling You Are Not a Gadget, worries publicly about the Strong AI view gaining momentum, and excoriates it and its deleterious cultural consequences in almost apocalyptic terms: "Spirituality is committing suicide. Consciousness is attempting to will itself out of existence."

I’m with Lanier. But the question is: What, on the merits, does Big Data — those terabytes of web pages, the consequences of a cacophony of a million banging typewriters — really have to tell us with respect to the age-old question of whether we can reproduce a human mind on a computer program?

Everything Old Is New Again

It’s the same old story, with a few new bells and whistles, it seems. "Big Data" is a paradigm case of a fad, what University of Toronto computer scientist Hector Levesque has called a "cheap bag of tricks." Levesque, like many of the better AI researchers, is refreshingly frank about the scope of the challenges the field faces. I’m not being unfair here — Big Data is a fad, there’s no question about that — and it’s an overblown one too. The addition of ever larger, even gargantuan, datasets has left the so-called "hard" problems of AI largely untouched.

Consider the problem of machine translation (MT). That is the task of automatically translating expressions in one language (say, English) into another language (say, German). Purely statistical machine translation was de rigueur in early AI research, until failures mounted and more "knowledge-intensive" approaches replaced these early efforts. Big Data in effect revivifies the old techniques. This time around, we have all that data (from all those monkeys). And it’s true: machine translation has gotten a lot better.

But what’s missing from the hype these days is a frank discussion of the problems that remain. Once we have that discussion, the bubble pops. Here, to illustrate, is an example of a hard problem, using Google’s highly regarded statistical machine translation service, Google Translate.

First some background. In 1979, University of Pittsburgh philosopher John Haugeland wrote an interesting article in the Journal of Philosophy, "Understanding Natural Language," about Artificial Intelligence. At that time, philosophy and AI were still paired, if uncomfortably. Haugeland’s article is one of my all time favorite expositions of the deep mystery of how we interpret language. He gave a number of examples of sentences and longer narratives that, because of ambiguities at the lexical (word) level, he said required "holistic interpretation." That is, the ambiguities weren’t resolvable except by taking a broader context into account. The words by themselves weren’t enough.

Well, I took the old 1979 examples Haugeland claimed were difficult for MT, and submitted them to Google Translate, as an informal "test" to see if his claims were still valid today. Here they are:

(1) When Daddy came home, the boys stopped their cowboy game. They put away their guns and ran out back to the car.

(2) When the police drove up, the boys called off their robbery attempt. They put away their guns and ran out back to the car.

Haugeland’s claim in 1979 was that the two sentences would be difficult for MT, if not hopelessly so, because the translator has to choose between different expressions that explain the (presumed) intended meaning. But Big Data should solve all that, right? It turns out, no. Google’s system in fact demonstrates Haugeland’s point, 35 years later.

Here’s the translation for (1) in German (the translation language Haugeland mentioned, along with French):

(1)’ Wenn Papa nach Hause kam, h�rte die Jungs ihre Cowboy-Spiel. Sie legten ihre Waffen nieder und lief zur�ck zum Auto.

And here’s (1)’ translated back into English:

When Dad came home, heard the boys with her cowboy game. They laid down their arms and ran back to the car.

Ignoring the grammatical errors introduced by the machine translation, notice the phrase "laid down their arms." If we translate (2), also, into German and then back to English again, we get the identical phrase: "laid down their arms."

What happened? Google Translate is worse for (1) than (2) of course, since boys playing "cowboy games" until "Daddy gets home" hardly licenses the interpretation of "guns" as "arms," except in a humorous sense, which is certainly not intended by Google Translate here.

The humorous reading of the "laid down their arms" translation is excluded when you consider that (2) gets the same translation. In (2), roughly, the translation is accurate. The boys were robbing the house, so "arms" makes sense here. Translation must account for context, so the fact that Google Translate generates the same phrase in radically different contexts is simply Haugeland’s point about machine translation made afresh, in 2014.

Language Is (Bloody Well) About Minds

So all the Big Data available to Google — and who, these days, has more? — didn’t help much with a couple of measly examples from a philosopher in 1979. There are other issues with Big Data, but that is a discussion for another day.

Language, we see, really is about minds, as I reflected recently in another context, when I read Dubliners, by James Joyce. In the introduction to my Bantam Classics edition, the writer remarks that Joyce was once at loggerheads with an editor. (No! Really? Joyce?) This editor was stonewalling him on taking out a single phrase — "bloody well" — in "The Boarding House," one of the stories in the book.

"The word, the exact expression I have used," Joyce icily explained to his publisher, "is the one expression in the English language which can create on the reader the effect that I wish to create."

Indeed. There’s a little of Joyce’s pride and reverence for language in most of us. There is none of him — not one scintilla — in a computer program.

Photo credit: April Killingsworth/Flickr.