Six Chimpanzees and Supercomputers

MJG · October 2012

Hi Guys,

At one time or another I’m sure you have all heard of the Six Chimpanzees story.

The simple theme is that given a sufficiently long time, a group of six chimps seated behind typewriters could duplicate some fine work of literature. The expanded version of that hypothesis is that, if the time were infinite, these chimps would eventually reproduce all the books in the London main library. Using the law of large numbers, this postulate might well not be rejected by statistical analysis. Of course, since many trees will be sacrificed during this completely random process, much useless rubbish will be an unintended output. That’s the basic storyline.

A classic and well circulated story inspired by the chimp tale was composed by New York writer Russell Maloney. He elaborated on this thesis in his short story “Inflexible Logic”. Here is my approximate summary of that work.

Mr. Bainbridge, a wealthy New Yorker, overheard the Six Chimpanzees fantasy at a cocktail party. A man with some scientific interests, he decided to conduct a test of it. He secured six chimps and put them to work. Several weeks later, Bainbridge excitedly called a local professor of statistics for advice. He nervously showed the accumulated output from his chimps. It was all perfect; each chimp was precisely reproducing a famous book.

The professor was shocked by this highly unlikely outcome, but volunteered that the laws of probability allowed for unlikely random events like 100 coin tosses producing all “Head” outcomes, even for the initial 100 trials. He recommended giving the chimps and the Laws of Probability ample time to establish themselves and generate the expected garbage.

Weeks later, the chimps were still all typing flawless, masterwork prose.

To quickly get to the conclusion, the disheveled professor shot the chimps and accidentally shot Bainbridge, who had attempted to end the monkey massacre. With his last keystrokes, the last surviving chimp completed one final classic.

That’s an imaginative fictional story. If it was never featured on Rod Serling’s 1950s The Twilight Zone TV series, it should have been. Embedded in it is a scientific random event challenge.

I doubt very much that a group of chimpanzees could complete even a single masterpiece, much less a full library of classics within their lifetime, perhaps if that time horizon were stretched to infinity.

However, could a lightening fast, tireless supercomputer replace the chimp brigade and perform that task if programmed to select letters randomly? Naturally, just like with the chimps, a mountain of meaningless drivel would be an overwhelming output.

So, here is my less demanding proposal for a more modest and modern version of the famous Six Chimpanzees test.

Rather than an entire book, how long would it take to generate a single meaningful sentence? How about a paragraph that is logical and not poppycock? What are the timeframes for a single page, a chapter, a complete book? I would graciously rescind the constraint that it be a replica of a classic work.

My challenge is rooted to the proposition of generating a portion of a book, any book, not necessarily a masterpiece. The primary objective of my challenge is to produce such a product using random selection as the only mechanism; no formal organization or structure is permitted. Some software apparently presently exists to help struggling authors. A few futurists are optimistic with regard to the more inventive and innovative assignment. I’m less sanguine. But here is a Link to one confident opinion and prediction:

http://voices.yahoo.com/fully-automated-book-writing-computers-write-5441977.html

I am not aware if my specific challenge is being addressed. Please comment if you have any insights. Has this type of experiment ever been attempted and documented?

I suppose I have too much thinking time on my hands. Computer book writing is still fantasyland from my narrow perspective; but perhaps, in another arena, supercomputers could practically augment expert forecasting opinion and judgment. Now that’s a more potentially plausible and fruitful line of attack.

Remember that IBM’s Deep Blue computer bested Grandmaster Garry Kasparov in a 1996 challenge chess match series. The computer essentially befuddled and intimidated the reigning champion with a random move after the machine evaluated its position as a losing one in Game Two of that series. Kasparov never recovered, losing the six-game contest.

Numerous studies demonstrate that expert decisions and forecasts are almost always inferior to those made by computer analytical models. Statistics professor George E. P. Box cautioned that “Essentially all models are wrong, but some are useful”. Performance scoring studies of experts suggest that even imperfect models record superior forecasts to those registered by wizards who were rated as experts.

University of California professor Phil Tetlock has collected data on expert forecasting for over two decades. Tetlock collected tens of thousands of expert predictions. He concludes that knowing a little improves forecasting accuracy; knowing a lot damages that forecasting skill. Unlike the Pope, experts are not infallible. In fact, they’re more fallible than an informed amateur.

Tetlock summarized his findings by identifying two groups of forecasters: Hedgehogs and Foxes. The Hedgehogs are characterized as highly specialized with a single Big Idea; Foxes are characterized as non-specialized with many small ideas.

In his final rankings of increasingly accurate predictions, Phil Tetlock rated Hedgehogs below Chimps (purely random forecasters); Foxes scored higher than Chimps; Extrapolation analytical models outperformed Foxes, and Auto-regression formulaic models outdid Extrapolation methods. In a sense, this outcome is yet another vindication of Occum’s Razor: Simple approaches usually bear more fruit than more complex methods.

Phil Tetlock originally published his research results about 15 years ago. His work is widely acclaimed. It is both unexpected and revealing.

Here is a Link to a New Yorker magazine review of Phil Tetlock’s two decade long research study that explored the forecasting record of experts:

http://www.newyorker.com/archive/2005/12/05/051205crbo_books1

The article, written by Louis Menard in 2006, is excellent. I highly recommend that you access it. You will be delighted by its findings. Menard’s ending lesson is to “think for yourself”. That’s good advice forever.

I am much encouraged by Tetlock’s primary conclusion. From an informed amateur investor’s perspective, it suggests that we non-specialists (Foxes) have a high likelihood of outperforming the super specialists (Hedgehogs) when making uncertain market forecasts and decisions.

That’s the good news side of the story. The sad other side is that it is even more likely that an emotionally neutral simple formula, implemented on a tireless and error-free computer, will outperform us Foxes. It is a challenging task for both Hedgehogs and Foxes alike to fully overcome our wealth sapping biases. Human frailties are a tough nut to crack. But we can try.

When properly organized and executed, the wisdom of the crowds potentially enhances decision performance over even informed Foxes. Maybe that’s what MFO is all about.

Your comments are always appreciated, welcomed, respected, and honestly considered.

Best Regards.

Charles · October 2012

Hi MJG.

You are about 40 kft over my head with this thread.

I am fascinated by your paragraph: "Numerous studies demonstrate that expert decisions and forecasts are almost always inferior to those made by computer analytical models. Statistics professor George E. P. Box cautioned that “Essentially all models are wrong, but some are useful”. Performance scoring studies of experts suggest that even imperfect models record superior forecasts to those registered by wizards who were rated as experts."

So, let me ask you this: Do you believe the financial future can be predicted based upon correlations of past events?

BannedfromBogleheads · October 2012

"Hedgehogs below Chimps (purely random forecasters); Foxes scored higher than Chimps; Extrapolation analytical models outperformed Foxes, and Auto-regression formulaic models outdid Extrapolation methods. In a sense, this outcome is yet another vindication of Occum’s Razor: Simple approaches usually bear more fruit than more complex methods."

As it says in the article "[hedgehogs] value parsimony, the simpler solution over the more complex" because they try to fit the data to a model with fewer degrees of freedom. So the methods in order of simplicity are: Hedgehog, Extrapolation, Auto-regression, Fox...which means the scores are entirely contrary to Occam's Razor.

And why the contrary result? It's because Tetlock, like most non-statisticians, doesn't have the foggiest idea how to correctly design a study or interpret the data. For example, one of the tests he uses in assessing how "good" an expert is is to ask "did all the events they said had an x per cent chance of happening happen x per cent of the time?", but this standard is wrong unless the events are independent (which is why the experts give answers that DON'T agree with this standard because they're smart enough to know that the events are NOT independent and should, therefore, satisfy a different criteria). And he makes this same mistake over and over by assuming independence to erroneously refute human decision makers that unlike, Tetlock, know intuitively that the real probabilities are dependent.

You can't mark the exam unless you have the answer key and the psychologists who are so quick to declare behavioral "errors" simply do not have it. The hedgehogs who are "wrong for the right reasons" are right when it counts and wrong when it doesn't whereas "researchers" like Tetlock would rather distort the notion of "scientific impartiality" to the point of being equanimously right when it doesn't count and wrong when it does.

Like a GPS that'll give you 100% correct directions right into a brick wall, correct decision making is all about context and scope which are the sort of things that are just as difficult to devise a formal study about as they are to solve with a simple formula or auto-regression. Occam's Razor is correct that simplicity is paramount within a given context, but context and scope are king.

MJG · October 2012

Reply to @Charles:

Hi Charles.

Thanks for wading through my post. Admittedly, it is not the most logical or clearest that I’ve submitted. I composed it while in a contemplative haze. That haze found its way into my text. Sorry about that.

I do not believe that the financial future can be predicted using correlations or equations that fit the historical record.

I do believe that computer modeling can help identify the range of possible outcomes, Black Swans notwithstanding, to guide investment decisions. I firmly believe that a human must always be in the loop to monitor and interpret any computer generated projection. Like professor Box noted that “Essentially all models are wrong, but some are useful”. The human in the loop is a necessary component to make that critical judgment.

Deep in the recesses of my mind is a latent fear of HAL, the supercomputer in Stanley Kubrick and Arthur Clarke’s “2001: A Space Odyssey”.

Best Wishes.

MJG · October 2012

Reply to @BannedfromBogleheads:

Hi BannedfromBogleheads,

Thanks for your post. As promised, I appreciated it for your effort, welcomed it for its divergent viewpoint, respected it for its carefully crafted positions, and am honestly responding to it.

I agree with many of your observations and positions; I take issue with a few of your sweeping conclusions.

You seem to harbor a deep resentment for Professor Tetlock specifically, and for the Psychiatry profession in general. I share many of your reservations, but don’t totally assign them to scientific purgatory. For example, their professional Bible, the Diagnostic and Statistical Manuel of Mental Disorders (DSM), is a vague tool subject to numerous, divergent interpretations. To the professions credit, the DSM has been revised and updated a number of times.

I too am somewhat suspicious of the statistical competency of the Psychiatric cohort. But that criticism is likely valid for many professions, including finance. I suspect Psychiatry researchers understand their shortcomings and limitations. Decision making is a complex process that demands a multi-disciplinary approach. Organizations and individuals are more fully recognizing this, and are organizing and structuring their projects to reflect that insight.

Note how the team approach is now the more common design when doing and reporting scientific studies.

Phil Tetlock did his work independently over a twenty year period and collected tens of thousands of forecasts from nearly 300 experts. That’s a huge pile of data that requires a valid statistical analysis to summarize properly. I do not know if he is qualified to perform that task or outsourced it. It would be a cardinal sin if that task were not completed with rigorous statistical methodology.

Tetlock continues his efforts on expert judgment. He heads a team of researchers from both the University of California Berkeley and the University of Pennsylvania (his current home base) that will explore the coupled arts of forecasting and decision making. It is a four year government funded competition. The goal of the project is to harness the wisdom of the crowd. In this instance, Tetlock will surely benefit from a rigid statistical discipline since one of his team members is David Scott, a statistics professor from Rice University.

My claim that informed amateurs (Foxes) are better characterized by Occum’s Razor than experts (Hedgehogs) is based on my belief that, in their area of expertise, experts have a deeper understanding and more nuanced interpretation of those parameters that govern a situation than an amateur might have at his command. In that sense, I rated Hedgehogs lower on an Occum’s Razor scale. He has the tools and the capability to be very complex in his analysis; too much analysis could be a serious hazard when making forecasts.

Tetlock’s data showed that Hedgehogs tended to express themselves with more absolute certainty. They tended to guesstimate probabilities at the extreme 0 % and 100 % levels. Their final predictions more closely complied with Occum’s Razor, but their analyses were not necessarily so. I appreciate my assessment might be controversial, but it is not a black or white situation.

I’m sure, the expert wizards did not think kindly of Tetlock’s fundamental findings. But it is consistent with many other such research. In MFO postings, I frequently reference CXO Advisory Group’s Guru scoring. CXO reports that, on average, financial and investment wizards struggle to achieve the record of a dart throwing monkey (a random event). Tetlock’s research is consistent with CXO’s scoring of investment pundits.

Placing folks into one of only two groupings (Hedgehog or Fox) is too simplistic to truly capture an individual’s complex decision making process. Most of us rank somewhere in the spectrum between Hedgehog and Fox. We change position on that spectrum depending on circumstance and situation.

Albert Einstein was likely a Hedgehog, but even he deviated from that specific behavior. One of his many famous sayings is “Things should be made as simple as possible, but not simpler”. That’s a wise aphorism, but difficult to implement consistently.

How many parameters are needed to fully capture a phenomenon in a model for predictive purposes is not an easy chore. With an increased number of parameters the prospects of enhanced accuracy is improved, but the danger of data mining becomes more likely. Delicate compromises are usually needed to avoid over-specification.

I certainly agree with you that independence is a critical requirement, and often not properly addressed. Very few experts predicted the early demise of the USSR. But once that event happened, the likelihood of a cascading impact was substantially increased. Probability estimates should be constantly revised. Bayesian analysis with its conditional probabilities is a useful tool that needs to be more universally applied to forecasts. Updating is mandatory and context is always important.

Once again, thank you for your thoughtful submittal. It greatly expanded this discussion tree.

If you feel up to it, please tell us about your being banned from the Boglehead site. I’m sure it is a fascinating and informative story.

Best Wishes.

Howdy, Stranger!

Categories

In this Discussion

Support MFO

Donate through PayPal

Six Chimpanzees and Supercomputers

Comments