Indicator Failure and scientific analysis

Just about every indicator fails some of the time. In fact, just about every indicator fails about half of the time. And to add insult to injury, an indicator’s optimum parameter value changes over time.

Because indicators are so unreliable, traders typically use two or more in a confirmation approach and hope together they overcome their individual unreliability. But if neither is valid in the first place, using two wrong indicators won’t make one right trading decision.

To overcome that problem, we often use fundamentals — but these can consist of stories we believe (or don’t) for all the wrong reasons.

On the first trading day of 2007, the euro gapped higher; the entire bar was above the most recent highest high in the downswing. According to market lore, we should always buy into an opening up gap, especially one that forms a breakout over a previous highest high in a corrective down move, because it usually means the correction is over.

The first week of the year has a pretty good track record for delivering a major trend change, too — or so traders were chattering about at yearend. The first trading days in 2005 brought a true trend reversal that ended a euro rally. The first trading days in 2006 reversed a trend, too — to a new euro uptrend.

As January 2007 was approaching, analysts and traders were making noise about a continuation of the “Thanksgiving rally” based on the outlook for European Central Bank rate hikes. Thus there were technical reasons (the gap and breakout) to think the correction was over, plus the fundamental story, topped off with a dollop of historical precedent.

All this “evidence” was wrong, though. Price collapsed the very next day, filling the up gap, and the push above the previous intermediate high turned out to be a false breakout.

An examination of the previous seven years of new-year price moves should have told us that a temporary spike was more likely than any other outcome. It shows only two years (granted, the most recent two years) of seven produced true trend reversals. Why did we heed the story that 2007 would be a repeat of January 2005, when the euro started a new rally?

Because traders and analysts are addicted to stories. In fact, according to David Aronson in his splendid book Evidence-Based Technical Analysis (2006, John Wiley & Sons), nearly all of what we do in trading is based on stories. It’s certainly not based on science. Our indicators are not scientific in the sense that when rigorously defined and fully back tested, they outperform well over half the time. This means good performance is a function of luck rather than the use of a scientific tool.

Aronson demonstrates that when you remove directional bias from back-tests (e.g., the market is going up and you are testing an indicator that is good at identifying an up move), a whole slew of standard technical indicators simply do not work.

It’s even worse for “market-lore” indicators such as chart patterns, which Aronson calls “subjective.” They are so loosely defined that when they seem to work, it can be a function of seeing what we want to see; we tend to ignore the pattern’s instances that don’t work.

Aronson cites research on the head-and-shoulders pattern, which is a bust when applied to stocks (although modestly successful in currencies). If we are truly using the scientific method, we would either dump all the subjective market-lore tools or refine them in such a way that they could be scientifically tested. Even then, a hypothesis is only provisionally “proved;” future observation and tests can invalidate any theory, just as Newtonian physics gave way to Einstein. Besides, a hypothesis can be proven but it has to pass tests of “statistical inference” so we know how much confidence to apply to it.

Aronson is right to say most of what passes for established technical analysis is not founded on anything that can be proven using the scientific method. We need objective, replicable evidence that a technique captures non-random price movement, and does it reliably and for a return higher than the risk-free alternative. Aronson tested 400 phony rules of zero predictive power and, by running them through massive amounts of price data (the S&P 500 going back to 1928), still found a rule that returned 48 percent annually. He also tested 6,400 technical analysis rules on the S&P 500 and found that not a single one had statistical significance.

That puts the euro gap and breakout into perspective. We do not actually know how many times an opening gap up fails or succeeds, or how many times a break over a previous high or low becomes a “false breakout.” Think of all the qualifying criteria you would have to define to test the ideas. To apply these two ideas willy-nilly with hard cash is not trading with a technical edge — it’s throwing dice.

Listening to chatter about the euro at the year-end changeover is a shaming experience in the context of real science, too. This is a simplified version of what actually happened, but we have no evidence that market players were using anything even close to a scientific approach. In fact, the baseless talk about a big trend change could have been concocted by one big player to lure the unwary into a long euro position, whereupon the player would hand the sucker his head.

Calling the story “historical evidence” is closer to superstition than anything else. With a sample size of seven, it’s certainly not “science.” You wouldn’t flip a coin seven times and upon getting heads two times, conclude that the probability of heads on the next flip is 28 percent. Similarly, you wouldn’t engage in the gambler’s fallacy of saying the coin is “due” for tails. The probability is 50 percent, although it could take thousands of actual flips for a 50-50 heads-tails ratio.

Can we at least say that at the crossover from one year to the next we should expect a bigger-than-normal daily move? Probably not. Three of the seven years had an average daily move below normal, if we loosely (unscientifically) define normal as about 120-140 points. In fact, we should not draw any inference from the seven years of data. The sample is simply too small, and using too small a sample is the “crime of small numbers” which leads to errors of judgment such as the gambler’s fallacy and another error Aronson describes as the “clustering llusion,” or the misperception of non-randomness in data that is, by objective standards, actually random.

We simply don’t know what we think we know. The study of how we process information and make decisions is cognitive psychology, and it’s full of landmines. A lot of what we think we know is simply wrong, and psychologists have spent much time figuring out how we come to believe stuff that is demonstrably invalid, or at least unproven. One cause is the overconfidence bias, where we think we are more knowledgeable than we really are, even when outcomes indicate we should be more modest. (Overconfidence does not seem to plague weather forecasters and horse-race handicappers, who get hard and unfalsifiable feedback every day.)

Another landmine is the confirmation bias, occurring when we accept evidence that confirms a previously held belief (“the euro will move up a lot on the first day of the trading year”) but ignore or underweight hard evidence that contradicts it (moves are not necessarily big and not necessarily upward).

“Anchoring” is another cognitive error. Anchoring is sticking to an early estimate even when later evidence indicates the estimate is wrong — that is, underweighting the new evidence. For example, if you ask, “How long is the Mississippi River?” and give respondents an anchor of 800 miles, they know it’s more than that, but slant their estimate off the 800 miles. If you provide a 5,000-mile anchor, they know it’s less but slant their estimate closer to 5,000.

This effect occurs all the time in currencies. Round numbers are psychologically important, as a Fed study in the late 90s showed. Round numbers and clusters near round numbers occur more often than chance would allow. This occurs everywhere but in the yen perhaps more than most markets not only round numbers but round numbers progressing by 10s (80, 90, 100, 110, 120, and so on).

Another instance is Fibonacci retracement levels. They may occur infrequently, but we still calculate them and pay attention when a price is nearing one of them. Price momentum often pauses near those levels — presumably because many people are waiting to see if the level is relevant. In mid December, the euro stopped at the 38-percent retracement level and moved up, establishing a confirmation bias that the euro correction was ending and the euro “should” rise over the previous highest high. The bounce off the Fibonacci level almost certainly had something to do with the apparent new-year breakout; it was consistent with the established anchor.

However, the euro failed to rally to the previous high, and since then it has retraced further to the 62-percent Fibonacci level. Even though we have no scientific evidence Fibonacci levels are a more likely stopping point than any other random level, we notice them all the same. We have to work hard to avoid this superstition.

Willingness to observe a silly factor based on an unproven story could be a form of herd behavior. If x number of forex commentators are talking about retracement levels, we might feel left out if we don’t observe them, too. This is similar to a physician treating a fever as a result of pneumonia, and perhaps a little dissatisfaction by the spirits — take penicillin and dance around a chicken bone three times.

Aronson points out that we need to change our way of thinking if we are to progress in technical analysis. Note that over the past 30 years, the application of technical analysis in the forex market has shifted from a procession of math-based indicators to subjective pattern-reading (as math based indicators proved themselves not up to the job).

Subjective pattern-reading may allow for success based on talent, but it’s farther removed from anything “scientific.” We need to become more ruthless in differentiating when a story is appealing more to our tendency for cognitive error than to our genuine reasoning abilities.

Indicator Failure and scientific analysis

Categories