The Usefulness and Uselessness of Backtests

ASYMMETRY® Glossary

The Usefulness and Uselessness of Backtests

Backtesting — simulating how a trading strategy would have performed on historical data — is both the most important tool in systematic investment research and one of the most commonly misused. Used correctly, backtests provide essential evidence about the potential viability of a strategy and the range of historical outcomes an investor might experience. Used incorrectly — as a marketing tool, a data-mining exercise, or a false guarantee of future performance — backtests mislead investors and create dangerously misplaced confidence.

Where Backtests Are Useful

Backtests provide genuine value in several ways. They allow researchers to test whether a hypothesized return driver has historical support — does the signal actually precede favorable returns in the data, or not? They reveal the historical distribution of outcomes: maximum drawdown, Sharpe ratio, win rate, and the typical period of underperformance before the strategy recovers. They provide a baseline against which live performance can be compared, enabling identification of when live results diverge suspiciously from the historical distribution. And they allow testing of risk management rules to ensure they would have meaningfully reduced drawdowns in historical adverse scenarios.

Where Backtests Are Useless or Misleading

Backtests are useless or actively misleading when they are the product of data mining: testing hundreds of parameter combinations and presenting only the best-performing result without accounting for the multiple comparison problem. Backtests conducted on the same data used to develop the strategy reflect in-sample overfitting rather than genuine predictive power. Backtests that ignore realistic transaction costs, slippage, and market impact overstate achievable returns. And backtests based on data from a single market regime may fail completely in a different regime.

The Gold Standard: Out-of-Sample Testing

The most meaningful form of backtesting uses true out-of-sample data — either genuine walk-forward analysis (testing each period on data not seen by the model during development) or evaluation on a completely separate market or dataset. Strategies that perform consistently in multiple out-of-sample tests across different markets and time periods have a significantly stronger claim to genuine predictive power than those whose results are based entirely on in-sample optimization.