Analysing a backtest - 6 critical things to look for

A backtest is only the beginning of strategy development. Backtest results must be carefully validated to make sure they are realistic. Not every big profit shown in a backtest is realistic. There is a standard approach we have developed that validates whether or not a backtest can be trusted.

Start by getting the expected PNL – not the traded one

This is a simple validation and uses the approach describes at our post of how to calculate the statistically expected pnl from a backtest result to come to an expected PNL. The linked article has the theory behind it, but at the end we assume the backtest is only a possible result and this is a very fast method to check what we can really expect. Bid discrepancies – mostly coming from a small number of losses, which indicates a too small sample size or curve fitting – area clear warning sign.

This step happens automatically in our framework because this analysis is right there on the backtest results page. Most retail packages are severely limited in their in depth analysis possibilities do the only way is to export all trades into for example excel and work with this exported data.

Make sure all trades shown in the backtest are valid

Just because a backtest shows a profit or a loss does not mean anything – unless the result is the result of intended trades, and not a program or data error. This requires using a manual trading approach and taking a significant sample size or two (for example 2 three month periods) and make sure every trade entry and exit makes logical sense.

There are two reasons for this. First, there simply may be a programming error. Errors happening, every programmer makes an error here and there. But if a strategy has an error, then the backtest result is either random or the programmer accidentally found a nice trading approach. But – this should be investigated. And making sure the trades you see are the trades you wanted to take is a critical step. In general we do not care about numbers unless we know the trades are correct. Technical validity is very critical.

If possible such a test should also include the manipulation or orders. When where target orders entered and modified? Did the trailing stop get moves according to the rules? It is easy to have a bug here. A good test also is to visualize the number of trades per day or per week on a calendar and look whether trades really happen when they should (i.e. on all days). And holes should be investigated. Yes, sometimes programming errors for example put a strategy into a state where no further trades happen (until the rest of the week). We had that once – a strategy that was very low on trades. It made roughly 2-3 per week…. And we found then out we rarely had a trade after Monday. Happens the strategy had bad code and got stuck…

Make sure your backtest follow your risk parameters

Day traders generally do not want to stay in a position between market sessions – this causes trouble. With their broker, with their clearing partner. An automatic check whether or not a position is open at a market opening time is critical. A big profit, from a position not closed on a Friday (which by accident had reduced trading hours the strategy was not aware of) is not a big profit – it is a total failure of the strategy and risk control because of bad data.

Once trade data has been loaded into excel – or a file – it is quite easy to make a check whether trades span over sessions or holidays or weekends. Because we only make day trading and because for example our infrastructure demands trades to be closed on weekends (otherwise we could not parallelize our backtests and optimizations) it is considered an error to close a week with an open position. IN our case our infrastructure will throw an error if a strategy fails to close.

One should not and never rely on automatic closing systems such as NinjaTrader provides – for two reasons. One, at least in the current version they are idiotic, always closing at the same time – while the markets occasionally have a shorter trading session. And the result is a not closed trade. Second, because they do a market exit. A strategy can do better, making a limit exit. And third, only a calendar aware strategy will not open a position some minutes before market close.

Investigate all the biggest winners and losers in a backtest

The next and final test is to check the 5% of biggest winners and 5% of biggest losers. Make sure they are not outliers and legal. Special focus on loosing trades that loose more than the allowed maximum risk – yes, market jumps DO happen, but make sure they did and it was not just a bad program, failing stop etc.

Think about the win to cost ratio of the backtest

The next thing is to look at the strategy efficiency. It is quite easy to make a profitable strategy – but you need a solid profit. If the net profit per trade is very low – especially compared to the costs – then you have a very unstable result. This means that the slightest variation in cost – or a small inaccuracy in the strategy – will be deadly and turn a profit into a loss.

As an example – a strategy that is trading for a 1 tick profit in ES has a high cost ratio. You can get a round trip below 4 USD – so we assume 5 USD for a trade (which also covers the occasional little slip). Sadly a 12.5 USD Profit with a 5 USD cost is not a lot of profit left and if you occasionally get out at a loss or a scratch (a trade with 0 gross profit, exit and enter at the same time) the costs are still there. This means that even if this strategy shows a profit, it is a VERY unstable result that is unlikely to be reproducible. Not only does it need a terrific high success rate to make a total net profit, it also is extremely reliant on the cost not going up and the simulation being super accurate.

Get the same backtest result with 1 additional tick and then 2 additional tick slippage and see how the numbers change. A solid strategy should survive this – a strategy that makes below 2 ticks net profit per trade is one that may be suited for HFT but it is one that is extremely hard to backtest correctly

Consider the stability of the trades in the backtest

As a last fast check, let us have a look at the profit stability. A good strategy will be consistent. Real consistency is very hard to achieve – and mostly a scalping / very short term trade characteristics, though you can do that in stocks also under certain conditions (which mostly focus on trading the strategy on many stocks, but not all of the m at the same time). You want consistent profits. The Sharpe ratio is a good number to assume here (put in 0 as risk free profit, a lot of the formula then can be simplified). Alternatively take the average profit per period (for example a week) and divide it by the standard deviation of the profit in the period. This is an easier formula and has a similar indication. Graphically a good strategy will have a PNL curve in the backtest from the lower left to the upper right, as straight as possible. Losses – which are an integral part of trading – will be controlled and distributed properly, and not out of hand with big outliers and clustered together.

Only a validated backtest can be trusted

Strategy development is programming. A backtest is a trial runs. Programming means a lot of testing. This simply is what programmers do when they work on complex decision making systems. An inexperienced strategy developer often focuses on the numbers in the backtest, but totally ignored whether they are correct to start with. Although this is the first step.

After that, once it is validated that the backtest actually trades what it should, the numbers must be looked at. Total profit is meaningless – unless it is a stable profit that also makes sense from the internal distribution of profits, losses and costs.

All this is not curve fitting or over optimization – it is analysing a backtest first for technical validity, then for trading common sense. Do not forget these steps – they can save you a lot of money.