Backtesting on a GPU - performance and a lot of problems

The GPU – Graphical Processing Unit – is without any doubt the most powerful processing unit you can put into a Computer. It would be wonderful to use it for backtesting, especially if you are willing to invest into the funds to buy a high end GPU – and a computer than can house multiple of them. Putting 4 high end GPU into a computer case is exactly no problem in a specialized (gaming optimized) computers. Yes, such setups are standard for a small amount of enthusiast gamers.

In number of cores – the GPU wins.

The power of a GPU is coming not from the speed of its processing cores. Far from this. The cores on a GPU are not particularly powerful. They can do a lot of special operations, though, similar to SIMD instructions on modern processors. Like multiplying a matrix of values in one operations. The main advantage of a GPU is that a GPU has a big number of cores. Thinking of the new Intel E5 Xeon processor that is coming these days? 18 cores – for around 4700 USD. With hyperthreading that is 36 hardware cores. It sounds impressive as a number. It is – for a general processor. Just a modern high end Graphics card (like the AMD R9 290X – available for around 650 USD at the moment) has 2816 independent stream units. Even at only 10% performance of a CPU core – this is destroying the CPU in terms of processing performance. This comes combined with 4GB of high end RAM that – while not as fast as the caches in the processor – is a lot more performant than any RAM in even a high end server.

In flexibility – the GPU is equal

Programming such a GPU for mathematical calculations is possible. Since some time they have evolved into quite versatile processing units, with standards like OpenCL (Open Computing Language). As such, while it requires special programming, a GPU is quite capable of handling the core tasks of calculating indicators and running an exchange simulator. It may not be inherently created for this, but a GPU can run a good backtest. This is 2014 – here are many simulations running on a GPU. And I know quite a lot of traders using a GPU driven architecture to calculate risk management parameters, for example.

The problem is – the architecture

The main problem when using a GPU for backtesting is architecture. In order to run on a GPU, all code must be written to run on the GPU. There is a high latency to move information into and out of the graphics card, so processing must happen in batches. But it means that one has to write a simulator to run on the GPU. Followed by the complete framework code to run on the GPU, to be able to run a backtest. Followed by programming the strategy in a compatible language, to run a backtest on the GPU. And then, one has to write a second framework for trading real money, and write the strategy again to not make a backtest on the GPU. But to trade real money. And this means that the backtest runs on a separate system than the actual trading. Writing the strategy two times may introduce bugs, or it means writing and defining a meta-language that can compile to – as example – OpenCL source code and .NET bytecode. Which has to be free of bugs.

It is not feasible to run trading on the GPU itself. In HFT – the delay of loading and retrieving every tick over to the GPU is not exactly nice to performance, and when not running HFT the cost of hosting a GPU are significant. Lower end and not custom build servers cannot handle a GPU, and as a result cannot use a GPU for trading. They simply lack the slot and power connections for a capable graphics card.

AMD is (trying to) solve this problem with the APU units. These special processors combine some (not very many) GPU units on the CPU – integrated. This is a nice approach and may be feasible. One can then run the backtest on the GPU and the trading happens on the APU. An APU is optimized for access speed and these special scenarios. Right now they are only (lower power, compared to a “real” GPU) graphics units in end user chips, but these are good enough for trading. More information about the AMD APU approach, from a programmer’s perspective can be found at this PDF from AMD

Development Tools – the GPU looses

NetTecture (the company behind is a IT consulting company (yes, you can hire us). We maintain our own framework – the Reflexo Trading Framework – written mostly in .NET. This means we use Visual Studio. Which is generations ahead in terms of functionality as development environment to any GPU based toolset. It even integrated GPU profiling in the latest versions. But the GPU part is still a toy compared to the functionality available in the “more native” development side – either C++ or .NET bytecode. The importance of professional development tools cannot be underestimated. Especially when you talk not only of a strategy but plan to move core code (the exchange simulator) into the GPU. Time, is an old saying, is money. Running a backtest on the GPU is going to save a lot of time in backtesting and optimization. But it will create additional burden for the developers. It will lose time there. Yes, this may change over time – and an AMD APU would work nicely for a combined approach when available in a more powerful version. But today, here on planet earth, the tooling support for a GPU is not on the same level.

Simulation Packages do it – but are they proper backtests?

Mathlab – the famous Mathlab environment –can run simulations on the GPU. Anyone interested can read up on how they integrate the GPU for example at this page

But there is a big difference between running a complete exchange simulation with tick by tick event backtest and a more simple mathematical formula based for example bar simulator. Mathlab may be an alternative for those inclined to use it in their backtesting – but, unless trading is done in Mathlab, this means implementing the strategy on another platform for real trading. Which goes back to development costs.

At the end – the GPU loses to development costs

GPU programming is a rare skill. With the exception of the AMD APU approach, a GPU is not feasible for trading, especially in latency sensitive areas. Then there is a selection of bad choices. If I want to run a backtest on a GPU, I have to either create my own language (to generate OpenCL or native code), or accept higher latencies or limit my chip selection. I then have to program the whole backtest container and all indicator code on the GPU, with inferior development tools. This will take more time. It is a lot harder to debug. The alternative is to program strategies twice, with all the possible errors this brings – and both versions must be maintained as strategies will take more development while in use. This is a selection of very hard choices. Running a backtest on the GPU means making a lot of bad compromises. At the end, it is easier to just throw a nice old generic processor at the backtest and program a grid approach and have a lot less problems.

And this is why very few people use a GPU to run a backtest.