Our Infrastructure in 2014-q4 - Software
Our own journey into automated trading started some years ago and at this time we were doing the same thing many other people do, just instead of being a lonely developer, we started with a company and some people behind. At this point, we tried to save costs and decided to work with NinjaTrader. Our lost year in productivity, as we call it now, culminating into a month in which we finished 3 optimizations only – with 4 computers – due to a number of crappy code (from NinjaTrader) and really bad luck (in terms of system instability due to heat as well as repeated power outages).
At the end, we decided to go with our own infrastructure and after some not too productive prototyping we teamed up with some other traders who had already a working prototype. One which forms until today the foundation of our trading operation, even though most code has by now been replaced with better, more scalable code. This framework, called “Reflexo” is not a product, not available on the market, and likely never will be (it is a lot of work to make a product out of something someone uses in-house). Still we often get questions about our infrastructure, and there are a lot of discussions around where more or less informed people talk about a good setup, so a description of our own codebase and approach may be helpful at times to all those people who consider developing their own infrastructure. It may well serve as a reference or at least starting point for planning.
Let us start saying that our company is very well embraced in .NET. We develop commercial code in that for customers – not related to trading – and as such, C# was a natural first language choice for us. We add to that some C++ on time critical elements – mostly where we interface with third party code that is not providing a sensible .NET interface, like Nanex, our main data provider (which has a C based API that is a lot easier to work in in C++, hence the use of C++/CLI to expose the data to a .NET infrastructure.
We decided quite early to go with a SQL Server based data storage and a distributed architecture and we strictly split development and production into separate servers. Using a proper SQL installation (instead of an embedded system) has maintenance requirements, but if you later look at the hardware and sizes you will realize we blow any “I run it on my workstation” simply out of the water. Being SQL based means we can query with our data from multiple computers, and that is big advantage. Generally we keep all data in SQL with the exception of historical price information, which we put into binary delta coded files. Binary as in that they are not text, so not easily readable, and delta coded in that we often only provide delta, not full values – so they have to be read from the beginning to the end. Still, prices mostly do not change a lot, so delta coded we can say “new Ask, size 30 at 2 ticks higher than last item” and only use 1 byte for the tick. And size IS critical here – we run a lot of files with a lot of computers, so we often deliver many gigabit of historical data into our back testing computers (which naturally implies a 10 gigabit backbone – yes, expensive, but not the most expensive part).
Most analysis and presentation happens web based. While we do have some visualization tools locally – the goal is that anything that is interesting outside of direct development is web based. Not web based like “on the internet” but web based like “you open it in a browser”. The ability to send around hyperlinks between members of the team is a time saver. I often get links from one of our developers to look at a particular back test result and he can just copy/paste the URL from his web browser.
This is our job control where the back testing and optimization jobs are being scheduled. Yes, it is basic – and not even dynamic – but it allows us to see easily how things are going. Every job is split into tasks (one week of testing, a maximum of 128 parameter pairs) and as you can see there are quite some jobs in the pipeline and more than 450.000 tasks pending. I have cut the finished tables because they contain confidential information – namely the names of the strategies. We currently work with 188 cores (out of which 161 had work when the picture was taken – the others were likely getting work - and complete 176 tasks per minute (the velocity). At current speed there is work for 1 day and nearly 19 hours. There are 45 waiting jobs - that is typical for a weekend when developers are just queueing work for the next days into the system.
Analysis is then done web based. Here is the current version of our results overview, originally inspired by NinajTrader. We are constantly working on improving our analysis part – and more on that will come in another post.
There are a number of similar pages that provide standardized analysis down to individual trade information and we are always adding new information when we come up with good ideas that can be standardized.
Contrary to end user tools like NinjaTrader that like to be as easy to learn as possible, we like to be as productive as possible. Quite often this means saying “thanks, no thanks” to a nice user interface, instead using a PowerShell based command line interface. Not as nice? Well, it also means we can schedule a strategy for 20 instruments in one command (the list of instruments we use is hardcoded), we can do filtering on results, we repeat back tests and optimization without dealing with slow and clumsy windows.
For example, the following command lines are used to export data from our Nanex based historical archives into our own tick format:
$cluster = "grid-x"
$symbols = @( `
$source = \\..server..\..fileshare..\NxCoreData.Archive
$target = \\..server..\..fileshare..
Start-RxNanexTickImport -Cluster $cluster -FirstDate 2014-08-03 -LastDate 2014-08-30 -Tape 'MF' -Symbols $symbols -SourceArchive $source -TargetArchive $target –Confirm
I keep this command line ready (obviously with the server names and file shares) in a text file on my desktop and when I need more data or a re-export I just adjust it and copy/paste it into a Reflexo Shell. No need to search for all the options in a nice but clunky window. We do everything in the command line - including scheduling backtests. Wich means that when we get new data we just run a command line to add this data for all the more than 50 strategies that we have trading or ready for trading. It is a LOT faster than working with a graphcial user interface.
Our system is split into 2 different areas for the obvious licensing issues – strategy development is not production use and allows us the use of developer licenses from Microsoft (as per their definition). These are seriously cheap for the highest end (like USD 50 for a SQL Server Developer Edition, per developer, which normally would be a more than 100.000 USD enterprise license), while actual trading is productive use and uses the much less costly (free) Express edition. Which is only needed as actual trading deals with a lot less data than the development side. The same, btw. applies to hardware – at the end, the requirements for running trading logic are magnitudes smaller than during development. A rented physical server of small to medium size (nothing too small) is totally suitable, preferably with an SSD (for the database). Our trading agent again is web based – we do automated trading only and the web is suitable enough for controlling strategies (starting, stopping, watching positions).
All this naturally runs on significant hardware, which we will explain in the second part of this information update.