Trading algo: how to interpret resuts

Back to Community

posted

Hi guys,

The "getting started with futures" lesson has a complete implementation of a pair trading algo. I'm attaching the algo here for convenience.

Let me give you some context in case you don't remember this specific example.

The idea is that some commodities' prices should be related. In this case they picked crude oil (CL) and gasoline (XB). If they are indeed related then we expect that a price difference should mean revert: if the short moving average is larger than the long moving average we should short the difference, and vice-versa.

To implement the above idea they define variables short_ma = 5 and long_ma = 65 and then compute a zscore. When the zscore is larger (or smaller) than 1.0 (or -1.0) that's a signal to short (or long) the difference between the CL and XB futures. They also define exit signals, which are triggered in the opposite direction when the zscore is 0.0.

Now lets discuss results and how to interpret them.

Lets pick a time period to discuss specific numbers: from 2016-01-01 to 2017-09-29.

If you run the algo on that time period as is it, you get total return 34.27% and sharpe ratio 2.16.

Lets play a bit with the entry signal.

If you set short_ma = 2 you get total return -7.71% and sharpe -0.47.
If you set short_ma = 3 you get total return 6.02% and sharpe 0.46.
If you set short_ma = 4 you get total return 20.5% and sharpe 1.45.
If you set short_ma = 6 you get total return 7.44% and sharpe 0.56.
If you set short_ma = 7 you get total return 16.92% and sharpe 1.20.
If you set short_ma = 8 you get total return 2.52% and sharpe 0.22.

Now, I understand that there's other things that we can play with other than short_ma. We could change long_ma, we could try to use a different signal than zscore, or we could change the exit signal etc etc. There's many different detailed ways in which to implement a specific idea.

But how to optimise isn't the point of this post. What I'd like to discuss is: How much optimisation is too much optimisation? Here for example the idea is sound, but the results seem way too sensitive to tiny changes.

This is important if you want to go and find other viable pairs. I've tried dozens of other highly correlated pairs and couldn't get good results with short_ma = 5. Is it that for some pairs the idea doesn't work at all (even if the pair is correlated), or could it be that short_ma = 6 or 7 would yield better results? And if it does, how can you detect if a good result for a specific free parameter is a fluke? If you try enough signals with enough assets, it will happen sooner or later.

14 responses

Ilija Ilievski

We have this problem in machine learning as well. I will share our solution for it:
You have 3 sets of data:

Data you fit/train your model on, called train data in ML,
Data you use to test and optimize your model free-parameters (also called hyperparameters), called validation data in ML,
Data you evaluate your final model (called test set in ML).

If the parameters perform well on the validation set and on the test set then they are good, if only on the validation set then you have overfitted the validation set.

João Aparício

Hey @Illjia, thanks for coming back about this. You're right, of course, however in this particular case there was a bigger issue here: that notebook had a critical bug. Have a look here.

TWI_proptrader

Sharpe of 3 using RBOB Gasoline and Natural Gas futures

João Aparício

Hey Frank, several observations.

First you have a bug in line 71. It should be zscore = (np.mean(spreads[-context.short_ma:]) - np.mean(spreads)) / np.std(spreads, ddof=1)
Second if you set date starting at 2014-01-01 the algorithm has a sharpe of 0.45, so clearly you just got good results on a lucky period of time. I wouldn't trade that into the future.

Tony Morland

This is probably only a minor point in the big scheme of things, but might perhaps help to lead someone on to other bigger ideas.

As Joao writes: "The idea is that some commodities' prices should be related. In this case they picked crude oil (CL) and gasoline (XB)".

In this case the relationship is obvious, as one commodity is simply a refined product obtained from the other. Similarly for all the other petroleum refinery products, and similarly for Soybeans, Bean Meal and Bean Oil. However the relationship between CL & NatGas is not so straightforward. Even though they are both hydrocarbons and are often (but not always) produced together, there is a major difference in how they are transported and sold. As a liquid and relatively easy to transport, oil can be sold on the spot or forward market anywhere in the world at any time, and there is generally a very short delay between production & sale. Gas on the other hand is difficult and expensive to transport because of its low density and therefore low value vs volume compared to oil. Gas is sold in one of 2 ways. Mostly locally via pipeline from the production site (wells) to the place of demand (cities). Usually these are in the same country and so, unlike oil, US Nat Gas is NOT an international commodity but strictly a local one, in the sense of "local" to the USA. This is also true of nat gas in most other countries, with the exception of parts of Europe, where a significant proportion of the gas used in some European countries comes by pipeline from Russia. So the price of pipeline gas in general is very much a "local" price only. The significant price differences of Nat Gas around the world generally do NOT provide arbitrage opportunities. The other way that gas is sold is as Liquified Natural Gas (LNG). This is relatively easily transportable compared to gas in the gas phase but the ships required are special ones, not conventional tankers. Unlike oil, the infrastructure required for handling LNG is large & expensive and so, even though LNG can be transported internationally, it is usually sold on the basis of very long-term contracts rather than on the spot market. So, the result of this is that, within the energy futures group, Nat Gas is very much the odd one out as it is a "local, mostly US only" commodity, whereas the others are truly international commodities.

Now, with the exception of NatGas, we also have another link between all the physical deliverable commodities, which is that their prices are denominated in US dollars. To those of you who live in the USA, this might seem like a "huh? of course , so what?" type of comment, but to anyone outside of the US there is a very obvious link between all physically deliverable commodities and that link is the exchange rate of the USD vs their own local currency. This leads to some interesting (and possibly unexpected) relationships between the prices of commodities that are apparently completely unrelated, for example wheat and silver.

Most people who live in the US do not realize what a huge advantage they have with the USD being the base currency for most commodity transactions, especially oil. Various countries, both producers & consumers, have tried to move the pricing of oil away from the USD to a basket of other currencies. This has in fact been done to some extent with other commodities, but is met by a lot of resistance from the USA when anyone tries it with oil. To those in the oil & gas exploration & production (O&G E&P or "upstream" oil industry) it was always very obvious that the war with Iraq was ONLY about oil and an excuse to invade & control it, while the whole lot of stuff about (non-existent) WMD was just a trumped-up nonsense. Most people realize this now, but what is not so well known is that the decision to invade Iraq came only ONE WEEK after Iraq had decided to sell oil at prices denominated in a basket of other currencies and remove the dominance of the USD. Almost certainly other countries would have followed suit. The lesson was sent fairly clearly, wasn't it! And so it has unsurprisingly taken a long time for other countries to even start talking about this idea again.

I hope this bit of oilfield insight helps to explain a few things, the main one being the link between ALL commodities that are traded internationally at prices denominated in USD.

Good luck, happy trading, best wishes from Tony
(former project manager & Petroleum Reservoir Engineer, now retired).

João Aparício

Usually these are in the same country and so, unlike oil, US Nat Gas is NOT an international commodity but strictly a local one, in the sense of "local" to the USA.

Amazing insight Tony, thank you.

Regarding the petrodollar you're right, but it's changing. Russia has decided to denominate its oil in yuan in sales to China link and Iran is accepting rupees in sales to India link

TWI_proptrader

Hi Joao,

When I use your code I get a syntax error. I am not sure why this is happening. Can you provide the code on a backtest please? I am not a programmer but I have an intraday strategy I want to test but I need to use minute data. Can we use the spread for the continuous_future using the product prices not the MA? We then add RSI and BB to trade the price spread range intraday over a set correlated assets.

Frank
btw, I have worked with many traders that have traded interproduct energy futures. Alpha can be found!

João Aparício

I've just replaced your line 71 with my line. I'm attaching the backtest.

Tony Morland

Hi Joao,

"Regarding the petrodollar you're right, but it's changing. Russia has decided to denominate its oil in yuan in sales to China link and Iran is accepting rupees in sales to India link".

Yes, correct, and in fact these are not the only examples of the beginning of moves away from purely USD-denominated commodity prices. Some other examples are Iron Ore and I think also Copper(?). A movement towards truly international (rather than just USD-centric) commodity pricing would break the current link that exists between all commodity prices with the exception of NatGas, but that link will probably remain for a long time and continue to provide exploitable opportunities in trading algos based on unexpected relationships between the futures of apparently unrelated commodities.

Hi Frank
"btw, I have worked with many traders that have traded interproduct energy futures. Alpha can be found!"

Yes, for sure. The so-called crack spread and variants of it are functions of the profit margin earned by petroleum refineries. I have never looked at it in much detail, but I would expect to find some interesting relationships between inter-product energy futures and the profit data of some integrated energy companies that have refining as a significant part of their operations. I think this sort of relationship may provide two different sources of alpha:

1) For anyone mainly interested in Futures strategies: Use of corporate Fundamentals data (Morningstar) of the producer companies as long-term background input factors for the relevant futures, and

2) For anyone mainly interested in Equities strategies:Use of commodity futures data as short-term input factors for any companies for whom the relevant commodities are either a cost or a profit center. (In fact I wanted to do this years ago but until now I never had the nice combination of platform and helpful community of people as here at Quantopian).

Could I suggest to the Q staff that it is worth kicking off a new thread (if one doesn't exist already) specifically on combining futures data PLUS equities data, and any pitfalls that may be associated with it.

Cheers, best wishes, Tony

TWI_proptrader

HI Tony,

I recently wrote a case study on Easyjet plc. Seems to be arb opportunities in new futures energy contacts (especially - Low Sulphur Gasoil (LSGO) futures (see page 14) - https://www.theice.com/publicdocs/futures/Jet_Fuel_Hedging_and_Trading_at_ICE.pdf

I can't find Brent crude on the drop down list. I would like to test the famous WTI-Brent spread on continuous_future function.

Great Idea Tony on combining futures and equities data.

Regards,
Frank

TWI_proptrader

Hi Joano,

Am I right in thinking this stays constant throughout the algorithm's duration. Surely this should be rolling to keep the hedging ratio fixed?

# Adjust price of gasoline (42x) so that both futures have same scale.
record(Crude_Oil=crude_oil_price, Gasoline=gasoline_price*42)

Thanks,

João Aparício

Frank,

Those prices are updated at each time step as you can see below (but this wasn't the point of this thread)

    # Get current price of primary crude oil and gasoline contracts.  
    crude_oil_price = data.current(context.crude_oil, 'price')  
    gasoline_price = data.current(context.gasoline, 'price')  
    # Adjust price of gasoline (42x) so that both futures have same scale.  
    record(Crude_Oil=crude_oil_price, Gasoline=gasoline_price*42)

Tony Morland

Hi Frank,

" ...case study on Easyjet plc" : Yes, I think all the airlines should make interesting "case studies" for algo development. My guess would be that the lower cost airlines probably have the smallest profit margins and therefore likely to be the most sensitive to fuel price changes.

Another aspect of this is fuel substitution as energy prices change. Hydrocarbons generally have four main uses: 1) as feedstock for plastics & chemicals, 2) as fuel for energy generation for industry, 3) for domestic use (heating & cooking), and 4) for transport.

Items 2) and to a lesser extent 3) are susceptible to change, for example substitution of gas for oil, and also coal (especially as technologies improve to clean up the environmental aspects of burning coal. Item 4) transport is the interesting one, especially air transport. Trains can run on electricity supplied via overhead wires and electricity can be generated in lots of different ways (oil, gas, coal, nuclear, solar, wind, tidal power, hydroelectric, etc). Cars & trucks will probably become increasingly electric in future as battery & other electricity storage technologies improve. But as for air transport, the one and only aviation fuel is derived from oil and will probably stay that way for a long time. I mean the idea of a coal-burning plane certainly never took off even if anyone was silly enough to think of it, and as for nuclear-powered planes or battery-powered planes, well I don't really think I would want to fly in them, would you? ;-))

"WTI-Brent spread": yes, worth looking at I would imagine, although I'm not quite sure how this would work exactly. Brent comes from the North Sea and is therefore most relevant to Europe. WTI (WestTexas Intermediate) ... well its obvious where that one comes from and so its most relevant to North America.

Good luck & best wishes, Tony.

Na A

So we are not able to access Brent prices through Quantopian?

You've successfully submitted a support ticket.

Our support team will be in touch soon.