Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Modelling Sports Data

Hi,

I need some advice on how to best model/represent in-running soccer data on Quantopian.

I have many independent timeseries (one per match). Each timeseries is two hours in length with second-by-second data. There are 5-10 fields in each record - home win/away win/draw prices, home/away goals, volume, red/yellow cards etc.

I want my algos to run on every timeseries in the dataset; I think of each timeseries as a new trial of the same experiment (since they are all two hours in length and prices decay towards the expiry time); I need to see how the algos perform in many different scenarios.

Obviously I need to use fetcher to import the data. Should I be trying to model each timeseries as a security, or a signal ? Although I want the algos to run on many timeseries, they don't have to do so in parallel; sequentially would be fine.

Thanks!

7 responses

Hi Justin,

Could you explain a bit more what you want to do with the data? Can you share a snippet of a file with the column header row?
That'll make it easier to recommend an approach.

Thanks,
fawce

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Justin -- I work in the sports industry and with sports data. I would love to speak with you more about this concept as I've had the same thoughts for soccer transfer market modelling.

After Twitter convo with @fawce (thanks John), seems like a bit of background info might be necessary:

Data comes from the UK- based Betfair sports exchange (http://www.betfair.com)

For those unfamiliar with sports betting (am looking at you USA) Betfair effectively allows you to back (buy) and lay (sell) futures contracts on a team's performance during the course of a match; so you might have data that looks as follows (for a match involving Liverpool vs Everton)

timestamp,selection,bid,offer,bid_size,offer_size,score
10:42:01,Liverpool,42,44,100,200,0-1
10:42:01,Everton,25,27,50,150,0-1
10:42:01,Draw,30,32,500,100,0-1
10:42:02,Liverpool,44,46,10,250,0-1
10:42:02,Everton,23,25,100,150,0-1
10:42:02,Draw,30,32,100,100,0-1

Note that there are three possible outcomes in this match - Liverpool win, Everton win, or a Draw

Each timeseries is approx 2 hrs long - 45 minutes for each of 2 halves, plus 15 mins at half time

I would like to get this data into Quantopian and run backtests on it - for example, it is profitable if I buy the home team as soon as they go behind, and then sell if they score ?

Crucially, I need the algo to run on multiple timeseries. It's not enough for me just to look at the Liverpool vs Everton match; I need to know for a large number of matches, is it profitable for me in general to pursue this strategy.

(I have a sinking feeling this functionaliity may not be available in Quantopian yet - what I'm really looking for as output is a histogram of the p&l distribution for this strategy, rather than the p&l growth chart shown in the sample algo).

All help gratefully received; happy to answer more questions.

Luigi, my twitter handle is @juzbo

@baggio1510

Hi Justin,

I think it is clear to me now. You want to treat each team as a stock, and feed the quote data through as market data. In addition, you want to use fetcher to pull in additional signal data about the teams during the match. Then you want to simulate buying and selling bets on the teams.

While I don't think you will be able to pursue this on Quantopian, I think it may be possible for you to do this in your local environment with our opensource zipline. In that environment, you'd feed the quote information through as "trades" and place orders to be simulated against that feed. Essentially you'll load the data into a dataframe with a datetime index for rows, and your teams as the columns. Zipline will then treat the teams as "stocks".

You can connect with the zipline community directly via the google group and zipline on github where you'll find people exploring and simulating with other exotic asset classes.

thanks,
fawce

Hi guys,

I know I am joining really late but I would love to discuss with you about the concept, I am doing some research about modelling sport data and automated betting.

I want to know if this concept become a real project or something.

Thanks,
Jordi