The Bean Report

Approaching Soybean Futures on Quantopian

content_image

Introduction

Understanding a futures contract, in my opinion, requires considering the inherent economic differences between purchasing a physical good at the prevailing market rate, and purchasing the same good, but with a promise from the seller to deliver said good at a future date. These economic differences were best described to me by a former bean trader at the Chicago Board of Trade. My humble attempt to replicate this explanation is below:

Imagine that you are a beef farmer, and you feed your herd of cows using soybean meal. Maintaining soybean meal on inventory is an absolute necessity, as failure to feed your cows would certainly result in the failure of your business. In addition, imagine there is only one soybean farmer in your area. You depend on this farmer to supply this critical input to your business. One day, you notice that your inventory of beans is running low. As a result, you call up the soybean farmer and ask if you can purchase some more beans. You and the farmer agree that 100 dollars per bushel is a fair price, and the farmer says that as long as you pick up the beans within 24 hours, he will honor your agreement on price. The deal works out great for both you and the farmer. The next day, you decide that you do not want to risk having your bean inventory run low again, a prudent decision because the farmer may sell out of beans before the next time you run low. You call up the farmer again, and request to buy more beans, but this time you would like him to deliver the beans to you, and you want this delivery to occur in 90 days with with cash paid on delivery. The farmer says, "sure thing, I can do that for 120 dollars per bushel…"

What happened? Is your good friend the bean farmer trying to take advantage of you with this higher price? You ask him, and he responds with the following reasons to justify his price increase:

He has to keep the beans you ordered stored in his silo for the 90 days before delivery. This will add 5 dollars to his price quote as it represents the money he will lose from his ability to rent out that space in his silo to other people. He has to deliver the beans. This will cost 10 dollars to him in gas and truck depreciation. A cost he will pass on to you. Your request for beans in 90 days restricts him from selling the same beans you have on hold the next day. He explains that he could sell those beans tomorrow and deposit the cash in a 90 day certificate of deposit at the bank, which would provide a total of 5 dollars in profits from interest income at the end of the 90 days.

It turns out that the farmer is being reasonable. Your request to reserve the beans and receive delivery in 90 days create generate more costs, or economic differences. The farmer intends to pass these costs on to you or it would not be logical to make the deal. His storage costs, delivery costs, and opportunity cost of capital all make up what can be described as the “Cost of Carry”, which along with the market price of the physical asset that underlies a futures contract, provides the basis for the valuation of the futures contract.

Section 1 - Pulling in Data

The problem...Wrangling in the data

content_image

Pull in some equity pricing, including a soybean ETF

#### Using Quantopian's 'get_pricing()', we create a pandas data frame below
### The 'spot_prices' variable will be the variable storing the data frame.
## The parameters neccesary to return the frame are described below
# After creating the variable, check to ensure you received the data you requested.

spot_prices = get_pricing(
    ['SOYB','USO','TLT','BSV','PSA',
    'SPY','XIV','IBM','MCD','JNJ','XOM'], # A list of equity symbols to include
    fields='close_price',  # the 'field' we want to utilize
    start_date='2017-01-01', # Starting date of the data
    end_date = '2017-03-31', # Ending date of the data
    frequency='daily', # Frequency of the data
)    
spot_prices.head(3) # use the '.head()' method of pandas to view the first 3 rows of the data frame

Know your Pandas!

content_image

print "Number of Rows:",len(spot_prices)

Number of Rows: 62

#Tom is my favorite Panda. He gets it...
spot_prices.tail(3) # Check to insure the last row matches with the end_date requested

Visualize pricing between all equities in 1Q, 2017

spot_delta = spot_prices.pct_change(periods=1, axis=0) # Get the percent change between closings
spot_delta = spot_delta.dropna() # drop the nan (there will be one at the end)
spot_delta.plot() # quick code for plotting a data frame

<matplotlib.axes._subplots.AxesSubplot at 0x7f59b8cdd090>

Pull in pricing of May and July bean futures for same time frame.

References To Determine Futures Data To Utilize:

Lifespan of Data Available - https://www.quantopian.com/posts/continuous-future-data-lifespans#590a60804de40b0a781ddbc0

Jamie's Notebook (w/ contract symbols and month symbols) - https://www.quantopian.com/posts/futures-data-now-available-in-research

CME's expiration calendar - http://www.cmegroup.com/tools-information/calendars/expiration-calendar/

from quantopian.research.experimental import history # Dependency needed to pull in futures data

# See References above to determine which futures you want to use
sy_contracts = symbols(['SYK17', 'SYN17'])

# Pandas frame for futures is below, same work as we did with equities. 
# (Excuse the lack of logic of the variable name, I was copying Jamie's code)
sy_consecutive_contract_price = history(
    sy_contracts, # Instead of defining the symbols within this argument, I just called the list dd 
    fields='price', 
    frequency='daily', 
    start_date='2017-01-01', 
    end_date='2017-03-31'
)
sy_consecutive_contract_price.head()

View returns between both contracts.

bean_delta = sy_consecutive_contract_price.pct_change(periods=1, axis=0)
bean_delta = bean_delta.dropna()
bean_delta.plot(title='Soy Futures')

<matplotlib.axes._subplots.AxesSubplot at 0x7f59b7c13a10>

Combine the equities and futures into one frame.

# Merging pandas can be a challenge, key with this method is to match up starting and ending dates of both frames
data = spot_prices.join(sy_consecutive_contract_price) 
data.head(3)

REMINDER!!! - Futures closing prices (in these frames) are later than equities

View columns names

print data.columns

Index([      Equity(41912 [SOYB]),        Equity(28320 [USO]),
              Equity(23921 [TLT]),        Equity(33651 [BSV]),
              Equity(24962 [PSA]),         Equity(8554 [SPY]),
              Equity(40516 [XIV]),         Equity(3766 [IBM]),
               Equity(4707 [MCD]),         Equity(4151 [JNJ]),
               Equity(8347 [XOM]), Future(1016201705 [SYK17]),
       Future(1016201707 [SYN17])],
      dtype='object')

Alter column names

data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot',
                'SPY','XIV','IBM','MCD','JNJ','XOM',
                'beans_may','beans_july']

print data.columns

Index([u'beans_spot', u'oil_spot', u'lt_rate_spot', u'st_rate_spot',
       u'storage_spot', u'SPY', u'XIV', u'IBM', u'MCD', u'JNJ', u'XOM',
       u'beans_may', u'beans_july'],
      dtype='object')

Section 2 - Data Insights and Feature Engineering

Make columns that show days to delivery for both contracts

import datetime
data['May_Delivery'] = datetime.datetime(2017,5,12) # May expiration date per CME 
data['July_Delivery'] = datetime.datetime(2017,7,14) # July expiration date per CME
data['may_dtd'] = data.May_Delivery - data.index # Subtract to get days to delivery
data['july_dtd'] = data.July_Delivery - data.index # Subtract to get days to delivery
data['may_dtd'] = data['may_dtd'].astype(int) / 86400000000000 # Convert to days as an integer
data['july_dtd'] = data['july_dtd'].astype(int) / 86400000000000 # Convert to days as an integer
del data['May_Delivery'] # delete 
del data['July_Delivery'] # delete
data.head() # know your pandas

Create The Cost Of Carry Feature

data['dCoC_may'] = (data['beans_spot'] - data['beans_may']) / data['may_dtd']  
data['dCoC_july'] = (data['beans_spot'] - data['beans_july']) / data['july_dtd']

Chart Cost of Carry For Both Contracts in 1Q 2017

Earlier plots used the '.plot()' method on a a pandas frame. Below is a five step method for the most important aspects of building a chart with matplotlib

# Step 1 - import the library (always use plt as the variable because it is easiest to find examples on the internet)
import matplotlib.pyplot as plt
# Step #2 - Select your X and Y data
# For the X values, we are using  'data.index' because that is the dates in our pandas frame
# For the Y values, we are using 'data.
# Addintionally, we add a label within each plot we want to see so we can map the color to the data
plt.plot(data.index,data['dCoC_may'],label="Beans For May Delivery")
plt.plot(data.index,data['dCoC_july'],label="Beans For July Delivery")
# Step #3 - add the legend, this will display the labels we added as the third argument above
plt.legend(loc='upper left')
# Step #4 - add a title to the chart, an X label, and a Y label
plt.title("Comparing Implied Cost of Carry Between May and July Soybeans")
plt.xlabel("Date")
plt.ylabel("Average Daily Implied Cost of Carry")
# Step #5 - Tell the python gods to show the chart on the screen
plt.show()

We know May and July Futures traded evenly, so can we profit from the increasing spread between the cost of carry of May beans and the cost of carry of July beans? How about we sell May Bean futures on Jan 3, and buy Spot Beans on Jan 3?

### Delete ME ????????
print ("Price of Spot Beans on Jan 3:",data['beans_spot'].iloc[0])
print ("Price of May Beans on Jan 3:",data['beans_may'].iloc[0])
print ('Beans shares bought on Jan 3rd ($10K value):',int(10000/18.93))
print ('May beans contracts sold on Jan 3rd ($10K value):',int(10000/10.045))
print ("Price of Spot Beans on March 31:",data['beans_spot'].iloc[-1])
print ("Price of May Beans on March 31:",data['beans_may'].iloc[-1])
print ('Spot Bean Trade Profit:',(10000-18.12*528)*-1)
print ('Futures Trade Profit:',10000-(995*9.448))
print ("Trade Profit:",599.23999999-432.6399999)

('Price of Spot Beans on Jan 3:', 18.93)
('Price of May Beans on Jan 3:', 10.045)
('Beans shares bought on Jan 3rd ($10K value):', 528)
('May beans contracts sold on Jan 3rd ($10K value):', 995)
('Price of Spot Beans on March 31:', 18.120000000000001)
('Price of May Beans on March 31:', 9.4480000000000004)
('Spot Bean Trade Profit:', -432.6399999999994)
('Futures Trade Profit:', 599.2399999999998)
('Trade Profit:', 166.60000008999998)

How often does cost of carry diverge between the front dated month and the next delivery date as was shown between the may and july beans in 1Q of 2017? Chart cost of carry for 1Q 2014-2016

2014 Chart

spot_prices = get_pricing(
    ['SOYB'], 
    fields='close_price', 
    start_date='2014-01-01', 
    end_date = '2014-03-31', 
    frequency='daily',
)    
#spot_prices.head(3)

sy_contracts = symbols(['SYK14', 'SYN14'])

sy_consecutive_contract_price = history(
    sy_contracts, 
    fields='price', 
    frequency='daily', 
    start_date='2014-01-01', 
    end_date='2014-03-31'
)
#sy_consecutive_contract_price.head()

data = spot_prices.join(sy_consecutive_contract_price)
#print (data.columns)
data.columns = ['beans_spot','beans_may','beans_july']

data['May_Delivery'] = datetime.datetime(2014,5,12) # fix with real date later
data['July_Delivery'] = datetime.datetime(2014,7,14) # fix with real date later
data['may_dtd'] = data.May_Delivery - data.index
data['july_dtd'] = data.July_Delivery - data.index
data['may_dtd'] = data['may_dtd'].astype(int) / 86400000000000
data['july_dtd'] = data['july_dtd'].astype(int) / 86400000000000
del data['May_Delivery']
del data['July_Delivery']

data['dCoC_may'] = (data['beans_spot'] - data['beans_may']) / data['may_dtd'] 
data['dCoC_july'] = (data['beans_spot'] - data['beans_july']) / data['july_dtd']




plt.plot(data.index,data['dCoC_may'],label="Beans For May Delivery")
plt.plot(data.index,data['dCoC_july'],label="Beans For July Delivery")
plt.legend(loc='upper left')
plt.title("Comparing Implied Cost of Carry Between May and July Soybeans")
plt.xlabel("Date")
plt.ylabel("Average Daily Implied Cost of Carry")
plt.show()

2015 Chart

spot_prices = get_pricing(
    ['SOYB'], 
    fields='close_price', 
    start_date='2015-01-01', 
    end_date = '2015-03-31', 
    frequency='daily',
)    
#spot_prices.head(3)

sy_contracts = symbols(['SYK15', 'SYN15'])

sy_consecutive_contract_price = history(
    sy_contracts, 
    fields='price', 
    frequency='daily', 
    start_date='2015-01-01', 
    end_date='2015-03-31'
)
#sy_consecutive_contract_price.head()

data = spot_prices.join(sy_consecutive_contract_price)
#print (data.columns)
data.columns = ['beans_spot','beans_may','beans_july']

data['May_Delivery'] = datetime.datetime(2015,5,12) #fix later
data['July_Delivery'] = datetime.datetime(2015,7,14) # fix later
data['may_dtd'] = data.May_Delivery - data.index
data['july_dtd'] = data.July_Delivery - data.index
data['may_dtd'] = data['may_dtd'].astype(int) / 86400000000000
data['july_dtd'] = data['july_dtd'].astype(int) / 86400000000000
del data['May_Delivery']
del data['July_Delivery']

data['dCoC_may'] = (data['beans_spot'] - data['beans_may']) / data['may_dtd'] 
data['dCoC_july'] = (data['beans_spot'] - data['beans_july']) / data['july_dtd']

plt.plot(data.index,data['dCoC_may'],label="Beans For May Delivery")
plt.plot(data.index,data['dCoC_july'],label="Beans For July Delivery")
plt.legend(loc='upper left')
plt.title("Comparing Implied Cost of Carry Between May and July Soybeans")
plt.xlabel("Date")
plt.ylabel("Average Daily Implied Cost of Carry")
plt.show()

2016 Chart

spot_prices = get_pricing(
    ['SOYB'], 
    fields='close_price', 
    start_date='2016-01-01', 
    end_date = '2016-03-31', 
    frequency='daily',
)    
#spot_prices.head(3)

sy_contracts = symbols(['SYK16', 'SYN16'])

sy_consecutive_contract_price = history(
    sy_contracts, 
    fields='price', 
    frequency='daily', 
    start_date='2016-01-01', 
    end_date='2016-03-31'
)
#sy_consecutive_contract_price.head()

data = spot_prices.join(sy_consecutive_contract_price)
#print (data.columns)
data.columns = ['beans_spot','beans_may','beans_july']

data['May_Delivery'] = datetime.datetime(2016,5,12) #fix later
data['July_Delivery'] = datetime.datetime(2016,7,14) # fix later
data['may_dtd'] = data.May_Delivery - data.index
data['july_dtd'] = data.July_Delivery - data.index
data['may_dtd'] = data['may_dtd'].astype(int) / 86400000000000
data['july_dtd'] = data['july_dtd'].astype(int) / 86400000000000
del data['May_Delivery']
del data['July_Delivery']

data['dCoC_may'] = (data['beans_spot'] - data['beans_may']) / data['may_dtd'] 
data['dCoC_july'] = (data['beans_spot'] - data['beans_july']) / data['july_dtd']

plt.plot(data.index,data['dCoC_may'],label="Beans For May Delivery")
plt.plot(data.index,data['dCoC_july'],label="Beans For July Delivery")
plt.legend(loc='upper left')
plt.title("Comparing Implied Cost of Carry Between May and July Soybeans")
plt.xlabel("Date")
plt.ylabel("Average Daily Implied Cost of Carry")
plt.show()

Chart cost of carry for the entire time frame

spot_prices = get_pricing(
    ['SOYB'], 
    fields='close_price', 
    start_date='2014-01-01', 
    end_date = '2017-03-31', 
    frequency='daily',
)    
#spot_prices.head(3)

sy_contracts = symbols(['SYK17', 'SYN17','SYK16','SYN16',
                       'SYK15', 'SYN15','SYK14','SYN14'])

sy_consecutive_contract_price = history(
    sy_contracts, 
    fields='price', 
    frequency='daily', 
    start_date='2014-01-01', 
    end_date='2017-03-31'
)
#sy_consecutive_contract_price.head()

data = spot_prices.join(sy_consecutive_contract_price)

data.columns = ['beans_spot','beans_may_17','beans_july_17','beans_may_16','beans_july_16',
               'beans_may_15','beans_july_15','beans_may_14','beans_july_14']

data['May_Delivery_17'] = datetime.datetime(2017,5,12)
data['July_Delivery_17'] = datetime.datetime(2017,7,14) 
data['may_dtd_17'] = data.May_Delivery_17 - data.index
data['july_dtd_17'] = data.July_Delivery_17 - data.index
data['may_dtd_17'] = data['may_dtd_17'].astype(int) / 86400000000000
data['july_dtd_17'] = data['july_dtd_17'].astype(int) / 86400000000000
del data['May_Delivery_17']
del data['July_Delivery_17']
data['dCoC_may_17'] = (data['beans_spot'] - data['beans_may_17']) / data['may_dtd_17'] 
data['dCoC_july_17'] = (data['beans_spot'] - data['beans_july_17']) / data['july_dtd_17']

data['May_Delivery_16'] = datetime.datetime(2016,5,12)
data['July_Delivery_16'] = datetime.datetime(2016,7,14) 
data['may_dtd_16'] = data.May_Delivery_16 - data.index
data['july_dtd_16'] = data.July_Delivery_16 - data.index
data['may_dtd_16'] = data['may_dtd_16'].astype(int) / 86400000000000
data['july_dtd_16'] = data['july_dtd_16'].astype(int) / 86400000000000
del data['May_Delivery_16']
del data['July_Delivery_16']
data['dCoC_may_16'] = (data['beans_spot'] - data['beans_may_16']) / data['may_dtd_16'] 
data['dCoC_july_16'] = (data['beans_spot'] - data['beans_july_16']) / data['july_dtd_16']

data['May_Delivery_15'] = datetime.datetime(2015,5,12)
data['July_Delivery_15'] = datetime.datetime(2015,7,14) 
data['may_dtd_15'] = data.May_Delivery_15 - data.index
data['july_dtd_15'] = data.July_Delivery_15 - data.index
data['may_dtd_15'] = data['may_dtd_15'].astype(int) / 86400000000000
data['july_dtd_15'] = data['july_dtd_15'].astype(int) / 86400000000000
del data['May_Delivery_15']
del data['July_Delivery_15']
data['dCoC_may_15'] = (data['beans_spot'] - data['beans_may_15']) / data['may_dtd_15'] 
data['dCoC_july_15'] = (data['beans_spot'] - data['beans_july_15']) / data['july_dtd_15']


data['May_Delivery_14'] = datetime.datetime(2014,5,12)
data['July_Delivery_14'] = datetime.datetime(2014,7,14) 
data['may_dtd_14'] = data.May_Delivery_14 - data.index
data['july_dtd_14'] = data.July_Delivery_14 - data.index
data['may_dtd_14'] = data['may_dtd_14'].astype(int) / 86400000000000
data['july_dtd_14'] = data['july_dtd_14'].astype(int) / 86400000000000
del data['May_Delivery_14']
del data['July_Delivery_14']
data['dCoC_may_14'] = (data['beans_spot'] - data['beans_may_14']) / data['may_dtd_14'] 
data['dCoC_july_14'] = (data['beans_spot'] - data['beans_july_14']) / data['july_dtd_14']


##### Charting Again.....

# Step 2
plt.plot(data.index,data['dCoC_may_17'],label="Beans For May 17 Delivery")
plt.plot(data.index,data['dCoC_july_17'],label="Beans For July 17 Delivery")
plt.plot(data.index,data['dCoC_may_16'],label="Beans For May 16 Delivery")
plt.plot(data.index,data['dCoC_july_16'],label="Beans For July 16 Delivery")
plt.plot(data.index,data['dCoC_may_15'],label="Beans For May 15 Delivery")
plt.plot(data.index,data['dCoC_july_15'],label="Beans For July 15 Delivery")
plt.plot(data.index,data['dCoC_may_14'],label="Beans For May 14 Delivery")
plt.plot(data.index,data['dCoC_july_14'],label="Beans For July 14 Delivery")
# Step 3
plt.legend(loc='upper left')
# Step 4
plt.title("Comparing Implied Cost of Carry Between May and July Soybeans")
plt.xlabel("Date")
plt.ylabel("Average Daily Implied Cost of Carry")
# Something new
plt.ylim(0,1) # Adjust y limits if the chart if not looking good on that scale
# Step 5 
plt.show()

Futures markets as they should be....

content_image

Lets make the 2017 data our 'out of sample' data, and 2011-2016 our 'in sample' data. Models will be trained on 2011-2016 1Q data, and predicted on 2017 data.

Get actual delivery dates for all these contracts in future

Copy Paste Append, Repeat

content_image

# Build a frame with in sample data 
########## 2011 ###########################################################################
spot_prices = get_pricing(
    ['SOYB','USO','TLT','BSV','PSA',
    'SPY','XIV','IBM','MCD','JNJ','XOM'], 
    fields='close_price', 
    start_date='2011-01-01', 
    end_date = '2011-03-31', 
    frequency='daily',
)    
#spot_prices.head(3)

sy_contracts = symbols(['SYK11', 'SYN11'])

sy_consecutive_contract_price = history(
    sy_contracts, 
    fields='price', 
    frequency='daily', 
    start_date='2011-01-01', 
    end_date='2011-03-31'
)
#sy_consecutive_contract_price.head()

data = spot_prices.join(sy_consecutive_contract_price)
#print (data.columns)
#data.columns = ['beans_spot','beans_may','beans_july']
#data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot','beans_may','beans_july']
data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot',
                'SPY','XIV','IBM','MCD','JNJ','XOM',
                'beans_may','beans_july']

data['May_Delivery'] = datetime.datetime(2011,5,12) # fix with real date later
data['July_Delivery'] = datetime.datetime(2011,7,14) # fix with real date later
data['may_dtd'] = data.May_Delivery - data.index
data['july_dtd'] = data.July_Delivery - data.index
data['may_dtd'] = data['may_dtd'].astype(int) / 86400000000000
data['july_dtd'] = data['july_dtd'].astype(int) / 86400000000000
del data['May_Delivery']
del data['July_Delivery']

data['dCoC_may'] = (data['beans_spot'] - data['beans_may']) / data['may_dtd'] 
data['dCoC_july'] = (data['beans_spot'] - data['beans_july']) / data['july_dtd']
#####################################################################################################

all_data = data # Make a different frame

########## 2012 ###########################################################################
spot_prices = get_pricing(
    ['SOYB','USO','TLT','BSV','PSA',
    'SPY','XIV','IBM','MCD','JNJ','XOM'], 
    fields='close_price', 
    start_date='2012-01-01', 
    end_date = '2012-03-31', 
    frequency='daily',
)    
#spot_prices.head(3)

sy_contracts = symbols(['SYK12', 'SYN12'])

sy_consecutive_contract_price = history(
    sy_contracts, 
    fields='price', 
    frequency='daily', 
    start_date='2012-01-01', 
    end_date='2012-03-31'
)
#sy_consecutive_contract_price.head()

data = spot_prices.join(sy_consecutive_contract_price)
#print (data.columns)
#data.columns = ['beans_spot','beans_may','beans_july']
#data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot','beans_may','beans_july']
data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot',
                'SPY','XIV','IBM','MCD','JNJ','XOM',
                'beans_may','beans_july']

data['May_Delivery'] = datetime.datetime(2012,5,12) # fix with real date later
data['July_Delivery'] = datetime.datetime(2012,7,14) # fix with real date later
data['may_dtd'] = data.May_Delivery - data.index
data['july_dtd'] = data.July_Delivery - data.index
data['may_dtd'] = data['may_dtd'].astype(int) / 86400000000000
data['july_dtd'] = data['july_dtd'].astype(int) / 86400000000000
del data['May_Delivery']
del data['July_Delivery']

data['dCoC_may'] = (data['beans_spot'] - data['beans_may']) / data['may_dtd'] 
data['dCoC_july'] = (data['beans_spot'] - data['beans_july']) / data['july_dtd']
#####################################################################################################

#all_data = data # Make a different frame
all_data = all_data.append(data)

########## 2013 ###########################################################################
spot_prices = get_pricing(
    ['SOYB','USO','TLT','BSV','PSA',
    'SPY','XIV','IBM','MCD','JNJ','XOM'], 
    fields='close_price', 
    start_date='2013-01-01', 
    end_date = '2013-03-31', 
    frequency='daily',
)    
#spot_prices.head(3)

sy_contracts = symbols(['SYK13', 'SYN13'])

sy_consecutive_contract_price = history(
    sy_contracts, 
    fields='price', 
    frequency='daily', 
    start_date='2013-01-01', 
    end_date='2013-03-31'
)
#sy_consecutive_contract_price.head()

data = spot_prices.join(sy_consecutive_contract_price)
#print (data.columns)
#data.columns = ['beans_spot','beans_may','beans_july']
#data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot','beans_may','beans_july']
data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot',
                'SPY','XIV','IBM','MCD','JNJ','XOM',
                'beans_may','beans_july']

data['May_Delivery'] = datetime.datetime(2013,5,12) # fix with real date later
data['July_Delivery'] = datetime.datetime(2013,7,14) # fix with real date later
data['may_dtd'] = data.May_Delivery - data.index
data['july_dtd'] = data.July_Delivery - data.index
data['may_dtd'] = data['may_dtd'].astype(int) / 86400000000000
data['july_dtd'] = data['july_dtd'].astype(int) / 86400000000000
del data['May_Delivery']
del data['July_Delivery']

data['dCoC_may'] = (data['beans_spot'] - data['beans_may']) / data['may_dtd'] 
data['dCoC_july'] = (data['beans_spot'] - data['beans_july']) / data['july_dtd']
#####################################################################################################

#all_data = data # Make a different frame
all_data = all_data.append(data)

########## 2014 ###########################################################################
spot_prices = get_pricing(
    ['SOYB','USO','TLT','BSV','PSA',
    'SPY','XIV','IBM','MCD','JNJ','XOM'], 
    fields='close_price', 
    start_date='2014-01-01', 
    end_date = '2014-03-31', 
    frequency='daily',
)    
#spot_prices.head(3)

sy_contracts = symbols(['SYK14', 'SYN14'])

sy_consecutive_contract_price = history(
    sy_contracts, 
    fields='price', 
    frequency='daily', 
    start_date='2014-01-01', 
    end_date='2014-03-31'
)
#sy_consecutive_contract_price.head()

data = spot_prices.join(sy_consecutive_contract_price)
#print (data.columns)
#data.columns = ['beans_spot','beans_may','beans_july']
#data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot','beans_may','beans_july']
data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot',
                'SPY','XIV','IBM','MCD','JNJ','XOM',
                'beans_may','beans_july']

data['May_Delivery'] = datetime.datetime(2014,5,12) # fix with real date later
data['July_Delivery'] = datetime.datetime(2014,7,14) # fix with real date later
data['may_dtd'] = data.May_Delivery - data.index
data['july_dtd'] = data.July_Delivery - data.index
data['may_dtd'] = data['may_dtd'].astype(int) / 86400000000000
data['july_dtd'] = data['july_dtd'].astype(int) / 86400000000000
del data['May_Delivery']
del data['July_Delivery']

data['dCoC_may'] = (data['beans_spot'] - data['beans_may']) / data['may_dtd'] 
data['dCoC_july'] = (data['beans_spot'] - data['beans_july']) / data['july_dtd']
#####################################################################################################

#all_data = data # Make a different frame
all_data = all_data.append(data)

########## 2015 ###########################################################################
spot_prices = get_pricing(
    ['SOYB','USO','TLT','BSV','PSA',
    'SPY','XIV','IBM','MCD','JNJ','XOM'], 
    fields='close_price', 
    start_date='2015-01-01', 
    end_date = '2015-03-31', 
    frequency='daily',
)    
#spot_prices.head(3)

sy_contracts = symbols(['SYK15', 'SYN15'])

sy_consecutive_contract_price = history(
    sy_contracts, 
    fields='price', 
    frequency='daily', 
    start_date='2015-01-01', 
    end_date='2015-03-31'
)
#sy_consecutive_contract_price.head()

data = spot_prices.join(sy_consecutive_contract_price)
#print (data.columns)
#data.columns = ['beans_spot','beans_may','beans_july']
#data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot','beans_may','beans_july']
data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot',
                'SPY','XIV','IBM','MCD','JNJ','XOM',
                'beans_may','beans_july']

data['May_Delivery'] = datetime.datetime(2015,5,12) # fix with real date later
data['July_Delivery'] = datetime.datetime(2015,7,14) # fix with real date later
data['may_dtd'] = data.May_Delivery - data.index
data['july_dtd'] = data.July_Delivery - data.index
data['may_dtd'] = data['may_dtd'].astype(int) / 86400000000000
data['july_dtd'] = data['july_dtd'].astype(int) / 86400000000000
del data['May_Delivery']
del data['July_Delivery']

data['dCoC_may'] = (data['beans_spot'] - data['beans_may']) / data['may_dtd'] 
data['dCoC_july'] = (data['beans_spot'] - data['beans_july']) / data['july_dtd']
#####################################################################################################

####### Append 2015 to 2014 ########################
all_data = all_data.append(data)
#####################################################

########## 2016 ###########################################################################
spot_prices = get_pricing(
    ['SOYB','USO','TLT','BSV','PSA',
    'SPY','XIV','IBM','MCD','JNJ','XOM'], 
    fields='close_price', 
    start_date='2016-01-01', 
    end_date = '2016-03-31', 
    frequency='daily',
)    
#spot_prices.head(3)

sy_contracts = symbols(['SYK16', 'SYN16'])

sy_consecutive_contract_price = history(
    sy_contracts, 
    fields='price', 
    frequency='daily', 
    start_date='2016-01-01', 
    end_date='2016-03-31'
)
#sy_consecutive_contract_price.head()

data = spot_prices.join(sy_consecutive_contract_price)
#print (data.columns)
#data.columns = ['beans_spot','beans_may','beans_july']
#data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot','beans_may','beans_july']
data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot',
                'SPY','XIV','IBM','MCD','JNJ','XOM',
                'beans_may','beans_july']

data['May_Delivery'] = datetime.datetime(2016,5,12) # fix with real date later
data['July_Delivery'] = datetime.datetime(2016,7,14) # fix with real date later
data['may_dtd'] = data.May_Delivery - data.index
data['july_dtd'] = data.July_Delivery - data.index
data['may_dtd'] = data['may_dtd'].astype(int) / 86400000000000
data['july_dtd'] = data['july_dtd'].astype(int) / 86400000000000
del data['May_Delivery']
del data['July_Delivery']

data['dCoC_may'] = (data['beans_spot'] - data['beans_may']) / data['may_dtd'] 
data['dCoC_july'] = (data['beans_spot'] - data['beans_july']) / data['july_dtd']
#####################################################################################################

####### Append 2016 to 2014 and 2015 ########################
all_data = all_data.append(data)
#####################################################

# play heads and tails to confirm
all_data.head(3)

all_data.tail(3)

Dealing with NaNs

content_image # Change font on 'NaN'

print "Some Nans in the ETF:",all_data.beans_spot.isnull().sum()
print ("Dropping First 70 Rows...")
all_data = all_data.dropna()
print "New Count of Nans in Data:",all_data.beans_spot.isnull().sum()

Some Nans in the ETF: 70
Dropping First 70 Rows...
New Count of Nans in Data: 0

all_data.head(2)

all_data.tail(2)

Lets make a target variable, spot price of beans the next day

content_image

all_data['beans_next_day'] = all_data['beans_spot'].shift(-1) # Move all data one row down 
all_data = all_data.dropna() # delete last row
all_data.head(3) # Know your pandas

all_data.tail(3)

Plot a heatmap

Reference Seong's Notebook - https://www.quantopian.com/posts/research-clonable-do-you-want-parameter-optimization-click-here-to-get-started-heat-maps-included

import numpy as np
import matplotlib.pyplot as pyplot
import pandas as pd
from collections import defaultdict

X_axis = [x for x in all_data.columns]
Y_axis = [x for x in all_data.columns]

Pearson = defaultdict(dict)

for feature_x in X_axis:
    for feature_y in Y_axis:    
        Pearson_R = np.corrcoef(all_data[feature_x],all_data[feature_y])[0,1]
        Pearson[feature_x][feature_y] = Pearson_R

Pearson = pd.DataFrame(Pearson)
Pearson.index.name = "Y Axis"
Pearson.columns.name = "X Axis"


def heat_map(df):
    fig = pyplot.figure()
    ax = fig.add_subplot(111)
    axim = ax.imshow(df.values,cmap = pyplot.get_cmap('RdYlGn'), interpolation = 'nearest')
    ax.set_xlabel(df.columns.name)
    ax.set_xticks(np.arange(len(df.columns)))
    ax.set_xticklabels(list(df.columns))
    ax.set_ylabel(df.index.name)
    ax.set_yticks(np.arange(len(df.index)))
    ax.set_yticklabels(list(df.index))
    ax.set_title("Pearson R Between All Data")
    plt.xticks(rotation=90)
    pyplot.colorbar(axim)
    
#: Plot our heatmap
heat_map(Pearson)

Section 3 - Develop Machine Learning Models and Test Predictions

Rise of the Machines

content_image

print "Know your Pandas!"
print "Length of Data Frame:",len(all_data)

Know your Pandas!
Length of Data Frame: 296

Train a few ML models.

# Checking which models can be imported
from sklearn.ensemble import ExtraTreesRegressor # Extra Trees Model #1
#from sklearn.neural_network import MLPClassifier # Not this one yet :(
from sklearn import linear_model # Lasso model #2
from sklearn.linear_model import SGDClassifier
from sklearn.neighbors.nearest_centroid import NearestCentroid
from sklearn.naive_bayes import GaussianNB
from sklearn import tree # Decision Tree Model #3
from sklearn.ensemble import RandomForestClassifier
from sklearn.isotonic import IsotonicRegression
from sklearn.ensemble import GradientBoostingRegressor # Gradient Boosting Regressor #4

all_data = all_data.dropna()
cols_to_use = ['beans_spot','beans_may','beans_july','may_dtd','july_dtd','dCoC_may','dCoC_july',
              'oil_spot','lt_rate_spot','st_rate_spot','storage_spot',
              'SPY','XIV','IBM','MCD','JNJ','XOM'] # These columns will be plugged into the model

print (len(all_data))

296

#1
#http://scikit-learn.org/stable/modules/generated/sklearn.tree.ExtraTreeRegressor.html
rfr = ExtraTreesRegressor(n_estimators=15, max_depth=4, n_jobs=-1, random_state=17, verbose=0)
model_1 = rfr.fit(all_data[cols_to_use], all_data['beans_next_day'].values)
all_data['trees_pred'] = model_1.predict(all_data[cols_to_use])

#2
# http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html
reg = linear_model.Lasso(alpha = 0.1)
model_2 = reg.fit(all_data[cols_to_use], all_data['beans_next_day'].values)
all_data['lasso_pred'] = model_2.predict(all_data[cols_to_use])

#3 
# http://scikit-learn.org/stable/auto_examples/tree/plot_tree_regression.html
clf = tree.DecisionTreeRegressor()
model_3 = clf.fit(all_data[cols_to_use], all_data['beans_next_day'].values)
all_data['d_tree_pred'] = model_3.predict(all_data[cols_to_use])

#4 
# http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html
gbr = GradientBoostingRegressor(n_estimators=25, max_depth=1)
model_4 = gbr.fit(all_data[cols_to_use], all_data['beans_next_day'].values)
all_data['gbr_pred'] = model_4.predict(all_data[cols_to_use])

all_data.head()

stack_cols = ['beans_spot','beans_may','beans_july','may_dtd','july_dtd','dCoC_may','dCoC_july',
             'trees_pred','lasso_pred','d_tree_pred','gbr_pred',
             'oil_spot','lt_rate_spot','st_rate_spot','storage_spot',
             'SPY','XIV','IBM','MCD','JNJ','XOM']

stacked = ExtraTreesRegressor(n_estimators=15, max_depth=4, n_jobs=-1, random_state=17, verbose=0)
stacked_model = stacked.fit(all_data[stack_cols], all_data['beans_next_day'].values)
all_data['stack_pred'] = stacked_model.predict(all_data[stack_cols])
all_data.head(3)

Q platform provides feature importance data for 2 of the 4 models utilized for the stacked model. Bar charts of these importance is below

#print stacked_model.feature_importances_ # Not working for extratrees
#print model_1.feature_importances_ # Not working for extratrees
#print model_2.feature_importances_ # Not an attribute for lasso
print model_3.feature_importances_ #.format('.65f') # Decision Tree Regressor
print model_4.feature_importances_ # Gradient Boosting Regressor
print ("Use these lists for bar charts")

[  1.07775304e-01   6.78169431e-02   8.03035820e-01   5.52975805e-04
   1.47436138e-03   1.50767518e-03   1.06282179e-02   4.41908566e-04
   4.79576457e-04   5.17862089e-04   1.03019158e-03   3.48141522e-04
   4.41415938e-04   1.91422789e-03   1.26456928e-03   5.93656641e-05
   7.11443129e-04]
[ 0.28  0.4   0.04  0.    0.    0.    0.    0.08  0.    0.    0.08  0.    0.
  0.    0.12  0.    0.  ]
Use these lists for bar charts

### Reference - https://pythonspot.com/en/matplotlib-bar-chart/

 
objects = cols_to_use ##### ALTER THIS WITH YOUR COLUMNS
y_pos = np.arange(len(objects))
performance = [  1.08052637e-01,   6.75205609e-02,   8.02818292e-01,   6.20421619e-04,
   1.41083053e-03,   1.47561752e-03,   1.06339815e-02,   4.64172859e-04,
   5.71266355e-04,   3.62098658e-04,   1.02916181e-03,   3.59705139e-04,
   4.59235080e-04,   1.81326427e-03,   1.39243322e-03,   3.48512822e-04,
   6.67808510e-04] ##### ALTER THIS WITH YOUR DATA
 
plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, objects)

#### ALTER DESCRIPTIONS ############
plt.ylabel('Numeric Importance To Model')
plt.title('Feature Importance - Decision Tree Regressor Model')
plt.xticks(rotation=90) 
plt.show()

import matplotlib.pyplot as plt # plt.rcdefaults()
import numpy as np
#import matplotlib.pyplot as plt
 
objects =cols_to_use
y_pos = np.arange(len(objects))
#performance = [0.48,  0.04,  0.08,  0,    0,    0,    0,    0.12,  0,    0,    0.08,  0,    0,
#  0.08,  0.08,  0,    0.04]

performance = [ 0.28,  0.4,   0.04,  0.,    0.,    0.,    0.,    0.08,  0.,    0.,    0.04,  0.,    0.,
  0.,    0.16,  0.,    0.,  ]
 
plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('Numeric Importance To Model')
plt.title('Feature Importance - Gradient Boosting Regressor Model')
plt.xticks(rotation=90) 
plt.show()

Chart model predictions on "in sample" data

plt.scatter(all_data.index,all_data['beans_spot'],color='red',label="Beans Spot Price")
plt.scatter(all_data.index,all_data['stack_pred'],color='blue',label="Predicted Spot Price For Beans")
plt.legend(loc='upper left')
plt.title("Comparing Stacked Model Predictions Against Actual Bean Prices")
plt.xlabel("Date")
plt.ylabel("Price")
plt.show()

Test the model on "out of sample" data

spot_prices = get_pricing(
    ['SOYB','USO','TLT','BSV','PSA',
    'SPY','XIV','IBM','MCD','JNJ','XOM'], 
    fields='close_price', 
    start_date='2017-01-01', 
    end_date = '2017-03-31', 
    frequency='daily',
)    
#spot_prices.head(3)

sy_contracts = symbols(['SYK17', 'SYN17'])

sy_consecutive_contract_price = history(
    sy_contracts, 
    fields='price', 
    frequency='daily', 
    start_date='2017-01-01', 
    end_date='2017-03-31'
)
#sy_consecutive_contract_price.head()

data = spot_prices.join(sy_consecutive_contract_price)
#print (data.columns)
#data.columns = ['beans_spot','beans_may','beans_july']
#data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot','beans_may','beans_july']
data.columns = ['beans_spot','oil_spot','lt_rate_spot','st_rate_spot','storage_spot',
                'SPY','XIV','IBM','MCD','JNJ','XOM',
                'beans_may','beans_july']


data['May_Delivery'] = datetime.datetime(2017,5,12) # fix with real date later
data['July_Delivery'] = datetime.datetime(2017,7,14) # fix with real date later
data['may_dtd'] = data.May_Delivery - data.index
data['july_dtd'] = data.July_Delivery - data.index
data['may_dtd'] = data['may_dtd'].astype(int) / 86400000000000
data['july_dtd'] = data['july_dtd'].astype(int) / 86400000000000
del data['May_Delivery']
del data['July_Delivery']

data['dCoC_may'] = (data['beans_spot'] - data['beans_may']) / data['may_dtd'] 
data['dCoC_july'] = (data['beans_spot'] - data['beans_july']) / data['july_dtd']

# Start Predictions
data['trees_pred'] = model_1.predict(data[cols_to_use])
data['lasso_pred'] = model_2.predict(data[cols_to_use])
data['d_tree_pred'] = model_3.predict(data[cols_to_use])
data['gbr_pred'] = model_4.predict(data[cols_to_use])

# stacked Prediction
data['stack_pred'] = stacked_model.predict(data[stack_cols])

test_score = data
test_score['beans_next_day'] = test_score['beans_spot'].shift(-1)
test_score = test_score.dropna()
print "R2 of Model:",stacked_model.score(test_score[stack_cols],test_score['beans_next_day'])

#Returns the coefficient of determination R^2 of the prediction.
#The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) ** 2).sum(). Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

R2 of Model: 0.484957903699

plt.scatter(data.index,data['beans_spot'],color='red',label="Beans Spot Price")
plt.scatter(data.index,data['stack_pred'],color='blue',label="Predicted Spot Price For Beans")
plt.legend(loc='upper left')
plt.title("Comparing Stacked Model Predictions Against Actual Bean Prices")
plt.xlabel("Date")
plt.ylabel("Price")
plt.show()

Get rate of change between predictions

data['signal_delta'] = data['stack_pred'] - data['stack_pred'].shift(1)
data = data.dropna()
data.tail(2)

Chart the deltas between predictions and the corresponding return

plt.plot(data.index,data['signal_delta'],color='red',label="Signal Delta")
#plt.scatter(data.index,data['stack_pred'],color='blue',label="Predicted Spot Price For Beans")
plt.legend(loc='upper left')
plt.title("Examining Large Deltas in Model Predictions")
plt.xlabel("Date")
plt.ylabel("Price")
plt.show()

	Equity(41912 [SOYB])	Equity(28320 [USO])	Equity(23921 [TLT])	Equity(33651 [BSV])	Equity(24962 [PSA])	Equity(8554 [SPY])	Equity(40516 [XIV])	Equity(3766 [IBM])	Equity(4707 [MCD])	Equity(4151 [JNJ])	Equity(8347 [XOM])
2017-01-03 00:00:00+00:00	18.93	11.450	119.153	79.248	221.489	224.195	50.24	165.873	118.741	115.034	90.056
2017-01-04 00:00:00+00:00	19.22	11.570	119.631	79.278	223.144	225.539	52.80	167.912	118.607	114.884	89.065
2017-01-05 00:00:00+00:00	19.24	11.695	121.483	79.397	225.581	225.380	52.99	167.366	118.805	116.107	87.737

	Equity(41912 [SOYB])	Equity(28320 [USO])	Equity(23921 [TLT])	Equity(33651 [BSV])	Equity(24962 [PSA])	Equity(8554 [SPY])	Equity(40516 [XIV])	Equity(3766 [IBM])	Equity(4707 [MCD])	Equity(4151 [JNJ])	Equity(8347 [XOM])
2017-03-29 00:00:00+00:00	18.45	10.38	121.33	79.690	219.95	235.59	74.68	173.96	128.86	124.97	82.03
2017-03-30 00:00:00+00:00	18.33	10.57	120.37	79.675	219.38	236.27	74.42	173.87	129.33	124.65	83.68
2017-03-31 00:00:00+00:00	18.12	10.64	120.72	79.730	218.90	235.72	73.03	174.17	129.61	124.51	82.00

	Future(1016201705 [SYK17])	Future(1016201707 [SYN17])
2017-01-03 00:00:00+00:00	10.045	10.110
2017-01-04 00:00:00+00:00	10.220	10.282
2017-01-05 00:00:00+00:00	10.210	10.278
2017-01-06 00:00:00+00:00	10.048	10.120
2017-01-09 00:00:00+00:00	10.138	10.210

	Equity(41912 [SOYB])	Equity(28320 [USO])	Equity(23921 [TLT])	Equity(33651 [BSV])	Equity(24962 [PSA])	Equity(8554 [SPY])	Equity(40516 [XIV])	Equity(3766 [IBM])	Equity(4707 [MCD])	Equity(4151 [JNJ])	Equity(8347 [XOM])	Future(1016201705 [SYK17])	Future(1016201707 [SYN17])
2017-01-03 00:00:00+00:00	18.93	11.450	119.153	79.248	221.489	224.195	50.24	165.873	118.741	115.034	90.056	10.045	10.110
2017-01-04 00:00:00+00:00	19.22	11.570	119.631	79.278	223.144	225.539	52.80	167.912	118.607	114.884	89.065	10.220	10.282
2017-01-05 00:00:00+00:00	19.24	11.695	121.483	79.397	225.581	225.380	52.99	167.366	118.805	116.107	87.737	10.210	10.278

	beans_spot	oil_spot	lt_rate_spot	st_rate_spot	storage_spot	SPY	XIV	IBM	MCD	JNJ	XOM	beans_may	beans_july	may_dtd	july_dtd
2017-01-03 00:00:00+00:00	18.930	11.450	119.153	79.248	221.489	224.195	50.24	165.873	118.741	115.034	90.056	10.045	10.110	129.0	192.0
2017-01-04 00:00:00+00:00	19.220	11.570	119.631	79.278	223.144	225.539	52.80	167.912	118.607	114.884	89.065	10.220	10.282	128.0	191.0
2017-01-05 00:00:00+00:00	19.240	11.695	121.483	79.397	225.581	225.380	52.99	167.366	118.805	116.107	87.737	10.210	10.278	127.0	190.0
2017-01-06 00:00:00+00:00	18.940	11.680	120.338	79.297	226.275	226.246	53.68	168.239	119.897	115.535	87.718	10.048	10.120	126.0	189.0
2017-01-09 00:00:00+00:00	19.115	11.320	121.334	79.387	224.907	225.519	53.82	166.404	119.570	115.615	86.291	10.138	10.210	123.0	186.0

	beans_spot	oil_spot	lt_rate_spot	st_rate_spot	storage_spot	SPY	XIV	IBM	MCD	JNJ	XOM	beans_may	beans_july	may_dtd	july_dtd	dCoC_may	dCoC_july
2011-01-03 00:00:00+00:00	NaN	39.06	92.803	80.144	102.760	126.500	123.80	146.848	75.979	62.237	74.119	13.882	13.918	129.0	192.0	NaN	NaN
2011-01-04 00:00:00+00:00	NaN	38.07	92.893	80.124	100.298	126.380	121.35	147.088	73.727	62.742	74.517	13.785	13.810	128.0	191.0	NaN	NaN
2011-01-05 00:00:00+00:00	NaN	38.50	90.886	79.945	101.231	127.077	125.99	146.450	74.074	62.742	74.328	13.978	13.998	127.0	190.0	NaN	NaN

	beans_spot	oil_spot	lt_rate_spot	st_rate_spot	storage_spot	SPY	XIV	IBM	MCD	JNJ	XOM	beans_may	beans_july	may_dtd	july_dtd	dCoC_may	dCoC_july
2016-03-29 00:00:00+00:00	18.128	9.79	131.06	80.490	274.45	205.14	25.43	149.37	123.99	109.150	84.51	9.170	9.232	44.0	107.0	0.203591	0.083140
2016-03-30 00:00:00+00:00	17.970	9.73	129.72	80.545	273.42	206.07	26.07	148.40	125.83	109.030	84.52	9.082	9.152	43.0	106.0	0.206698	0.083189
2016-03-31 00:00:00+00:00	18.021	9.70	130.53	80.614	275.70	205.56	25.88	151.48	125.86	108.175	83.61	9.098	9.168	42.0	105.0	0.212452	0.084314

	beans_spot	oil_spot	lt_rate_spot	st_rate_spot	storage_spot	SPY	XIV	IBM	MCD	JNJ	XOM	beans_may	beans_july	may_dtd	july_dtd	dCoC_may	dCoC_july
2012-01-03 00:00:00+00:00	22.29	39.69	118.884	80.620	134.002	127.033	6.86	185.538	98.121	65.244	85.472	12.37	12.460	130.0	193.0	0.076308	0.050933
2012-01-04 00:00:00+00:00	22.16	39.79	117.471	80.654	130.284	127.123	7.00	184.820	98.727	64.877	85.582	12.40	12.488	129.0	192.0	0.075659	0.050375

	beans_spot	oil_spot	lt_rate_spot	st_rate_spot	storage_spot	SPY	XIV	IBM	MCD	JNJ	XOM	beans_may	beans_july	may_dtd	july_dtd	dCoC_may	dCoC_july	beans_next_day
2016-03-28 00:00:00+00:00	18.020	10.01	129.69	80.270	269.70	203.25	24.03	148.39	123.19	108.23	84.22	9.093	9.162	45.0	108.0	0.198378	0.082019	18.128
2016-03-29 00:00:00+00:00	18.128	9.79	131.06	80.490	274.45	205.14	25.43	149.37	123.99	109.15	84.51	9.170	9.232	44.0	107.0	0.203591	0.083140	17.970
2016-03-30 00:00:00+00:00	17.970	9.73	129.72	80.545	273.42	206.07	26.07	148.40	125.83	109.03	84.52	9.082	9.152	43.0	106.0	0.206698	0.083189	18.021

	beans_spot	oil_spot	lt_rate_spot	st_rate_spot	storage_spot	SPY	XIV	IBM	MCD	JNJ	...	beans_july	may_dtd	july_dtd	dCoC_may	dCoC_july	beans_next_day	trees_pred	lasso_pred	d_tree_pred	gbr_pred
2012-01-03 00:00:00+00:00	22.290	39.69	118.884	80.620	134.002	127.033	6.86	185.538	98.121	65.244	...	12.460	130.0	193.0	0.076308	0.050933	22.160	21.958738	22.335042	22.160	22.094363
2012-01-04 00:00:00+00:00	22.160	39.79	117.471	80.654	130.284	127.123	7.00	184.820	98.727	64.877	...	12.488	129.0	192.0	0.075659	0.050375	21.850	22.037390	22.231920	21.850	22.094363
2012-01-05 00:00:00+00:00	21.850	39.18	117.262	80.610	131.811	127.501	7.15	183.934	99.134	64.817	...	12.272	128.0	191.0	0.075523	0.050147	21.502	21.678980	21.994559	21.502	21.750433
2012-01-06 00:00:00+00:00	21.502	39.23	118.167	80.709	131.216	127.232	7.26	181.812	99.888	64.242	...	12.120	127.0	190.0	0.074567	0.049379	22.140	21.720583	21.670597	22.140	21.750433
2012-01-09 00:00:00+00:00	22.140	39.06	117.968	80.749	131.037	127.441	7.36	180.886	98.925	64.351	...	12.498	124.0	187.0	0.078403	0.051561	22.072	22.007340	22.128152	22.072	22.094363