Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Why can't I access historical prices value? and how to do natural log?

I thought if you obtain the historical data via data.history ... then it's automatically in the data type of "list of list" but it seems like it's not.

def initialize(context):  
    context.securities = [sid(24),sid(114)]  
    for i in range(1,390,30):  
        schedule_function(open_positions,date_rules.every_day(),time_rules.market_open(minutes=i))  
def open_positions(context, data):  
    # obtaining historical data (previous 50 data points at 30 minutes frequency)  
    period = 50  
    timeframe = 30  
    timeframe_unit = 'T'  
    bars_1m = period*timeframe  
    timeframe_string = str(timeframe) + timeframe_unit  
    hist = data.history(context.securities, 'price', bars_1m, '1m').resample(timeframe_string).last().ffill().iloc[-1]  
    #-------------------------  

with this, if I do print hist then it does print historical close prices, and I can even do len(hist) which would be the total number of securities defined but if I go like, say, len(hist[0]) or even print hist[0][0] it doesn't work saying it's an "invalid index to a scalar variable"

which leads me thinking it's really just a 1D array in which its components just define the securities ID

How can I transform it into a 2D array (list of list), say myMatrix, in which myMatrix[5][10] would represent 10th oldest data of the 5th securities

4 responses
def initialize(context):  
    context.securities = [sid(24),sid(114)]  
    for i in range(1,390,30):  
        schedule_function(open_positions,date_rules.every_day(),time_rules.market_open(minutes=i))  
def open_positions(context, data):  
    # obtaining historical data (previous 50 data points at 30 minutes frequency)  
    period = 50  
    timeframe = 30  
    timeframe_unit = 'T'  
    bars_1m = period*timeframe  
    timeframe_string = str(timeframe) + timeframe_unit  
    hist = data.history(context.securities, 'price', bars_1m, '1m').resample(timeframe_string).last().ffill().iloc[-1]  
    #-------------------------  
    data_arr = hist.as_matrix()  
    print data_arr[0][0] 

So I've tried as_matrix() to get the numpy matrix but still it doesn't work, saying it's invalid index to a scalar variable

The 'data.history' method returns a pandas Series or DataFrame or Panel, depending on the dimensionality of the 'assets' and 'fields' parameters. So, in the scenario above, with more than one asset and exactly one 'field' (ie price), hist will be a 2D pandas dataframe. The returned dataframe is indexed by date, and has assets as columns. There is no need to turn it into a matrix or 2D array. Use the pandas methods to slice and dice and select. See the docs https://www.quantopian.com/help#api-data-history

Some of the methods and/or operations to be careful of in pandas are 'len' and selecting data using 'data[0][0]' . They don't always work as you would expect. Use the equivalent pandas methods instead.

If one wants to get a single specific value and select by integer location, then use the '.iat' or 'iloc' methods (maybe take a look here https://stackoverflow.com/questions/28757389/loc-vs-iloc-vs-ix-vs-at-vs-iat ). Something like this will give the last price for the first asset

latest_price = hist.iat[-1, 0]  
latest_price = hist.iloc[-1, 0]

The minute is the first index. The asset is the second. Probably a more definite way is to specify the asset explicitly. Unfortunately the loc and iat methods require integers for both axis. Something like this however works..

appl = sid(24)  
appl_col = hist.columns.get_loc(appl)  
latest_price = hist.iloc[0, appl_col]

Hope this helps.

Good luck.

Just noticed the post title asked about getting log returns. I assume this was the question?

Several ways to get log returns. Pandas doesn't have a built in log method. One can combine numpy and pandas though. Here's one way using the pandas pct_change method.

import numpy as np  
log_prices = np.log(1 + hist.pct_change())

log_prices will be a pandas dataframe similar to hist. It is indexed by date, and has assets as columns.

I guess I will need to try on some features. Never really dealt with Pandas

As for natural log, I looked it up and it's part of numpy library so I deleted that portion of the question

But thanks tho