Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
How to Use data.history

Hi guys,

I am having trouble using the data.history. The examples that I found was:

def initialize(context):
# AAPL, MSFT, and SPY
context.assets = [sid(24), sid(5061), sid(8554)]

def handle_data(context, data):
price_history = data.history(context.assets, fields="price", bar_count=20, frequency="1d")

However, I am now having a DataFrame of stocks filtered from Pipeline. I tried to convert that DataFrame with tolist() and assign it as context.assets. However, it didn't seem to work when I pass that list through data.history. Is there any way that I can convert the DataFrame into a list of sid?

Thank you very much!

40 responses

HI THanh,

if you assign your pipeline output you should be able to loop through each pipeline generated symbol.

context.results = pipeline_output('factors')  
 for stock in context.results.index:  
            var=1  
            sym=stock.symbol  

Hope this helps

Hi JJ,

Thanks a lot for your help! I am very new to both programming and Quantopian. Would you kindly explain what does the line var=1 do? From what I understand, the for loop will go through all values in context.results.index and input the stock.symbol value for each stock in the variable "sym." Is it correct?

Thanks so much!

Thanh Duong

var=1 doesn't it's just from an old code,
yes that is correct

def before_trading_start(context, data):  
...
    result10 = result[result['score'] == 10]  
    context.securities = result10.tail(10)  
    context.sym = []  
    for stock in context.securities.index:  
           context.sym.append(stock.symbol)  
def my_rebalance(context,data):  
    prices = data.history(context.sym, 'price', bar_count=100, frequency='1d')  
    returns = prices.pct_change().dropna()  
    returns = returns.values  
    print type(prices)  
    print prices  

Hi JJ, I followed your instruction to loop through the Dataframe index and append a list of stock symbols. However, when I ran the algorithm and print the prices, it showed me that "prices" is an empty DataFrame. I really wish there would be a way to debug this.

Here is my full code:

from quantopian.algorithm import attach_pipeline, pipeline_output  
from quantopian.pipeline.classifiers.fundamentals import Sector  
from quantopian.pipeline import Pipeline  
from quantopian.pipeline.data import morningstar  
from quantopian.pipeline.data import Fundamentals  
from quantopian.pipeline import CustomFactor  
from quantopian.pipeline.filters import QTradableStocksUS  
import numpy as np  
import cvxopt as opt  
from cvxopt import blas, solvers  
import pandas as pd


def initialize(context):  
    context.opt_frame = 180

    # Rebalance on the first trading day of each month at 11AM.  
    schedule_function(my_rebalance,  
                      date_rules.every_day(),  
                      time_rules.market_open(hours=1, minutes=30))

    # Create and attach our pipeline (dynamic stock selector), defined below.  
    attach_pipeline(make_pipeline(context), 'pit')

def make_pipeline(context):  
  # Get Latest Fundamentals  
  OCF = Fundamentals.operating_cash_flow.latest  
  debt_to_asset = Fundamentals.debtto_assets.latest  
  quick_ratio = Fundamentals.quick_ratio.latest  
  outstanding_shares = Fundamentals.shares_outstanding.latest  
  gross_margin = Fundamentals.gross_margin.latest  
  assets_turnover = Fundamentals.assets_turnover.latest  
  NI = Fundamentals.net_income_from_continuing_operations.latest  
  cash_return = Fundamentals.cash_return.latest  
  enterprise_value = Fundamentals.enterprise_value.latest  
  total_revenue = Fundamentals.total_revenue.latest  
  fcf_yield = Fundamentals.fcf_yield.latest  
  book_value_yield = Fundamentals.book_value_yield.latest  


  #Get Last Year Fundamentals  
  class Previous(CustomFactor):  
    # Returns value of input x trading days ago where x is the window_length  
    # Both the inputs and window_length must be specified as there are no defaults  
      def compute(self, today, assets, out, inputs):  
          out[:] = inputs[0]  
  window_length = 252  
  debt_to_asset2 = Previous(inputs = [Fundamentals.debtto_assets], window_length = window_length)  
  quick_ratio2 = Previous(inputs = [Fundamentals.quick_ratio], window_length = window_length)  
  outstanding_shares2 = Previous(inputs = [Fundamentals.shares_outstanding], window_length = window_length)  
  gross_margin2 = Previous(inputs = [Fundamentals.gross_margin], window_length = window_length)  
  assets_turnover2 = Previous(inputs = [Fundamentals.assets_turnover], window_length = window_length)  
  NI2 = Previous(inputs = [Fundamentals.net_income_from_continuing_operations], window_length = window_length)  
  total_revenue2 = Previous(inputs = [Fundamentals.total_revenue], window_length = window_length)  
  cash_return2 = Previous(inputs = [Fundamentals.cash_return], window_length = window_length)  
  enterprise_value2 = Previous(inputs = [Fundamentals.enterprise_value], window_length = window_length)

  result = Pipeline(  
    columns={  
        'OCF':OCF,  
        'debt_to_asset':debt_to_asset,  
        'debt_to_asset2':debt_to_asset2,  
        'quick_ratio':quick_ratio,  
        'quick_ratio2':quick_ratio2,  
        'outstanding_shares':outstanding_shares,  
        'outstanding_shares2':outstanding_shares2,  
        'gross_margin':gross_margin,  
        'gross_margin2':gross_margin2,  
        'assets_turnover':assets_turnover,  
        'assets_turnover2':assets_turnover2,  
        'NI': NI,  
        'NI2': NI2,  
        'total_revenue': total_revenue,  
        'total_revenue2': total_revenue2,  
        'cash_return': cash_return,  
        'cash_return2': cash_return2,  
        'enterprise_value': enterprise_value,  
        'enterprise_value2': enterprise_value2,  
        'fcf_yield': fcf_yield,  
        'book_value_yield': book_value_yield  
        }, screen = QTradableStocksUS()  
  )  
  return result

def before_trading_start(context, data):

    context.output = pipeline_output('pit')  
    result = context.output.dropna(axis=0)  
    result2 = context.output.dropna(axis=0)

    result.loc[:,('total_avg_assets')] = result.loc[:,('total_revenue')]/result.loc[:,('assets_turnover')]  
    result.loc[:,('ROA')] = result.loc[:,('NI')]/result.loc[:,('total_avg_assets')]  
    result.loc[:,('FCF')] = result.loc[:,('cash_return')]*result.loc[:,('enterprise_value')]  
    result.loc[:,('FCFTA')] = result.loc[:,('FCF')]/result.loc[:,('total_avg_assets')]

    result.loc[:,('total_avg_assets2')] = result.loc[:,('total_revenue2')]/result.loc[:,('assets_turnover2')]  
    result.loc[:,('ROA2')] = result.loc[:,('NI2')]/result.loc[:,('total_avg_assets2')]  
    result.loc[:,('FCF2')] = result.loc[:,('cash_return2')]*result.loc[:,('enterprise_value2')]  
    result.loc[:,('FCFTA2')] = result.loc[:,('FCF2')]/result.loc[:,('total_avg_assets2')]

#Current Profitability  
#ROA > 0  
    result.loc[:,('FS_ROA')] = result.loc[:,('ROA')] >0  
#FCFTA > 0  
    result.loc[:,('FS_FCFTA')] = result.loc[:,('FCFTA')] >0  
#Accrual  
    result.loc[:,('F_ACCRUAL')] = result.loc[:,('OCF')] > result.loc[:,('NI')]

#Stability  
#Lever  
    result.loc[:,('delta_lever')] = result.loc[:,('debt_to_asset')] - result.loc[:,('debt_to_asset2')] < 0  
#Liquidity  
    result.loc[:,('delta_quick_ratio')] = result.loc[:,('quick_ratio')] - result.loc[:,('quick_ratio2')] > 0  
#Equity Repurchase  
    result.loc[:,('delta_OS')] = result.loc[:,('outstanding_shares')] - result.loc[:,('outstanding_shares2')] <= 0

#Operational Improvement  
#Increasing ROA  
    result.loc[:,('delta_roa')] = result.loc[:,('ROA')] - result.loc[:,('ROA2')] > 0  
#Increasing FCFTA  
    result.loc[:,('delta_FCFTA')] = result.loc[:,('FCFTA')] - result.loc[:,('FCFTA2')] > 0  
#Increasing gross margin  
    result.loc[:,('delta_gross_margin')] = result.loc[:,('gross_margin')] - result.loc[:,('gross_margin2')] > 0  
#Increasing assets turnover  
    result.loc[:,('delta_assets_turnover')] = result.loc[:,('assets_turnover')] - result.loc[:,('assets_turnover2')] > 0

    result = result.drop(['OCF','assets_turnover','assets_turnover2','debt_to_asset',  
                      'debt_to_asset2','gross_margin','gross_margin2','outstanding_shares',  
                      'outstanding_shares2','quick_ratio','quick_ratio2','ROA','ROA2','NI','NI2','total_revenue','total_revenue2','cash_return', 'cash_return2', 'enterprise_value', 'enterprise_value2',  
                     'total_avg_assets','FCF','FCFTA','total_avg_assets2','FCF2','FCFTA2','fcf_yield'], axis=1)  
    result = result.astype(int)  
  #Sum row to get the score  
    result.loc[:,('score')] = result.sum(axis=1)  
    result.loc[:,('fcf_yield')] = result2.loc[:,('fcf_yield')]  
    result.sort_values(by=['fcf_yield'])  
    result = result[result['book_value_yield'] > result['book_value_yield'].quantile(0.7)]  
    result10 = result[result['score'] == 10]  
    context.securities = result10.tail(10)  
    context.sym = []  
    for stock in context.securities.index:  
            context.sym.append(stock.symbol)  
def optimal_portfolio(returns):  
    """  
    Finds the Optimal Portfolio according to the Markowitz Mean-Variance Model  
    """  
    n = len(returns)  
    returns = np.asmatrix(returns)  

    print type(returns)  
    print returns  
    N = 200  
    mus = [10**(5.0 * t/N - 1.0) for t in range(N)]  
    # Convert to cvxopt matrices  
    S = opt.matrix(np.cov(returns))  
    pbar = opt.matrix(np.mean(returns, axis=1))  
    print pbar  

    # Create constraint matrices  
    G = -opt.matrix(np.eye(n))   # negative n x n identity matrix  
    h = opt.matrix(0.0, (n ,1))  
    A = opt.matrix(1.0, (1, n))  
    b = opt.matrix(1.0)  
    # Calculate efficient frontier weights using quadratic programming  
    portfolios = [solvers.qp(mu*S, -pbar, G, h, A, b)['x']  
                  for mu in mus]  
    ## CALCULATE RISKS AND RETURNS FOR FRONTIER  
    returns = [blas.dot(pbar, x) for x in portfolios]  
    risks = [np.sqrt(blas.dot(x, S*x)) for x in portfolios]  
    ## CALCULATE THE 2ND DEGREE POLYNOMIAL OF THE FRONTIER CURVE  
    m1 = np.polyfit(returns, risks, 2)  
    x1 = 10 #= np.sqrt(m1[2] / m1[0])  
    # CALCULATE THE OPTIMAL PORTFOLIO  
    wt = solvers.qp(opt.matrix(x1 * S), -pbar, G, h, A, b)['x']  
    return np.asarray(wt), returns, risks

def my_rebalance(context,data):  
    """  
     Called daily to rebalance the Portfolio according to the equity weights calulated by optimal_portfolio()  
    """

    prices = data.history(context.sym, 'price', bar_count=100, frequency='1d')  
    returns = prices.pct_change().dropna()  
    returns = returns.values  
    print type(prices)  
    print prices

    try:  
        # Calculate weights by method of choice  
        weights, _, _ = optimal_portfolio(returns.T)  
        #weights = equal_portfolio(returns.T)  

        # Rebalance portfolio accordingly  
        for stock, weight in zip(prices.columns, weights):  
            order_target_percent(stock, weight)  
    except ValueError as e:  
        # Sometimes this error is thrown  
        # ValueError: Rank(A) < p or Rank([P; A; G]) < n  
        pass  
    pass  

Thanks for sharing, Thanh Duong.

Do you find the Markowitz Mean-Variance optimisation results stable?
Keen to see a backtest :)

Hi Thanh,

Context.sym is being appended, but your issue is
prices = data.history(context.sym, 'price', bar_count=100, frequency='1d') not really sure what invalid literal for int() with base 10: 'S' means

I found the issue! Turned out my initial DataFrame had too many filters that made it empty! I removed some of those filters and now the algorithm is up and running. I will post the result shortly. Thanks a lot, JJ.

can't wait to see the results Thanh, glad I could help

Here is a backtest from 2016 to 2018. I tried to backtest from 2004 to 2018 but the algorithm crashed sometime in 2009 or 2015. I am still trying to find out why. This algorithm is basically a combination of a couple of strategies that I learned on Quantopian. I selected 10 financially stable stocks with Piotroski Score + sorted dividend yield. Then, I constructed a portfolio with those ten stocks with optimized weights from Markowitz Mean-Variance Frontier.

The result: TOO RISKY! The return just seemed like a magnified market return with 4.99 beta! Maximum drawdown is 52%. In one of my backtest, in 2008, the max drawdown was 2000%... There has to be something wrong with the algorithm as everything seemed too absurd to me, haha.

JJ and Karl, I hope that you guys can offer some insights as to why this happened.

Here is the original backtest with equally weighted Piotroski + sorted dividend yield stocks (No Markowitz involved)

Did you put a limit on your leverage??

I did not! I will now.

I summed all the weights up in variable "leverage" and changed
order_target_percent(stock, weight)

to

order_target_percent(stock, weight/leverage)

I also changed the rebalancing frequency to monthly and fixed minor errors. The result, however, is still a very high beta portfolio. The sharp ratio does look good. I am backtesting from 2004 - 2018 to see how it would perform during a recession.

The real reason for the leverage problem is that your algorithm does not sell stocks that are no more in the context.securities.
These 4 lines in my rebalance can help to do that.

    for stock in context.portfolio.positions.keys():  
        if stock not in context.securities.index:  
             if data.can_trade(stock):  
                 order_target_percent(stock, 0)  

Thanks, Vladimir. I can't believe I forgot about this... I'm rerunning everything now.

Fixed everything! Here is the final result. The return slightly increases as compared to the algorithm without Markowitz (equally weighted). Alpha increased from 5% to 6%. Max drawdown increase as the result of slightly higher beta causing a bigger downswing during the recession. However, the sharp ratio decrease, which is odd because I thought the whole point of Markowitz weighting is to maximize sharp ratio. Do you guys have any suggestion on how to further improve the algorithm?

it looks great did you submit it already

Hi JJ, do you mean submitting for the competition? Unfortunately, this algorithm is a long-only and not a long-short, and I didn't use the appropriate constraints and order_optimize to qualify for the competition. I do have a long-short version but it is horrible. Here is the long-short version with all contest condition met. Even though stocks with high Piotroski scores do seem to out-perform the market, ones with low Piotroski scores do not. Hence, when I shorted the low Piotroski score stocks, the whole portfolio underperformed.

how did you get risk_loadings to work even the one from quantopian has a runtime error

Hey JJ,

I followed the contest instruction and read sample codes from Quantopian to construct this algo. I ran this backtest a couple months ago. Maybe the runtime error is a recent issue?

I found the issue, I wasn't creating a proper pipeline.
Ill show you the algo when it completes the backtest

Here it is, I based this on some value investing but I couldn't get any info on bond ratings

Hey JJ. Can you explain what this line does?

beta = 0.66*RollingLinearRegressionOfReturns(
target=sid(8554),
returns_length=5,
regression_length=260,
mask=combined_alpha.notnull() & Sector().notnull()
).beta + 0.33*1.0

believe that is the restriction for beta to spy
https://www.quantopian.com/tutorials/contest#lesson8

Heres a better one that meets the constraints but only for the time period ,
Im still working on the alpha, the key is to get a positive return 2 years prior to the current date where I'm not
just improving the algo for that period only.

Thanh would you happen to know how many stocks are in QTradableStocksUS

Hey Thanh,

I wondering if you can help me out with something,

can you tell me how you manage your backtest in research environment,
there doesn't seem to be a direct way to this.
Im trying to run backtest on Fundamentals

Hey JJ. Sorry I have been busy all week. I normally only create the pipeline in the research environment and move that right away to backtest. I will try to help you this weekend.

thank you ,
my goal is to loop through all combination of fundamentals to get the combo with the highest returns,
but I can't seem to get the return comparison right .
Thank you in advance for you help

Here's the updated notebook it seems to have a problem appending run pipeline,
I had to attach a break in combo so the kernel can stop dying

Hey JJ. Sorry for the delay in response. I think your idea is solid.

However, isn't from this post. It's still not yet possible to run a sustainable backtest on research? I try to run zipline locally also, but we can't access Fundamental or anything like QtradableStockUS() from the Quantopian API.

I might try to scrape fundamental data from the SEC and test out your hypothesis (that there exists a combination of fundamental features that would yield the highest alpha). But it would be a very lengthy process and might not be applicable to Quantopian's order_optimize.

IT's ok I decided instead of backtesting Fundamentals I'm just going to determine which indicator is more prevalent within high returns

Hi Thanh Duong,

One tweet by Earnest Chan on Mean-Variance optimisation may be of interest to you :)

ps: Reference paper Optimized Fundamental Portfolios abstract.. using "a fundamentals-based returns model in conjunction with classic mean-variance portfolio optimization and find that portfolio optimization combined with fundamental analysis offers substantial improvements in portfolio performance over either fundamental analysis or portfolio optimization alone. Long-only mean-variance optimized fundamental portfolios produce CAPM alphas of over 3.2% per quarter and 5-factor alphas of over 2.2% per quarter, with high Sharpe and Information ratios."

Hey Karl,

This is fascinating. Thanks a lot for sharing! I will play around with it. Have you tested it out yet?

Here's some preliminary work, in which I made a data frame with the return, operating assets growth, financing growth, roe, and book to market value.

However, in the research, their regression model is a five years time series updated every month, not a cross-section. Right now, I only know how to queue data from one previous period by using the function:

  class Previous(CustomFactor):  
    # Returns value of input x trading days ago where x is the window_length  
    # Both the inputs and window_length must be specified as there are no defaults  
      def compute(self, today, assets, out, inputs):  
          out[:] = inputs[0]  

Karl, do you know how to get historical data for different periods in the past?

Hi Thanh Duong,

Previous(CustomFactor) is versatile to customise for user-defined periods in Pipeline, for example:

period = 5 # say 5 years

for n in range(0, period):  
    wL = 252 * n + 1 # Note: wL = 1 at n = 0 is equivalent to .latest  
    net_income = Previous(inputs=[Fundamentals.net_income_income_statement], window_length=wL, mask=my_universe)  
    pipe.add(net_income, 'netIncome'+str(n))  
    return pipe

class Previous(CustomFactor):  
    window_safe = True  
    def compute(self, today, assets, out, inputs):  
        out[:] = inputs[0]  

As for mean-variance optimization, I have tried to make it work but found it tenuous and unpredictable - try Quantopian's multivariate Optimize API that is very effective and highly customisable :)

Hi Karl.

Thanks a lot for your help. I tested your code but somehow the data frame only showed netIncome0 :(. Sorry for being such a noob. I am still new to coding and would very much appreciate your help!

Thanks!

No worries, Thanh perhaps try if my Notebook snippets are any good..

Hope this helps.

Hi Thanh Duong,

Just recalled.. there were recent posts on the subject that may be relevant to your quest.

Hope they are useful for reference.