Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
How to get the values from the columns of a Pipeline

I have an algorithm that I'm trying to work on coding for getting averages of the currently selected stock and updating them with the number of days they've been selected. I have a ExponentialWeightedMovingAverage and a MovingAverageConvergenceDivergenceSignal in the columns as well as the total rank. I use the total rank column to get the selected stocks. My questions is: how do I get the MACD and EWMA values from the pipeline based on the rank. (I'm not finished researching but I'm trying to implement the code in to be able to test run it and see if it's updating and doing the averaging right. Averages include close prices, EWMA, and MACD in which I will weight each one).

8 responses

Notebook

from quantopian.pipeline import Pipeline  
from quantopian.research import run_pipeline  
from quantopian.pipeline.factors import SimpleMovingAverage, MovingAverageConvergenceDivergenceSignal, EWMA  
from quantopian.pipeline.data.builtin import USEquityPricing  
def make_pipeline():  
    ewma = EWMA.from_span(inputs=[USEquityPricing.close], window_length=20, span=15,)  
    macd = MovingAverageConvergenceDivergenceSignal(inputs=[USEquityPricing.close])  
    close = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=20)  
    pipe = Pipeline(  
        columns = {  
            'close': close,  
            'macd' : macd,  
            'ewma' : ewma  
        }  
    )  
    return pipe

run_pipeline(make_pipeline(), start_date = '2015-01-01', end_date = '2015-03-01').sort_values('close', ascending=False).head(5)  

Algorithm

from quantopian.algorithm import attach_pipeline, pipeline_output  
from quantopian.pipeline import Pipeline  
from quantopian.pipeline.factors import SimpleMovingAverage, MovingAverageConvergenceDivergenceSignal, EWMA  
from quantopian.pipeline.data.builtin import USEquityPricing  
def initialize(context):  
    pipe = make_pipeline()  
    attach_pipeline(pipe, 'make_pipeline')  
def make_pipeline():  
    ewma = EWMA.from_span(inputs=[USEquityPricing.close], window_length=20, span=15,)  
    macd = MovingAverageConvergenceDivergenceSignal(inputs=[USEquityPricing.close])  
    close = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=20)  
    pipe = Pipeline(  
        columns = {  
            'close': close,  
            'macd' : macd,  
            'ewma' : ewma  
        }  
    )  
    return pipe

def before_trading_start(context,data):  
    context.output = pipeline_output('make_pipeline')  
    print(context.output.sort_values('close', ascending=False).head(5))  

I have that much so far. How would I loop through the stocks and get their individual MACD and EWMA? after the pipeline has already ran? I don't want the entire universe, just the MACD and EWMA from the stocks the pipeline has generated. So say pipeline generates 3 stocks (X, Y, Z) and I want the individual stock MACD and EWMA (which I have stored the values in the Pipeline columns). How would I retrieve X, Y, Z individual EWMA and MACD from the Pipeline column (as it's been returned but I just need the code to get it out). Like on similarity to C# equivalent of these:

context_output[stock]['macd']
context_output[stock]['ewma']

These would select the MACD with the corresponding value associated with the stock involved in the results of the Pipeline.

Those are the C# equivalent. I'm looking for how to accomplish this in python

I have figured it out. To get the other columns, you have to do:

context.output[context.output["column you use"]]['ewma']  

Same goes for the MACD :)

The pipeline output is a pandas dataframe. The columns are whatever factors, filters, or classifiers were specified in the pipeline definition. The rows are indexed by the securities. To get specific column values or multiple values at a time, you can use any of the many slicing and dicing methods which pandas has to offer (see https://pandas.pydata.org/pandas-docs/stable/indexing.html also maybe take a look here https://pandas.pydata.org/pandas-docs/stable/10min.html)

Two methods I like are .get_value and the ubiquitous .loc . (see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.get_value.html and https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html )

# assume that stock xyz is in context.output which is the dataframe returned by pipeline...  
# get_value is great when one wants to retrieve one specific value from the dataframe  
# enter the row index (ie xyz) and the column name (ie 'ewma') you want to retrieve  
xyz_ewma = context.output.get_value(xyz, 'ewma')

# The .loc method is more flexible and can retrieve multiple values and/or rows  
# To get a single value use the following. Note the brackets rather than parenthesis.  
xyz_ewma = context.output.loc[xyz, 'ewma']

How about in the research environment? How should I represent the name of the security? I tried

my_pipe.loc[("2019-01-07 00:00:00+00:00","Equity(24 [AAPL])"),"ewma"]  

It raised the "KeyError: 'Equity(24 [AAPL])'" exception

By the way here's a way to line up the column headers (from the original algorithm).
print('{}{}'.format('.\n', context.output.sort_values('close', ascending=False).head(5)))

    Original:  
    print(context.output.sort_values('close', ascending=False).head(5))  
2018-12-11 05:45 PRINT                              close           ewma         macd  
Equity(1091 [BRK_A])  320597.26820  317323.747381 -1016.181763  
Equity(6761 [SEB])      3652.98425    3703.206860    34.198243  
Equity(5513 [NVR])      2434.91750    2454.470675    45.890790  
Equity(19917 [BKNG])    1850.89225    1850.313016    -9.535392  
Equity(16841 [AMZN])    1621.92195    1643.671116     8.446272

    Replacement:  
    print('{}{}'.format('.\n', context.output.sort_values('close', ascending=False).head(5)))  
2018-12-11 05:45 PRINT .  
                             close           ewma         macd  
Equity(1091 [BRK_A])  320597.26820  317323.747381 -1016.181763  
Equity(6761 [SEB])      3652.98425    3703.206860    34.198243  
Equity(5513 [NVR])      2434.91750    2454.470675    45.890790  
Equity(19917 [BKNG])    1850.89225    1850.313016    -9.535392  
Equity(16841 [AMZN])    1621.92195    1643.671116     8.446272  

On the KeyError, consider attaching the Notebook.

This is some code providing an overview of pipeline output with examples of high and lows for each column.

Partial output here, and it exposes some NaN's. Full output is in the code.

2018-12-10 05:45 log_data:40 INFO $30,000    2018-12-10 to 2018-12-11  
2018-12-10 05:45 log_data:111 INFO Stocks: 8731  Columns: 3  
                        min                   mid        max  
 close     0.00114285714286              23.39825        nan     NaNs 69/8731  
  ewma     0.00157469165799          28.004032962        nan     NaNs 1608/8731  
  macd       -219.708487626     0.000653736733132        nan     NaNs 1762/8731  
2018-12-10 05:45 log_data:126 INFO _ _ _   close   _ _ _  
    ... close highs  
                             close           ewma        macd  
Equity(1091 [BRK_A])  322000.76820  319339.588164 -219.708488  
Equity(6761 [SEB])      3649.53425    3705.932057   27.664418  
Equity(5513 [NVR])      2430.06500    2462.412753   52.415213  
Equity(19917 [BKNG])    1858.44875    1855.240195  -11.512557  
    ... close lows  
                      close  ewma  macd  
Equity(52703 [PQIN])    NaN   NaN   NaN  
Equity(52704 [ACSG])    NaN   NaN   NaN  
Equity(52705 [ENOB])    NaN   NaN   NaN  
Equity(52707 [SUZ])     NaN   NaN   NaN  

This would screen out the NaN's:

    pipe = Pipeline(  
        screen  = ewma.notnull() & macd.notnull() & close.notnull(),  
        columns = {  
            'close': close,  
            'macd' : macd,  
            'ewma' : ewma  
        }  
    )