Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
How to get access to data variable in custom factor class?

Hi,

I have some code that loads data from a CSV, that I execute in initialize(context). I also have a custom factor class that I have definitions to compute a signal in. However, I don't know how to get access to the data variable in that custom factor class.

The data from the CSV is accessible by netQty = data.current(data.fetcher_assets, 'netQty'). However, my custom factor class has no access to the variable "data". How do I get it into the class and the def compute method?

My code is below:

from quantopian.algorithm import attach_pipeline, pipeline_output  
from quantopian.pipeline import Pipeline  
from quantopian.pipeline.data.builtin import USEquityPricing  
from quantopian.pipeline.filters import Q500US, Q1500US  
from quantopian.pipeline import CustomFactor  
from quantopian.pipeline.data import morningstar

import numpy as np

def initialize(context):  
    # Rebalance at the beginning of the month, 1 hour after market open.  
    schedule_function(rebalance,date_rules.month_start(),time_rules.market_open(hours=1))  
    # get csv of insider holdings data (loaded from lh2456's Google Drive)  
    fetch_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vQI-J2Ewg_XH6DD9GfTPfO7QQsCmHBujrKUKbdzek6jLU5uWHGmCRPPftvz8l8sM9ukemgUS6SRXYeu/pub?output=csv',  
              date_column='date',  
              date_format='%m/%d/%y')  
    # Create our dynamic stock selector.  
    attach_pipeline(make_pipeline(), 'pipeline')  
    # Trading cost module.  
    set_commission(commission.PerShare(cost=0.0, min_trade_cost=0))  
    set_slippage(slippage.VolumeShareSlippage(volume_limit=0.50, price_impact=0.0))  


class ComputeInsiderSignal(CustomFactor):  
    # how do I get the CSV into the compute custom factor class?  
    # Want access to netQty variable here.  
    inputs = [morningstar.valuation_ratios.pb_ratio]  
    window_length = 1  
    def compute(self, today, assets, out, pb_ratio):  
        out[:] = 1.0/pb_ratio[-1]  


def make_pipeline():  
    # Base universe set to the Q1500US.  
    base_universe = Q1500US()

    # Compute the factor.  
    InsiderSignal = ComputeInsiderSignal(mask=base_universe)

    # Take the top 50 and bottom 50 signals.  
    longs = InsiderSignal.top(50)  
    shorts = InsiderSignal.bottom(50)  
    securities_to_trade = (shorts | longs)

    return Pipeline(  
        columns={  
            'Longs': longs,  
            'Shorts': shorts,  
            'InsiderSignal': InsiderSignal,  
        },  
        screen=(securities_to_trade),  
    )  

def before_trading_start(context, data):  
    # Get our database of signals every day.  
    pipe_results = pipeline_output('pipeline')  
    # read the net quantity of insider trades (all) from the fetched data  
    netQty = data.current(data.fetcher_assets, 'netQty')  
    # print out data (can access the data here, since we have access to data variable)  
    print netQty  
    print type(netQty)  
    print pipe_results.sort_values('InsiderSignal',ascending = False).head()  
    print pipe_results.sort_values('InsiderSignal',ascending = False).tail()  
    # Get a list of securities that we can trade and that we want to buy.  
    context.longs = []  
    for sec in pipe_results[pipe_results['Longs']].index.tolist():  
        if data.can_trade(sec):  
            context.longs.append(sec)

    # Get a list of securities that we can trade and that we want to short.  
    context.shorts = []  
    for sec in pipe_results[pipe_results['Shorts']].index.tolist():  
        if data.can_trade(sec):  
            context.shorts.append(sec)    

def rebalance(context, data):  
    long_weight = 0.5 / len(context.longs)  
    short_weight = -0.5 / len(context.shorts)  
    for security in context.portfolio.positions:  
        if security not in context.longs and security not in context.shorts and data.can_trade(security):  
            order_target_percent(security, 0)

    for security in context.longs:  
        order_target_percent(security, long_weight)

    for security in context.shorts:  
        order_target_percent(security, short_weight)  
    record(NoLongs = len(context.longs), NoShorts = len(context.shorts))  
3 responses

Hi Lawrence. Were you able to find a solution? I'm trying to do the same.

Ricardo,

I couldn't get pipeline working with the custom factor code. But I did get it working by just playing around manually with panda DataFrame. I've attached my code for your perusal.

Some background: make_pipeline() is used to extract shares_outstanding from morningstar, and industry_code (where I do some custom industry_code logic). I found this to be one of the easiest ways to construct a table with some pre-existing data.

before_trading_start() is executed on the schedule basis before your trade runs. This will calculate your custom factor. The variable pipe_results holds the contents of the pipeline signal. I append columns to the table manually (e.g. netQty, netQtyForCEO, etc).

Hope this helps.

from quantopian.algorithm import attach_pipeline, pipeline_output  
from quantopian.pipeline import Pipeline  
from quantopian.pipeline.data.builtin import USEquityPricing  
from quantopian.pipeline.filters import Q500US, Q1500US  
from quantopian.pipeline import CustomFactor  
from quantopian.pipeline.data import morningstar, Fundamentals

import pandas as pd  
import numpy as np

def initialize(context):  
    # Rebalance at the beginning of the month, 1 hour after market open.  
    schedule_function(rebalance,date_rules.month_start(),time_rules.market_open(hours=1))  
    # get csv of insider holdings data (loaded from lh2456's Google Drive)  
    # actual file  
    fetch_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vQI-J2Ewg_XH6DD9GfTPfO7QQsCmHBujrKUKbdzek6jLU5uWHGmCRPPftvz8l8sM9ukemgUS6SRXYeu/pub?output=csv',  
              date_column='date',  
              date_format='%m/%d/%y')  
    # test file (only 2018)  
    #fetch_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vTtSDhw-kKaY8oUU5D4zxVa3N_pHMuk8V19qMDcPe8OnXgR_r7CR53T47SmpqU3o7NeaBXplhumx66y/pub?output=csv',  
    #           date_column='date',  
    #           date_format='%m/%d/%y')  
    # test file (2017 - current)  
    #fetch_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vQfTZlYDpfrvaGG2tD5v383iWpHbjPc26m5beY4Ll2OEfgIBZhb1eq6yVomZEkNqsiCc23njLi_4_De/pub?output=csv',  
    #          date_column='date',  
    #          date_format='%m/%d/%y')

    # Create our dynamic stock selector.  
    attach_pipeline(make_pipeline(), 'pipeline')  
    # Trading cost module.  
    #set_commission(commission.PerShare(cost=0.0, min_trade_cost=0))  
    #set_slippage(slippage.VolumeShareSlippage(volume_limit=0.50, price_impact=0.0))    

def make_pipeline():  
    # Base universe set to the Q1500US.  
    base_universe = Q500US()

    return Pipeline(  
        columns={  
            'shares_outstanding': Fundamentals.shares_outstanding.latest,  
            'industry_code': morningstar.asset_classification.morningstar_industry_code.latest  
        },screen=(base_universe),  
    )  

def before_trading_start(context, data):  
    # Get our database of signals every day.  
    pipe_results = pipeline_output('pipeline')  
    # filter out anything that isn't the following industries:  
    # ENERGY = 309  
    # FINANCIAL_SERVICES = 103  
    industry_codes = (pipe_results['industry_code'] - (pipe_results['industry_code'] % 100000)) / 100000  
    energy = industry_codes == 309  
    financial = industry_codes == 103  
    # union the two pipe results to find the sector filtered dataframe  
    pipe_results = pipe_results[energy | financial]  
    # read the net quantity of insider trades (all) from the fetched data  
    netQty = data.current(data.fetcher_assets, 'netQty')  
    netQtyForCEO = data.current(data.fetcher_assets, 'netQtyForCEO')  
    netQtyForCFO = data.current(data.fetcher_assets, 'netQtyForCFO')  
    netQtyForOfficers = data.current(data.fetcher_assets, 'netQtyForOfficers')  
    netQtyForDirectors = data.current(data.fetcher_assets, 'netQtyForDirectors')  
    # set the netQty column to be netQty, and others as well  
    pipe_results['netQty'] = netQty  
    pipe_results['netQtyForCEO'] = netQtyForCEO  
    pipe_results['netQtyForCFO'] = netQtyForCFO  
    pipe_results['netQtyForOfficers'] = netQtyForOfficers  
    pipe_results['netQtyForDirectors'] = netQtyForDirectors  
    # drop any rows where any entries are null or NaN  
    # N.B.: We do this only once (before we calculate normalizedNetQty) because we assume that shares_outstanding cannot be 0. (If if were, then normalizedNetQty would be NaN again. However, if shares_outstanding is 0, then it isn't tradeable. So this would be a valid assumption)  
    pipe_results = pipe_results.dropna(how='any')  
    # add normalized columns.  
    shares_outstanding = pipe_results['shares_outstanding']  
    # netQty  
    normalizedNetQty = pipe_results['netQty']/shares_outstanding  
    pipe_results['normalizedNetQty'] = normalizedNetQty  
    # netQtyForCEO  
    normalizedNetQtyForCEO = pipe_results['netQtyForCEO']/shares_outstanding  
    pipe_results['normalizedNetQtyForCEO'] = normalizedNetQtyForCEO  
    # netQtyForCFO  
    normalizedNetQtyForCFO = pipe_results['netQtyForCFO']/shares_outstanding  
    pipe_results['normalizedNetQtyForCFO'] = normalizedNetQtyForCFO  
    # netQtyForOfficers  
    normalizedNetQtyForOfficers = pipe_results['netQtyForOfficers']/shares_outstanding  
    pipe_results['normalizedNetQtyForOfficers'] = normalizedNetQtyForOfficers  
    # netQtyForDirectors  
    normalizedNetQtyForDirectors = pipe_results['netQtyForDirectors']/shares_outstanding  
    pipe_results['normalizedNetQtyForDirectors'] = normalizedNetQtyForDirectors

    # calculate the top 50 and bottom 50  
    # TODO: This is where we can change which normalized column we want to sort by.  
    # To "activate it", add a # in front of the column you don't want to sort by, and remove the "#" in front of the  
    # column that you do. Only one column can be uncommented at a time.  
    #columnToSortBy = 'normalizedNetQty'  
    #columnToSortBy = 'normalizedNetQtyForCEO'  
    columnToSortBy = 'normalizedNetQtyForCFO'  
    #columnToSortBy = 'normalizedNetQtyForOfficers'  
    #columnToSortBy = 'normalizedNetQtyForDirectors'  
    top50 = pipe_results.nlargest(50, columnToSortBy)  
    bottom50 = pipe_results.nsmallest(50, columnToSortBy)  
    # debug info  
    # print "largest 50"  
    # print top50  
    # print "smallest 50"  
    # print bottom50  
    # Get a list of securities that we can trade and that we want to buy.  
    context.longs = []  
    for sec in top50.index.tolist():  
        if data.can_trade(sec):  
            context.longs.append(sec)

    # Get a list of securities that we can trade and that we want to short.  
    context.shorts = []  
    for sec in bottom50.index.tolist():  
        if data.can_trade(sec):  
            context.shorts.append(sec)  

def rebalance(context, data):  
    long_weight = 0.5 / len(context.longs)  
    short_weight = -0.5 / len(context.shorts)  
    for security in context.portfolio.positions:  
        if security not in context.longs and security not in context.shorts and data.can_trade(security):  
            order_target_percent(security, 0)

    for security in context.longs:  
        order_target_percent(security, long_weight)

    for security in context.shorts:  
        order_target_percent(security, short_weight)  
    record(NoLongs = len(context.longs), NoShorts = len(context.shorts))


Lawrence. Unfortunately, I can not do as you suggest.

I uploaded some data (vix term structure slopes) that I need to regress with stocks return to form the factor. I haven't been able to do it in the Pipeline.

Thank you anyway for your fast reply.