One can easily add current data to the dataframe returned by pipeline. Since the pipeline output is a pandas dataframe and the results from the data.current
method can be a pandas series, they play nice together. Something like this should work in most cases:
# First get our data as of market open from pipeline
stock_data = pipeline_output('my_pipe')
# Next, get current prices for all the securities in the pipeline and add as a new column
stock_data['current_price'] = data.current(stock_data.index, 'price')
Once all the data is neatly in a single dataframe (both previous data fetched using pipeline and current data fetched using data.current
), one can perform whatever logic one wishes. I like using the query
method along with either nlargest
or nsmallest
to first select the pool of potential securities and then to take a set number of the 'best' to actually trade. There are a lot of ways to do the logic but here is one approach
# Create whatever logic to choose longs and shorts
# Note that anything currently held but NOT in either of these will be closed
longs = (stock_data.
query('(current_price<yesterday_close) & (rsi<30) & (yesterday_close<sma_10)').
nsmallest(context.TARGET_LONGS, 'current_to_sma25_price').
index.tolist()
)
shorts = (stock_data.
query('(current_price>yesterday_close) & (rsi>70) & (yesterday_close>sma_10)').
nlargest(context.TARGET_SHORTS, 'current_to_sma25_price').
index.tolist()
)
A couple of things. First, if one uses current data then at least some of the trading logic needs to be outside of the pipeline definition. Pipeline definitions only apply to pipeline data and not current data. Moreover, one cannot really analyze such a 'factor' (or strategy) using Alphalens. Alphalens is premised on the assumption that all trades would be decided on before the market opens and then traded at market open. Using current days data doesn't fit this model. This approach is fine for backtesting just not for Alphalens.
So, "is it possible? will the dates from get_pricing()
and from pipeline match?". Yes, in an algo, the pipeline returned will be all the data one knows just before market opens. One can add more recent data. This is also true in a notebook as long as one interprets that data as yesterdays data. It's often helpful to explicitly name pipeline prices as yesterday_close
or previous_close
. As a sidenote, get_pricing
only works in notebooks. The similar method to use in an algo is data.history
or data.current
.
Also, "can the function handle_data() take assets from my pipeline?" . Yes, the pipeline output dataframe is indexed with the assets. One can get a list of those assets as follows
#Get our data from pipeline
stock_data = pipeline_output('my_pipe')
# Get the index and turn into a list. These will be the pipeline assets.
my_assets = stock_data.index.tolist()
Most methods will also work directly with the asset index so there may not be a need to always turn it into a list.
The attached backtest shows this approach in action. It could work as a "skeleton" for a working algo. Good luck!