Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Need some help on custom filter

Hi, I just started to study the stock behavior with Quantopian. I got questions that need some help.
Attached "Untitled1.ipynb" is my first custom filter to get the stock price increased from 3% to 20% (I know there is a built in function that I can get the daily return. I calculate here just for practice). The result seems not correct. Here are my questions:

  1. I created columns to show todays' and yesterdays' price. Yesterday's price is correct, but today's price is not (it seems it is the day before yesterday price)
  2. The price range I want is 3% to 20%. But the table shows only negative price change.

Anything wrong with my code? I googled/study a couple of days. But I can't figure it out. Can anyone shed me some lights?

jason

8 responses

First off, welcome. Good that you're using the research environment (ie notebooks) to evaluate factors. It's much easier than the IDE backtesting environment for seeing what's going.

The convention for passing input values into a custom factor is:

  • first value in the array (eg input[0]) is the earliest date
  • last value (eg input[-1]) will always be the latest date
  • the latest date is always the previous trading day before the pipeline run date

So, indeed yesterday's price is prices[-1]. However, prices[0] is the price 2 days ago (since the window_length=2). It's not today's price.

It's not possible to get the current day's prices using pipeline. The earliest data is always from the previous day. The paradigm is to run pipeline once each day before the markets are open (ie in the before_trading_start method) to retrieve data which is acted upon on the current day.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Thank you Dan, very helpful as always!

If one wanted to get today’s open price and the current high, low, and cumulative volume say 1 hour after the open, is there an easy and efficient way of doing this, other than having to collect this data every minute in handle_data (and thereby slow down the backtest quite a bit)?

Joakim, good question.

The current days open, high, low, close and cumulative volume can be fetched using data.history. The last values for these (eg high[-1] ) are the days values as of the minute the method was called. So what does this mean?

open: this is the days open price and will be constant throughout the day
high: the days high
low: the days low
close: the close price as of the last minute. This will update every minute and is the same as the data.current close price
volume: cumulative volume for the day

So, for example, to get the days high and cumulative volume as of one hour after open, one could do this:


def initialize(context):

    # Reference to the AAPL security.  
    context.aapl = sid(24)

    # Get data every day one hour after market open.  
    algo.schedule_function(  
        get_current_data,  
        algo.date_rules.every_day(),  
        algo.time_rules.market_open(hours=1, minutes=0)  
    )


def get_current_data(context, data):  
    fields = [ 'high',  'volume']

    # fetch data using data.history  
    # bar_count = 1 will get just the current days data.  
    latest_data = data.history(context.aapl,  
                                fields=fields,  
                                bar_count = 1,  
                                frequency = '1d')

It might be good to note that the method data.current will fetch data just for the previous (current) minute. So, for example, the 'high' will be the high price during the last minute and not the high price for the day. However, data.history will fetch data updated continuously as of the current day.

Attached is an algo showing this behavior. Look at the logs. Not formatted real pretty but one can see the various fields change (or not as the case may be).

Great, thanks Dan!

I don't suppose one could use data.current and data.history from within Pipeline? I'm asking because one may want to create a Q contest algo that goes long and short the stocks (within the QTU) with the biggest (and possibly smoothest or choppiest) price movement in the first hour of trading, given a minimum dollar volume. How would one go about doing this?

Or one may want to look at yesterday's first half trading day and compare it to the second half (.e.g is it trending or mean reverting?). If today's first half price movement is similar to yesterday's, is that predictive of today's second half price movement? Getting and storing yesterday's intraday price movements might be difficult, I'm just brainstorming really. I'm guessing none of this is possible in Research?

Apologies to @Jason also, for somewhat hijacking his post. It's somewhat related I think, but I'm happy to start a new post if that's preferred.

Thanks, Dan. Really appreciate!
That's explained why the price doesn't match. I finally figured out how the data are indexed. Can I find out these information somewhere? Or, if not, can I download the source code somewhere? I can find out from the source code.

Another question I have in the previous post is the function "percentile_between". I suppose to get all daily return between 3% and 10%. But why the result only give me negative returns?

Joakim
I don't mind at all. Actually, I learned a lot as Dan answers your questions.

@Jason, I did a cursory look for documentation on how the inputs are indexed and didn't immediately see anything. So, it's not surprising the confusion. (I'll put in a request to update the documentation). The zipline code which Quantopian is based upon is open source and can be found at https://github.com/quantopian/zipline and the documentation at http://www.zipline.io . I often do a google search like " site:zipline.io percentile_between" to find the actual source code and do just as you suggested to get the answer. You may need to do a bit of searching in the page but that will be the guts of what is going on. In this case you will need to do another search for "PercentileFilter" which isn't in the documentation but in the actual github code (https://github.com/quantopian/zipline/blob/master/zipline/pipeline/filters/filter.py ).

As far as "why the result only give me negative return?". The percentile_between method doesn't return results with values between the given parameters, but rather returns results which fall within the percentile rank using the 'numpy.percentile' method. For example "returns.percentile_between(0.0, 10.0)" will return stocks with returns in the bottom 10% and NOT stocks with returns between 0-10%. This said, I can understand the confusion especially when using returns which are expressed as a percent.

So, the reason why price_change.percentile_between(3.00, 10.00) gives only negative results is the bottom 3%-10% of returns are probably all negative. Change it to price_change.percentile_between(3.00, 60.00)and one will see some positive returns in the results as typically 50% of the results will be positive and 50% negative.

@Joakim, the current days OHLCV data isn't available within pipeline (sort of by definition). However there's nothing stopping one from also using data.current and data.history. One could get previous days data using pipeline and then append the current days data using one of those methods. (Don't ever store OHLCV data from one day to the next because it won't be split adjusted). The contest doesn't require using only pipeline to fetch data. The workflow would be 1) get factors/data from pipeline 2) post-process those factors with current data 3) feed order_optimal_portfolio with the post-processed data.

This same workflow can be done inside a notebook (ie research) by adding to the pipeline results using get_pricing. Be VERY careful to align the dates properly. Pipeline dates are shifted off by one day. Also, Alphalens is really set up assuming trades are done ONLY with info captured before trading starts (ie no current day data) so it's a bit problematic to then use Alphalens to analyze results.

Hope that helps.

Dan
Thanks a lot. That's very helpful.

jason