I'm obviously doing something wrong here (or expecting the wrong thing) but I can't see what it is (yes I've done the searching!)
The code below (condensed into a single cell for brevity) creates a pipeline which filters to the bottom 10 by AverageDollarVolume. I call run_pipeline() with a date range that spans 17 calendar days, 11 trading days. The resulting dataframe has 11 values in
index.levels[0]
, as expected. Each day has 10 values in index.levels[1]
, as expected. results.info()
reports that the multi index has 110 entries, as expected.
I want to extract the list of assets, so I use
results.index.levels[1].unique()
which is a technique that is used in the documentation. However, this returns an array of 8930 (in my case) assets and this number is the same however wide or narrow I make the filter. I expected this to be the list of assets referenced in the index, so between 10 and a max of 110 (but probably more like 15), as run here. On one hand this feels like a pandas problem because run_pipeline() returns a pandas DataFrame and I then am only using pandas methods on it, but on the other hand it feels like a Quantopian problem because I have never seen this behaviour in a DataFrame produced by any other means. Help!
def make_pipeline(filterWidth=10):
# Dollar volume factor
dollar_volume = AverageDollarVolume(inputs=[USEquityPricing.close, USEquityPricing.volume],
window_length=30)
# 10-day close price average
mean_10 = SimpleMovingAverage(inputs=[USEquityPricing.close], window_length=10)
filter_dollar_volume = dollar_volume.bottom(filterWidth)
return Pipeline(
columns={
'meanclose': mean_10,
'dolvol': dollar_volume
},
screen=filter_dollar_volume
)
filterWidth = 10
results = run_pipeline(make_pipeline(filterWidth), '2020-05-15', '2020-06-01')
dateCount = len(results.index.levels[0])
asset_list = list(results.index.levels[1].unique())
print('''Number of dates in index.levels[0]: {0}
Number of rows in dataframe: {1}
Product of dates and filter width: {2}
Length of results.index.levels[1].unique(): {3}
'''.format(dateCount,
len(results),
dateCount * filterWidth,
len(asset_list)))
print('First 5 of asset list:\n', asset_list[:5])
print('')
results.info()
If you want to run it, my imports are these:
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.data.builtin import USEquityPricing
from quantopian.pipeline.factors import SimpleMovingAverage, AverageDollarVolume
import pandas as pd
Thanks in anticipation...