Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Getting a random subset of a pipeline output in Research?

I'm new to Quantopian, Python, and Pandas, and have been having some trouble with this. I want to get a random subset of my pipeline output in the Research environment so I can spot check my algorithm manually just to make sure it's doing what I think it is. I'm looking for something like:

for x in range(10)  
print pipeline_output[ random.randint( 1, pipeline_output.shape[0] ) ]  

but that's obviously not working.

2 responses

One could create a custom filter to return a random sampling of assets. However, if you are working in the research environment then I would take the sample AFTER the pipeline is run. A pipeline returns a pandas dataframe and luckily pandas has a 'sample' method to take random samples of data. See
https://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.DataFrame.sample.html . There are several different ways to take a sample (eg a fixed number or a fixed percent of the data) that are described in the docs.

There are probably other ways but here is one. Assuming that 'results' is the multi indexed dataframe output of the pipeline.


# Use the 'sample' method on level 1 (ie the securities) to get a random sample of securities  
random_securities = results.index.get_level_values(1).to_series().sample(n=10).values

# Then use the 'query' method to just get rows where the level 1 index is in this sample  
results.query('ilevel_1 in @random_securities')

See the last couple of cells in the attached notebook for this in action.

Thanks! That worked perfectly! Seems heckin' complex though, to do thru 5 different methods/properties/whatever they're called, but if it works, it works.