counting the the contents of factors within pipeline

Back to Community

posted Oct 26, 2018

Hello there

In my notebook i'm producing factors within a pipeline.

Is it possible for me to add a column to the output so that I can see the number of "longs" and "shorts" signals for each day?

    # Create pipeline  
    return Pipeline(  
        columns={  
            'number of shorts': len(shorts),  
            'number of longs': len(longs),  
        },  
        screen=long_short_screen  
    )  
#    return pipe


result = run_pipeline(make_pipeline(), '2018-10-26', '2018-10-26')  
print len(result)


result.head(len(result))

When I try the above code i get the following error:

TypeError: object of type 'NumExprFilter' has no len()

Thanks in advance :-)

4 responses

Ben

Oct 27, 2018

Could this be achieved using a custom factor and removing the parameter axis=0? Need some guidance please! (I would normally write an SQL query using a "group by", so am sure it can be done).

Ben

Oct 27, 2018

From this interesting thread:
https://www.quantopian.com/posts/analyzing-pipeline-data-from-research-notebook-how-to-get-data-using-equity-object

was able to get the following code:

result_today= datetime.strptime('2018-09-25', '%Y-%m-%d')  
result.ix[result_today].count()

The only problem is it counts all the union of the Longs and Shorts (does not differentiate between True/False). Can somebody suggest how can adjust the above code to only count when "longs" or "shorts" are True?

Here is an attempt to count the "True":

print result_today.at[longs, 'True'].count()

I get an error message saying that "longs" is not recognised. The column is called "longs"

Hmmm...
@Dan Whitnable

Dan Whitnable

Oct 28, 2018

First off, it's probably more clear to do any overall pipeline calculations (eg counting the number of longs over the entire pipeline) once the pipeline dataframe is returned. I say this because the pipeline dataframe is structured as securities with values for each security. If one wanted to return the total longs (for example) inside the dataframe then one would have a column with this same total value returned for each security. Not wrong but not the original intention of the dataframe.

So, in both notebooks and the IDE, it's cleanest to do any overall pipeline calculations after the pipeline is returned. This is where you were heading in the above post.

Now, assuming we have a dataframe (called 'result') returned by the pipeline and it has columns called 'longs' and 'shorts', here is one simple method to get the totals by day

totals_by_day = result.groupby(level=0).sum()

See the explanation in the attached notebook. There are also some other methods shown which one may have a preference for.

Hope that answers your question. Good luck.

[oops. Just noticed I forgot to clean up the notebook. There are a few extra cells in the middle. Start at the cell with the comment "Before doing any counting..."]

Ben

Oct 28, 2018

That's exactly what I need - thank you very much!

You've successfully submitted a support ticket.

Our support team will be in touch soon.