Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
my NB on fundamental factors

hello
I have done this NB to investigate the relation of fundamental factor with returns.
So in the first part i identified the best 50 and the worst 50 stock by percent returns (thanks to Jamie McCorriston for the help)

In the second part i compare the fundamental factors of the stock with best returns with the fundamental factors of the stock with worst returns to see the difference.

So normally it turns out that better fundamentals have positive returns and vice versa BUT in the recession of 2008 where things are different as you can see in the example attached.

Beside that i would appreciate some help in getting my comparison more statistically robust and not just looking at the values in the last two cells to see how much they differ, and also, the values are just the mean of all the best stocks vs the mean of all the worst stocks so it does take in consideration all the spikes that sometimes fundamental values have..

Edit: in the notebook uncomment the first line on In [16]:

fund_factors = unstacked.drop('returns', axis= 1, level= 0)  
15 responses

update with the final outcome of significant factors
P.S. of course all this is based on the assumption that the market -in the mid long term- moves because of fundamental reasons

update with the forward test:

Thank you Giuseppe. I was planning to create a NB like this, you spared me some time. Just few comments:

1 - forward test: shouldn't the returns be calculated as :

ptf_long = long_df.pct_change()  
ptf_long[0:1] = 0  
ptf_long = (ptf_long + 1).cumprod().mean(axis=1)  
ptf_long = ptf_long - 1  

Instead of:

ptf_long = long_df.pct_change().cumsum().sum(axis=1)  

2 - Lots of CustomFactor subclasses: you might save lines of code with Latest or SimpleMovingAverage (in zipline I can now see Returns factor too):

from quantopian.pipeline.factors import SimpleMovingAverage  
from quantopian.pipeline.factors import Latest  
from quantopian.pipeline.factors import Returns

factors = [  
Latest([morningstar.operation_ratios.revenue_growth]),  
Latest([morningstar.cash_flow_statement.operating_cash_flow]),  
Latest([morningstar.operation_ratios.gross_margin]),  
...
]

Thanks!

Edit: forward test calculation

Pipeline initialization can be reduced to:


# from https://github.com/quantopian/zipline/blob/master/zipline/pipeline/factors/technical.py  
# it hasn't been deployed to Quantopian yet  
class Returns(CustomFactor):  
    """  
    Calculates the percent change in close price over the given window_length.  
    **Default Inputs**: [USEquityPricing.close]  
    """  
    inputs = [USEquityPricing.close]

    def compute(self, today, assets, out, close):  
        out[:] = (close[-1] - close[0]) / close[0]

pipeline_columns = {  
'returns':Returns(window_length=2), 
'roa': Latest([morningstar.operation_ratios.roa]), 
'roe':Latest([morningstar.operation_ratios.roe]), 
'gross_margin':Latest([morningstar.operation_ratios.gross_margin]), 
'roic':Latest([morningstar.operation_ratios.roic]),
'pe_ratio':Latest([morningstar.valuation_ratios.pe_ratio]),
'pb_ratio':Latest([morningstar.valuation_ratios.pb_ratio]),
'cash_flow':Latest([morningstar.cash_flow_statement.operating_cash_flow]),
'growth':Latest([morningstar.operation_ratios.revenue_growth]),
'ebitda':Latest([morningstar.operation_ratios.ebitda_margin]),
'assets_turnover':Latest([morningstar.operation_ratios.assets_turnover]),
}
pipe = Pipeline(columns=pipeline_columns)  

It might be useful so that we can easily add other fundamentals.

ciao Luca :)

Actually the point 1) is a gross mistake in my Notebook, thank you for having spotted it!

This new way of defining pipeline is much more clean and readable thanks!
Since i am no expert on fundamentals i use to search example codes to spot the most frequently used ones, however i always wanted a code with a complete list! so maybe in the future when this code will be more robust i will do some complete screening with it

One of my next steps it to weight each stocks with theyr 'normalized' dataframe so stocks with higher rank value will be more weighted

Thank you!
-Giuseppe

Thanks to you Giuseppe. I am still working on your NB and if I find answers to your original questions I'll let you know. In the meanwhile thank you, your NB is very useful to me.

No big changes here, just few fixes and small improvements.

Thank you Luca your work has saved me a lot of time, my goal now is to blend this logic to a sector based stock picking like the one in the notebook attached (early stage of development and not so efficent for now..)

What exactly are you trying to achieve with the new NB?

By the way I extended your original NB functionalities. The main difference is that in your NB only one set of best/worst stocks is calculated: that is the stocks that perform best/worst over the whole pipeline timeframe. So any statistics derived from this set of best/worst stocks are not generic. Suppose you run the NB from 2013-1-1 to 2014-1-1, the statistics you derive applies only to trades made ON 2013-1-1 and ended ON 2014-1-1.

My idea is that we have to select every day the best/worst stocks, looking at future stock performance. So if we run the pipeline over 1 year time, we have 252 set of good/bad performers, each set gathered from a different trading day of the pipeline timeframe. This should make any statistic calculated more interesting. Suppose again you run the NB from 2013-1-1 to 2014-1-1, the NB will select the best/worst performes of 2013-1-1, then the best/worst of 2013-1-2, then 2013-1-3 and so on for every day in the pipeline.

Sorry if that's not clear, have a look at the NB and let me know.

Note: the NB is "empty", that is no graphs and no data. You have to run it yourself as Quantopian as a limit (timeout or amount of bytes) when sharing NBs so that it doesn't allow me to upload a NB with data/graphs.

Hello Luca i have cloned your notebook and i need some time tu study it, however my notebook was derived by the fact that i think that sectors cannot be used just like any other fundamental factor, i don't think that being in a factor can be or not be significant, i think it is significant but it cannot be threated just like any other factor.. , and by the way i'd like to have them as well as a filter to this kind of fundamental factors study.

So maybe they can be (naively for now) used for example to see how the best performing sectors are doing in the various market phases, for example just by looking at the graph you can tell that an equity line of 1 dollar invested in the best monthly performing sector , probably has very high beta to the market.

However after this sectors-study digression i am now back to the original point wich is findig wich factors are good signals..

i have updated (be re-writing it completely) the notebook following your suggestion of doing the calculation every day, but i think it will be useful to ad another aspect: sectors.
By grouping stocks by sectors we can give factors a more meaningful value because we compare "apples with apples " .. for example it is not very significant to compare price to book between different sectors, but it is very significant inside the same sector.
So here you are a first step in this direction: in the graph you can see for example that the market weight very differently the pe_ratio in 103 sector and in 101 sector (of course this is just a first step).
Note that classes are modified too: performance are calculated on 25 days basis, and fundamental factor are picked in the first (the oldest) of these 25 days.

Next step is to choose significant factors by picking the ones with the highest correlation between FACTOR best/worst difference and SECTOR best/worst returns difference.
This last step will be done inside each sector because it's the place where fundamental factors should be compared ..

update:
1) what are the best / worst performing stocks inside each sector.
2) what are the values of the fundamental factors of these best/worst stocks
3) do we get some information from the difference between fundamental values of best/worst performing stocks? for example:

  • is the difference a constant value?
  • is it above or below zero most of the time?
  • is it mean reverting?
    By looking at the graphs i would say that the few factors i have considered are not significant when it comes to split best performing stocks from worst performing ones because their difference vary over time in a mean reverting way and from a first look it is not even correlated with price variations..

Hello Giuseppe and Luca,

I am not sure if you both are still working on this project, but Jamie McCorriston pointed me to this discussion after I emailed him regarding being able to pull fundamentals data for all 600+ fundamentals. I have been trying to attack the very same problem you and Luca have been working on.

My idea was to try to look at, ideally, every fundamentals metric and then iteratively check for correlation between daily returns and a clustering of normalized fundamentals values (maybe from comparing the distribution profile of poor and good daily returns for a single fundamentals metric).

I am still learning some of the various ways Python can handle data and have not been programming for very long so my code is a sloppy and incomplete. I have also been experimenting with better ways of viewing and manipulating the data to accomplish what I want, so I have written many incomplete code snippets related to this project. There is a lot of random commented code sections, too since I'll try something, then abandon it and try something else, etc. I will share what I have been trying to do in case it helps.

I have found correlation before between fundamentals and returns, even when as the values change over time. It is why I am trying to scan the entire fundamentals library to see how many.

Brandon, have you seen Andrew's fantastic factor tear sheet? You can use that NB to evaluate any pipeline factor (so any fundamental value too) capacity to predict returns. That research area overlaps with this one in many aspects.

Wow, thank you Luca. That is quite a notebook.