Hi,
I'm writing a thesis where I investigate the factors, in this case from the "quality minus junk" paper by Asness et al. I want to consider international equities and I'm running into trouble as these datasets have a lot of missing data for fundamental factors using the FactSet.Fundamentals.
For instance, their profitability factor is a z-score weighted by 6 factors, each z-scored:
Profitability = z(z_gpoa + z_roe + z_roa + z_cfoa + z_gmar + z_acc)
The problem is that if one of the "sub-factors" are NaN the whole profitability factor becomes NaN so I want to mitigate this by just ignoring the NaN factors. I've unsuccessfully tried to do it inside the CustomFactor itself by calculating the zscore using the following on each "sub-factor" inside the CustomFactor:
zscores = lambda x: (x - nanmean(x, axis=0)) / nanstd(x, axis=0)
But it doesn't want to run. So instead I thought I would solve it in pipeline by applying .zscore() to each customfactor and find a way where it ignores the NaN entries...but so far I haven't been able to do it - anyone who can see a good solution?
I also tried some way of setting the NaN values to zero after first calculating the zscore on the individual z-factor, but then maybe backfill/forwardfill is a more appropriate.