Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Beneish Score (accounting manipulation) as Pipeline Factor: MemoryError

I'm trying to implement the Beneish Score (identify whether a company has manipulated its earnings) as Pipeline Factor.

To this aim two years of data are needed. I found that 65 days are a good approximation for a quarter.
Unfortunately I get a "MemoryError - Algorithm used too much memory. Need to optimize your code for better performance."

How can I optimize the code? Is there an alternative way to get historical fundamental data? Thanks!
The algorithm is attached, to reproduce the error please uncomment the line 155.

2 responses

Inspired by this post https://www.quantopian.com/posts/piotroski-score-plus-aroon-indicator
I split up all of the subcomponents of the Beneish score into individual CustomFactors and then computed the 5 factors only version.

This solved my Out-of-Memory issue! Anyway I hope the the Quantopian Team will give "more power" to the backtesting servers :-)

Here an extract of the code:

quarter_lenght = 65  
latest = -1  
one_year_ago = -4*quarter_lenght  
ttm    = [               -1,   -quarter_lenght, -2*quarter_lenght, -3*quarter_lenght]  
ttm_py = [-4*quarter_lenght, -5*quarter_lenght, -6*quarter_lenght, -7*quarter_lenght]

class DSRI(CustomFactor):  
    """  
    1. DSRI = Days Sales in Receivables Index = (Receivables_t / Revenue_t)    /     (Receivables_t-1 / Revenue_t-1)  
    receivable_turnover = revenue / receivables  
    """  
    inputs = [morningstar.operation_ratios.receivable_turnover]  
    window_length = 4*quarter_lenght + 1  
    def compute(self, today, assets, out, receivable_turnover):  
        dsri = receivable_turnover[one_year_ago] / receivable_turnover[latest]  
        out[:] = dsri

class GMI(CustomFactor):  
    """  
    2. GMI = Gross Margin Index = GrossMargin_t-1    /     GrossMargin_t  
    """  
    inputs = [morningstar.operation_ratios.gross_margin]  
    window_length = 4*quarter_lenght + 1  
    def compute(self, today, assets, out, gross_margin):  
        gmi = gross_margin[one_year_ago] / gross_margin[latest]  
        out[:] = gmi  
class AQI(CustomFactor):  
    """  
    3. AQI = Asset Quality Index  
    AQI = (1 - (CurrentAssets_t + PPE_t) / TotalAssets_t)    /     (1 - (CurrentAssets_t-1 + PPE_t-1) / TotalAssets_t-1)  
    """  
    inputs = [morningstar.balance_sheet.current_assets,  
              morningstar.balance_sheet.net_ppe,  
              morningstar.balance_sheet.total_assets]  
    window_length = 4*quarter_lenght + 1  
    def compute(self, today, assets, out, current_assets, net_ppe, total_assets):  
        aqi = (1 - (current_assets[latest] + net_ppe[latest]) / total_assets[latest])  
        aqi = aqi / (1 - (current_assets[one_year_ago] + net_ppe[one_year_ago]) / total_assets[one_year_ago])  
        out[:] = aqi

class SGI(CustomFactor):  
    """  
    4. SGI = Sales Growth Index = Sales_t    /     Sales_t-1  
    """  
    inputs = [morningstar.operation_ratios.revenue_growth]  
    window_length = 1  
    def compute(self, today, assets, out, revenue_growth):  
        sgi = revenue_growth[latest] + 1.0  
        out[:] = sgi  
class DEPI(CustomFactor):  
    """  
    5. DEPI = Depreciation Index  
    DEPI    =    (Depreciation_t-1 / (Depreciaton_t-1 + PPE_t-1))    /     (Depreciation_t / (Depreciaton_t + PPE_t))  
    """  
    inputs = [morningstar.balance_sheet.net_ppe,  
              morningstar.income_statement.depreciation_amortization_depletion]  
    window_length = 7*quarter_lenght + 1  
    def compute(self, today, assets, out, net_ppe, depreciation):  
        dep_ttm = np.sum(depreciation[ttm], axis=0)  
        dep_ttm_py = np.sum(depreciation[ttm_py], axis=0)  
        depi = (dep_ttm_py / (dep_ttm_py + net_ppe[one_year_ago])) / (dep_ttm / (dep_ttm + net_ppe[latest]))  
        out[:] = depi  


def initialize(context):  
    context.count = 0  
    pipe = Pipeline()  
    attach_pipeline(pipe, 'my_pipeline')  
    # 5 Variable Version of the Beneish Model  
    dsri = DSRI()  
    gmi = GMI()  
    aqi = AQI()  
    sgi = SGI()  
    depi = DEPI()  
    mScore  = -6.065 + 0.823*dsri + 0.906*gmi + 0.593*aqi + 0.717*sgi + 0.107*depi  
    pipe.add(mScore, 'mScore')  

Has anyone tried to CDF(mScore)? I am having the hardest time trying to apply scipy 's norm.cdf to a Factor.