Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Running out of research memory

Hello there,

I am completely new to Quantopian and I am thinking of doing some machine learning on the dataset.
Below is the custom factor and the pipeline I've created but it seems that whenever I run it it always max out the research memory.
Did I make a mistake somewhere? Or is the period I would like to query simply too long?

Thanks!

from quantopian.pipeline import CustomFactor  
import numpy as np  
import pandas as pd  
import talib



# custom factor for calculating % return  
class pct_return(CustomFactor):  
    # Default inputs  
    inputs = [USEquityPricing.close]  
    def compute(self, today, asset_ids, out, close):  
        # Calculates the column-wise standard deviation, ignoring NaNs  
        out[:] = (close[-1]-close[0])*1.0/close[0]


from quantopian.pipeline import Pipeline  
from quantopian.research import run_pipeline


# Pipeline instantiation & definition  
# takes in two parameters pct_return and timeframe that together specifies  
# the criteria needed to generate the target label  
def basedata_pipeline():

    # ---equity id---  
    symbol = company_reference.primary_symbol.latest  
    # ---equity pricing factors---  
    close  = USEquityPricing.close.latest  
    volume = USEquityPricing.volume.latest  
    # ---percentage return---  
    pct_rt = pct_return(window_length=1)  

    return Pipeline(  
       columns = {  
            # id  
            'symbol': symbol,  
            # equity pricing data fields  
            'close': close,  
            'volume': volume,  
            # percentage return  
            'pct_return': pct_rt  
        }  
    )  


result = run_pipeline(basedata_pipeline(), '2010-01-01', '2012-12-31')  
5 responses

Hi,

qunatopian has custom factor builtin for calculation returns you just have to specify inputs and window, I cleaned up your code, please let me know if you have questions

import numpy as np  
import pandas as pd  
import talib


from quantopian.pipeline import CustomFactor  
from quantopian.pipeline import Pipeline  
from quantopian.research import run_pipeline  
from quantopian.pipeline import CustomFactor  
from quantopian.pipeline.factors import Returns  
from quantopian.pipeline.data.builtin import USEquityPricing


def make_pipeline():  
    volume = USEquityPricing.volume.latest  
    close  = USEquityPricing.close.latest  
    # you can specify return window  
    my_returns=Returns(inputs=[USEquityPricing.close], window_length=10)  
    return Pipeline(columns={'volume': volume,  
                             'close': close,  
                             'my_returns': my_returns})


my_pipe=make_pipeline()  
results=run_pipeline(my_pipe, '2017-01-01', '2017-01-01')  
results.head ()  

Hi Has, thanks for your suggestion. It definitely makes the code cleaner.
With that said, I think the amount of memory being taken up is still quite significant.
Typically what's the longest time period that one would be able to pull the data?

do you have other notebooks that are running ? if yes than I suggest go to the notebook folder and select all the notebooks which you are not using and press stop button (usually on side of the notebook or on the top of page). Usually when I do coding and have 1 notebook in use my memory consumption is at 20% in some cases its goes to 40 but never more.

Just one notebook is running!
Is it because I am trying to pull all stocks in the universe over multiple years?

probably, try to add some filters, most likely you don't need to work with all 8000 stocks...I am not an expert perhaps experienced users can give more insights...

good luck=)