I have a bunch of fundamental data (i.e: data from Morningstar such as total_assets.latest, common_stock_equity) in a learning data set that incorporates various factors which seems to look promising in terms of train/test accuracy in the research environment. This learning data was created using pipeline which returns historical data in a multi-indexed pandas dataFrame in the Research environment.
Unfortunately, by design the backtesting IDE only returns the current days data when making a pipeline. I can create custom factors that call windows of data and compare historical data to create an output, but I don't think that I can pass an entire 500 day window back into the pipeline output as custom factors return 2D numpy arrays.
What would be the cleanest way to create a 500 day history of total_assets.latest, common_stock_equity for each company in my universe to create a training dataset? If Q wants their users to build fundamental factors and use machine learning to combine them, it would really make sense to have an easier way to create large historical training datasets of fundamental data.
IT WOULD BE GREAT TO HAVE A SPECIAL LEARNING DATA PIPELINE IN THE IDE THAT COULD RETURN HISTORICAL DATA IN A SIMILAR FORMAT AS IN THE RESEARCH ENVIRONMENT! It would certainly speed up machine learning algo development...