On Thursday, July 7th, we'll be making some changes to how macroeconomic datasets are treated in pipeline.
Currently, pipelines that compute macroeconomic indicators, such as VIX, output the same value for each asset (sid) on each day. That means we're unnecessarily assigning a daily value to each asset as opposed to representing datasets such as VIX as its own time series.
So with these upcoming changes, data not associated with an asset (VIX, macro-economic indicators, etc) will be loaded as a single column of values. This means that when something like vix.close is used as an input to a custom factor, it is passed as a column vector rather than as a 2D array of values (dates x assets).
This is best illustrated with an example:
class VIXFactor(CustomFactor):
window_length = 3
inputs = [vix.close]
def compute(self, today, assets, out, vix):
# Old vix: Each row contains `len(assets)` number of repeated values of
# vix for that day. There are 3 rows because our window length is 3.
# [[21, 21, 21, ..., 21, 21, 21],
# [20, 20, 20, ..., 20, 20, 20],
# [22, 22, 22, ..., 22, 22, 22]]
# New vix: There are still 3 rows, but now there is always only 1
# column, which is independent of the number of assets.
# [[21],
# [20],
# [22]]
# This will still work in the new case, as the singleton value [22]
# will simply be broadcast into `out`.
out[:] = vix[-1]
This new format is not only a more true depiction of the data, but it also lends itself nicely to computing things such as correlations and regressions between single columns of data and the columns of other factors (examples to follow soon).
So what does this mean for adding macroeconomic terms as pipeline columns?
Unfortunately, it means you can no longer add them directly to a pipeline through code such as pipe.add(vix.close.latest, 'vix')
. In order to replicate this behavior, you're able to create a CustomFactor to achieve the same results.
class VIXFactor(CustomFactor):
window_length = 1
inputs = [vix.close]
def compute(self, today, assets, out, vix):
# Here vix might look like [[20]], but this will broadcast the same
# value into `out` for every asset.
out[:] = vix
pipe.add(VIXFactor(), 'vix')
This also means that any custom factor using VIX will need to reflect the new structure. If you're unsure about how to support that, please provide your questions in this thread and I will be happy to assist with the migration.
For a list of all datasets to be affected by this change, please visit the available datasets from Quandl.
We welcome any feedback you might have on this new update and hope it will simplify working with macroeconomic datasets.