I want to find a regression line and R^2 value for the given notebook but can not figure it out. Can someone please help me?
I want to find a regression line and R^2 value for the given notebook but can not figure it out. Can someone please help me?
The method you should use is linear_regression
(https://www.quantopian.com/docs/api-reference/pipeline-api-reference#zipline.pipeline.Factor.linear_regression). That method will perform a linear least-squares regression for two sets of values. The target, or independent values, can be either a single 1D set of values (eg SPY returns), or a a 2D set of values which the method then pairs asset by asset. The method returns a factor with 5 outputs
The method is equivalent to the scipy.stats.linregress
method (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html#scipy.stats.linregress). The outputs are named a bit different though to align with the conventional investing terms.
I couldn't discern what values you were trying to regress from the notebook above. So, let's use the example of regressing returns against the returns of SPY. This will return the Alpha, Beta, and other attributes of the regression. A pipeline like below would work for that
def make_spy_correlation_pipeline():
# Universe we wish to trade
my_universe = Q500US()
spy = symbols('SPY')
# Ensure we include SPY in our universe
total_universe = my_universe | StaticAssets([spy])
# Create any needed factors.
# Get the 2 day returns for each asset (including SPY)
returns = Returns(window_length=2)
# Now make a 'slice' of data representing just the returns of SPY
spy_returns = returns[spy]
# Use the 'linear_regression' method to get all the regression attributes
# for each asset returns vs SPY returns.
# Check the regression over the past quarter (about 63 trading days)
# We don't really need to use a mask but it sometimes speeds things up
regression = returns.linear_regression(target=spy_returns, regression_length=63, mask=total_universe)
# Create our pipeline
# The regression factor has multiple outputs. Use dot notation to access each separately
# Also, square the r_ value to get R Squared
pipe = Pipeline(
columns={
'returns': returns,
'alpha': regression.alpha,
'beta': regression.beta,
'r_value': regression.r_value,
'p_value': regression.p_value,
'stderr': regression.stderr,
'r_squared': regression.r_value ** 2,
},
screen=total_universe,
)
return pipe
See the attached notebook for this in action. Good luck.
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.
Dan,
Thank you so much for the timely response. I think that I misspoke when I said linear regression. I have a dataframe of 165 stocks that I would like to run a line of best fit on each of them with the dates on the x-axis and price on the y-axis. From this data, I would like to obtain an R^2 value for each of the plots to see how linear the trend of the stock is. Please let me know if this possible/if you can, can you show me. I have attached an updated notebook. Also are you able to save and create custom universes? Thank you so much I appreciate your help.
The first place to always check are the built in pipeline factors. While there is the linear_regression
method as noted above, it really expects two datasets to regress against each other. It isn't set up for 'best fit' line analysis. So, the next place to look is pandas (https://pandas.pydata.org/pandas-docs/version/0.18/api.html). The nice thing about pandas methods is they often automatically work across multiple columns, or stocks in this case. However, unfortunately there isn't a built in pandas linregress
method. So, now check numpy
and scipy
or statsmodels
. Those are the 'go to' modules for statistical methods. They actually all have their own version or versions to get an r squared value. I'll choose linregress
from scipy.stats
(https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html) because I'm most familiar with it.
Now that we have found a function which gives us r squared for a single line, the next step is to use the pandas apply
method to apply this function to all columns in a dataframe (https://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.DataFrame.apply.html). The one issue here is often the inputs to the function aren't always the same as those used in the apply
method. The apply
method really wants to pass a single pandas series, representing each column of the dataframe, to the function. The linregress
function expects separate x values and y values. Moreover, the pandas series has a datetime index which linregress
doesn't know how to handle. The solution? Wrap linregress
with a custom function. Something like this
def get_r_squared(data_series=None):
# Use the scipy linregress method. It expects X values and Y values to be stated explicitly
# The x values also can't be timestamps. So just reset the index to get integers
# Set drop=True to not save the index, inplace=true to not create a new series
data_series.reset_index(drop=True, inplace=True)
x_values = data_series.index.values
y_values = data_series.values
r_squared = linregress(x_values, y_values).rvalue ** 2
return r_squared
Now, apply that function to a dataframe of prices. Something like this.
stock_prices = get_pricing(['AAPL', 'CAT', 'IBM'], fields='price')
# Now simply use the `apply` method to get the r squared value for each stock
r_squared_values = stock_prices.apply(get_r_squared)
That's it. Check out the attached notebook for more explanation. There are also some cells at the end for how to turn this into a custom factor to get r squared using pipeline and/or use in an algo. Good luck.
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.