Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
DELETED

.

46 responses

Thanks for joining us for the webinar! I'm so glad you found it useful - and awesome work on your contest submission -- it looks great!

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

I love this type of usage of Pyfolio and the forums. It allows you to share your results for feedback without disclosing the logic of your strategy at all.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

When and where was the seminar announced ? I must have missed it. Can it still be watched?Thanks.

Hi Tim,

We announced the Webinar in a few places including the forums and a marketing email. Sorry that we weren't able to notify you about it ahead of time. The good news is that it was recorded and you can see it here.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Loving this exercise! How are you thinking about approaching hold out data and overfitting guards?

@Olive Coyote,

Very impressive, especially the second one! Please stop now. ;)

Oh now I see what you are trying to do. Basically you are going to let OOS dictate which factors survived and which need replacement or what new factors need OOS validation. By repeatedly doing this you are hoping to have a conveyor belt of factors (that survived by placing at the center of the bayesian cone in the subsequent OOS). Tthose you will eventually mold into the final algo.

That's a nice approach. I guess one has to have a lot of patience for that approach.

I am surveying what you and Joakim are doing to get a feel for what will work for me.

I am inclined towards choosing a holdout strategy that fits my personality. I have little patience and somewhat of an overconfidence in my abilities to just figure things out. I'll probably go with this twist of hold out data:

  • Use research platform for time periods till 2017-06-30 for alpha factor discovery (this should include AlphaLens analysis)
  • Occasionally combine the alpha factors and test until 2017-12-31 (once a month)
  • At the end of the test one last time validate till 2018-06-30 (once in 6 months)
  • Data beyond 2018-07-01 (never used)

That way I will have hold out data of two different six month periods and by the time I am finished at the end of the 2018 there still is one 6 month period that the algorithm would have never seen.

This also aligns with my concern that the first time I use data until 2017-12-31 that data will lose its importance moving forward in subsequent testing, but I will still have the last six month period upto 2018-06-30 that I plan to save until the finale!

I like both approaches, but for now, I'm sticking to my much simpler (and possibly not as effective at minimizing overfitting) approach.

Antony,

I have 2 questions :

Do you consider, as achievement, algo with "out of sample" 0.17% % Annual Return in 1.6%-2.0% CD Rate environment and where you find some strength?

Do you think the algo with "out of sample" 0.17% Annual Return in 1.6%-2.0% CD Rate environment should be in best 5 of Quantopian Open?

That is quite a compelling example of EXACTLY the overfitting issue we talked about in the webinar - thanks for sharing it!

Did you use a trailing, rolling window to determine your weights in this version? Or some other technique? Would you be willing to share a tearsheet of the version where you drop back to a flat 1/N weighting, with no other modification?

EDIT: Thought of another question - have you previously run backtests over prior periods? Is there another range of year(s) of past data you haven't testing yet that you might be able to preserve as hold out?

Vladimir - Unless I'm misunderstanding, the point of this post is to illustrate an overfitting fail - not to propose this algo would be a wise investment as is.

Jessica,

The algo finished in the Top 5 of the competition.

Can you answer my second question?

@Jess,

If I may try to rephrase Vladimir's queries as it relates to the above example that you find quite a compelling example of overfitting. If this algo is a compelling case of overfitting, why did it place 5th in the contest? Is the contest 38 scoring system out of sync vis a vis the objective Q is trying to achieve?

"If this algo is a compelling case of overfitting, why did it place 5th in the contest? Is the contest 38 scoring system out of sync vis a vis the objective Q is trying to achieve?"

I think there are two elements that are unsatisfying in the case of this January submission Olive Coyote was generous enough to share:

(1) With the benefit of hindsight it appears that some aspect of the algo development process was likely overfit to in-sample data. The in-sample sharpe of over 3.0 drops out of sample to ~0.26. There is no aspect of the contest scoring (for the 6 month contests or the current daily contests) that explicitly penalizes apparent overfitting based on mismatch between in and out of sample stats. We can certainly debate the merits of that design and if and how scoring could be modified to penalize overfitting - but for contest 38 we can only look to the out of sample results to sanity check an algo's ranked position in the contest.

(2) Intuitively, we don't find the algo's out of sample Sharpe that compelling to justify a top 5 finisher. I took a quick look back at the top 10 algos from that contest 38 and the top 3 look pretty reasonable, but it falls off quite steeply after that -- here I think it highlights the improved SIMPLIFIED scoring mechanism for the daily contest where if you qualify you're just ranked on risk adjusted returns. I think that will avoid this effect going forward where we see counter-intuitive rankings from a Sharpe basis.

Contest 38 top 10 finishing Sharpe ratios:

  1. jade horse - 1.4
  2. salmon zebra - 0.9
  3. red gazelle - 1.5
  4. scarlet dove - 0.23
  5. olive coyote - 0.10
  6. pear panther - 0.27
  7. olive coyote - (-0.09)
  8. pink owl - (-0.4)
  9. violet pig - 0.50
  10. green ape - (-0.45)

I'm curious if others here agree that the daily contest scoring ranking methodology is doing a better job of sorting the best algos to the top of the heap?

@Jess,

Thanks for your response. I look at contest 38 as an experimental sedgeway to the new daily contests, The design didn't completely capture the desired results. I'll leave it at that.

The new contest scoring system is definitely more robust and a step in the right direction. However, I think there is room for improvements in terms of mitigating luck factor and gaming, penalizing overfitting, inclusion of holdout validation data and measuring metrics with the same duration. I think the scoring system will be a continuing work in progress with a feedback loop until all nuances are factored in.

Hi Jess -

Regarding your question:

I'm curious if others here agree that the daily contest scoring ranking methodology is doing a better job of sorting the best algos to the top of the heap?

"The proof of the pudding is in the eating." Very soon, we'll be at the 6-month mark of the present contest, which is the minimum out-of-sample period for fund allocation decisions. It will be up to Q to decide if proof will be provided, by providing data showing the effectiveness of the present contest in producing fund-worthy algos (e.g. plot of dollars allocated versus contest score).

Presumably, Q could perform preliminary analysis now, and assign a "probability of funding" score for each contest algo, and then publish the aggregate statistics (dropping traceability to individual quants). Aside from the qualitative, self-reported Strategic Intent statement, you should have all the data per the Get Funded page to make a decent assessment, prior to completing the 6-month mark for the current contest. For example, what would a plot of probability of funding versus contest score for all current algos look like?

Personally, I'm generally happy with the direction things have headed for the contest/fund (although I'm skeptical, specifically, about the Strategic Intent requirement and the style risk factors).

"I'm curious if others here agree that the daily contest scoring ranking methodology is doing a better job of sorting the best algos to the top of the heap?"

A classification of best assumes out performance assuming everything else was constant across algorithms, for instance all algorithms took the same risk or had the market regime to their favor to the same degree in the just concluded 6 month period.

Risk in our contest is bounded, not constant. Market regimes are never constant, so can't say that today's best algorithms will continue to be the "best" (top of the heap) in the subsequent 6 month period or when the next market regime switch happens.

At a minimum you should have one downturn, one bull market and one consolidation. That means starting from 2007 or before. Going further will be better but not possible in Quantopian much further.

Definitely agree to have a meaningful backtest that encompasses different market regimes. I would go even further as to propose a 2-3 year holdback data say starting 2016 to present as a OOS validation period which will have some scoring weight together with 6 month live OOS to account for consistency. As I have advocated before in previous posts, accurate measurement of metrics should have the same time period, same OOS start date and end date!

Per the requirements on Get Funded, the overall return just needs to be positive:

Positive returns

Strategies should have positive total returns. The return used for the
Positive Returns constraint is defined as the portfolio value at the
end of the backtest used to check criteria divided by the starting
capital ($10M).

The is no concept of "excess return" in evaluating algos (which might be o.k....see below). By the way, this requirement needs some minimum durations (e.g. minimum 2.5 year backtest, with minimum 0.5-year out-of-sample).

Given that Q is cobbling together a "crowd-sourced" fund of 30 or more algos (one would hope hundreds), incremental alpha accretion may be more important than locking in a specific return minimum (e.g. the risk-free rate).

It is also worth noting that the Get Funded requirements don't include incorporation of a volatility metric. So, presumably it is not taken into account in judging algos for funding? This is a big disconnect between the contest ranking, and the Get Funded requirements. It is implicit in the top-level goal "Create a trading strategy on our platform which will continue to make money in the future. " but there's actually no explicit guidance on volatility relative to returns.

Hi Grant,

Interesting you brought up the issue of volatility metric. While not highlighted in the Get Funded page, it is very much reflected in the scoring system of the new daily contests where the 63 day rolling volatility of the algo is use to adjust cummulative returns and is floored at 2%. A contest score of 1 can be achieved at the very minimum to have a 6 month OOS live returns of 2% with a 63 day rolling volatility of 2% or less. On the other hand, a score of 1 can also be achieved by having a 6 month OOS live returns of 50% with a 63 day rolling volatility of 50%. Given these two equal contest scores with extreme volatilities, it not clear how they will be ranked. Will they be ranked equally or should they favor the lower volatility which is more logical since at the fund execution level they plan to leverage it many times. The current contest rules does not explicitly address this issue, but I suspect they are particularly biased with low volatility algos. For these reasons, like Olive Coyote, I have refocused on containing risks to achieve low volatility after neutralizing beta, style and sector risks factors with the expectation of lower returns.

@ James Villa -

I have refocused on containing risks to achieve low volatility after neutralizing beta, style and sector risks factors with the expectation of lower returns.

Same basic approach here, using multiple pipeline factors and combining them (sum of z-scores). Main take-aways are to get to ~100 stocks minimum and keep daily turnover to ~0.1. And then waiting 6 months+, which as Jess pointed out is not the ideal cycle time for development. Ugh...

OC,

In my personal opinion, factors that affect the performance out of sample vs in sample are more than just whether the algorithm itself is overfit.

It is possible the algorithm latched onto some property of the market that has been persistent in the short term and continue into out of sample (but favorable conditions may not continue forever). In that case out of sample could still be good but not predictable over a very long term.

It is also possible the six month out of sample period is just a different market regime like the current 6 month period where there is only sideways movement. Algorithms that performed well in this period (like in the new contest) could be the ones that thrive on volatility (say mean reversion algorithms).
Whether mean reversion will continue to perform at the same level going forward is unknown, and neither is an out of sample under performance in such a period indicative of overfit of a strategy that was developed when markets were rising.

In my personal opinion it is important to get a handle on what drove the performance in-sample when we analyze whether out of sample is overfit. Also I consider it important to tally performance with the same market regime going backward.

Hi Olive, I can relate to your idea on primary/couple factors per model well:

4) Restrict models to having a couple of ideas / factors. Then have the discipline to say
that those ideas have now been ‘used up’. This also ensures a nice conveyor belt of fresh ideas.

I have hitherto been focused on primary+supporting factors in my contest and live algorithms - I treat them as OOS validation of the factors - still some distance from actually combining all the factors into one single algorithm.

ps: As for overfitting, the situation seems different if each trade is treated as if a game in reinforcement learning according to Tom Starke.

@Leo,
I agree with your thoughts...the current use of the word overfit is overloaded and under defined...in my opinion!
alan

Hi Antony,

In my experience, adding an additional factor, if it's different enough from your first factor, will bring up the turnover to above 5% average daily usually, even if both are based on the fundamentals dataset.

Folks might find some of the discussion in this book relevant:

Systematic Trading: A unique new method for designing trading and investing systems
by Robert Carver
Link: http://a.co/1t6CNJn

The author also presented at QuantCon 2017:

"Trading Strategies that are Designed, Not Fitted" by Robert Carver from QuantCon NYC 2017 https://youtu.be/-aT55uRJI8Q

Hi @Jess,
Just a comment and quick question regarding your comments on the new scoring system, as per part 2) of your post from 4 days ago, in which you state: "... it highlights the improved SIMPLIFIED scoring mechanism for the daily contest where if you qualify you're just ranked on risk adjusted returns. I think that will avoid this effect going forward where we see counter-intuitive rankings from a Sharpe basis".

My thoughts are that ALL different risk adjusted return metrics that one could think of, irrespective of whether Sharpe, Calmar, Sortino, etc or anything else are all using SOME sort of ratio of the form: (function of Return)/ (function of Risk), AND so is the new simplified scoring formula as well.

I certainly have no problem at all with that per se and if, in the opinion of Q, the new formula works "better", then that's absolutely fine by me. My question (in several parts) however is this: Suppose there are two different algos that score identically based on the new formula (or whatever other risk-adj return metric might be used in future), then how do you proceed from there in determining & ranking which algo is preferable in practice (e.g. for allocation)?

If 2 algos had the SAME risk-adj return score and, all other things being equal, then surely the one with the higher absolute return would be preferable? Is that right? Please could you comment.

Next part of my question is then, if we have 2 algos that are NEARLY the same in terms of Risk adj Rtn, with algo "A" being just SLIGHTLY better then algo "B" in terms of Risk-Adj Rtn, then my understanding is that algo "A" comes out ahead of "B" in the daily contest, irrespective of the UN-adjusted returns of either algo. But now if algo "B" is WAY AHEAD of "A" in terms of cumulative return (not risk adj), then, in practice for allocation, do we have a situation where Q would prefer algo "B" (with the much better absolute return) over algo "A" (with an ALMOST identical Risk-Adj Rtn and essentially identical in other respects)?

Putting this another way, is there potential for a situation in which the rankings for allocation would NOT correspond to rankings in the contest? I assume that the answer is probably yes. Please could you comment.

Cheers, TonyM.

Hi @Tony,

I have raised this point in my above post:

A contest score of 1 can be achieved at the very minimum to have a 6 month OOS live returns of 2% with a 63 day rolling volatility of 2% or less. On the other hand, a score of 1 can also be achieved by having a 6 month OOS live returns of 50% with a 63 day rolling volatility of 50%. Given these two equal contest scores with extreme volatilities, it not clear how they will be ranked. Will they be ranked equally or should they favor the lower volatility which is more logical since at the fund execution level they plan to leverage it many times. The current contest rules does not explicitly address this issue, but I suspect they are particularly biased with low volatility algos.

At the execution level of the Q hedge fund, an algo chosen for allocation will be subjected to a separate analysis by Q as to how many times it will be levered., the process being opaque maybe because it is proprietary but consistent with Steve Cohen's Point72 trading strategy which is to leverage it up to 8 times. Having said that, I believe Q would prefer to go with the algo that has a lower volatility (risks) rather than higher unadjusted returns assuming scores to be equal or almost equal .

The new contest format and scoring system utilizes one unit leverage, however at execution level it is leverage many times over depending on Q analysis of algo. I have raised this question to Dr. Jess Stuath and below is her answer:

@james you asked a question about why we want to evaluate strategies at unit (1.0) leverage. Specifically "This is what is throwing me off, is this the intended usage of Q's market neutral strategy? If so, then why not design the contest to reflect that (x times leverage)?"

The answer is that it makes our task of evaluating strategies at scale that much simpler if we can assume a fairly consistent leverage profile across all candidate strategies. In our investment process we apply leverage at the portfolio level, and we assign a weight in the portfolio to each individual algorithm. So we think about weights and leverage separately in our process. While it's certainly true that we could try to back out leverage applied at different levels by different users, that can get complicated if people use widely varying leverage over time in their strategies. There's nothing wrong with that approach in principle - but it not only makes our evaluation problem harder, it makes combining such a strategy into a portfolio of strategies more challenging as well. Under the current contest design the way I think about it is that we're creating a level playing field of max leverage = 1 and allowing people to compete to achieve the best results possible given that (and several other) constraint(s).

So, bottomline, even if your algo is chosen for allocation, the total amount of allocation and thus your potential earnings is dependent on how many times your algo will be levered which is a totally separate analysis and is solely a Q decision.

Hi @Antony, firstly my apologies to you, it was not my intention to sidetrack your discussion of your methodology.

Hi @ James, nice to chat with you again and thanks for your comments. Although I have been away from Q for 6 months, evidently we are both still continuing to think about very similar issues. The combination of your & Jess's comments as per your post above clarify the following points:

  • For evaluation & comparison purposes, Q wants a level playing field with leverage = 1, as per the contest. I think this is clear, as is Jess’ explanation of it.

  • For investment purposes, if Q selects an algo then Q will subsequently apply some leverage factor. It is not clear (to us) what that leverage factor will be, nor how exactly Q arrives at it. Personally I am quite OK with simply accepting that as being "Q-confidential" … unless Jess or someone else in Q would care to clarify further.

  • I believe, James, that you are correct in saying that: "the total amount of allocation and thus your potential earnings is dependent on how many times your algo will be levered". From the previous point, we don't (currently) know what that leverage factor would be, but does it really matter in terms of the algo design process? As both Q and the algo author share a vested interest in maximizing (risk-adjusted) returns therefore, at least from my perspective, I'm quite happy to just leave it that whatever leverage Q finally chooses is not really a "need to know" from an algo author's perspective.

  • Conversely however, what I think is VERY relevant to serious algo developers, is obtaining more clarity on Q's selection process, especially in cases where the Risk-Adjusted returns of several algos are almost the same (and therefore would score similarly in the daily contest). Evidently achieving good daily contest results constitute part of a "necessary but not sufficient set of conditions" for Q's selection of an algo for real-live trading. It is the set of "additional" conditions, in the form of Objectives or objective functions that ARE important to the algo designer.

-In particular, James, you have highlighted the fact that currently there is very little if any transparency on this issue. For my part I assumed that, given EQUAL Risk-adjusted returns, Q would probably put priority on higher absolute returns (which is what I do in algos for my own personal use as a secondary objective after the primary one of maximizing Risk-Adj Returns). You have suggested quite the contrary, namely that you believe: “ … Q would prefer to go with the algo that has a lower volatility (risks) rather than higher unadjusted returns assuming [Risk-Adjusted Return] scores to be equal or almost equal “. And you may well be right, but the situation is that we really just don’t know.

As part of a rational algo design process, it makes sense to have both Primary and Secondary Objectives. I think it has been made clear that the Primary one is the maximization of Risk-Adjusted Return in the well-defined form as stated by Jess and specified precisely in @Rene’s white paper on Q’s risk model. However what should we be using as Secondary Objective(s)?

Please @Jess, @Rene, @Delaney & others, can you give us some specific feedback from Q on the relative importance of multiple objectives beyond the primary one of maximizing Risk-Adjusted Return?

Regards, TonyM.

Hi Antony,
Thanks for your comments, and I think you have done a great job in your preceding posts, both in answering some questions and also in raising some interesting new ones! I will now try to link my interjection to your previous train of thought.

In your first notebooks you have certainly done exceptionally well in eliminating systematic risk (or the risk associated with "common returns" as it is called in Q's model). As you say, it was very well received by Q and this implicitly answers a question that I raised in a separate post asking if the goal is intended to be the maximization of "specific (i.e. non-systematic) returns" ? Although Q has not responded to my question (yet), apparently the answer seems to be yes, as you have effectively demonstrated, and I agree with your comment about systematic risk and its impact for a fund running multiple portfolios.

With regard to high Sharpe ratios (or any other quality metrics), I guess all we can say is that high values are "probably a necessary but almost certainly not a sufficient" condition for a good system. What we get matters less than how we got there!

As algo developers, Q has given us a precise set of constraints, a single objective function to maximize (namely one specific function of Risk Adj Rtn), some good general guidelines, but then Q seems consistently to avoid answering various other questions. Sometimes I find that Q's lack of direct answers to direct questions seems frustrating, but maybe there is an underlying reasoning behind it. Although there are other possible explanations, presumably Q's intention is simply to try to encourage as much algo diversity as possible, even though it can be frustrating to so frequently have to "tease out" information by inference.

For example, your comment: " What I have since learned ... is that they would like us to use as much data as possible". Just FYI at the Quantcons in Singapore for the last 2 years, Delaney offered (approximately) the following comment to people who aspire to win an allocation, saying that he had THREE hints for them, namely:
- Don't only consider price data,
- Look at Alternative data, and
- Use non-pricing data.
;-))

With regard to data hold-back, splitting data sets, and avoiding over-fitting, there are lots of different ways to do it (e.g. your Baysean approach, the webinar & elsewhere), and also in acknowledgement to @LeoM 's excellent comment: "I wanted to elaborate on that because I think that goes to the core of strategy development. Are we accounting for risk exposures in a way that the strategy is balanced in all market regimes without knowing which market regime it is currently operating in".

Sometimes we don't even know what regime the market is currently in. And of course we don't know what the market will throw at us in future. Will it be anything like what we have already seen at some time in the past? Will it be nothing like what we have ever seen before in THIS market but nevertheless maybe something similar to what has been seen in some other completely different market? Is data from other unrelated markets just a useless diversion, or is it in fact a plausible analogue for what MIGHT happen in a possible future market regime in our market, even if never seen before in our specific data set(s)? And of course on the other hand we always have the question of how really can we avoid, or at least minimize, the adverse impact of over-fitting or data-mining bias, or whatever else we call it?

I have 2 ideas that I come back to in my own personal system development outside of the Q context, namely:

1) Consider possible use of ALL data, from all real markets, everywhere, over all timescales and all time periods. Why? Because this is the only way to get a look at the full spectrum of possible market regimes that our system might need to be prepared for in an always-unknown future.

2) Use NO actual historical data series at all. Why? Because this is the only way that our system can avoid all the various forms of over-fitting. Base the system entirely on logic only, and avoid anything that looks like data mining in any way.

Neither of these are "conventional" approaches, and perhaps you might be skeptical, but personally I have benefited from at least considering the key aspects of both of these rather extreme ideas.

Cheers, all the best from TonyM
Looking forward to more interesting & practical discussions with you.

Yes, good example, you got it.
Some great ROBUST systems can be designed based only on thinking carefully about different aspects of known archetypal market behaviors and what therefore "should" work, without using specific data at all!

I'll quickly step in here and say that I agree. The best approach is simply borrowing from what we've learned is the best approach in the natural sciences. Use your understanding of the world (markets) to come up with an idea for something that is true, or should be true and probably isn't due to inefficiency. Then create a model which predicts future states based on this hypothesis. Certainly you should let the data inform your world view as you constantly update your prior, but remember that it's prior -> hypothesis -> model -> test -> updated priors. I did a long-form webinar about this a while ago if anybody is interested.

@Antony Jackson,

Very impressive indeed! Out of curiosity, are these two alpha factors or either one derived from fundamental data supplied by Morningstar? The reason I ask is because of active discussions on the shortcomings of fundamental data by Morningstar and my post here

Hi Anthony,

I have been testing 1 up to 5 fundamental factor combinations, scored across QTU over 10 years and sometimes the shortcomings of the fundamental factor/s used manifest itself in backtest performance in areas like low turnover, inconsistent returns / sharpe, etc.

Don't be afraid, if you do due diligence on the fundamental factors that you use. Things like checking and verifying if the factors you're using all have 4 quarters of data if that is the frequency of reporting, throwing away the Nans or stocks with insufficient data. My point being that data used as inputs should have veracity and consistency, otherwise, garbage in, garbage out. I would rather that Q perform this data integrity checks and standardization much like they did when they came up with the QTU universe, filtered and processed to their specs.

Hi Antony,

Great work! I’d be interested to know how correlated they are to each other. Have tried to run the combined factors through Delaney’s alpha correlation check notebook? If they are not very correlated, why not combine the factors into a single algo?

Gotcha, thanks! It’s an impressive one. Have you tested it OOS at all to see if/how overfit? Did you test the factors in AL?

I see. Have you checked the correlation between the factors’ return streams? Are they equally weighted, or you do something fancier? Mine are all equal weighted currently. I may look at giving more weight to stronger factors in the future though.

Fair enough. Eaqual weight of two highly correlated factors effectively means stronger weight to those factors than to other less correlated factors though, right? Not an easy thing, not to me anyway, and I’m very worried about overfitting too.

Good call I reckon. I try to do something similar. Don’t think I ever got a mean IC of > 0.1 though. If I get a mean IC of around 0.01 and risk-adjusted IC or around 0.1 I’m happy. :)

Possibly looking to assign more weight to factors with historically higher risk adjusted IC.

Hi Antony,

I saw this issue come up a few times now that the backtests run but the tearsheets hang. Did you know you can modify the notebook to run other subsets of analyses vs. the full tearsheet option? e.g. create_simple_tearsheet will return just summary stats table and a subset of the basic plots.

I'm ripping off a nice answer Cal gave someone in helpscout below in case its useful in more detail

Cheers, Jess

Cal's answer:
... all create_full_tear_sheet() does is make calls to other tear sheet generators, which you can run individually. Analyzing your backtest in this fashion will allow you to create tearsheets on larger backtests than would be possible otherwise.

Here are the functions that create_full_tear_sheet() calls:
create_returns_tear_sheet()
create_interesting_times_tear_sheet()
create_round_trip_tear_sheet()
create_capacity_tear_sheet()
create_risk_tear_sheet()
create_bayesian_tear_sheet()
create_perf_attrib_tear_sheet() (this one is usually the most memory hungry)
create_position_tear_sheet()
create_txn_tear_sheet()

Also, create_simple_tear_sheet() is a smaller version of create_full_tear_sheet(). You might be able to get away with running that one.

For example, try running the following code in a research notebook:

bt = get_backtest()
bt.create_returns_tear_sheet()

Lastly, a few of the functions that create_full_tear_sheet() calls are hidden behind if statements. For example, create_txn_tear_sheet() will only run if your backtest passes the following test:

if transactions is not None:
create_txn_tear_sheet()

You can view all of the functions, and the various attributes that an algorithm must have in order to create that particular tear sheet here:
https://github.com/quantopian/pyfolio/blob/master/pyfolio/tears.py#L67

Hi Anthony,

I'm glad it's working out on your holdout data.

There is some degradation of the Sharpe Ratio 'out-of-sample', but deep down I suspected that my Bayesian technique has been updating the 'prior' too rapidly with respect to the data.

What do you mean by "...updating the 'prior' too rapidly with respect to the data."? Can you please elaborate on that?

Hi @Anthony, regarding "This approach makes the memory limitation on tearsheets a plus for robust model development." I have had a similar experience. Sort of a blessing in disguise actually.

Hi Anthony,

Thanks for the reference. I'll have a look at it when I get a chance.

Not too shabby either I reckon. Well done!! I guess you’re not using FactSet for this one?

Is this OOS? If not, how does its OOS stats compare?

Cool! Very impressive strategies!

Killing all notebooks and starting with fresh memory at around 5% usually works for me for up to 7-8 year long backtests. You may be trading more and holding more positions though so might use more memory perhaps.