Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Master thesis ideas

I'm a CS major with my specialisation in AI, and I'm looking for topics for my thesis.

I am absolutely in love with automated trading, and the mathematics/machine learning behind it. However, my knowledge in the domain of trading/finance is not sufficient enough to come up with 'novel' ideas.That's why I'm reaching out to this community and every type of suggestion is very much appreciated!

Do you have any interesting ideas? Anything you want to investigate, want to see proved/tested? It can be absolutely anything, as long as it can be used to trade/generate signals. I get to spend 720 hours on this next year.

How about trying to predict a firm's success by analysing social media/impact of change in staff/profiling CEOs?

5 responses

Hi Thomas,

Here are a few experiments that might help spark a new idea:
Women-Led Fortune 1000 Companies The Rising Impact of Earnings on Stock Returns Social Media Trader Mood How Mass Shootings and Politics Boost Gun Shares

Perhaps you could combine some of these ideas into one?

I would also suggest taking a look through the list of third-party vendor datasets. It sounds like you may have already been looking through the list, but as a computer scientist with more of a background in informatics, I find these sets to be the most interesting. As an added bonus, in my (limited) experience, they're less explored so you might have more options in terms of novel ideas.

I also figured I'd chime in with some comments on what you should expect from running a big project like this on Q!

Benefits:
- Clean, uniform datasets. In my experience, in machine learning projects, data collection, cleaning, and organization can take a significant amount of time. Sometimes it's the hardest part. On top of being collected, cleaned, and organized, the data on Q is reproducible - something that is really nice for academic studies.
- Research. The research environment (IPython notebooks) provides you with an easy way to interact with the data. This includes pricing data, fundamentals data, and whichever sets you're working with from the third-party list. You'll be able to study the relationships in the data and formulate a hypothesis.
- Paper trading. At the end of your project, it might be a nice idea to test your idea with an out of sample test set. The best way to do this is to implement an algorithm trained on your model, and paper trade it (trades live with fake money).

Limitations:
- While machine learning is possible on Quantopian, there are some limitations. Namely, there are memory limits, and a select list of whitelisted libraries (see "What Libraries Does Quantopian Support?" here). Libraries such as sklearn, scipy, numpy, pandas, are whitelisted. Regarding the memory limits, you get about 4GB of RAM in research - test it out to see if it's enough!
- Transitioning from Research to the Interactive Development Environment (IDE) can be tough. This is something that we'd like to make easier, but right now, if you train a model in the research environment, the transition to implementing it in an IDE algorithm requires a bit of work.

Other notes:
- Data vendor sets, pricing data, and fundamentals data can only be used natively on the platform, it can't be exported. This is a deal we have with our data providers, though I wouldn't expect it to limit you. You're free to publish screenshots and results as you'd like.

My intention with this response was to give you a sense of how Quantopian can help you with your project. In my opinion, if you can work under the memory limit, the time saved on data collection/processing alone is worth it. You would likely be saving yourself hundreds of hours of work, and end up with a completely reproducible result.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Awhile back, I found this paper:

Ming Li; Xin Chen; Xin Li; Bin Ma; Vitanyi, P.M.B.; , "The similarity metric,"
Information Theory, IEEE Transactions on , vol.50, no.12, pp. 3250- 3264, Dec. 2004
http://homepages.cwi.nl/~paulv/papers/similarity.pdf

Quantopian supports zlib, so you can implement it (see https://www.quantopian.com/posts/pattern-recognition-based-on-zlib).

There have also been some clustering examples posted (e.g. https://www.quantopian.com/posts/1st-attempt-finding-co-fluctuating-stocks).

Here's another reference:

http://icml.cc/2012/papers/168.pdf

The paper has been implemented on Quantopian.

I'm not sure if there's any ground-breaking math/CS, or if the author just borrowed ideas from others. I have all of the key references somewhere, if you want them.

Thanks for the suggestions! I will be looking into them.

Thomas,

A couple other thoughts:

  • Check out the topic of compressed sensing. I'm know almost nothing, but it has been a hot topic in recent years.
  • If you have an applied math department, talk with them about what might be kinda cutting edge, that could be applied to trading.

Overall, I would think beyond just getting your thesis done, but writing something of interest to the field, so that you could submit it for publication, or do a conference poster or talk. If you apply techniques that are sexy, you'll have a better shot (even if you don't produce an algorithm that performs exceptionally well).