I have tried doing this same algorithm with the set_universe, however I've run into some problems. (I did only try it when the set_universe feature just came out so there may have been some bugs fixed by now).
These are the issues that I run into when trying this on the set_universe feature:
1. As far as I know you cannot set the exact number of stocks to return when using set_universe, it just returns all the stocks that fit within a certain range, which you can tweak but cannot set the exact number that are returned to you.
2. When I use set_universe, it sometimes misses price data and returns 0's for the prices for an entire security. When this happens, my algorithm will calculate 0's for the daily_returns for a specific stock. Later, when I use the daily_returns to calculate the covariance_matrix it will give me 0's for the row and column of the covariance_matrix that contains that specific security. When this happens, I get an error "Singular Matrix" when trying to take the inverse of that matrix, and this prevents me from calculating the GMV portfolio unless I loop through beforehand and remove any securities that have 0's in the returns.
3. Also, when I use set_universe, it returns to me a much larger set of stocks, which I know it is supposed to do. However, for this specific algorithm, and a few others, I would like to use a set number of securities to make up my portfolio. To do this, I iterate through each of the possible combinations of say 10 securities out of the say 40 securities that set_universe returns. The number of possible combinations for this set is very high and when I loop through all of them to find the optimal set of 10 securities with the least variance, the backtester will time-out and say that it has spent too much time in handle_data. Is there a way to possibly increase the time that it will allow you in handle_data before timing out?
4. For this specific example I am reallocating my portfolio every 10 days on the past 40 days of data. I have a circuit breaker to skip the entire handle_data function if the current day % 10 is not = 0, allowing me to only run handle data every 10 days when all the data refreshes. However, I have debugged and printed out both data.keys() and the keys in my batch transform and it looks like they overlap sometimes. So the set of stocks I am getting back from my batch transform are no longer in data.keys(). This causes an issue when I am in the process of reallocating because I am unable to loop through data.keys() and hit every security that I would like to deal with.
Not sure if anyone has come across these issues when dealing with set_universe, or whether or not some of these things have been fixed. But if so, please let me know. It would be much appreciated.