Quantopian research platform - comments & questions

posted Mar 24, 2015

API Research

Some comments & questions on the Quantopian research platform:

Are the docstrings available, other than by doing a 'shift-tab' to
see a pop up (e.g. on a web page that could be displayed in a
separate browser tab)?
What is the data source for the research platform? Is it the same
as the backtester? Same as live trading? Other?
Is there forward-filling of prices in the data set?
Is the volume of an empty bar reported as 0 or 'nan'?
There appears to be no way to preview a post to the forum,
originated from the research platform with a notebook attached. It
would be nice to see how the post will appear.
It'd be nice if the Quantopian-supplied and supported notebooks
(e.g. documentation and examples) were in a separate folder (or
color code or add a special icon). Also, it's o.k. that they are
editable/runnable, but it would be a good idea to add a 'refresh'
button so that users could get the most current versions at any
point.
It appears that the single folder of notebooks is just sorted by
title, alphanumerically. If you stick with the single folder
concept, there needs to be a better way of organizing.
Is there a way to get a list of all of the importable modules
available? Are there ones that are imported by default?
Is there a way to see intermediate output from a cell as it runs?
For example, say I'm processing 10,000 securities, and I'd like to
know which one I'm on. It would be nice to increment a counter and
display its value.
Any way you could enable notifications to a user's e-mail (e.g.
send_email('Computation complete.'))?
Reportedly, with a tweak, a more efficient version of zipline can
be created (see https://www.quantopian.com/posts/zipline-question).
Is it possible to use custom versions of zipline in the research
platform?
It appears that when I close a running notebook, it stops. Shouldn't it run in the background, as the backtester does?

96 responses

Mar 24, 2015

1 - If you type ? and then the function you are interested in seeing the doc string for, and execute the cell, it will open up in a secondary window.

2 - It is the same as the back tester.

3 & 4 - Prices are not filled forward. If there is a minute or day where a stock wasn't traded, all data will show as 'nan' or an empty bar.

5 - We have a lot of work to do on sharing. Adding preview, adding the ability to clone a notebook, the ability to edit a notebook you have shared, or post a secondary notebook. It's relatively high up on my priority list, but there are a couple of usability things before it (like fixing the file upload experience...have you tried that yet?)

The one question I have on notebook sharing is how we should deal with secondary data sources. Sometimes you have data you can share with the notebook (like my female CEOs notebook) and sometimes you don't (like the Estimize notebook). Should this be at the discretion of the notebook creator, and an option when you choose to share a notebook? Or is there a different way to handle it that makes more sense?

6 - I agree with having the documentation notebooks grouped together. I'm a little worried if I bury them, newbies won't be able to find the help they need. But we can find a way around that.

We had actually been discussing making them executable, but not editable. So you can run the code, but if you edit it and "mess it up" it won't save and you always revert to the documentation. The notebooks are great for teaching, it's hard to have documentation that you can mess up!!

7 - We talked about switching it to be ordered from top to bottom by recency, so your most recent notebooks are at the top. Would that be more useful for you?

8 - No there isn't right now, but it's a fair request and something I need to figure out how to convey.

9 - Not right now. We definitely need a more visible indicator that your cell is still running, but you are asking for something even more specific. Have you tried the get_backtest function yet? That has a progress bar. Would that solve your need?

10 - I love this idea in both research and in back testing. It's on my future feature list, but probably not the next, or even the next thing we work on. It is there though.

11 - You cannot import your own github fork of Zipline at this point. Do you think this is important? Especially if we are planning to add parallel processing backtesting to research?

12 - Notebooks do keep running, and cells will finish executing when you close your browser window, however the results will not be displayed when you open the notebook. For example, if you execute a cell with a bunch of print statements and close your browser, when you come back the cell will have finished executing, but the print statements won't show. The results will be stored in the namespace of your notebook, and so if you print the results from another cell, you won't have to run the long running cell again.

Disclaimer

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by Quantopian. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. No information contained herein should be regarded as a suggestion to engage in or refrain from any investment-related course of action as none of Quantopian nor any of its affiliates is undertaking to provide investment advice, act as an adviser to any plan or entity subject to the Employee Retirement Income Security Act of 1974, as amended, individual retirement account or individual retirement annuity, or give advice in a fiduciary capacity with respect to the materials presented herein. If you are an individual retirement or other investor, contact your financial advisor or other fiduciary unrelated to Quantopian about whether any given investment idea, strategy, product or service described herein may be appropriate for your circumstances. All investments involve risk, including loss of principal. Quantopian makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances.

Mar 24, 2015

Thanks for all your questions Grant. These are great!

Disclaimer

Thanks Karen,

In response to your questions:

I have not yet tried to upload and use an external file in the
research environment. Regarding memory management, supporting only
CSV may not be the best practice. Generally, it seems you should
support arbitrarily large files, which are compact on disk. In MATLAB, there is a native .mat format, which is handy. Also, I've dealt with large binary files that cannot be loaded into memory entirely, but are read in piecemeal with low level I/O for analysis. I don't have a specific use case in mind here, but I would suggest a roadmap beyond CSV support.
Regarding secondary data sources and notebook sharing, it all depends on your vision for "sharing" notebooks. My sense is that for both algos and notebooks, rather than having a proliferation of hacks and versions spread across the forums, they ought to be hosted in a controlled fashion on something like github, with any data sources stored there, as well. The problem I see is that via the forum, you are enabling uncontrolled copying of notebooks, not sharing. I think you really want one user to run a version-controlled notebook, created by another user, with certain permissions. The secondary data would be handled in a similar fashion, and could be made read-only so that they are not corrupted.
Organizing notebooks by recency might be an improvement, but come on guys & gals, we can do better than that! If you insist on a single folder, then there should be a way to add custom tags and then sort/filter on those tags (e.g. "Stock screening project").
There's a general problem of getting output from a cell while it is running, for monitoring progress and as a means of debugging. Is it possible to direct output immediately to a separate window, rather than buffering it? It must be there in memory, right?
I'm not sure importing customized zipline code is necessary, if you incorporate changes to the Quantopian code to make it amenable to mapping response surfaces and optimization. The link I provided suggests over an order of magnitude improvement in efficiency, which could turn an overnight computation into an hour-long one. Regarding parallel processing, my guess is that you'll still want the added efficiency. Even for a relatively coarse 30x30 element heatmap, you'd need to support 900 backtests in parallel, times 35,000 users, is 31,500,000 backtests, all running at once!

Grant

Another question -

Does each user have a personal copy of the stock database? Or are all users pulling from a central database? Just curious, since I figure the load on a central database, particularly if users start doing a lot of stock screening, could get heavy and bog it down. I'm playing around with developing a screen that pulls, sequentially, minute-level price data for 1867 stocks over a two year window, so it's a lot of data pouring out of the database.

Blue Seahawk

Q, for organizing, on a page with a list of notebooks, it is easy (well, I mean compared to writing stock market trading algorithms) to populate a table with title, date, size, whatever and even easier to make those table columns sortable with some js, examples here, just use <table class="sortable"> and then you also need to use a header row (<thead>). That's all there is to it, the page makes it seem more difficult than it is.

Karen & Co.,

Mentioned in a talk at QuantCon:

https://elsen.co/

Might provide a benchmark for you, in terms of processing power.

Grant

Saurabh Purnaye

Additional question(s)
Do we have fundamental data available in research platform? If yes, is there any documentation anywhere?

Alisa Deychman

@Saurabh, fundamentals are not currently integrated into the research platform, but they will be in the future. This is something we're working on right now.

Disclaimer

Saurabh Purnaye

Thanks for reply @Alisa. Also as per following should I be able to use normalized_diluted_eps_as_of field?

"as of" Dates Each fundamental data field has a corresponding as_of field, e.g. basic_eps also has basic_eps_as_of. The as_of field contains the relevant time period of the metric, as a Python date object. Some of the data in Morningstar is quarterly (revenue, earnings, etc.), while other data is daily/weekly (market cap, P/E ratio, etc.). Each metric's as_of date field is set to the end date of the period to which the metric applies.

For example, if you use a quarterly earnings metric like basic_eps, the accompanying date field basic_eps_as_of will be set to the end date of the relevant quarter (for example, June 30, 2014). The as_of date indicates the date upon which the measured period ends.

Quantopian currently exposes the most recent period's data. If you need previous periods' metrics, you must manually save those metrics in your algorithm as your backtest progresses.

Mar 26, 2015

Another question -

I'd like to do a screen over all of the available securities in the database. Is there any way with 'symbols()' to get the entire list? For example, is there a wild-card character, e.g. symbols('*'), or a flag, e.g. symbols(all=True)?

Screening over the entire database would seem to be a typical use case, so I'm wondering how it is supported?

Mar 26, 2015

Can security data be pulled from the database by security ID (the sid number)? If so, how? And presumably, the numbers range from 1 to N? Is there any way to determine the current value of N from the notebook?

I'm figuring this way I could loop through all of the securities in the database, by sid number.

Mar 26, 2015

Here's code for finding all of the sid id's:

n = 0  
sids = []  
for sid in range(100000):  
    try:  
        get_pricing(sid,start_date='2013-03-25',end_date='2015-03-25',fields='price',frequency='daily')  
        sids.append(sid)  
        n += 1  
    except:  
        pass

The assumption is that integers are used, in the range of 0 to 99,999. I have it running now; we'll see what happens (I ran it with range(10) and it worked just fine).

Mar 27, 2015

How can I tell how many kernels I have running? The concern is that if the notebooks run in the background, after closing them, then I could have numerous ones running and not know it, consuming resources. Also, is there a way to kill all of them (e.g. with a single click)? Or would I need to open every notebook and do it manually?

Does the import of modules eat into the RAM available to a user of the research platform?

In an attempt to keep things organized, from https://www.quantopian.com/posts/the-efficient-frontier-markowitz-portfolio-optimization-in-python-using-cvxopt, there was some relevant Q & A:

Karol:

Is there a way to read an external csv from the Research platform? fetch_csv is not there yet and pd.DataFrame.from_csv seems to be disabled.

Grant:

I think one should be able to drag a file into the notebooks/data area, and it'll be uploaded. However, for the life of me, I can't get it to work. The operation just ends up making a copy of the file on my local drive. I get the same behavior with both chrome and firefox browsers.

Thomas:

As to the NB, there's a specific area where you have to drop the file I think, specifically the top line that says "Notebooks" with the "new notebook" button.

I still have not been able to get the file upload to work, so if anyone knows how, please advise.

Josh Payne

http://cl.ly/image/1H2y1E2g0o3o

Grant, for me, I need to drag a file to the area outlined in the image below:

It's in the data subdirectory of quantopian.com/research

I'm on Mac/Chrome and I drag the file from the finder to the target area.

Disclaimer

Thanks Josh,

I'm in https://www.quantopian.com/research/data, and I have a file (test.csv) on my local desktop. When I drag it over to the "Notebooks/data" area, as you show, I get a little box/flag to the right of the file, "+ Copy". Then, if I let go of the mouse, the file ends up in:

C:\Users\Grant\Downloads

It does not end up in Notebooks/data.

I'm on 32-bit Windows 8.1 Pro, running Chrome Version 41.0.2272.101 m.

If you need more info., just let me know.

Josh Payne

Yeah, the trick is to find the spot where the "+ Copy" tooltip text disappears.

Disclaimer

Eureka! The tooltip has to say "+Move" and then it works! Pretty subtle.

Also, say I have 100 files to upload. Would each one need to be uploaded individually? When I try dragging two at a time, they show up in Notebooks/data, but then I don't see a way to upload them. Could you give this a try on your end and see what happens?

And they'd all end up in the Notebooks/data folder. In MATLAB, I've found it handy to be able to dump a bunch of files into a folder, and then just read in all of the files from that folder to the MATLAB workspace (via the code, not manually). The problem I see with putting all files in a single folder is that then within the code to analyze those files, there has to be some way to identify them as a group (e.g. special prefix/suffix in the file name).

I'm glad you figured it out. Like I mentioned in my previous post, we know this experience needs a lot of work and it's getting revamped right now.

Before you upload 100 files, I want to let you know that data files aren't being persisted right now. This means if the server hosting your research environment gets restarted (which happens roughly once a week to deploy code) these files will have to be re-uploaded. This is definitely not ideal and something we need to fix.

Another thing that is coming soon is the ability to put datafiles and notebooks into directories. That should help with the organization. Right now you cannot do a bulk import of data files into the system. It's an interesting idea and good to know this is how you work in MATLAB.

Disclaimer

Thanks Josh, Grant.
I confirm this works on Mac/Chrome but not on Mac/Safari (the Upload button disappears on Safari). In my case it only worked when I dragged the file in a place without the "+" tooltip on it, strange.
Thanks for the note @Karen, I'll take that into account.

Thanks for the note on persistence of data files. I'm assuming that if data from a file get loaded into a notebook, those data will persist (so long as the user does not overwrite them or restart the kernel).

Is there a disk space limit on storage? And if so, how can one tell how much of it has been consumed? Maybe I'll just try uploading a humongous file, and I'll get an error message that says "The file is too big. You only have X GB of storage."

In Xubuntu, on google Chrome, I can drag an empty file (0 bytes) to the data folder & its file name shows up in the list, but I can't upload it (no button available for upload). If I try to drag a large file (1.4 GB) I get a (unhelpful) generic error message.

Also, in Xubuntu, using Firefox, I am bounced to:

file:///home/grant/Desktop/New%20File

when I try to upload a small file. It doesn't even appear in the list. Oddly, when I try to upload the large file, I just get a dialog box asking if I want to open the file or download it.

BTW, which function can we use to read the files in the data/ dir?

Oh, it must be local_csv()

Karol,

There should be an example notebook "Tutorial - Loading a CSV from Your Data Directory.ipynb" for you. The function is:

local_csv

Grant

Another problem:

I started a notebook cell running earlier this morning, and then closed the notebook. When I re-open, there is no longer a '*' in the cell; it's empty:

In[*] changed to In[]

Here's the code that may still be running:

In [ ]:

# get list of security id's, assuming a range of 0 to 99,999  
start_time = time.time()  
n = 0  
sids = []  
for sid in range(100000):  
    try:  
        get_pricing(sid,start_date='2000-01-01',end_date='2015-03-27',fields='price',frequency='daily')  
        sids.append(sid)  
        n += 1  
    except:  
        pass  
print("--- %s minutes ---" % ((time.time() - start_time)/60))

Does the absence of a '*' mean that the cell is still running? Or is it dead?

In my call to get_pricing() in my post immediately above, do the data get transferred to a buffer, from the database (which presumably incurs a read from disk)? Yesterday, the code took about 3 hrs. to run, so I'm figuring that it must actually be pulling data from the database, even though I don't make an assignment in the notebook.

How do I set the benchmark to compare algo results to it?
I know one can compare two algo results manually on Research, but I find it strange that set_benchmark() is not available in Research while it is on Q platform, should be easily doable, right?

Not sure. My understanding is that Q is working on making it so that backtests can be run in a kind of batch mode from Research using the IDE backtester back-end (for lack of a better term), and you'd have access to the results, relative to a benchmark set with set_benchmark(). I haven't ever used zipline; you could try posting to https://groups.google.com/forum/#!forum/zipline where I would expect someone can assist.

I tried uploading the 1.4 GB file on my Windows 8.1 machine, Chrome browser, and it fails, too. I'm wondering if something more robust, like an old-school ftp might be in order here. --Grant

Above, Karen confirmed that the bar data provided by Q in the research platform are the same as for the backtester, but I get this:

In [5]:  
symbols(24)  
Out[5]:  
Security(24, symbol=u'AAPL', security_name=u'APPLE INC', exchange=u'NASDAQ GLOBAL SELECT MARKET', start_date=u'Mon, 04 Jan 1993 00:00:00 GMT', end_date=u'Mon, 23 Mar 2015 00:00:00 GMT', first_traded=None)

Shouldn't the end_date be Friday of the past week, not Monday?

When I run this code (within a notebook cell):

start_time = time.time()  
sids = np.zeros(100000,dtype=int)  
for s in range(1,100000):  
    try:  
        symbols(s)  
        sids[s-1] = s  
    except:  
        pass  
sids = sids[sids>0]  
print("--- %s minutes ---" % ((time.time() - start_time)/60))

I do not get consistent output:

print('Max security id: ' + str(np.amax(sids)))  
# Max security id: 48602  
# Max security id: 48441

What's going on? I should get the same max security id every time, right?

This works wonderfully (and almost instantaneously!):

s = range(1,100000)  
stocks = symbols(s)  
sids = [stock.sid for stock in stocks]

consistently yielding:

In [15]:  
max(sids)  
Out[15]:  
48796

In [16]:  
len(sids)  
Out[16]:  
19762

So, it is a fix for the problem I reported above. Nevertheless, getting all of the securities in a loop should yield the same result every time (and it suggests that there might be a problem with the robustness of filtering securities by looping over a list of them, and pulling data sequentially from the database for analysis).

Another question:

I'm a bit perplexed how to persist the results of analysis in the research environment. I gather that there is no way to write to a file on disk. But I see that a given notebook can be saved (presumably to disk, and not to RAM), but when I restart the kernel, everything gets wiped out. Is there any way to persist specified data generated within a notebook, but still be able to re-start the kernel, to wipe out everything else?

https://www.quantopian.com/posts/get-fundamentals-slash-query-from-a-research-notebook
https://www.quantopian.com/posts/can-we-save-data-into-file-in-research-and-use-it-in-an-backtest-slash-live-algorithm-dot

More questions (from Saravanan Shanmugham):

When doing a "Run All" the behavior is a bit mysterious. My notebook flies through a bunch of cells, and then, I see:

In [*]:  
len(sids)  
Out[5]:  
19762

with the remaining cells below showing "In[*]" (one of which takes tens of minutes to execute). So, why would I get a "In[*]" for a trivial computation? Shouldn't all of the cells prior to the computationally intensive one show completion?

@Grant
I want to try and get at the confusion around the persistence of your data, the kernel and the running of cells.

First the (*)s and some details on knowing when a cell is executing.

When a cell is executing, or queued to execute, you will see a (*) next to it on the left hand side
In the run menu, you can run all, run all above or run all below, to mass execute your notebook. When you do this, you will see star's in the cell that is currently executing, and those queued up to run.
You can also tell when you kernel is running because the circle on the right hand side of the menu bar will be filled in (black) when the kernel is working, and empty when it is not.

With regards to closing a notebook while there is a process running.

Notebooks do keep running, and cells will finish executing when you close your browser window.
The results will not be displayed when you open the notebook.
For example, if you execute a cell with a bunch of print statements and close your browser, when you come back the cell will have finished executing, but the print statements won't show. The results will be stored in the namespace of your notebook, and so if you print the results from another cell, you won't have to run the long running cell again.
In your example above, if you print sids when returning to your notebook, you should have been able to print out the entire list.
In this case, the absence of a (*) indicates that the cell has completed running.

With regards to kernel management.

There is no limitation on the number of kernels you have running. I expect you will run into memory issues eventually if you have too many, but you aren't limited today in the total number you can start.
You can stop the kernel either from within the notebook, or on the notebook list screen. If you mouse over the notebook title on the list page, you will see a stop button/delete button and a duplicate button, on the right hand side. When the kernel is running, this will be a stop button to stop the kernel. If the kernel is not running, this will be a delete button to delete the notebook.
Stopping a kernel will always clear your namespace and remove all of the values stored in memory. There isn't currently a way to store a single large dataset so that you don't have to rerun it every time you clear the kernel, this is something that is on the list.

Disclaimer

Thanks Karen,

Well, I started a notebook running, then I navigated to 'Notebooks' with the list off all my notebooks. Then I stopped the notebook I'd just started, and instead of a square I got an 'x'--great!. Then, I opened the notebook again, and it was still running; the circle was filled and when I floated over it, it said 'Kernel Busy'. Eventually, I got the empty circle back, but it took a bit. Is there latency?

Also, I'm confused about the difference between stopping, interrupting, and restarting a kernel. And are stopping and interrupting different names for the same operation? I tried the interrupt button, and it breaks out of an executing cell, but it did not clear the namespace of the notebook. Is it only restarting of the kernel that clears the namespace?

Also, if I'm understanding correctly, to be sure that I have all of my memory available to a single notebook, I need to go through and re-start every one of my notebooks? Perhaps you could explain a bit what goes on behind the scenes in terms of RAM versus disk storage, etc. since I have the impression that results just stay in RAM forever.

James Jack

Dear Karen / everyone,

1) I'm hitting a lot of security-wall functions. e.g. numpy.full(), or scatter() from matplotlib.
2) "from matplotlib.finance import candlestick" would be rather nice.
3) Cells have no line numbers. e.g. "Error on line 328"... which line is 328?
4) Notebook->Download as-> PDF causes Internal Server Error (500).
5) The need for Quantopian in Quantopian. Basically, what you need to really "sell" the research platform, is to get easy steps between it and backtester. I.e. from research -> backtesting -> tuning -> backtesting -> paper trading -> profits. To do this, you need some (or all) of the functionality of the backtester in the research platform. One possibility is to let people write backtester programs and fire them off all from within the research platform, but I think this is going too far. All you need is the ability to fire off functions in the same form as the schedule_function calls. You should be able to write a function of the form def my_backtester_function(context, data):, which would receive data in the normal way, and would be called a number of times between a set of datetime's. The aim here is that you can copy+paste my_backtester_function from research straight into the backtester.
6) Some way of transmitting data between a backtest and the research platform. Perhaps this could be a string for each order(), or a log_to_research() function that records the datetime and whatever string, and this data can be accessed from the backtest object. Recording objects rather than a string would also be infinitely useful.

Just some thoughts - apologies if they have been mentioned elsewhere.

@James,
I am 110% in agreement on 5 & 6. We are focusing now on getting the tool out to everyone, but these are the key things we need to work on once we have done that.

We are also going to definitely run into security issues. If there are libraries or functions you need, please let us know by filling out this form. We will be working to add libraries as we go along.

Thanks for the feedback!

Disclaimer

Deleted User

How do I activate the Research platform ?

Thanks,

Scott Sanderson

TIL that matplotlib.finance exists...adding some portion of that module should be straightforward.

Notebook->Download as-> PDF causes Internal Server Error (500).

Download as PDF isn't especially useful if you can't supply your own LaTeX templates (IPython uses LaTeX and pandoc to convert to PDF). But we should probably remove that option from the dropdown.

Disclaimer

@Nicolas we are slowing rolling the platform out. Make sure you have signed up for the beta, and we'll get it to you as soon as possible.

Disclaimer

@Karen Thanks

I'm getting some new output (shaded in pink) when I run certain cells:

# get list of all valid security id's, assuming a range of 1 to 99,999  
s = range(1,100000)  
stocks = symbols(s)  
sids = [stock.sid for stock in stocks]

[2015-04-06 11:50:57.369680] INFO: requests.packages.urllib3.connectionpool: Starting new HTTP connection (1): localhost
[2015-04-06 11:50:58.963485] DEBUG: requests.packages.urllib3.connectionpool: "POST /api/symbols HTTP/1.1" 200 6524420

# for returns of each security, compute the ratio of the mean return divided by the standard deviation  
# of the return (note that this is a crude first attempt, just to test overall feasibility)  
start_time = time.time()  
ratio = np.zeros(len(sids))  
for k,s in enumerate(sids):  
    p = get_pricing(s,start_date='2015-02-01',end_date='2015-02-28',fields='price',frequency='minute')  
    p = p.ffill()  
    rt = p.pct_change().dropna()  
    rt_mean = rt.mean()  
    rt_std = rt.std()  
    ratio[k] = rt_mean/rt_std  
print("--- %s minutes ---" % ((time.time() - start_time)/60))

[2015-04-06 11:49:22.989503] INFO: requests.packages.urllib3.connectionpool: Starting new HTTP connection (1): localhost
[2015-04-06 11:49:23.279861] DEBUG: requests.packages.urllib3.connectionpool: "POST /api/pricing HTTP/1.1" 200 158483
[2015-04-06 11:49:23.285878] INFO: requests.packages.urllib3.connectionpool: Starting new HTTP connection (1): localhost
[2015-04-06 11:49:23.300574] DEBUG: requests.packages.urllib3.connectionpool: "POST /api/pricing HTTP/1.1" 200 158500
[2015-04-06 11:49:23.305723] INFO: requests.packages.urllib3.connectionpool: Starting new HTTP connection (1): localhost
[2015-04-06 11:49:23.318032] DEBUG: requests.packages.urllib3.connectionpool: "POST /api/pricing HTTP/1.1" 200 158486
[2015-04-06 11:49:23.323096] INFO: requests.packages.urllib3.connectionpool: Starting new HTTP connection (1): localhost
[2015-04-06 11:49:23.347833] DEBUG: requests.packages.urllib3.connectionpool: "POST /api/pricing HTTP/1.1" 200 158496
[2015-04-06 11:49:23.352970] INFO: requests.packages.urllib3.connectionpool: Starting new HTTP connection (1): localhost
[2015-04-06 11:49:23.654685] DEBUG: requests.packages.urllib3.connectionpool: "POST /api/pricing HTTP/1.1" 200 158490
[2015-04-06 11:49:23.660541] INFO: requests.packages.urllib3.connectionpool: Starting new HTTP connection (1): localhost
[2015-04-06 11:49:23.681507] DEBUG: requests.packages.urllib3.connectionpool: "POST /api/pricing HTTP/1.1" 200 158505
[2015-04-06 11:49:23.686651] INFO: requests.packages.urllib3.connectionpool: Starting new HTTP connection (1): localhost
[2015-04-06 11:49:24.117749] DEBUG: requests.packages.urllib3.connectionpool: "POST /api/pricing HTTP/1.1" 200 158491
Etc...

Is there any way to suppress the output? It creates clutter, and in the case of looping over all securities, must create unnecessary overhead, since it appears that all of the messages are being printed out to my notebook dynamically.

EDIT - And eventually, the entire notebook locks up as the loop executes. And finally, I get an error from Chrome that the page is dead. : (

Apr 6, 2015

The issue seems to be that we are overwhelming the system with so many log lines. The new debug lines you noticed this weekend happen with every sid you send to get_pricing. There are so many in this particular example, that the whole systems gets overwhelmed and essentially crashes.

I've submitted two bugs to the team (one to remove these debug lines and the other to better throttle the standard out and standard errors) which should help. I don't have an eta on when they will be taken care of, but will let you know when I do.

Disclaimer

Apr 7, 2015

Is the get_backtest() function fully enabled? I saw that Josh used it in an example (https://www.quantopian.com/posts/value-investing-in-quantopian-comparing-the-acquirers-multiple-to-the-magic-formula), but the docstring has not been fleshed out:

Type: function
String form:
File: /home/qexec/src/qexec_repo/qexec/research/api.py
Definition: get_backtest(backtest_id)
Docstring: Get a backtest

Seems like a nice feature, so I thought I'd give it a try.

Apr 7, 2015

Would it be possible to customize the Security object, in a notebook? For example, say I wanted to add an attribute, ETF, like this:

Security(24, symbol=u'AAPL', security_name=u'APPLE INC', exchange=u'NASDAQ GLOBAL SELECT MARKET', start_date=u'Mon, 04 Jan 1993 00:00:00 GMT', end_date=u'Mon, 06 Apr 2015 00:00:00 GMT', first_traded=None, ETF=False)

Security(8554, symbol=u'SPY', security_name=u'SPDR S&P 500 ETF TRUST', exchange=u'NYSE ARCA EXCHANGE', start_date=u'Fri, 29 Jan 1993 00:00:00 GMT', end_date=u'Mon, 06 Apr 2015 00:00:00 GMT', first_traded=None, ETF=True)

Would this be possible? Advisable? Better approach?

Generally, would there be a way to capture a multitude of security attributes, across all approx. 20,000 securities versus time. And then be able to do queries, comparisons, filtering, statistics, etc. Set up a custom database, I suppose. Seems like the kind of thing a researcher would want.

Apr 7, 2015

@Grant - get_backtest is functional. Thanks for mentioning that the docstring was not useful. We will take care of that.

You should now have a Tutorials folder, where we have combined all the latest versions of the documentation. In the API documentation is information on using get_backtest.

I'm not sure if you can modify a security object. I would probably create a dataframe with all of the ETFs security objects and use that to add or remove them....but that might not be the best approach.

Disclaimer

Apr 9, 2015

What are the terms of use of the research platform, with respect to commercial applications? Say someone wanted to use the platform as a consultant, charging their outside customers consulting fees, and a charge for reports?

Apr 9, 2015

The new area for file upload is nice--thanks! I tried my 1.4 GB nonsense file again, just for yucks, and still got an unhelpful error message. Would it be possible to report the max. acceptable file size, along with the error?

How should I interpret the difference between a minutely closing price and the opening price of the next minute? Would these be consecutive individual historical trades? How would they be interpreted in the context of high-frequency trading (or does that all take place in so-called 'dark pools')? Basically, when I look at OHLCV data, what is it? Where did it come from? What's missing that might be important?

Simon Thornington

They are almost certainly consecutive trades.

Simon,

What about the HFT stuff? Say there's a trade at 10:30 - dt and another one at 10:30 + dt, with dt being a tiny number (microseconds?). Does the Q data set include those individual high-frequency trades, or are they aggregated somehow? I'm just trying to get a feel to what extent the data represent the actual market. Maybe the swings are larger than what the OHLCV data would suggest?

I just find it hard to imagine that the market is being sampled at such a high frequency to generate the Q minute bar data set.

Grant

Simon Thornington

Well, I don't really know how Quantopian has implemented the aggregation, but usually, trades come from some consolidated trade feed with a total ordering. Whether this total ordering is "correct" is largely a philosophical question, given relativity. In any case, some trade will have a timestamp which, according to this feed, should be part of the 10:30 minute, and the next trade should not be, and thereby a new bar will be formed. If Quantopian is using an NxCore feed to build their realtime bars, then I would expect that each trade that happens is separately reported. Whether or not it makes it into one bar or the next depends which timestamp is being used and where it was applied (exchange, NxCore, Quantopian processor), so which bar it ends up in is uncertain, but between two bars, I think it's safe to say there were no trades, and that the closing price of one and the opening price of the next were made by consecutive trades, where "consecutive" is a bit fuzzy.

Interactive Brokers' trade feed, however, I believe does aggregate trades. Furthermore, many retail platforms aggregate trades. This is usually only a performance optimization for those ones that have stupid per-tick painting logic (I dealt with this in a previous life). If such a performance optimization resulted in different bars for different people, though, that would be a bug. Since Quantopian customers only get the bars once they are fully complete, this should be a non-issue.

There are a few methods to try and fit a volatility to a time-series using OHLC data, http://www.atmif.com/papers/range.pdf is one such, though it assumes daily data so you'd have to change a bunch of the parameters for "overnight" to use it with minute data.

More generally, if volatility estimation is frequency-dependent, then this is evidence that the underlying process is not Brownian motion, and you might be looking for evidence of mean-reversion or autocorrelation. This is why "variance ratio" tests for non-random-walks are used, I believe.

Note also that the OHLC data up until recently may have only included round-lot trades. Odd-lot trades weren't disseminated on the SIP feed, and for this reason, a lot of HFT used to use odd-lots to avoid detection. This was changed in 2013, so that odd-lot trades started getting reported on the SIP-consolidated feed. There have been some studies of what impact this had on data quality, like http://www.efa2012.org/papers/s1f1.pdf (written prior to the rule change). I think there have been some since too.

Apr 13, 2015

@ Simon. Thanks for the paper on volatility estimation using OHLC data.

@ Q Team. I was able to download the PDF to my local desktop, and then upload it to the Q research platform. Upon clicking on it in the research platform, it immediately pops up in a separate browser tab. Nice! Now it is in my 'notebooks' folder, but then if I try to drag the file, and drop it into my 'data' folder, it doesn't work. Should it? Is the idea that all of normal copy/cut/paste/drag/drop operations should work, once files have been uploaded? Also, can I create my own folders and directory structures?

I'm trying to post a notebook to the forum, but when I click the "Submit" button, I get:

There was a problem submitting your post, please try again or contact Quantopian support.

Here's the code I'm trying to share. Quantopian support is also welcome to grab it off of your server: https://www.quantopian.com/research/notebooks/heatmap%20example.ipynb

# coding: utf-8

# In[1]:

import pandas as pd  
from scipy import stats  
from pytz import timezone  
import matplotlib.pyplot as plt  
import seaborn as sns


# In[2]:

data_cached = get_pricing(['SPY','SH'],start_date='2014-03-27',end_date='2015-03-27',fields='volume',frequency='minute').dropna()


# In[3]:

data = data_cached.copy(deep=True)


# In[4]:

data['time'] = data.index.tz_convert(timezone('US/Eastern')).time  
data['date'] = data.index.date


# In[5]:

data['ii'] = range(len(data))


# In[6]:

def z_diff(ii, df):  
    x_df = df.iloc[map(int, ii)]  
    x = x_df.iloc[:,0].values;  
    y = x_df.iloc[:,1].values;  
    return stats.zmap(y[-1],y)-stats.zmap(x[-1],x)


# In[7]:

data['z_diff'] = pd.rolling_apply(data.ii,390,lambda x: z_diff(x, data)).dropna()


# In[8]:

ht_map = pd.pivot_table(data,'z_diff',index=data['date'], columns=data['time'],fill_value=0) 


# In[11]:

plt.pcolor(ht_map)  
plt.colorbar()  
plt.clim(ht_map.min().min(),ht_map.max().max())  
# sns.heatmap(ht_map)

When I try to download a notebook as a PDF, I get:

500 : Internal Server Error
The error was:
nbconvert failed: [Errno 2] No such file or directory

Adam Blackwell

Hi Grant,

Thank you for the heatmap example! We have a problem with sharing right now that we're working on & we know about the PDF export issues. Right now we're not planning on supporting PDF exports since you can't supply LaTeX templates, we just haven't removed it from the dropdown yet.

Disclaimer

LaTex templates? Sorry, I don't follow. I just looked at http://ipython.org/ipython-doc/1/interactive/nbconvert.html, and it sounds like it wouldn't require any invention on your part.

Scott Sanderson

The --to latex mode requires that you pass a latex template to use (the pdf rendering uses latex as its actual rendering engine). The standard templates aren't super useful, and require several large (on the order of gigabytes) packages of latex extensions. In general, you're much better off with HTML export.

Disclaimer

Adam Blackwell

Just as an update, sharing is working now. Looking forward to seeing your heatmap example as a shared notebook.

Disclaimer

See https://www.quantopian.com/posts/research-platform-how-to-get-a-nice-heatmap.

@ Adam,

It still appears that there is no way to attach a notebook in the forum, other than as a new post, correct?

Also, what's the thinking on supporting collaborative research, via github or a similar multi-user, version controlled environment? Are you thinking of doing notebooks the same as algos--cloning via the forum? Ughh!

It appears that get_backtest() does not capture the backtest code. I would suggest making it available, along with the simulation environment variables. That way, a user can grab all relevant information, and have access to it in IPython, versus needing to dig back into the backtesting platform.

Hey Grant,
I love the suggestion to make the code available via get_backtest. I'll add that to the list.

As for sharing. On my short list are adding cloning and the ability to attach a notebook to an existing forum thread. There are a few things in front of them, but they are pretty high up. Collaboration is on the list too, but the sharing things (I think) will be first.

Disclaimer

Apr 18, 2015

Some form of github integration would be really nice, even if it were unidirectional. How hard would it be to allow users to pull in notebooks from github? I realize that there are security and IP concerns over allowing code to be pushed out, but pulling in notebooks would open up a lot of possibilities, including mult-user collaboration on github, with the convenience of pulling in new revisions for editing and testing. This way, Q research platform users could open-source selected notebooks and make it convenient for other users to grab the latest revision and give it a whirl.

Apr 19, 2015

Regarding get_backtest(), would there be a way to get all of the backtests of a given algo? And capture the log output of each? I'm trying to figure out how to analyze results versus parameter variations (e.g. generate a 30x30 heatmap of returns, thus requiring 900 backtests). With a lot of clicking (which potentially could be automated), I think I know how to launch lots of backtests running in parallel, but I don't see how to post-process the results conveniently in the research environment.

Or perhaps the backtest IDs are incremented in a deterministic way (e.g. blah, blah+1, blah+2, etc.)? Then it would be a matter of determining the first one, and looping over the remaining ones.

Also, I've never seen a stated limit to the number of backtests that could be run in parallel? Would there be a problem with my pushing the limit--my goal is 900 (which, by writing a script that does mouse clicks for me should be feasible)? I did about 10 in parallel this morning, so why not 900?

Apr 20, 2015

Grant,
There isn't an easy way yet to get all the backtests of a given algo. I think we need to do some work inside research to make this easier, because getting the backtest ID from the URL on the backtest results page is not a long term workflow. It's good to know you would want all of the results of a given backtest. That will help us when we figure out how to build it.

There isn't a stated limit on the number of backtests you can run in parallel, but running 900 will cause us pain. I don't have guidance to give you on what you can reasonably do today, but we'll look into it.

Disclaimer

Apr 20, 2015

Thanks Karen,

One thing to think about is a means to run lots of backtests in parallel but either a few at a time, or at a lower priority. If 900 backtests in parallel by a single user would be painful, then you'll need a strategy for enabling parallel computing via the research platform, since "10 works...let's try 900!" For example, if 10 is o.k., then if I can automatically queue up 10, followed by another 10, etc., I could get to 900 in 90 iterations. Say each backtest takes 20 minutes, so that's only 90*20 minutes / 60 minutes/hr. = 30 hrs. -- a little more than a day, not bad. But then I still have the problem of post-processing in the research platform. How do I get all of those backtest ID's into the research platform? Ah! I know. Use the same script for automatically running them to copy the ID's into a file. But then I still have the problem of knowing which parameters were used in the backtest, so I need a way to get at the log output. Could you make the backtest log output available in the research environment? A bit rambling, but I think I'm only missing one piece.

Grant

Note - to run a new backtest, just enter https://www.quantopian.com/algorithms/54fcc40aa2d6f1577800005c/new_backtest?s=1 in the browser, and off she'll go! The string '54fcc40aa2d6f1577800005c' identifies the algorithm.

Apr 21, 2015

Any chance you could add access to everything stored in context at the end of a backtest? For example, say I would like to analyze context.my_interesting_stuff? Or maybe you would need a special function, store_data(), that would work like record()? Or maybe record() could be expanded to include storing objects versus time, that would then be available to the research platform for post-processing?

Apr 21, 2015

Hi Karen,

Could there be an option to get the backtest of a particular algo, by backtest number? Perhaps you could just build on the present get_backtest(). For example, get_backtest(5,'54fcc40aa2d6f1577800005c') would get the 5th backtest of algo 54fcc40aa2d6f1577800005c.

What I'd like to try is using the record() function to save parameter values, as I run multiple backtests in parallel (either manually varying the parameters, or picking them randomly within given ranges). Then, I would be able to easily load all of the backtests into the research platform by looping over their numbers.

The remaining piece of the puzzle is how to automatically launch backtests (assuming I use the random parameter generator). I think I'll be able to figure this one out, with the trick posted above.

How many backtests can I run in parallel without causing a problem? If I get this scheme to work, I'll try to stay under that limit.

Grant

Apr 22, 2015

Grant,
Letting users run many backtests and optimize their algos is one of the key use cases we plan to support in research. Right now you are limited to zipline backtests, but we do plan to allow you to kick off backtests from within research, modifying the parameters, and getting the results of all the backtests easily. (Seong actually has a very simple version using zipline and attempting to do this which he will share in the next day or two.)

Disclaimer

Apr 23, 2015

It turns out that using the Chrome browser, if I right-click on the page listing all of the backtests for a given algo and select 'View page source' I get all of the gobbledygook code behind the web page. Buried in there are all of the backtest IDs, in the form:

<tr data-backtest-id='5538b4e3e6e08b0d52a3fae6'>

So, to get all of the backtest IDs for a given algo, it is a matter of parsing the page source code, and extracting the IDs. Then, the IDs could be pasted into the research platform (or perhaps uploaded to the 'data' folder in a file).

In support of the get_backtest() API, could you add a helper tool to the backtester, so that users could simply get a listing of all of the backtests for a given algo? My thinking is that one would then put some/all of them into a list in a notebook, and iterate over them. Parameters could be stored via record() in the backtest. And then one could actually do optimizations, response surface explorations, heatmaps of backtest results, etc. without a ridiculous amount of clicking, copying, pasting, page navigating, and so on.

May 1, 2015

I have restarted a notebook, which should kill it, right? It still indicates that it is running (little notebook is green, and green-lettered "Running" indicator off to the right). This is no big deal, except that there is no way to know how many resources the running notebook is consuming. Maybe I'm left with 1 MB of RAM to run another notebook.

Also:

It appears that users can now set up directory structures. Excellent! I gave it a try, but when I attempt to rename a folder, I get:

Rename Failed An error occurred while renaming "Untitled Folder" to
"test". No such entity: [Untitled Folder/test]

Once I create a new folder, how do I move/copy notebooks into it?
Regarding local_csv(), it appears that it will only pull files from /data or a /data sub-directory. Correct? Is this restriction necessary? Or could any path be used in local_csv()? The reason is that it would be nice to be able to set up a project in a single directory structure, with relevant data, notebooks, documents (e.g. PDFs), etc. all within that structure.
Presumably, you are doing disaster recovery back-ups of user data, but it would also be nice to be able to download everything into a compressed file, for personal back-up (basically zip the entire Research folder and all of its sub-directories, and then download it as a single file).
I uploaded a PDF, and when I click on it, it pops up in my browser tab--wonderful! However, when I click the browser back arrow, to navigate back to my Q research home folder, it doesn't work--the browser just puts me back into the PDF view (an error message flashes up, too fast for me to capture it). I can provide OS, browser, and its settings if it would help troubleshoot.

May 5, 2015

Hi Grant,
Thanks for playing with this. There are some kinks with the folder system. You picked up on two big ones, inability to rename the folders and not being able to move files into folders. We will work on getting these fixed.

Local_csv is limited to the data folder for the time being, but I think what you are saying about organizing data and notebooks together is interesting. I believe there are a couple of system reasons why we are limited to the data folder, but I will bring this up.

I like the idea of a download everything. I can see that being valuable. I don't think it's at the top of the list, but I will keep it in mind.

Disclaimer

May 5, 2015

Well, imagine a user after 5 years looking into his or her data folder of hundreds files, all linked to various notebooks. It just doesn't sound like the right way to go. --Grant

May 5, 2015

I couldn't agree more. My own list is atrocious and I've only been using it for 5 months!

Disclaimer

May 8, 2015

Are there examples of how to export fetcher-compatible files from a Q Research notebook, to either a local pc folder or directly to one in the "cloud" that the backtester could access? Or would this require downloading the entire notebook, and somehow finagling it to spit out a CSV?

May 11, 2015

Hi Grant,
You cannot download data into fetch compatible files at this point. We are going to have to solve the issues of moving data and code between the environments in the coming months.

Disclaimer

James Jack

May 20, 2015

What's the maximum amount of time a cell can run for?

Is there any problem with having multiple (e.g. 8) notebooks running overnight?

I had some running last night but they all got shutdown by the morning.

Alisa Deychman

May 20, 2015

Last night we pushed updated code to research, and it sounds like you may have been running a notebook at this time. If your notebook is running when we're deploying, your notebook (and cells) will stop. You can run your notebooks again, in general there is no problem with having them run overnight. In the future we'd like to have a scheduled, regular time when we deploy to give everyone advanced warning.

Disclaimer

James Jack

May 20, 2015

Thank you Alisa -- running again now

Jun 6, 2015

The example backtest notebook, Research / Tutorials and Documentation / Tutorial (Advanced) - Backtesting with Zipline, suggests that to run a backtest on the research platform, the entire history needs to be loaded into RAM:

data = get_pricing(  
    ['AAPL'],  
    start_date='2002-01-01',  
    end_date = '2015-02-15',  
    frequency='daily'  
)

If this approach is the only one, then might it be problematic when scaling to large numbers of securities (thousands) over many years of minutely bars (a decade or more), due to memory limitations (with users not able to see how much RAM is available and how much has been consumed by loading the data)? Or is there another approach, to pull in data from the database on a rolling basis? Or maybe the RAM expands to fit the data? Or disk is used and not RAM?

Brian Christopher

Aug 5, 2015

I'm finding some methods/attributes related to approved imports are blocked. For example I am unable to use matplotlib.gridspec

import matplotlib as mpl

mpl.gridspec.GridSpec(2, 1,height_ratios=[1,1])

RestrictedAttributeError: Accessing mpl.gridspec raised an AttributeError. No attributes with a similar name were found.

Are there specific reasons why certain module attributes/methods would be restricted and if so what are they? If there are none how easy/soon can these be made available?

UPDATE: I'm unable to use a lot of matplotlibs attributes within the research environment including datetime formatting.

Alisa Deychman

Aug 6, 2015

There are restrictions on modules and sub-modules for security reasons. We need to protect user code, IP, and our servers against malicious intent in the wild of the web. Sometimes the modules are not available because of these potential vulnerabilities. Other times, the module simply hasn't been requested by users. For example, "import os" will never be available, but if you ask for a list of matplotlib attributes, my hunch is we can add majority of them.

That said, which ones would you like to see added to the research environment?

Disclaimer

Brian Christopher

Aug 6, 2015

Gridspec, Formatter to start. Constructing intraday plots is a struggle bc I can't add more timestamps to various axes. To construct multiple intraday plots without gaps requires multiple subplots which gridspec makes simple.

Aug 15, 2015

How does get_backtest work? In the research environment, I get:

Signature: get_backtest(backtest_id)
Docstring:
Get the results of a backtest that was run on Quantopian.

Parameters

backtest_id : str
The id of the backtest for which results should be returned.

Returns

BacktestResult
An object containing all the information stored by Quantopian about the
performance of a given backtest run, as well as some additional
metadata.

Notes

You can find the ID of a backtest in the URL of its full results page,
which will be of the form::

https://www.quantopian.com/algorithms/<algorithm_id>/<backtest_id>

File: /home/qexec/src/qexec_repo/qexec/research/api.py
Type: function

Similarly, in the help documentation:

get_backtest(backtest_id)

Get the results of a backtest that was run on Quantopian.  
Parameters: backtest_id (str) – The id of the backtest for which results should be returned.  
Returns:    BacktestResult – An object containing all the information stored by Quantopian about the performance of a given backtest run, as well as some additional metadata.

Notes

You can find the ID of a backtest in the URL of its full results page, which will be of the form:

https://www.quantopian.com/algorithms/<algorithm_id>/<backtest_id>

When I look in API Reference.ipynb, there is a bit more information (I recall that more used to be available), but I still can't sort out what data from a backtest are available in the research environment.

The only way to sort it out, it seems, is to actually run get_backtest, and then do:

In [4]:

result.scalars

Out[4]:

['benchmark_security', 'capital_base', 'end_date', 'start_date']

In [5]:

result.frames

Out[5]:

['cumulative_performance', 'daily_performance',
'orders',
'positions',
'recorded_vars',
'risk',
'transactions']

In [6]:

result.attrs

Out[6]:

['cumulative_performance', 'daily_performance',
'orders',
'positions',
'recorded_vars',
'risk',
'transactions',
'benchmark_security',
'capital_base',
'end_date',
'start_date']

Aug 15, 2015

Could you include the backtest code as an importable field? Values of context variables at the end of the backtest? Log output? Or maybe just dump everything in context at the backtest completion?

At a minimum, there should be a header field for each backtest available in the research environment. Maybe a helper function could be created, e.g. algo_header('Algo based on the recent paper by Alfred E. Neuman').

Aug 15, 2015

These are great suggestions for how to make get_backtest even more valuable. The other one that I had on my list was the list of dividends paid.

There is a lot of data there and sifting through it can be daunting. I'd actually suggest taking a look at the tearsheet that @Justin shared a few weeks ago. It really shows the power of get_backtest (and does a lot of the work for you.)

Disclaimer

https://github.com/quantopian/pyfolio/

Aug 16, 2015

Thanks Karen,

Yes, I saw the tearsheet post, but haven't gotten around to applying it. By the way, you could consider releasing it as a standard, revision-revision controlled analysis tool (if that is the intent), on github (or equivalent), if this hasn't already been done. Otherwise, if I grab the notebook from the link you provide, how do I know it is the current revision?

Grant

Jonathan Kamens

Aug 16, 2015

Disclaimer

Aug 17, 2015

Jik gave you the link!

We are working on a more seamless integration with research. Keep your eyes out for it in the coming days/weeks.

Disclaimer

Oct 1, 2015

In the research platform, is there a direct way to obtain a list of all of the days the market was open? Presently, I'm doing this:

prices_spy = get_pricing('SPY', start_date='2000-01-01', end_date='2015-09-29',fields='price')  
trade_days = list(prices_spy.dropna().index.values)  
trade_days[0:3]

[numpy.datetime64('2002-01-02T00:00:00.000000000+0000'),
 numpy.datetime64('2002-01-03T00:00:00.000000000+0000'),  
 numpy.datetime64('2002-01-04T00:00:00.000000000+0000')]

It works, but only under the assumption that SPY traded every day (which is probably valid, but I have no way of checking). Or maybe even if SPY doesn't trade on a given day, it'll have a closing price, forward-filled from the prior day?

Rather than using SPY, is there a better way to get the dates when the market was open?