Quantopian's community platform is shutting down. Please read this post for more information and download your code.
Back to Community
Discretizing a continuous variable to build a dataset for classification

From the papers I have read, it seems that it is the direction and strength of movement that is the most importance not necessarily the price itself.

My goal is to create a dataset that has features and has the predictive variable Bucketed, such as histogram. I would like to know once the variable has been bucketed using a method, how do I then get the buckets back and attach it to a dataset so that I can proceed with a classification method.

I could discretize the predictive variable using the Bayesian Blocks algorithm as shown below.

http://www.astroml.org/user_guide/density_estimation.html
or
https://github.com/pkgw/pwkit/blob/master/pwkit/bblocks.py

Many thanks in advance for the assistance.

2 responses

Bayesian Blocks! Hadn't heard of that, cool.

I don't understand your question.. how do you "get the buckets back"?

Are you talking about the programming environment? for importing data, running scripts, and so on? My preference is to center things around an ipython notebook (for example). Quantopian's website is centered around their web code editor. But their tools can be used with ipython notebooks, and the beta "research" platform goes in that direction.

Cheers Casey!
That is a very informative notebook.
By question just to clarify was, that as I understand it the Bayesian blocks algorithm automatically buckets a continuous variable by using
a change point. I assume this means that this buckets once it has gone through the entire continuous variable are stored some where as data.
I was wondering how to query theses buckets and then attach them as a column to a data set.
In the above notebook it states that the bins should be plotted but how do a print out the actual bin categories that create the histogram.

Many thanks
Regards,
Andrew.