Pandas dataframes have some very simple and powerful plotting capabilities based upon matplotlib. The key is to get the data into the format which the plot
method expects. The plot
method defaults to a line graph and really expects a single index (which it will use as the x-axis) and columns of data. Each column will be a separate set of data. By default all these data will be plotted on a single plot.
So, if one wants to plot data (eg volume) for a single security then select that from the dataframe and then apply the plot
method. One easy way to select a single security is to use the xs
method. This will get a 'slice' of just a single level. Here is an example how to plot just the volume data for the security INTC. Remember level 0 of the multi-index are the dates and level 1 are the securities.
df.xs(symbols('INTC'), level=1).volume.plot()
That may be all one wants. A single plot of a single security. However, the plot method can do some pretty fancy things. More than likely one wants plots of all the securities in the dataframe. No problem. The plot method can plot individual columns of data on individual graphs - all at once. They key is to get the data into columns. How to make the securities in our multi-index dataframe into columns so the plot
method can plot them? A second very handy method (second only to xs
) is unstack
. This will take the 'stacked' multi index and turn all the level values (ie the securities) into columns. The resulting dataframe will be shorter and wider. The rows will now become columns. Since it's shorter it's referred to as 'unstacked'. Here is an example of how to plot all the volume data, for all securities, in separate graphs with a single method.
df.volume.unstack(level=1).plot(subplots=True, figsize=(15, 60), layout=(10, 2));
The unstack method turns each security into its own column and removes the level 1 index leaving a single index 0 of dates. The 'subplots' parameter tells the plot
method to create separate plots for each security. The 'figsize' and 'layout' parameters set the size of each subplot and the array size (ie 2 wide and 10 tall). Finally, notice the semicolon at the end of the line. That just 'pretties up' the output by suppressing some text before the plots.
Take a look at the docs for many more features of pandas plotting https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html
That's some basics of plotting. Enough to get started and often enough to do some serious visualization. The key is to select the desired data (a good way is to use the xs
method) and put separate data sets into separate columns using the unstack
method.
One comment on the data. The data being plotted is the output from a pipeline. Pipeline may not be the best way to get data for plotting. The issue is that pipeline returns data which is 'split and dividend adjusted as of each day'. In other words, in the case of open, close, and volume, data, this is the data that would have been actually seen on each day. It won't be 'normalized' or adjusted as of a single day. Stock splits will appear as big swings on the graph.
What one probably wants is to use data which is 'split and dividend adjusted as of a single ending day'. How to do this? Use the get_pricing
method rather than pipeline. It returns a similar dataframe which can be plotted in a similar fashion. Either can be used but for different use cases.
See the attached notebook for step by step examples of all these methods.