First, want to verify you are using the EventVestor EarningsCalendar
dataset. This seems the case.
So, there are four typical fields used in this dataset:
next_asof_date - datetime64[ns]
previous_asof_date - datetime64[ns]
next_announcement - datetime64[ns]
previous_announcement - datetime64[ns]
There's really just two pieces of data - previous_announcement
and next_announcement
. These are both dates. Each of these have an associated asof_date
. So, for example, every time a new 'earnings announcement' date is posted it will have a new associated 'asof_date' which is the date it was posted. Below are the four fields for AAPL during 2018.
PIPELINE DATE next_ann next_asof_date prev_ann prev_asof_date
2018-01-02 00:00:00+00:00 NaT NaT 2017-11-02 2017-10-04
2018-01-04 00:00:00+00:00 2018-02-01 2018-01-03 2017-11-02 2017-10-04
2018-02-01 00:00:00+00:00 2018-02-01 2018-01-03 2018-02-01 2018-01-03
2018-02-02 00:00:00+00:00 NaT NaT 2018-02-01 2018-01-03
2018-04-04 00:00:00+00:00 2018-05-01 2018-04-03 2018-02-01 2018-01-03
2018-05-01 00:00:00+00:00 2018-05-01 2018-04-03 2018-05-01 2018-04-03
2018-05-02 00:00:00+00:00 NaT NaT 2018-05-01 2018-04-03
2018-07-05 00:00:00+00:00 2018-07-31 2018-07-03 2018-05-01 2018-04-03
2018-07-31 00:00:00+00:00 2018-07-31 2018-07-03 2018-07-31 2018-07-03
2018-08-01 00:00:00+00:00 NaT NaT 2018-07-31 2018-07-03
2018-10-04 00:00:00+00:00 2018-11-01 2018-10-03 2018-07-31 2018-07-03
2018-11-01 00:00:00+00:00 2018-11-01 2018-10-03 2018-11-01 2018-10-03
2018-11-02 00:00:00+00:00 NaT NaT 2018-11-01 2018-10-03
2018-11-06 00:00:00+00:00 2019-01-31 2018-11-02 2018-11-01 2018-10-03
Since you are seeing the five dates 1/3/2018, 4/3/2018, 7/3/2018, 10/3/2018, 11/02/2018, you must be looking at the next_asof_date
field. This is probably not what want? I would think you want the actual dates (not the date when the company said they would make the announcement). In any case the issue and the fix would be the same.
One issue is that there can be many times a year which a company says they are going to release earnings then change their mind and release them on a different date. This will result in more than four dates a year. Additionally, there will typically be five or more next_earnings
dates in a single year - four for the current year and then one for the the following year. That's the situation with AAPL. On 11/02/2018 they stated they will announce their next earnings on 01/31/2019. That added a fifth post for an earnings announcement to 2018.
So, what to do? If one is looking for the most recent 4 earnings dates then something like this:
pipe_output['stock'] = pipe_output.index.get_level_values(level='1')
last_announcements = pipe_output.drop_duplicates(['stock','previous_announcement'], keep='last')
last_4_announcements = last_announcements.groupby(level='security').previous_announcement.nlargest(4)
What this does is first add a new column to the data frame which is a duplicate of the 'security' index. This just makes it easier to use the drop_duplicates
method. Then apply the drop_duplicates
method to get just the last row where the security and the previous_announcement
date are equal. Finally, group by security and take the largest 4 previous_announcement
dates. Those will be the last four dates which each company actually announced earnings.
Hope that helps? See attached notebook.