Pandas Help

Back to Community

Pandas Help

posted Jun 1, 2016

I am having some issues working with some data in a Pandas df. In my dataframe I have 3 columns...X,Y, and Z. I am trying to determine the amount of instances where Y > 0, after a row where X > 0, but Z in the subsequent rows remains less than a certain value. My psuedo code is below:

import datetime  
import time  
import pandas as pd

# Set number of rows to skip  
rows_to_skip = 0  
# Rows to use after skipped rows  
rows_to_use = 10000

# Read the file  (Adjust arguments accordingly)  
data = pd.read_csv('example.csv',skiprows=rows_to_skip, error_bad_lines=False, nrows=rows_to_use, low_memory=False)

# Add headers when skipping rows  
data.columns = ["X","Y","Z"]

# Psuedo Code Below

for variable in data['X']:  
    if variable > 0:  
        # Count number of times the following conditions are met in all subsequent rows:  
        condition 1 ) Y > 0  
        condition 2 ) Z <= (Z of the row where variable was > 0) + 0.5

       # Then I want to take the count of these instances and append to a new column, but in the same row where X was originally greater than zero

Any suggestions from Pandas Pros out there would be awesome...I'll try to follow up with a notebook as well

3 responses

toan tran

Jun 2, 2016

Frank, i don't understand 2, but make an example df and take it from there:

df = pd.DataFrame([[1,-1,2], [1, 1, 1], [2, 10, 20]])
df.columns = ['x', 'y', 'z']

Deleted User

Jun 2, 2016

Toan,

Good idea.

df = pd.DataFrame([0,1,0,1,0,0,0,0,1],  
                                        [1,0,1,0,1,1,1,0,0],  
                                        [0,1,2,3,4,5,6,7,8])

df.columns = ['x','y','z']

find first instance where x == 1:  
    check next x value, if it is not equal to 1, check if corresponding y value is equal to 1, if yes, check if corresponding z value greater than z value when x was last equal to 1. If all these conditions are met, increment the count and move on to the next y/z values to test the same conditions. Repeat this process until the z value is a specific level above the z value that corresponds to the z value when x was equal to 1.

Might help to better explain my application. I am working with microsecond level data from the CME for the ES. I am trying to measure the number of changes to the order book within a specified time frame after each trade occurs. For instance, if a trade occurs at microsecond 112354, I want to count the number of changes to the order book within the next 10 microseconds.

In my example, x > 0 would be the occurrence of a trade (data is NaN for non trade events). "y" would be the changes to the order book (NaN if no change), and "z" would be the microsecond time stamp. So by counting the number of rows (after the row where x>0) where y>0 and z is equal to z at time of x>0 + ~10 microseconds, I can count the number of changes to the order book within ~10 microseconds of each trade. Eventually I would like to plot this data to get a visual of possible increased algorithmic activity or even maybe spot some spoofing....

Thanks

toan tran

Jun 2, 2016

Frank, if i get you right, this would work.

df = pd.DataFrame([[0,1,0,1,0,0,0,0,1], [1,0,1,0,1,1,1,0,0],[0,1,2,3,4,5,6,7,8]]).T  
df.columns = ['x', 'y', 'z']  
X = df['x'].diff()  
Y = df['y']  
Z = df['z'].diff()  
delta = Z[(X == -1) & (Y==1)]  
sum(delta > 0)

You've successfully submitted a support ticket.

Our support team will be in touch soon.