How to Put Two Different Flags Based on Two Thresholds When Column Value Changes in a Pandas DataFrame
Introduction
This article is about finding the time difference between two consecutive work times for each distinct order in a pandas DataFrame and applying different flags based on certain thresholds.
In this example, we have a DataFrame with three columns: vehicle, order, and work. The work column contains values that represent the start and finish dates of specific tasks. We want to calculate the time difference between these two consecutive work times for each distinct order and apply different flags based on certain thresholds.
Problem Description
The problem is as follows:
- Given a DataFrame with three columns:
vehicle,order, andwork. - The
workcolumn contains values that represent the start and finish dates of specific tasks. - Calculate the time difference between these two consecutive work times for each distinct order.
- Apply different flags based on certain thresholds.
Step 1: Split the Work ID into Two Parts
To solve this problem, we first need to split the work id into two parts: work_prefix and work_suffix. This is because we want to apply different flags based on whether the work prefix is the same or different for each group of consecutive orders.
df[['work_prefix', 'work_suffix']] = df['work'].str.split('_', expand=True)
Step 2: Define Boolean Masks
Next, we define boolean masks corresponding to the conditions. These masks are set considering the group boundary of the same order using .groupby().
threshold_sameprefix = -1.0 # given threshold value
threshold_diffprefix = 0.8 # given threshold value
w_ne = df['work'] != df.groupby('order')['work'].shift() # work id changed
wp_eq = df['work_prefix'] == df.groupby('order')['work_prefix'].shift() # same work prefix
wp_ne = df['work_prefix'] != df.groupby('order')['work_prefix'].shift() # different work prefix
m1 = w_ne & wp_eq & (df['flag'] > threshold_sameprefix) # condition for 'F1'
m2 = w_ne & wp_ne & (df['flag'] > threshold_diffprefix) # condition for 'F2'
Step 3: Apply Flags Using Boolean Masks
Finally, we use .loc with the boolean masks to set up flag1 with values 'F1' and 'F2', as follows:
df['flag1'] = ' ' # init flag1 to blank
df.loc[m1, 'flag1'] = 'F1'
df.loc[m2, 'flag1'] = 'F2'
Bonus Codes: More Efficient Way to Calculate flag Column
To set up your first column flag more efficiently instead of using looping, you can use the following code:
data = [['abc', 'abc1', '1_1', '2021-06-01 06:00:00.035999', '2021-06-02 09:59:59.964000'],
['abc', 'abc1', '1_2', '2021-06-01 06:00:00.035999', '2021-06-02 09:59:59.964000'],
['abc', 'abc2', '1_1', '2021-06-01 06:00:00.035999', '2021-06-01 20:59:59.964001'],
['abc', 'abc2', '1_2', '2021-06-01 06:00:00.035999', '2021-06-01 20:59:59.964001'],
['abc', 'abc3', '1_1', '2021-06-01 06:00:00.035999', '2021-06-03 06:29:59.964000'],
['abc', 'abc3', '1_2', '2021-06-01 06:00:00.035999', '2021-06-03 06:29:59.964000'],
['abc', 'abc3', '2_1', '2021-06-04 06:30:00.000001', '2021-06-04 07:44:59.927999'],
['abc', 'abc3', '2_2', '2021-06-04 06:30:00.000001', '2021-06-04 07:44:59.927999']]
df = pd.DataFrame(data, columns=['vehicle', 'order', 'work', 'Start', 'Finish'])
df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])
And the main codes to replace your codes with looping are:
df['flag'] = ((df['Start'] - df.groupby('order')['Finish'].shift()) / pd.Timedelta(days=1)).fillna(0)
Result
The result will be a DataFrame with an additional column flag1 that contains different flags based on certain thresholds.
vehicle order work Start Finish flag flag1
0 abc abc1 1_1 2021-06-01 06:00:00.035999 2021-06-02 09:59:59.964000 0.000000 0.000000
1 abc abc1 1_2 2021-06-01 06:00:00.035999 2021-06-02 09:59:59.964000 -1.166666 F1
2 abc abc2 1_1 2021-06-01 06:00:00.035999 2021-06-01 20:59:59.964001 0.000000 F1
3 abc abc2 1_2 2021-06-01 06:00:00.035999 2021-06-01 20:59:59.964001 -0.624999 F1
4 abc abc3 1_1 2021-06-01 06:00:00.035999 2021-06-03 06:29:59.964000 0.000000 F1
5 abc abc3 1_2 2021-06-01 06:00:00.035999 2021-06-03 06:29:59.964000 -2.020833 F1
6 abc abc3 2_1 2021-06-04 06:30:00.000001 2021-06-04 07:44:59.927999 1.000000 F2
7 abc abc3 2_2 2021-06-04 06:30:00.000001 2021-06-04 07:44:59.927999 -0.052082 F1
Last modified on 2024-04-27