How to Apply Different Flags Based on Thresholds When Column Value Changes in a Pandas DataFrame with Time Calculations

How to Put Two Different Flags Based on Two Thresholds When Column Value Changes in a Pandas DataFrame

Introduction

This article is about finding the time difference between two consecutive work times for each distinct order in a pandas DataFrame and applying different flags based on certain thresholds.

In this example, we have a DataFrame with three columns: vehicle, order, and work. The work column contains values that represent the start and finish dates of specific tasks. We want to calculate the time difference between these two consecutive work times for each distinct order and apply different flags based on certain thresholds.

Problem Description

The problem is as follows:

  • Given a DataFrame with three columns: vehicle, order, and work.
  • The work column contains values that represent the start and finish dates of specific tasks.
  • Calculate the time difference between these two consecutive work times for each distinct order.
  • Apply different flags based on certain thresholds.

Step 1: Split the Work ID into Two Parts

To solve this problem, we first need to split the work id into two parts: work_prefix and work_suffix. This is because we want to apply different flags based on whether the work prefix is the same or different for each group of consecutive orders.

df[['work_prefix', 'work_suffix']] = df['work'].str.split('_', expand=True)

Step 2: Define Boolean Masks

Next, we define boolean masks corresponding to the conditions. These masks are set considering the group boundary of the same order using .groupby().

threshold_sameprefix = -1.0       # given threshold value
threshold_diffprefix = 0.8        # given threshold value

w_ne = df['work'] != df.groupby('order')['work'].shift()          # work id changed
wp_eq = df['work_prefix'] == df.groupby('order')['work_prefix'].shift()   # same work prefix
wp_ne = df['work_prefix'] != df.groupby('order')['work_prefix'].shift()   # different work prefix

m1 = w_ne & wp_eq & (df['flag'] > threshold_sameprefix)       # condition for 'F1'
m2 = w_ne & wp_ne & (df['flag'] > threshold_diffprefix)       # condition for 'F2'

Step 3: Apply Flags Using Boolean Masks

Finally, we use .loc with the boolean masks to set up flag1 with values 'F1' and 'F2', as follows:

df['flag1'] = ' '               # init flag1 to blank
df.loc[m1, 'flag1'] = 'F1'
df.loc[m2, 'flag1'] = 'F2'

Bonus Codes: More Efficient Way to Calculate flag Column

To set up your first column flag more efficiently instead of using looping, you can use the following code:

data = [['abc', 'abc1', '1_1',    '2021-06-01 06:00:00.035999', '2021-06-02 09:59:59.964000'],
        ['abc',  'abc1',  '1_2',  '2021-06-01 06:00:00.035999', '2021-06-02 09:59:59.964000'],
        ['abc',  'abc2',  '1_1',  '2021-06-01 06:00:00.035999', '2021-06-01 20:59:59.964001'],
        ['abc',  'abc2',  '1_2',  '2021-06-01 06:00:00.035999', '2021-06-01 20:59:59.964001'],
        ['abc',  'abc3',  '1_1',  '2021-06-01 06:00:00.035999', '2021-06-03 06:29:59.964000'],
        ['abc',  'abc3',  '1_2',  '2021-06-01 06:00:00.035999', '2021-06-03 06:29:59.964000'],
        ['abc',  'abc3',  '2_1',  '2021-06-04 06:30:00.000001', '2021-06-04 07:44:59.927999'],
        ['abc',  'abc3',  '2_2',  '2021-06-04 06:30:00.000001', '2021-06-04 07:44:59.927999']]
df = pd.DataFrame(data, columns=['vehicle', 'order', 'work', 'Start', 'Finish'])

df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])

And the main codes to replace your codes with looping are:

df['flag'] = ((df['Start'] - df.groupby('order')['Finish'].shift()) / pd.Timedelta(days=1)).fillna(0)

Result

The result will be a DataFrame with an additional column flag1 that contains different flags based on certain thresholds.

  vehicle order work                      Start                     Finish      flag     flag1
0     abc  abc1  1_1 2021-06-01 06:00:00.035999 2021-06-02 09:59:59.964000  0.000000  0.000000
1     abc  abc1  1_2 2021-06-01 06:00:00.035999 2021-06-02 09:59:59.964000 -1.166666  F1
2     abc  abc2  1_1 2021-06-01 06:00:00.035999 2021-06-01 20:59:59.964001  0.000000  F1
3     abc  abc2  1_2 2021-06-01 06:00:00.035999 2021-06-01 20:59:59.964001 -0.624999   F1
4     abc  abc3  1_1 2021-06-01 06:00:00.035999 2021-06-03 06:29:59.964000  0.000000  F1
5     abc  abc3  1_2 2021-06-01 06:00:00.035999 2021-06-03 06:29:59.964000 -2.020833   F1
6     abc  abc3  2_1 2021-06-04 06:30:00.000001 2021-06-04 07:44:59.927999  1.000000   F2
7     abc  abc3  2_2 2021-06-04 06:30:00.000001 2021-06-04 07:44:59.927999 -0.052082   F1

Last modified on 2024-04-27