Treating Timeseries Data without Weekends: A Matplotlib and Pandas Approach

Matplotlib and Pandas Treatment of Timeseries without Weekends

The question posed by the original poster is a common one in data analysis, particularly when working with time series data. The goal is to create a plot where dates are treated as categories rather than continuous values, resulting in straight lines instead of wavy ones. This can be achieved using Matplotlib and Pandas libraries.

Introduction

Pandas is a powerful library for data manipulation and analysis. It provides an efficient way to handle structured data, including time series data. However, when working with dates as indices, Pandas treats them as continuous values, which can lead to unexpected results in plots created using Matplotlib. In this article, we will explore how to modify the Pandas dataframe to treat dates as categories, allowing us to create straight lines in our plots.

Setup

To replicate the issue described in the original poster, let’s first set up a simplified example:

import pandas as pd
import matplotlib.pyplot as plt

LEN_SER = 23
dates = pd.date_range('2015-07-03', periods=LEN_SER, freq='B')
df = pd.DataFrame(range(1, LEN_SER + 1), index=dates)
ts = df.iloc[:, 0]

In this example, we create a Pandas dataframe df with a range of values from 1 to LEN_SER (23) and dates ranging from ‘2015-07-03’ to ‘2015-07-25’, with a frequency of every second day (‘B’). We then extract the time series data from the dataframe using ts = df.iloc[:, 0].

The Issue

When we plot this time series using Matplotlib, we notice that the resulting graph has a wavy line. This is because Matplotlib recognizes the dates in the x-axis as continuous values.

fig = plt.figure()
ax1 = plt.subplot2grid((1, 1), (0, 0))
ax1.plot([ts.index[5], ts.index[20]],
         [ts[5], ts[5] + (1.0 * (20 - 5))], 'o-')
ts.plot(ax=ax1)
plt.show()

In this code snippet, we plot a straight line between the dates ts.index[5] and ts.index[20], which corresponds to the values at indices 5 and 20. However, when Matplotlib plots the time series data itself, it recognizes these dates as continuous values and produces a wavy line.

Solving the Issue

To solve this issue, we need to treat the dates in the x-axis as categories rather than continuous values. We can do this by replacing the dates with strings that represent specific categories (e.g., weekdays or weekends).

df.index = df.reset_index().apply(lambda x: \
    x['index'].strftime('%Y-%m-%d'), axis=1)  # dates -> categories (string)
ts = df.iloc[:, 0]

In this modified code snippet, we use the apply method to replace each date with its corresponding string representation. We then assign this new dataframe df to the variable ts.

Alternative Approach

Alternatively, instead of replacing dates with strings, we can also modify the Matplotlib plot to treat these categories as integers or other numerical values.

ax1.plot([5, 20], [ts[5], ts[5] + (1.0 * (20 - 5))], 'o-') 
# x coordinates 'categories' 5 and 20

In this code snippet, we plot a straight line between the categories 5 and 20. By doing so, we effectively create two separate plots: one with straight lines corresponding to specific dates (with respect to weekdays) and another with wavy lines corresponding to continuous values.

Conclusion

By treating dates as categories rather than continuous values, we can achieve straight lines in our plots when working with time series data using Matplotlib and Pandas. There are multiple ways to achieve this, including replacing dates with strings or modifying the Matplotlib plot to treat these categories as integers or other numerical values.


Last modified on 2023-05-15