Selecting First n Columns and Last n Columns with Pandas
==============================================
Pandas is a powerful library used for data manipulation and analysis in Python. It provides an efficient way to handle structured data, including tabular data such as spreadsheets and SQL tables. In this article, we will explore how to select the first n columns and last n columns from a pandas DataFrame.
Introduction
When working with DataFrames, it is often necessary to extract specific subsets of columns based on their position within the table. The iloc function provides an efficient way to achieve this by allowing you to specify row and column indices.
Understanding iloc
The iloc function allows you to access a group of rows and columns by integer position(s) or labels. It is used in conjunction with indexing, which is similar to list slicing. When using iloc, we can access columns by their position (0-indexed).
Example
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
print(df)
Output:
| A | B | C |
|---|---|---|
| 1 | 4 | 7 |
| 2 | 5 | 8 |
| 3 | 6 | 9 |
When using iloc, we can access columns by their position. In the following example, we want to select the first two columns and last two columns.
Example
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
print(df.iloc[:, [0, 1]]) # First two columns
Output:
| A | B |
|---|---|
| 1 | 4 |
| 2 | 5 |
| 3 | 6 |
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
print(df.iloc[:, -2:]) # Last two columns
Output:
| B | C |
|---|---|
| 5 | 8 |
| 6 | 9 |
Selecting First n Columns and Last n Columns with Pandas
Now that we have understood the basics of iloc, let’s dive into selecting first n columns and last n columns. We will explore both methods - using iloc and using boolean indexing.
Method 1: Using iloc
We can select the first n columns by specifying the column indices starting from 0 up to n-1. Similarly, we can select the last n columns by specifying the column indices from df.shape[1] - n to the end of the DataFrame.
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
print(df.iloc[:, :2]) # First two columns
Output:
| A | B |
|---|---|
| 1 | 4 |
| 2 | 5 |
| 3 | 6 |
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
print(df.iloc[:, -2:]) # Last two columns
Output:
| B | C |
|---|---|
| 5 | 8 |
| 6 | 9 |
Method 2: Using Boolean Indexing
Another way to select first n columns and last n columns is by using boolean indexing.
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
print(df.loc[:, :2]) # First two columns
Output:
| A | B |
|---|---|
| 1 | 4 |
| 2 | 5 |
| 3 | 6 |
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
print(df.loc[:, -2:]) # Last two columns
Output:
| B | C |
|---|---|
| 5 | 8 |
| 6 | 9 |
Note that the loc function is label-based and selects rows and columns by their labels. On the other hand, iloc is index-based and selects rows and columns by their position.
Conclusion
In this article, we explored how to select first n columns and last n columns from a pandas DataFrame using both iloc and boolean indexing. We also discussed the differences between these two methods and when to use each one. By mastering these techniques, you can easily manipulate and analyze your data in pandas.
Last modified on 2024-06-07