Finding Overlapping Ranges in Biological Data Using R's IRanges Package

Finding Overlapping Ranges in Data Tables

=====================================================

In this article, we will explore how to find overlapping ranges between two data tables. We will use the foverlaps function from the IRanges package in R, which is a powerful tool for working with intervals.

Introduction


When working with biological data, such as mass spectrometry or chromatography data, it’s common to have multiple rows of data that represent different measurements. These measurements often come with uncertainties associated with them, and are typically represented by ranges (e.g., mzmin, mzmax for mass-to-charge ratios). In this article, we’ll show how to find overlapping ranges between two data tables.

Data Preparation


To solve this problem, we first need to prepare our data tables. We will use the data.table package in R to create a data table from a text file.

# Load necessary libraries
library(data.table)
library(IRanges)

# Create data tables from text files
table1 <- data.table(read.table(header = T, 
                   text = "name        mzmed       mzmin       mzmax       rtmed   rtmin   rtmax
M1          202.1110    202.110859  202.111285  50.35   49.62   51.13
                   M2          373.144219  373.143792  373.154876  50.38   49.62   51.86
                   M3          371.14497   371.144256  371.145224  80.34   79.62   81.41
                   M4          372.147279  372.146992  372.147583  100.35  99.62   101.41
"))

table2 <- data.table(read.table(header = T, 
                     text = "name        mzmed       mzmin       mzmax       rtmed   rtmin   rtmax
M1          558.109976  558.102886  558.111497  10.89   9.95    11.95
M2          371.144564  371.144000  371.144999  80.29   79.14   81.98
M3          498.091821  498.091632  498.092225  658.15  656.57  660.96
M4          284.098785  284.098429  284.099092  760.32  758.67  761.2
"))

Finding Overlapping Ranges


To find overlapping ranges between the two data tables, we will use the foverlaps function from the IRanges package.

# Convert data tables to intervals
table1_interval <- as.interval.table(table1[, c("mzmin", "mzmax"), by = .(name)])
table2_interval <- as.interval.table(table2[, c("rtmin", "rtmax"), by = .(name)])

# Find overlapping ranges
out <- foverlaps(table1_interval, table2_interval, type="any",nomatch=0L)

# Print the result
print(out)

In this code snippet, we first convert each data table to an interval object using the as.interval.table function. Then, we use the foverlaps function to find overlapping ranges between the two interval objects.

Filtering Overlapping Ranges by Distance


To filter the overlapping ranges by distance, we can use the following code:

# Find overlapping ranges with absolute difference less than 100
out_filtered <- out[abs(out$mzmin - out$rtmax) < 100 | abs(out$rtmin - out$mzmax) < 100,]

print(out_filtered)

In this code snippet, we use the abs function to calculate the absolute difference between each overlapping range and filter out ranges with an absolute difference greater than or equal to 100.

Conclusion


In conclusion, finding overlapping ranges in data tables is a crucial step in analyzing biological data. The foverlaps function from the IRanges package provides a powerful tool for working with intervals and can be used to find overlapping ranges between two data tables. By filtering these overlapping ranges by distance, we can gain insights into the relationships between different measurements.

We hope this article has provided a comprehensive guide to finding overlapping ranges in data tables using R and the IRanges package.


Last modified on 2024-03-12