I have been struggling with this for some time now and couldn\'t find any way of doing it, so I would be incredibly grateful if you could help! I am a novice in programming
You can use sqldf
package:
library(sqldf)
#dummy data
fixes <- read.table(text="
Order Participant Sentence Fixation StartPosition
1 1 1 1 -6.89
2 1 1 2 -5.88
3 1 1 3 -5.33
4 1 1 4 -4.09
5 1 1 5 -5.36
",header=TRUE)
zones <- read.table(text="
Sentence Zone ZoneStart ZoneEnd
1 1 -8.86 -7.49
1 2 -7.49 -5.89
1 3 -5.88 -4.51
1 4 -4.51 -2.90
",header=TRUE)
#output merged result
res <-
sqldf("SELECT [Order],Participant,f.Sentence,Fixation,StartPosition,Zone
FROM fixes f,zones z
WHERE f.Sentence=z.Sentence AND
f.StartPosition>=z.ZoneStart AND
f.StartPosition<z.ZoneEnd")
There is a package in Bioconductor called IRanges that does what you want.
First, form an IRanges object for your zones:
zone.ranges <- with(zones, IRanges(ZoneStart, ZoneEnd))
Next, find the overlaps:
zone.ind <- findOverlaps(fixes$StartPosition, zone.ranges, select="arbitrary")
Now you have indices into the rows of the zones
data frame, so you can merge:
fixes$Zone <- zones$Zone[zone.ind]
Edit: Just realized you have floating point values, while IRanges is integer-based. So you would need to multiply the coordinates by 100, given your precision.
With version v1.9.8 (on CRAN 25 Nov 2016), data.table
has gained the ability to perform non-equi joins and range joins:
library(data.table)
setDT(fixes)[setDT(zones),
on = .(Sentence, StartPosition >= ZoneStart, StartPosition < ZoneEnd),
Zone := Zone][]
Order Participant Sentence Fixation StartPosition Zone 1: 1 1 1 1 -6.89 2 2: 2 1 1 2 -5.88 3 3: 3 1 1 3 -5.33 3 4: 4 1 1 4 -4.09 4 5: 5 1 1 5 -5.36 3
fixes <- readr::read_table(
"Order Participant Sentence Fixation StartPosition
1 1 1 1 -6.89
2 1 1 2 -5.88
3 1 1 3 -5.33
4 1 1 4 -4.09
5 1 1 5 -5.36"
)
zones <- readr::read_table(
"Sentence Zone ZoneStart ZoneEnd
1 1 -8.86 -7.49
1 2 -7.49 -5.89
1 3 -5.88 -4.51
1 4 -4.51 -2.90"
)
I think the best approach is to change zones
to a more friendly format for what you're doing:
ZoneLookUp = lapply(split(zones, zones$Sentence), function(x) c(x$ZoneStart, x$ZoneEnd[nrow(x)]))
#$`1`
#[1] -8.86 -7.49 -5.88 -4.51 -2.90
Then you can easily look up each zone:
fixes$Zone = NULL
for(i in 1:nrow(fixes))
fixes$Zone[i] = cut(fixes$StartPosition[i], ZoneLookUp[[fixes$Sentence[i]]], labels=FALSE)
If performance is an issue, you can take a (only) slightly less simple approach using by
or data.table
with by.