Plain old aggregate
would do:
newdataframe <- aggregate(. ~ gene_id, dataframe, sum)
The formula reads everything else aggregated by gene_id
, and sum
to compute the sum of all values. You could also use mean
for instance.
If you just want some of the other columns, you can cbind
them:
newdataframe <- aggregate(cbind(col1, col2) ~ gene_id, dataframe, sum)