converting from Ensembl gene ID's to different identifier

前端未结

关注

 1  966

I\'ve inherited a dataset of RNAseq output data from Canis Lupus (dog). I have the gene identifier in the Ensembl format, specifically they look like this, ENSCAFT00000001452.3.

相关标签:

1条回答

南方客

2021-01-27 02:22

Here is step-by-step example:

Load the biomaRt library.
```
library(biomaRt)
```
As query input we have Canis lupus familiaris Ensembl transcript IDs (note that they are not Ensembl gene IDs). We also need to strip the dot+digit(s) from the end, which is used to indicate annotation updates.
```
tx <- c("ENSCAFT00000001452.3", "ENSCAFT00000001656.3")
tx <- gsub("\\.\\d+$", "", tx)
```

We now query the database for the Ensembl transcript IDs in tx

ensembl <- useEnsembl(biomart = "ensembl", dataset = "cfamiliaris_gene_ensembl")
res <- getBM(
    attributes = c("ensembl_gene_id", "ensembl_transcript_id", "external_gene_name", "description"),
    filters = "ensembl_transcript_id",
    values = tx,
    mart = ensembl)
res
#ensembl_gene_id ensembl_transcript_id external_gene_name
#1 ENSCAFG00000000934    ENSCAFT00000001452            COL14A1
#2 ENSCAFG00000001086    ENSCAFT00000001656                MYC
#                                                                   description
#1               collagen type XIV alpha 1 chain [Source:VGNC Symbol;Acc:VGNC:51768]
#2 MYC proto-oncogene, bHLH transcription factor [Source:VGNC Symbol;Acc:VGNC:43527]

Note that you can get a data.frame of all attributes for a particular mart with listAttributes(ensembl).

Additionally to the link @GordonShumway gives in the comment above, another good (and succinct) summary/introduction to biomaRt can be found on the Ensembl websites.

0 讨论(0)