问题
I'm struggling to read my tables in Variant Call Format (VCF) with R.
Each file has some comment lines starting with ##
, and then the header starting with #
.
## contig=<ID=OTU1431,length=253>
## contig=<ID=OTU915,length=253>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT /home/sega/data/bwa/reads/0015.2142.fastq.q10sorted.bam
Eubacterium_ruminantium_AB008552 56 . C T 228 . DP=212;AD=0,212;VDB=0;SGB=-0.693147;MQ0F=0;AC=2;AN=2;DP4=0,0,0,212;MQ=59 GT:PL 1/1:255,255,0
How can I read such table without missing a header?
Using read.table()
with comment.char = "##"
returns an error: "invalid 'comment.char' argument"
回答1:
If you want to read VCF, you can also just try to use readVcf
from VariantAnnotation
in Bioconductor.
https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html
Otherwise, I can highly recommend fread
function in data.table
package.
It allows you to use the skip
argument to allow it to start importing when a substring has been found.
e.g.
fread("test.vcf", skip = "CHROM")
should work.
来源:https://stackoverflow.com/questions/42370218/read-table-with-comment-lines-starting-with