genome | 易学教程

Common genomic intervals in R

阅读更多关于 Common genomic intervals in R

I would like to infer shared genomic interval between different samples. My input: sample chr start end NE001 1 100 200 NE001 2 100 200 NE002 1 50 150 NE002 2 50 150 NE003 2 250 300 My expected output: chr start end freq 1 100 150 2 2 100 150 2 Where the "freq" is the how many samples have contribuited to infer the shared region. In the above example freq = 2 (NE001 and NE002). Cheers! If your data is in a data.frame (see below), using the Bioconductor GenomicRanges package I create a GRanges instance, keeping the non-range columns too library(GenomicRanges) gr <- makeGRangesFromDataFrame(df,

AWK: extract lines if column in file 1 falls within a range declared in two columns in other file

阅读更多关于 AWK: extract lines if column in file 1 falls within a range declared in two columns in other file

Currently I'm struggling with an AWK problem that I haven't been able to solve yet. I have one huge file (30GB) with genomic data that holds a list with positions (declared in col 1 and 2) and a second list that holds a number of ranges (declared in col 3, 4 and 5). I want to extract all lines in the first file where the position falls within the range declared in the seconds file. As the position is only unique within a certain chromosome (chr) first it has to be tested if the chr's are identical (ie. col1 in file 1 matches col3 in file2) file 1 chromosome position another....hundred....

Organizing the output of my shell script into tables within the text file

阅读更多关于 Organizing the output of my shell script into tables within the text file

I am working with a unix shell script that does genome construction then creates a phylogeny. Depending on the genome assembler you use, the final output (the phylogeny) may change. I wish to compare the effects of using various genome assemblers. I have developed some metrics to compare them on, but I need help organizing them so I can run useful analyses. I would like to import my data into excel in columns. This is the script I am using to output data: echo "Enter the size (Mb or Gb) of your data set:" read SIZEOFDATASET echo "The size of your data set is $SIZEOFDATASET" echo "Size of Data

blast against genomes in biopython

阅读更多关于 blast against genomes in biopython

问题 from Bio.Blast import NCBIXML from Bio.Blast import NCBIWWW result_handle = NCBIWWW.qblast( "blastn", "nr", "CACTTATTTAGTTAGCTTGCAACCCTGGATTTTTGTTTACTGGAGAGGCC", entrez_query='"Beutenbergia cavernae DSM 12333" [Organism]') blast_records = NCBIXML.parse(result_handle) for blast_record in blast_records: for alignment in blast_record.alignments: for hsp in alignment.hsps: print(hsp.query[0:75] + '...') print(hsp.match[0:75] + '...') print(hsp.sbjct[0:75] + '...') this does not give me an output,