genome | 易学教程

Filter overlapping entries in bed file

阅读更多关于 Filter overlapping entries in bed file

问题 I have a bed file that looks like this: 1 183113 183114 chr1:183113-183240 0 + 1 187286 187287 chr1:187128-187287 0 - 1 187576 187587 chr1:187375-187577 0 - 1 187580 187590 chr1:187379-187577 0 - My aim is to extract only those rows for which entries do not overlap with any others. For some time I have been trying bedtools merge according to the doc. I wanted to use specific flags to count the entries that constituted to each "merged" fragment and later keep only those with value "1" but here

How to determine characteristics for a genome?

阅读更多关于 How to determine characteristics for a genome?

问题 In AI, are there any simple and/or very visual examples of how one could implement a genome into a simulation? Basically, I'm after a simple walkthrough (not a tutorial, but rather something of a summarizing nature) which details how to implement a genome which changes the characteristics in an 'individual' in a sumlation. These genes would not be things like: Mass Strength Length, Etc.. But rather they should be the things defining the above things, abstracting the genome from the actual

Plotting coverage depth in 1kb windows?

阅读更多关于 Plotting coverage depth in 1kb windows?

问题 I would like to plot average coverage depth across my genome, with chromosomes lined in increasing order. I have calculated coverage depth per position for my genome using samtools. I would like to generate a plot (which uses 1kb windows) like Figure 7: http://www.g3journal.org/content/ggg/6/8/2421/F7.large.jpg?width=800&height=600&carousel=1 Example dataframe: Chr locus depth chr1 1 20 chr1 2 24 chr1 3 26 chr2 1 53 chr2 2 71 chr2 3 74 chr3 1 29 chr3 2 36 chr3 3 39 Do I need to change the

NEAT: Speciating

阅读更多关于 NEAT: Speciating

问题 I was trying to implement neat myself, using the original paper but got stuck. Let's say that in the last generation I had the following species: Specie 1: members: 100 avg_score: 100 Specie 2: members: 150 avg_score: 120 Specie 3: members: 300 avg_score: 50 Specie 4: members: 10 avg_score: 110 My attempt right now for the next gen. is the following: from each species, remove each genome except one random genome. place each genome in the species / perhaps create a new one set the score of the

What's a sensible way to represent a binary genome for a genetic algorithm?

阅读更多关于 What's a sensible way to represent a binary genome for a genetic algorithm?

问题 My previous question belied my inexperience and was based on an assumption. Now I am much wiser. (Put 1s and 0s in a string? Pah! I laugh at the suggestion!) My question is then, how should I encode my genomes ? On paper, they look like this: 01010011010110010 17 bits that encode (in some cases singly and in some cases as groups) the parameters to be tested. The requirements are: Needs to be scalable. There might be 17 at the moment, but this could grow/shrink as options are added, removed or

Organizing the output of my shell script into tables within the text file

阅读更多关于 Organizing the output of my shell script into tables within the text file

问题 I am working with a unix shell script that does genome construction then creates a phylogeny. Depending on the genome assembler you use, the final output (the phylogeny) may change. I wish to compare the effects of using various genome assemblers. I have developed some metrics to compare them on, but I need help organizing them so I can run useful analyses. I would like to import my data into excel in columns. This is the script I am using to output data: echo "Enter the size (Mb or Gb) of

Gnome glib status for Windows/OSX/Unix-like and binaries [closed]

阅读更多关于 Gnome glib status for Windows/OSX/Unix-like and binaries [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 11 months ago . I am trying to understand which is the current situation of glib regarding Windows, Unix-Like (not necessary Linux) and OSX. I am analyzing if I could use glib for a project and I will need all those OS working. I am searching the binaries of Windows and the last I found are

Gnome glib status for Windows/OSX/Unix-like and binaries [closed]

阅读更多关于 Gnome glib status for Windows/OSX/Unix-like and binaries [closed]

Closed . This question needs to be more focused . It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post . Closed 11 months ago . I am trying to understand which is the current situation of glib regarding Windows, Unix-Like (not necessary Linux) and OSX. I am analyzing if I could use glib for a project and I will need all those OS working. I am searching the binaries of Windows and the last I found are very old (from 2010 and 2011). Does this mean that windows support is being dropped by Gnome glib? I

Common genomic intervals in R

阅读更多关于 Common genomic intervals in R

问题 I would like to infer shared genomic interval between different samples. My input: sample chr start end NE001 1 100 200 NE001 2 100 200 NE002 1 50 150 NE002 2 50 150 NE003 2 250 300 My expected output: chr start end freq 1 100 150 2 2 100 150 2 Where the "freq" is the how many samples have contribuited to infer the shared region. In the above example freq = 2 (NE001 and NE002). Cheers! 回答1: If your data is in a data.frame (see below), using the Bioconductor GenomicRanges package I create a

AWK: extract lines if column in file 1 falls within a range declared in two columns in other file

阅读更多关于 AWK: extract lines if column in file 1 falls within a range declared in two columns in other file

问题 Currently I'm struggling with an AWK problem that I haven't been able to solve yet. I have one huge file (30GB) with genomic data that holds a list with positions (declared in col 1 and 2) and a second list that holds a number of ranges (declared in col 3, 4 and 5). I want to extract all lines in the first file where the position falls within the range declared in the seconds file. As the position is only unique within a certain chromosome (chr) first it has to be tested if the chr's are