biopython

Extracting specific data from a file and writing it to another file

若如初见. 提交于 2019-12-12 03:27:45
问题 I tagged python and perl in this only because that's what I've used thus far. If anyone knows a better way to go about this I'd certainly be willing to try it out. Anyway, my problem: I need to create an input file for a gene prediction program that follows the following format: seq1 5 15 seq1 20 34 seq2 50 48 seq2 45 36 seq3 17 20 Where seq# is the geneID and the numbers to the right are the positions of exons within an open reading frame. Now I have this information, in a .gff3 file that

Nucleotides separator in the pairwise sequence alignment bio python

情到浓时终转凉″ 提交于 2019-12-12 03:24:16
问题 I have RNA sequences that contain different modified nucleotides and residues. Some of them for example N79, 8XU, SDG, I . I want to pairwise align them using biopython's pairwise2.align.localms . Is it possible to make input not as a string but as list for example in order to accurately account for these modified bases? What is the correct technique? 回答1: Biopython's pairwise2 module works on strings of letters, which can be anything - for example: >>> from Bio import pairwise2 >>> from Bio

Search for motifs with degenerate positions

元气小坏坏 提交于 2019-12-12 02:53:36
问题 I have a 15-mer nucleotide motif that uses degenerate nucleotide sequences. Example: ATNTTRTCNGGHGCN. I would search a set of sequences for the occurrence of this motif. However, my other sequences are exact sequences, i.e. they have no ambiguity. I have tried doing a for loop within the sequences to search for this, but I have not been able to do non-exact searches. The code I use is modeled after the code on the Biopython cookbook. for pos,seq in m.instances.search(test_seq): print pos, seq

Reading at three different frames

若如初见. 提交于 2019-12-11 19:27:27
问题 So I'm trying to create a class that reads a DNA string in three different frames - one that starts at position 0 (or the first base), another that starts in position 1 (the second base), and a third that starts reading at position 2 (the third base). So far, this is what I've been playing around with: def codons(self, frame_one, frame_two, frame_three): start = frame_one while start + 3 <=len(self.seq): yield (self.seq[start:start+3], start) start += 3 start+1 = frame_two while start + 3 <

Biopython error - The system cannot find the file specified

半世苍凉 提交于 2019-12-11 05:42:09
问题 I have encountered an error which I am not able to resolve. I am trying to perform the easiest set of commands that will perform a tBLASTn algorithm, looking for a sequence (sequence specified as a "pytanie.fasta" file) in a database (also specified as file -> cucumber.fasta). The result will be saved in the "wynik.txt" file. The code looks as following: from Bio.Blast. Applications import NcbitblastnCommandline database = r"\Biopython\cucumber.fasta" qr = r"\Biopython\pytanie.fasta" output =

ModuleNotFoundError in Spyder

杀马特。学长 韩版系。学妹 提交于 2019-12-11 05:07:02
问题 I tried to import the biopython package in Spyder. I always get the error message ModuleNotFoundError: No module named 'biopython' although biopython is installed. I also checked the PYTHONPATH: there is a path set into the directory where the packages are stored. Can somebody help? Did I miss something? Thanks for your help! 回答1: If you're using Anaconda, it's best to install all the packages you want from Anaconda if possible. You can check if a package is available with (e.g.): conda

type object 'RestrictionType' has no attribute 'size'

邮差的信 提交于 2019-12-11 04:06:10
问题 I ran into this problem today, and wanted to bring it up to see if anyone else has seen it. Searching Google/SO/Biostars didn't get me anywhere. I'm running a simple restriction analysis (on a randomly generated "genome"), and getting this error. If I look for cut sites with the enzymes individually, it works for each. However, when I put them into a RestrictionBatch , I get an error on the class: type object 'RestrictionType' has no attribute 'size' I put up an IPython notebook describing

Transform data frame into matrix with counts

跟風遠走 提交于 2019-12-11 03:39:15
问题 I have data files structured like this: OTU1 PIA0 1120 OTU2 PIA1 2 OTU2 PIA3 6 OTU2 PIA4 10 OTU2 PIA5 1078 OTU2 PIN1 24 OTU2 PIN2 45 OTU2 PIN3 261 OTU2 PIN4 102 OTU3 PIA0 16 OTU3 PIA1 59 OTU3 PIA2 27 OTU3 PIA3 180 OTU3 PIA4 200 OTU3 PIA5 251 OTU3 PIN0 36 OTU3 PIN1 61 OTU3 PIN2 156 OTU3 PIN3 590 OTU3 PIN4 277 OTU4 PIA0 401 OTU4 PIN0 2 And I want to create a matrix that shows combination of data from the second column taking the first column as reference for the counts of combination (showing

ImportError: cannot import name _aligners [biopython]

情到浓时终转凉″ 提交于 2019-12-10 18:55:50
问题 I am doing bioinformatics that has biopython dependency. Biopython always give me the following error: I hope someone could help me with this issue. Thank you! 回答1: This can occur on Biopython version >= 1.72 and has been discussed on the biopython mailing list here. This error occurs when you try and import while inside the biopython/ directory, to fix the error simply move to another directory outside the source tree and then execute your code. If the error still occurs then likely the

Can Biopython perform Seq.find() accounting for ambiguity codes

情到浓时终转凉″ 提交于 2019-12-10 13:58:11
问题 I want to be able to search a Seq object for a subsequnce Seq object accounting for ambiguity codes. For example, the following should be true: from Bio.Seq import Seq from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA amb = IUPACAmbiguousDNA() s1 = Seq("GGAAAAGG", amb) s2 = Seq("ARAA", amb) # R = A or G print s1.find(s2) If ambiguity codes were taken into account, the answer should be >>> 2 But the answer i get is that no match is found, or >>> -1 Looking at the biopython source code, it