bioinformatics

AttributeError: 'str' object has no attribute 'id' using BioPython, parsing fasta

时光怂恿深爱的人放手 提交于 2020-06-28 03:21:47
问题 I am trying to use Bio and SeqIO to open a FASTA file that contains multiple sequences, edit the names of the sequences to remove a '.seq' on the end of all the names, (>SeqID20.seq should become >SeqID20), then write all the sequences to a new FASTA file, But i get the following error AttributeError: 'str' object has no attribute 'id' This is what I started with : with open ('lots_of_fasta_in_file.fasta') as f: for seq_record in SeqIO.parse(f, 'fasta'): name, sequence = seq_record.id, str

biopython no module named Bio

a 夏天 提交于 2020-06-27 06:49:11
问题 FYI: this is NOT a duplicate! Before running my python code I installed biopython in the cmd prompt: pip install biopython I then get an error saying 'No module named Bio' when try to import it in python import Bio The same thing happens with import biopython It should be noted I have updated PIP and run python 3.5.2 I appreciate anyone's help. 回答1: use this: pip3 install biopython and then import Bio worked for me 回答2: When I came across this problem I noticed that after I installed

Error message: h5py.h5py_warnings.H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead

一个人想着一个人 提交于 2020-06-15 05:53:10
问题 I'm tring to run mbin for methylation analysis. But get error message: h5py.h5py_warnings.H5pyDeprecationWarning: dataset.value has been deprecated. Use dataset[()] instead. for several attempts, while trying to extract control IPDs with buildcontrols. Environment: mbin version: 1.1.1 Python version:2.7.12 Operating System: centOS running under virtualenv I thought it was, again, caused by version. What I've tried: I tried on a server with both python 3 and 2. And specified virtualenv to use

How to conditionally count and record if a sample appears in rows of another dataset?

家住魔仙堡 提交于 2020-05-30 09:44:36
问题 I have a genetic dataset of IDs (dataset1) and a dataset of IDs which interact with each other (dataset2). I am trying to count IDs in dataset1 which appear in either of 2 interaction columns in dataset2 and also record which are the interacting/matching IDs in a 3rd column. Dataset1: ID 1 2 3 Dataset2: Interactor1 Interactor2 1 5 2 3 1 10 Output: ID InteractionCount Interactors 1 2 5, 10 2 1 3 3 1 2 So the output contains all IDs of dataset1 and a count of those IDs also appear in either

Python: How to encode DNA sequence using binary values?

早过忘川 提交于 2020-05-18 18:38:46
问题 I would like to convert a file that contained few DNA sequences into binary values which is as follow: A=1000 C=0100 G=0010 T=0001 FileA.txt CCGAT GCTTA Desired output 01000100001010000001 00100100000100011000 I have tried using this code to solve my problem but the bin output file seem failed to output my desired answer. Can anyone help me? Code import sys if len(sys.argv) != 2 : sys.stderr.write('Usage: {} <nucleotide file>\n'.format(sys.argv[0])) sys.exit() # assumes the file only contains

how group values into smaller sets of values in a data set?

会有一股神秘感。 提交于 2020-05-17 05:49:14
问题 basically I have a single column data set of 53 values. what I am trying to achieve is binning them into sets based on a 400 point difference, ranging from ~500 to 4500. you can just be vague if needed and state a function for doing so, I can work out the rest 回答1: A dplyr option library(dplyr) df_test <- data.frame(x = runif(1000, 400, 5000), y = rep("A", 1000)) df_test <- df_test %>% mutate(bins = case_when(between(x, 400, 800) ~ "Set 1", between(x, 801, 1600) ~ "Set 2", between(x, 1601,