grep

keep fasta records which have 2 matches of OX values

╄→гoц情女王★ 提交于 2020-05-16 22:05:49
问题 I have a file that looks as follows : >sp|rin-1 ghsfdhjkuesl OX=10116 GN=Cdh1 PE=1 SV=1|sp|P10287|ghsfdjdeosd gdhkhs OX=10090 GN=Cdh3 PE=1 SV=2 WRDTANWLEINPETGVISTRAEMDREDSEHVKNSTYTALIIATDDGSPIATGTGTLLLVLSDVNDNAPIPEPRNMQFCQRNPKPHVITILDPDLPP >sp|erin-1 ghsfdshkd OX=10116 GN=Cdh1 PE=1 SV=1|sp|P22223|CADH3_HUMAN Cadherin-3 OX=9606 GN=CDH3 PE=1 SV=2 ESYPTYTLVVQAADLQGEGLSTTAKAVITVKDINDNAPIFNPSTYLQCAASEPCRAVFREAEVTLEAGGAEQEPGQALGKVFMGCPGQEPALFSTD >sp|n-1 ghsfd OX=10116 GN=Cdh1 PE=1 SV=1|tr|F1LMI3

keep fasta records which have 2 matches of OX values

怎甘沉沦 提交于 2020-05-16 22:05:17
问题 I have a file that looks as follows : >sp|rin-1 ghsfdhjkuesl OX=10116 GN=Cdh1 PE=1 SV=1|sp|P10287|ghsfdjdeosd gdhkhs OX=10090 GN=Cdh3 PE=1 SV=2 WRDTANWLEINPETGVISTRAEMDREDSEHVKNSTYTALIIATDDGSPIATGTGTLLLVLSDVNDNAPIPEPRNMQFCQRNPKPHVITILDPDLPP >sp|erin-1 ghsfdshkd OX=10116 GN=Cdh1 PE=1 SV=1|sp|P22223|CADH3_HUMAN Cadherin-3 OX=9606 GN=CDH3 PE=1 SV=2 ESYPTYTLVVQAADLQGEGLSTTAKAVITVKDINDNAPIFNPSTYLQCAASEPCRAVFREAEVTLEAGGAEQEPGQALGKVFMGCPGQEPALFSTD >sp|n-1 ghsfd OX=10116 GN=Cdh1 PE=1 SV=1|tr|F1LMI3

keep fasta records which have 2 matches of OX values

烂漫一生 提交于 2020-05-16 22:05:02
问题 I have a file that looks as follows : >sp|rin-1 ghsfdhjkuesl OX=10116 GN=Cdh1 PE=1 SV=1|sp|P10287|ghsfdjdeosd gdhkhs OX=10090 GN=Cdh3 PE=1 SV=2 WRDTANWLEINPETGVISTRAEMDREDSEHVKNSTYTALIIATDDGSPIATGTGTLLLVLSDVNDNAPIPEPRNMQFCQRNPKPHVITILDPDLPP >sp|erin-1 ghsfdshkd OX=10116 GN=Cdh1 PE=1 SV=1|sp|P22223|CADH3_HUMAN Cadherin-3 OX=9606 GN=CDH3 PE=1 SV=2 ESYPTYTLVVQAADLQGEGLSTTAKAVITVKDINDNAPIFNPSTYLQCAASEPCRAVFREAEVTLEAGGAEQEPGQALGKVFMGCPGQEPALFSTD >sp|n-1 ghsfd OX=10116 GN=Cdh1 PE=1 SV=1|tr|F1LMI3

Search for unicode values in character string

老子叫甜甜 提交于 2020-05-15 04:49:17
问题 I am trying to identify unique unicode values in a data frame composed of character strings. I have tried using the grep function, however I encounter the following error Error: '\U' used without hex digits in character string starting ""\U" A example data frame time sender message 1 2012-12-04 13:40:00 1 Hello handsome! 2 2012-12-04 13:40:08 1 \U0001f618 3 2012-12-04 14:39:24 1 \U0001f603 4 2012-12-04 16:04:25 2 <image omitted> 73 2012-12-05 06:02:17 1 Haha not white and blue... White with

Using grep in python

╄→гoц情女王★ 提交于 2020-05-13 04:35:44
问题 There is a file (query.txt) which has some keywords/phrases which are to be matched with other files using grep. The last three lines of the following code are working perfectly but when the same command is used inside the while loop it goes into an infinite loop or something(ie doesn't respond). import os f=open('query.txt','r') b=f.readline() while b: cmd='grep %s my2.txt'%b #my2 is the file in which we are looking for b os.system(cmd) b=f.readline() f.close() a='He is' cmd='grep %s my2.txt

Using grep in python

混江龙づ霸主 提交于 2020-05-13 04:33:29
问题 There is a file (query.txt) which has some keywords/phrases which are to be matched with other files using grep. The last three lines of the following code are working perfectly but when the same command is used inside the while loop it goes into an infinite loop or something(ie doesn't respond). import os f=open('query.txt','r') b=f.readline() while b: cmd='grep %s my2.txt'%b #my2 is the file in which we are looking for b os.system(cmd) b=f.readline() f.close() a='He is' cmd='grep %s my2.txt

What's the difference between [:space:] and [:blank:]?

懵懂的女人 提交于 2020-05-12 12:05:08
问题 From the A Brief Introduction to Regular Expressions [:blank:] matches a space or a tab. [:space:] matches whitespace characters (space and horizontal tab). To me both definitions are the same and I was wondering if they are really duplicates? If they are different, what are the differences? 回答1: For the GNU tools the following from grep.info applies: [:blank:] Blank characters: space and tab. [:space:] Space characters: in the 'C' locale, this is tab, newline, vertical tab, form feed,

How to search for non-ASCII characters with bash tools?

别等时光非礼了梦想. 提交于 2020-05-09 19:08:33
问题 I have a large text file that contains a few unicode characters that make LaTeX crash. How can I find non-ASCII characters in a file with sed, and the like in a Linux bash? 回答1: Try: nonascii() { LANG=C grep --color=always '[^ -~]\+'; } Which can be used like: printf 'ŨTF8\n' | nonascii Within [] ^ means "not". So [^ -~] means characters not between space and ~. So excluding control chars, this matches non ASCII characters, and is a more portable though slightly less accurate version of [^

How to pick multiple fasta sequences from a genes list

ぐ巨炮叔叔 提交于 2020-05-09 07:55:32
问题 I have two files The gene list file looks like this LOC_Os06g12230.1 Pavir.Ab03005 Pavir.J14065 ChrUn.fgenesh Sevir.1G325700 LOC_Os02g51280.1 Bradi3g59320 Brast04G017400 Fasta sequence file looks like this >LOC_Os03g57190.1 pacid=33130570 polypeptide=LOC_Os03g57190.1 locus=LOC_Os03g57190 ID=LOC_Os03g57190.1.MSUv7.0 annot-version=v7.0 ATGGAGGCGGCGGTGGGGGACGGGGAAGGCGGTGGCGGCGGCGGCGGGCGGGGGAAGCGTGGGCGGGGAGGAGGAGGAGG GGAGATGGTGGAGGCGGTGTGGGGGCAGACGGGGAGTACGGCGTCGCGGATCTACAGGGTGAGGGCGACGGGGGGGAAGG

How to pick multiple fasta sequences from a genes list

我与影子孤独终老i 提交于 2020-05-09 07:55:08
问题 I have two files The gene list file looks like this LOC_Os06g12230.1 Pavir.Ab03005 Pavir.J14065 ChrUn.fgenesh Sevir.1G325700 LOC_Os02g51280.1 Bradi3g59320 Brast04G017400 Fasta sequence file looks like this >LOC_Os03g57190.1 pacid=33130570 polypeptide=LOC_Os03g57190.1 locus=LOC_Os03g57190 ID=LOC_Os03g57190.1.MSUv7.0 annot-version=v7.0 ATGGAGGCGGCGGTGGGGGACGGGGAAGGCGGTGGCGGCGGCGGCGGGCGGGGGAAGCGTGGGCGGGGAGGAGGAGGAGG GGAGATGGTGGAGGCGGTGTGGGGGCAGACGGGGAGTACGGCGTCGCGGATCTACAGGGTGAGGGCGACGGGGGGGAAGG