Regular Expression for String representing DNA code

前端 未结 3 1519
慢半拍i
慢半拍i 2021-01-22 21:42

Hello I am trying to use regular expressions in a java program. I would like the regex to identify a String of unknown length and whose charachters are only \'C\', \'A\', \'G\'

相关标签:
3条回答
  • 2021-01-22 22:00

    Easy, just use a character class:

    [CAGT]+
    

    Or if the entire string has to comprise of the chars CAGT for it to match:

    ^[CAGT]+$
    
    0 讨论(0)
  • 2021-01-22 22:03

    Adding to the above :

    ^[CAGTcagt]+$
    

    To ensure detection of lowercase and upper case charcters.

    0 讨论(0)
  • 2021-01-22 22:11

    I disagree with the most voted answer. With [ACGT]+, a large string will lead to a lot of memory usage. So I would use a negated regex instead, and check if the string doesn't contain non [ACGT] characters instead:

    str !~ [^ACGTacgt]
    
    0 讨论(0)
提交回复
热议问题