Swap letters in a string

前端 未结 4 772
我寻月下人不归
我寻月下人不归 2021-01-12 12:59

I need to swap letters in a string with the following rules:

  • A is replaced by T
  • T is replaced by A
  • C is replaced by G
  • G is replaced
4条回答
  •  迷失自我
    2021-01-12 13:37

    DNA has a small alphabet. You can use a lookup table, replacing some statements with a simple array indexing.

    This approach:

    • Traverses the sequence only once.
    • Eliminates the conditional statements.
    • Can be stable in terms of letter case, which is sometimes used to communicate information in DNA sequences.
    • Can handle IUPAC ambiguity codes.
    • Can handle gaps.
    • Can easily provide a reverse complement.

    First, you need a lookup table.

    private static final String COMPLEMENT_TABLE 
      // 0123456789ABCDEF0123456789ABCDEF
      = "                                " // 0-31
      + "             -                  " // 32-63
      + " TVGH  CD  M KN   YSAABWXR      " // 64-95
      + " tvgh  cd  m kn   ysaabwxr      "; // 96-127
      //  ABCDEFGHIJKLMNOPQRSTUVWXYZ
    
    private static final byte[] COMPLEMENT_TABLE_BYTES 
      = COMPLEMENT_TABLE.getBytes( StandardCharsets.US_ASCII );
    

    Then, you can find the complement's bases by a simple table lookup.

    public static byte[] complement( byte[] sequence ) {
        int length = sequence.length;
        byte[] result = new byte[ length ];
    
        for ( int i = 0; i < length; ++i ) {
            result[i] = COMPLEMENT_TABLE_BYTES[ sequence[i] ];
        }
    
        return result;
    }
    

    If desired for convenience with small sequences, you can provide a method that accepts and returns a String.

    public static String complement( String sequence ) {
        byte[] complementBytes = complement( 
          sequence.getBytes( StandardCharsets.US_ASCII ));
        return new String( complementBytes, StandardCharsets.US_ASCII );
    }
    

    The reverse complement can be computed in the same loop.

    public static byte[] reverseComplement( byte[] sequence ) {
        int length = sequence.length;
        byte[] result = new byte[ length ];
    
        for ( int i = 0; i < length; ++i ) {
            result[ (length - i) - 1] = COMPLEMENT_TABLE_BYTES[ sequence[i] ];
        }
    
        return result;
    }
    
    public static String reverseComplement( String sequence ) {
        byte[] complementBytes = reverseComplement( 
          sequence.getBytes( StandardCharsets.US_ASCII ));
        return new String( complementBytes, StandardCharsets.US_ASCII );
    }
    

    Using your example sequence:

    public static void main(String[] args) {
        String sequence = "ACGTA";
    
        String complementSequence = complement( sequence );
        System.out.println( String.format( 
           "complement(%s) = %s", sequence, complementSequence ));
    
        String reverseComplementSequence = reverseComplement( sequence );
        System.out.println( String.format( 
          "reverseComplement(%s) = %s", sequence, reverseComplementSequence ));
    }
    

    We get this output:

    complement(ACGTA) = TGCAT
    reverseComplement(ACGTA) = TACGT
    

提交回复
热议问题