Random sampling of non-overlapping substrings of length k

前端 未结 2 1660
野的像风
野的像风 2021-02-10 06:28

Given a string of length n, how would I (pseudo)randomly sample m substrings of size k such that none of the sampled substrings overlap? Most of my sc

2条回答
  •  自闭症患者
    2021-02-10 07:01

    If there is a character that cannot occur in the input, e.g. X, just:

    my $size = 20;
    my $count = 20;
    my $mark = 'X';
    my $input = 'CCACGCATTTTTGTTCATTGTTCTGGCTTCTTACAAGGTTCAGTAGACTTTGTAACACAGTTGTGTCTCTCACAGATTGGCAGATGTTTGGTAAAGGATTGACTTTTCAGCCAACTCATGGGAAAGTGAAATAATGTAAAAAACAGGAAGAATACAGTTTTAGGCCTTTCAAGTGAGGCATGGCTTTCAGCTCTTGGCAAGAACAGGCAAGGAGATGCAAGTTTTAGGACTCTAAGAGGCTAGGCTTTTCAAAGTGCTTCTCTCCCCTTCACCCTCCTTCAGTTACAGCACCAAGCACCACCGAGGTGTTACCTGCAGCCTCACTCTCTACCTGGTTGTGGGATCCTGCCACTTCCTTAACCCACACTGAGTTCCTTGTGGTTCACAGGGTCACACAGAGGGCTGTAGAGATACAAAAGATATATGTGATTTTATATCACCTATCATATGAAGATATATTTATAAAATAGGAAACATATTAACCACTTATCATTTTATATATTTATGGTTTTATGTGTCAAAAATATATTGTTTCATGTATGTATTAAAGGATAAGTATGTATAAGAGGTTTTATAGATGTGTAAAATTATATATTTATACGTATCTTTACAAATTTAAGAATAAAGGAAGGAAAATTCTCAAAGAGGAATTCAGATATCAAGCAGTGCCCTTTGACCAAGAGCCTTGGTTACAACATACCTACAAAAGTGAACTATCATTGAAAGACCTATGGACACTGGATTTCTCTTTCCTTATTTAGAAGGGCAGTCTGTGTCTTGGAAAAGCATACAGTTTGTTGTATCTTGCTGGACAACAGGAGTCA';
    
    if (2*$size*$count-$size-$count >= length($input)) {
        die "selection may not complete; choose a shorter length or fewer substrings, or provide a longer input string\n";
    }
    
    my @substrings;
    while (@substrings < $count) {
        my $pos = int rand(length($input)-$size+1);
        push @substrings, substr($input, $pos, $size, $mark x $size)
            if substr($input, $pos, $size) !~ /\Q$mark/;
    }
    

提交回复
热议问题