Count consecutive characters

后端 未结 9 1523
别那么骄傲
别那么骄傲 2020-11-28 06:44

How would I count consecutive characters in Python to see the number of times each unique digit repeats before the next unique digit?

At first, I thought I could do

相关标签:
9条回答
  • 2020-11-28 07:17

    This is my simple code for finding maximum number of consecutive 1's in binaray string in python 3:

    count= 0
    maxcount = 0
    for i in str(bin(13)):
        if i == '1':
            count +=1
        elif count > maxcount:
            maxcount = count;
            count = 0
        else:
            count = 0
    if count > maxcount: maxcount = count        
    maxcount
    
    0 讨论(0)
  • 2020-11-28 07:24

    This is my simple and efficient code for finding maximum number of consecutive binary 1's in python:

    def consec(x):
        count=0
        while x!=0:
            x= x & (x<<1)
            count+=1
        return count
    
    n = int(input())
    print(consec(n))
    

    Using Bit Magic: The idea is based on the concept that if we AND a bit sequence with a shifted version of itself, we’re effectively removing the trailing 1 from every sequence of consecutive 1s.

      11101111   (x)
    & 11011110   (x << 1)
    ----------
      11001110   (x & (x << 1)) 
        ^    ^
        |    |
    

    trailing 1 removed

    So the operation x = (x & (x << 1)) reduces length of every sequence of 1s by one in binary representation of x. If we keep doing this operation in a loop, we end up with x = 0. The number of iterations required to reach 0 is actually length of the longest consecutive sequence of 1s.

    **

    0 讨论(0)
  • 2020-11-28 07:26

    Consecutive counts:

    Ooh nobody's posted itertools.groupby yet!

    s = "111000222334455555"
    
    from itertools import groupby
    
    groups = groupby(s)
    result = [(label, sum(1 for _ in group)) for label, group in groups]
    

    After which, result looks like:

    [("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]
    

    And you could format with something like:

    ", ".join("{}x{}".format(label, count) for label, count in result)
    # "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"
    

    Total counts:

    Someone in the comments is concerned that you want a total count of numbers so "11100111" -> {"1":6, "0":2}. In that case you want to use a collections.Counter:

    from collections import Counter
    
    s = "11100111"
    result = Counter(s)
    # {"1":6, "0":2}
    

    Your method:

    As many have pointed out, your method fails because you're looping through range(len(s)) but addressing s[i+1]. This leads to an off-by-one error when i is pointing at the last index of s, so i+1 raises an IndexError. One way to fix this would be to loop through range(len(s)-1), but it's more pythonic to generate something to iterate over.

    For string that's not absolutely huge, zip(s, s[1:]) isn't a a performance issue, so you could do:

    counts = []
    count = 1
    for a, b in zip(s, s[1:]):
        if a==b:
            count += 1
        else:
            counts.append((a, count))
            count = 1
    

    The only problem being that you'll have to special-case the last character if it's unique. That can be fixed with itertools.zip_longest

    import itertools
    
    counts = []
    count = 1
    for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
        if a==b:
            count += 1
        else:
            counts.append((a, count))
            count = 1
    

    If you do have a truly huge string and can't stand to hold two of them in memory at a time, you can use the itertools recipe pairwise.

    def pairwise(iterable):
        """iterates pairwise without holding an extra copy of iterable in memory"""
        a, b = itertools.tee(iterable)
        next(b, None)
        return itertools.zip_longest(a, b, fillvalue=None)
    
    counts = []
    count = 1
    for a, b in pairwise(s):
        ...
    
    0 讨论(0)
  • 2020-11-28 07:26

    Totals (without sub-groupings)

    #!/usr/bin/python3 -B
    
    charseq = 'abbcccffffdd'
    distros = { c:1 for c in charseq  }
    
    for c in range(len(charseq)-1):
        if charseq[c] == charseq[c+1]:
            distros[charseq[c]] += 1
    
    print(distros)
    

    I'll provide a brief explanation for the interesting lines.

    distros = { c:1 for c in charseq  }
    

    The line above is a dictionary comprehension, and it basically iterates over the characters in charseq and creates a key/value pair for a dictionary where the key is the character and the value is the number of times it has been encountered so far.

    Then comes the loop:

    for c in range(len(charseq)-1):
    

    We go from 0 to length - 1 to avoid going out of bounds with the c+1 indexing in the loop's body.

    if charseq[c] == charseq[c+1]:
        distros[charseq[c]] += 1
    

    At this point, every match we encounter we know is consecutive, so we simply add 1 to the character key. For example, if we take a snapshot of one iteration, the code could look like this (using direct values instead of variables, for illustrative purposes):

    # replacing vars for their values
    if charseq[1] == charseq[1+1]:
        distros[charseq[1]] += 1
    
    # this is a snapshot of a single comparison here and what happens later
    if 'b' == 'b':
        distros['b'] += 1
    

    You can see the program output below with the correct counts:

    ➜  /tmp  ./counter.py
    {'b': 2, 'a': 1, 'c': 3, 'd': 4}
    
    0 讨论(0)
  • 2020-11-28 07:27

    If we want to count consecutive characters without looping, we can make use of pandas:

    In [1]: import pandas as pd
    
    In [2]: sample = 'abbcccffffddaaaaffaaa'
    In [3]: d = pd.Series(list(sample))
    
    In [4]: [(cat[1], grp.shape[0]) for cat, grp in d.groupby([d.ne(d.shift()).cumsum(), d])]
    Out[4]: [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 4), ('f', 2), ('a', 3)]
    

    The key is to find the first elements that are different from their previous values and then make proper groupings in pandas:

    In [5]: sample = 'abba'
    In [6]: d = pd.Series(list(sample))
    
    In [7]: d.ne(d.shift())
    Out[7]:
    0     True
    1     True
    2    False
    3     True
    dtype: bool
    
    In [8]: d.ne(d.shift()).cumsum()
    Out[8]:
    0    1
    1    2
    2    2
    3    3
    dtype: int32
    
    0 讨论(0)
  • 2020-11-28 07:28

    There is no need to count or groupby. Just note the indices where a change occurs and subtract consecutive indicies.

    w = "111000222334455555"
    iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
    dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
    cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
    
    print(dw)  # digits
    ['1', '0', '2', '3', '4']
    print(cw)  # counts
    [3, 3, 3, 2, 2, 5]
    
    w = 'XXYXYYYXYXXzzzzzYYY'
    iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
    dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
    cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
    print(dw)  # characters
    print(cw)  # digits
    
    ['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'z', 'Y']
    [2, 1, 1, 3, 1, 1, 2, 5, 3]
    
    0 讨论(0)
提交回复
热议问题