Count consecutive characters

后端未结

关注

 9  1525

How would I count consecutive characters in Python to see the number of times each unique digit repeats before the next unique digit?

At first, I thought I could do

相关标签:

9条回答

情书的邮戳

2020-11-28 07:17

This is my simple code for finding maximum number of consecutive 1's in binaray string in python 3:

count= 0
maxcount = 0
for i in str(bin(13)):
    if i == '1':
        count +=1
    elif count > maxcount:
        maxcount = count;
        count = 0
    else:
        count = 0
if count > maxcount: maxcount = count        
maxcount

0 讨论(0)

别那么骄傲

2020-11-28 07:24
This is my simple and efficient code for finding maximum number of consecutive binary 1's in python:
```
def consec(x):
    count=0
    while x!=0:
        x= x & (x<<1)
        count+=1
    return count

n = int(input())
print(consec(n))
```
Using Bit Magic: The idea is based on the concept that if we AND a bit sequence with a shifted version of itself, we’re effectively removing the trailing 1 from every sequence of consecutive 1s.
```
  11101111   (x)
& 11011110   (x << 1)
----------
  11001110   (x & (x << 1)) 
    ^    ^
    |    |
```
trailing 1 removed

So the operation x = (x & (x << 1)) reduces length of every sequence of 1s by one in binary representation of x. If we keep doing this operation in a loop, we end up with x = 0. The number of iterations required to reach 0 is actually length of the longest consecutive sequence of 1s.

**
0 讨论(0)
发布评论:

提交评论
- 加载中...
挽巷

2020-11-28 07:26
Consecutive counts:

Ooh nobody's posted itertools.groupby yet!
```
s = "111000222334455555"

from itertools import groupby

groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]
```
After which, result looks like:
```
[("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]
```
And you could format with something like:
```
", ".join("{}x{}".format(label, count) for label, count in result)
# "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"
```
Total counts:

Someone in the comments is concerned that you want a total count of numbers so "11100111" -> {"1":6, "0":2}. In that case you want to use a collections.Counter:
```
from collections import Counter

s = "11100111"
result = Counter(s)
# {"1":6, "0":2}
```
Your method:

As many have pointed out, your method fails because you're looping through range(len(s)) but addressing s[i+1]. This leads to an off-by-one error when i is pointing at the last index of s, so i+1 raises an IndexError. One way to fix this would be to loop through range(len(s)-1), but it's more pythonic to generate something to iterate over.

For string that's not absolutely huge, zip(s, s[1:]) isn't a a performance issue, so you could do:
```
counts = []
count = 1
for a, b in zip(s, s[1:]):
    if a==b:
        count += 1
    else:
        counts.append((a, count))
        count = 1
```
The only problem being that you'll have to special-case the last character if it's unique. That can be fixed with itertools.zip_longest
```
import itertools

counts = []
count = 1
for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
    if a==b:
        count += 1
    else:
        counts.append((a, count))
        count = 1
```
If you do have a truly huge string and can't stand to hold two of them in memory at a time, you can use the itertools recipe pairwise.
```
def pairwise(iterable):
    """iterates pairwise without holding an extra copy of iterable in memory"""
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.zip_longest(a, b, fillvalue=None)

counts = []
count = 1
for a, b in pairwise(s):
    ...
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
星月不相逢

2020-11-28 07:26
Totals (without sub-groupings)
```
#!/usr/bin/python3 -B

charseq = 'abbcccffffdd'
distros = { c:1 for c in charseq  }

for c in range(len(charseq)-1):
    if charseq[c] == charseq[c+1]:
        distros[charseq[c]] += 1

print(distros)
```
I'll provide a brief explanation for the interesting lines.
```
distros = { c:1 for c in charseq  }
```
The line above is a dictionary comprehension, and it basically iterates over the characters in charseq and creates a key/value pair for a dictionary where the key is the character and the value is the number of times it has been encountered so far.

Then comes the loop:
```
for c in range(len(charseq)-1):
```
We go from 0 to length - 1 to avoid going out of bounds with the c+1 indexing in the loop's body.
```
if charseq[c] == charseq[c+1]:
    distros[charseq[c]] += 1
```
At this point, every match we encounter we know is consecutive, so we simply add 1 to the character key. For example, if we take a snapshot of one iteration, the code could look like this (using direct values instead of variables, for illustrative purposes):
```
# replacing vars for their values
if charseq[1] == charseq[1+1]:
    distros[charseq[1]] += 1

# this is a snapshot of a single comparison here and what happens later
if 'b' == 'b':
    distros['b'] += 1
```
You can see the program output below with the correct counts:
```
➜  /tmp  ./counter.py
{'b': 2, 'a': 1, 'c': 3, 'd': 4}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

名媛妹妹

2020-11-28 07:27

If we want to count consecutive characters without looping, we can make use of pandas:

In [1]: import pandas as pd

In [2]: sample = 'abbcccffffddaaaaffaaa'
In [3]: d = pd.Series(list(sample))

In [4]: [(cat[1], grp.shape[0]) for cat, grp in d.groupby([d.ne(d.shift()).cumsum(), d])]
Out[4]: [('a', 1), ('b', 2), ('c', 3), ('d', 4), ('a', 4), ('f', 2), ('a', 3)]

The key is to find the first elements that are different from their previous values and then make proper groupings in pandas:

In [5]: sample = 'abba'
In [6]: d = pd.Series(list(sample))

In [7]: d.ne(d.shift())
Out[7]:
0     True
1     True
2    False
3     True
dtype: bool

In [8]: d.ne(d.shift()).cumsum()
Out[8]:
0    1
1    2
2    2
3    3
dtype: int32

0 讨论(0)

时光说笑

2020-11-28 07:28

There is no need to count or groupby. Just note the indices where a change occurs and subtract consecutive indicies.

w = "111000222334455555"
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]

print(dw)  # digits
['1', '0', '2', '3', '4']
print(cw)  # counts
[3, 3, 3, 2, 2, 5]

w = 'XXYXYYYXYXXzzzzzYYY'
iw = [0] + [i+1 for i in range(len(w)-1) if w[i] != w[i+1]] + [len(w)]
dw = [w[i] for i in range(len(w)-1) if w[i] != w[i+1]] + [w[-1]]
cw = [ iw[j] - iw[j-1] for j in range(1, len(iw) ) ]
print(dw)  # characters
print(cw)  # digits

['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'z', 'Y']
[2, 1, 1, 3, 1, 1, 2, 5, 3]

0 讨论(0)

1 2 下一页

Count consecutive characters

Consecutive counts:

Total counts:

Your method:

Totals (without sub-groupings)