I want to replace repeated instances of the \"*\"
character within a string with a single instance of \"*\"
. For example if the string is \"*
Well regular expressions wise I would do exactly as JoshD has suggested. But one improvement here.
Use -
regex = re.compile('\*+')
result = re.sub(regex, "*", string)
This would essentially cache your regex. So subsequent usage of this in a loop would make your regex operations fast.
I'd suggest using the re module sub function:
import re
result = re.sub("\*+", "*", "***abc**de*fg******h")
I highly recommend reading through the article about RE and good practices. They can be tricky if you're not familiar with them. In practice, using raw strings is a good idea.
Lets assume for this sake of this example, your character is a space.
You can also do it this way:
while True:
if " " in pattern: # if two spaces are in the variable pattern
pattern = pattern.replace(" ", " ") # replace two spaces with one
else: # otherwise
break # break from the infinite while loop
This:
File Type : Win32 EXE
File Type Extension : exe
MIME Type : application/octet-stream
Machine Type : Intel 386 or later, and compatibles
Time Stamp : 2017:04:24 09:55:04-04:00
Becomes:
File Type : Win32 EXE
File Type Extension : exe
MIME Type : application/octet-stream
Machine Type : Intel 386 or later, and compatibles
Time Stamp : 2017:04:24 09:55:04-04:00
I find this is a little easier than having to muck around with the re module, which can get a little annoying sometimes (I think).
Hope that was helpful.
You wrote:
pattern.replace("*"\*, "*")
You meant:
pattern.replace("\**", "*")
# ^^^^
You really meant:
pattern_after_substitution= re.sub(r"\*+", "*", pattern)
which does what you wanted.
I timed all the methods in the current answers (with Python 3.7.2, macOS High Sierra).
b()
was the best overall, c()
was best when no matches are made.
def b(text):
re.sub(r"\*\*+", "*", text)
# aka squeeze()
def c(text):
while "*" * 2 in text:
text = text.replace("*" * 2, "*")
return text
Input 1, no repeats:
'a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*'
Input 2, with repeats:
'a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*****************************************************************************************************a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*'
The methods:
#!/usr/bin/env python
# encoding: utf-8
"""
See which function variants are fastest. Run like:
python -mtimeit -s"import time_functions;t='a*'*100" "time_functions.a(t)"
python -mtimeit -s"import time_functions;t='a*'*100" "time_functions.b(t)"
etc.
"""
import re
def a(text):
return re.sub(r"\*+", "*", text)
def b(text):
re.sub(r"\*\*+", "*", text)
# aka squeeze()
def c(text):
while "*" * 2 in text:
text = text.replace("*" * 2, "*")
return text
regex = re.compile(r"\*+")
# like a() but with (premature) optimisation
def d(text):
return re.sub(regex, "*", text)
def e(text):
return "".join(c for c, n in zip(text, text[1:] + " ") if c + n != "**")
def f(text):
while True:
if "**" in text: # if two stars are in the variable pattern
text = text.replace("**", "*") # replace two stars with one
else: # otherwise
break # break from the infinite while loop
return text
text = "aaaaaaaaaabbbbbbbbbbcccccccffffdffffdaaaaaa"
result = " "
for char in text:
if len(result) > 0 and result[-1] == char:
continue
else:
result += char
print(result) # abcda