import re
str=\"x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi\"
str2=re.match(\"[a-zA-Z]*//([a-zA-Z]*)\",str)
print str2.group()
current result=> error
expec
import re
str="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"
re.findall('//([a-z.]*)', str)
print re.sub(r"[.]","",re.search(r"(?<=//).*?(?=/)",str).group(0))
See this demo.
match
tries to match the entire string. Use search
instead. The following pattern would then match your requirements:
m = re.search(r"//([^/]*)", str)
print m.group(1)
Basically, we are looking for /
, then consume as many non-slash characters as possible. And those non-slash characters will be captured in group number 1.
In fact, there is a slightly more advanced technique that does the same, but does not require capturing (which is generally time-consuming). It uses a so-called lookbehind:
m = re.search(r"(?<=//)[^/]*", str)
print m.group()
Lookarounds are not included in the actual match, hence the desired result.
This (or any other reasonable regex solution) will not remove the .
s immediately. But this can easily be done in a second step:
m = re.search(r"(?<=//)[^/]*", str)
host = m.group()
cleanedHost = host.replace(".", "")
That does not even require regular expressions.
Of course, if you want to remove everything except for letters and digits (e.g. to turn www.regular-expressions.info
into wwwregularexpressionsinfo
) then you are better off using the regex version of replace
:
cleanedHost = re.sub(r"[^a-zA-Z0-9]+", "", host)
output=re.findall("(?<=//)\w+.*(?=/)",str)
final=re.sub(r"[^a-zA-Z0-9]+", "", output [0])
print final