Extract string with Python re.match

后端 未结 4 847
盖世英雄少女心
盖世英雄少女心 2020-12-01 14:16
import re
str=\"x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi\"

str2=re.match(\"[a-zA-Z]*//([a-zA-Z]*)\",str)
print str2.group()

current result=> error
expec         


        
相关标签:
4条回答
  • 2020-12-01 14:34
    import re
    str="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"
    re.findall('//([a-z.]*)', str)
    
    0 讨论(0)
  • 2020-12-01 14:35
    print re.sub(r"[.]","",re.search(r"(?<=//).*?(?=/)",str).group(0))
    

    See this demo.

    0 讨论(0)
  • 2020-12-01 14:40

    match tries to match the entire string. Use search instead. The following pattern would then match your requirements:

    m = re.search(r"//([^/]*)", str)
    print m.group(1)
    

    Basically, we are looking for /, then consume as many non-slash characters as possible. And those non-slash characters will be captured in group number 1.

    In fact, there is a slightly more advanced technique that does the same, but does not require capturing (which is generally time-consuming). It uses a so-called lookbehind:

    m = re.search(r"(?<=//)[^/]*", str)
    print m.group()
    

    Lookarounds are not included in the actual match, hence the desired result.

    This (or any other reasonable regex solution) will not remove the .s immediately. But this can easily be done in a second step:

    m = re.search(r"(?<=//)[^/]*", str)
    host = m.group()
    cleanedHost = host.replace(".", "")
    

    That does not even require regular expressions.

    Of course, if you want to remove everything except for letters and digits (e.g. to turn www.regular-expressions.info into wwwregularexpressionsinfo) then you are better off using the regex version of replace:

    cleanedHost = re.sub(r"[^a-zA-Z0-9]+", "", host)
    
    0 讨论(0)
  • 2020-12-01 14:48
    output=re.findall("(?<=//)\w+.*(?=/)",str)
    
    final=re.sub(r"[^a-zA-Z0-9]+", "", output [0])
    
    print final
    
    0 讨论(0)
提交回复
热议问题