How to input a regex in string.replace?

前端 未结 7 1816
佛祖请我去吃肉
佛祖请我去吃肉 2020-11-22 06:50

I need some help on declaring a regex. My inputs are like the following:

this is a paragraph with<[1> in between and then there are cases ..         


        
相关标签:
7条回答
  • 2020-11-22 07:00
    import os, sys, re, glob
    
    pattern = re.compile(r"\<\[\d\>")
    replacementStringMatchesPattern = "<[1>"
    
    for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')):
       for line in reader: 
          retline =  pattern.sub(replacementStringMatchesPattern, "", line)         
          sys.stdout.write(retline)
          print (retline)
    
    0 讨论(0)
  • 2020-11-22 07:05

    str.replace() does fixed replacements. Use re.sub() instead.

    0 讨论(0)
  • 2020-11-22 07:08

    don't have to use regular expression (for your sample string)

    >>> s
    'this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. \nand there are many other lines in the txt files\nwith<[3> such tags </[3>\n'
    
    >>> for w in s.split(">"):
    ...   if "<" in w:
    ...      print w.split("<")[0]
    ...
    this is a paragraph with
     in between
     and then there are cases ... where the
     number ranges from 1-100
    .
    and there are many other lines in the txt files
    with
     such tags
    
    0 讨论(0)
  • 2020-11-22 07:14

    replace method of string objects does not accept regular expressions but only fixed strings (see documentation: http://docs.python.org/2/library/stdtypes.html#str.replace).

    You have to use re module:

    import re
    newline= re.sub("<\/?\[[0-9]+>", "", line)
    
    0 讨论(0)
  • 2020-11-22 07:19

    The easiest way

    import re
    
    txt='this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>.  and there are many other lines in the txt files with<[3> such tags </[3>'
    
    out = re.sub("(<[^>]+>)", '', txt)
    print out
    
    0 讨论(0)
  • 2020-11-22 07:21

    This tested snippet should do it:

    import re
    line = re.sub(r"</?\[\d+>", "", line)
    

    Edit: Here's a commented version explaining how it works:

    line = re.sub(r"""
      (?x) # Use free-spacing mode.
      <    # Match a literal '<'
      /?   # Optionally match a '/'
      \[   # Match a literal '['
      \d+  # Match one or more digits
      >    # Match a literal '>'
      """, "", line)
    

    Regexes are fun! But I would strongly recommend spending an hour or two studying the basics. For starters, you need to learn which characters are special: "metacharacters" which need to be escaped (i.e. with a backslash placed in front - and the rules are different inside and outside character classes.) There is an excellent online tutorial at: www.regular-expressions.info. The time you spend there will pay for itself many times over. Happy regexing!

    0 讨论(0)
提交回复
热议问题