How to combine multiple regular expressions into one line?

前端 未结 2 1798
一个人的身影
一个人的身影 2021-01-22 08:20

My script works fine doing this:

images = re.findall(\"src.\\\"(\\S*?media.tumblr\\S*?tumblr_\\S*?jpg)\", doc)
videos = re.findall(\"\\S*?(http\\S*?video_file\\S         


        
相关标签:
2条回答
  • 2021-01-22 08:38

    If you really want efficient...

    For starters, I would cut out the \S*? in the second regex. It serves no purpose apart from an opportunity for lots of backtracking.

    src.\"(\S*?media.tumblr\S*?tumblr_\S*?jpg)|(http\S*?video_file\S*?tumblr_[a-zA-Z0-9]*)
    

    Other ideas

    You can get rid of the capture groups by using a small lookbehind in the first one, allowing you to get rid of all parentheses and directly matching what you want. Not faster, but tidier:

    (?<=src.\")\S*?media.tumblr\S*?tumblr_\S*?jpg|http\S*?video_file\S*?tumblr_[a-zA-Z0-9]*
    

    Do you intend for the periods after src and media to mean "any character", or to mean "a literal period"? If the latter, escape them: \.

    You can use the re.IGNORECASE option and get rid of some letters:

    (?<=src.\")\S*?media.tumblr\S*?tumblr_\S*?jpg|http\S*?video_file\S*?tumblr_[a-z0-9]*
    
    0 讨论(0)
  • 2021-01-22 08:51

    As mentioned in the comments, a pipe (|) should do the trick.

    The regular expression

    (src.\"(\S*?media.tumblr\S*?tumblr_\S*?jpg))|(\S*?(http\S*?video_file\S*?tumblr_[a-zA-Z0-9]*))
    

    catches either of the two patterns.

    Demo on Regex Tester

    0 讨论(0)
提交回复
热议问题