Matching TV and Movie File names with Regex

前端 未结 2 1417
有刺的猬
有刺的猬 2021-01-29 00:35

I\'ve been working on getting a regular expression to grab the TV Show or Movie name, the year it was aired if it exist, the season #and the episode # from the file name of a vi

相关标签:
2条回答
  • 2021-01-29 01:10

    I made some modifications to your regex, and it seems to work, if I understood you correctly.

    ^(
      (?P<ShowNameA>.*[^ (_.]) # Show name
        [ (_.]+
        ( # Year with possible Season and Episode
          (?P<ShowYearA>\d{4})
          ([ (_.]+S(?P<SeasonA>\d{1,2})E(?P<EpisodeA>\d{1,2}))?
        | # Season and Episode only
          (?<!\d{4}[ (_.])
          S(?P<SeasonB>\d{1,2})E(?P<EpisodeB>\d{1,2})
        | # Alternate format for episode
          (?P<EpisodeC>\d{3})
        )
    |
      # Show name with no other information
      (?P<ShowNameB>.+)
    )
    

    See demo on regex101

    EDIT: I've updated the regex to handle those last 3 situations you mentioned in comments.

    One main problem was that you had no parens around the main alternation, so it included the whole regex. I also had to add an alternation to allow for none of the year/episode formats following the name.

    Because you have so many different possible layouts that possibly conflict with each other, the regex ended up being lots of alternation of different scenarios. For example, to match a title that has no year or episode information at all, I had to add an alternation around the whole regex that if it can't find any known pattern, just match the whole thing.

    Note: now that you seem to have expanded show years to match any four digits, there's no need for the lookahead. In other words, (?=\d{4,4})(?P<ShowYear>\d{4}) is the same as (?P<ShowYear>\d{4}). This also means that your alternate format for episode must match 3 digits only, not 4. Otherwise, there's no way to distinguish a stand-alone 4-digit sequence as a year or episode.

    General pattern:

    [ (_.]+                   the delimiter used throughout
    (?P<ShowNameA>.*[^ (_.])  the show name, greedy but not including a delimiter
    (?P<ShowNameB>.+)         the show name when it's the whole line
    

    Format A (Year with possible Season and Episode):

    (?P<ShowYearA>\d{4})
    ([ (_.]+S(?P<SeasonA>\d{1,2})E(?P<EpisodeA>\d{1,2}))?
    

    Format B (Season and Episode only):

    (?<!\d{4}[ (_.])
    S(?P<SeasonB>\d{1,2})E(?P<EpisodeB>\d{1,2})
    

    Format C (Alternate format for episode):

    (?P<EpisodeC>\d{3})
    
    0 讨论(0)
  • 2021-01-29 01:10

    if i may, i adapted brian's regex to match something like

    SHOW.NAME.201X.SXXEXX.XSUB.VOSTFR.720p.HDTV.x264-ADDiCTiON.mkv

    here it is (PHP PCRE)

    /^(
        (?P<ShowNameA>.*[^ (_.]) # Show name
            [ (_.]+
            ( # Year with possible Season and Episode
                (?P<ShowYearA>\d{4})
                ([ (_.]+S(?P<SeasonA>\d{1,2})E(?P<EpisodeA>\d{1,2}))?
            | # Season and Episode only
                (?<!\d{4}[ (_.])
                S(?P<SeasonB>\d{1,2})E(?P<EpisodeB>\d{1,2})
            )
    |
            # Show name with no other information
            (?P<ShowNameB>.+)
    )/mx
    
    0 讨论(0)
提交回复
热议问题