Regex to pick out artist name and song title, issue with lazy matching

江枫思渺然 提交于 2019-12-22 18:26:36

问题


I'm trying to build a flexible regular expression to pick out the artist name and song title of a media file. I'd like it to be flexible and support all of the following:

01 Example Artist - Example Song.mp3

01 Example Song.mp3 (In this example, there's no artist so that group should be null)

Example Artist - Example Song.mp3

Example Song.mp3 (Again, no artist)

I've come up with the following (in .NET syntax, particularly for named capture groups):

\d{0,2}\s*(?<artist>[^-]*)?[\s-]*(?<songname>.*)(\.mp3|\.m4a)

This works well, but fails for this input: 01 Example Song.mp3

It swallows the song name as the artist, I believe because of greedy matching. So, I tried modifying the expression so that the artist part would be lazy matching:

\d{0,2}\s*(?<artist>[^-]*)*?[\s-]*(?<songname>.*)(\.mp3|\.m4a)

The change is:

(?<artist>[^-]*)?

became

(?<artist>[^-]*)*?

This does indeed fix the above problem. But now, it fails for this input:

01 Example Artist - Example Song.mp3

Now, it's too lazy in that it captures "Example Artist - Example Song" as the songname and captures nothing for the artist name.

Does anyone have a suggestion regarding this?


回答1:


You can't achieve this task only with greediness, you need to be more descriptive using groups (optional or not). An example:

(?x) # switch on comment mode
^    # start of the string
(?: (?<track>\d{1,3}) \s*[\s-]\s* )? # the track is optional ( including separators) 
(?: (?<artist>.+?) \s*-\s* )? # the same with the artist name
(?<title> .+ )
(?<ext> \.m(?:p3|4a) )

demo

As an aside, audio filenames can be very weird, even with the best pattern of the world, I doubt you can handle all cases.

You can be a little more flexible and more efficient if you replace .+ with something more explicit:

^(?x)
(?: (?<track>\d{1,3}) \s*[\s-]\s* )?
(?: (?<artist> \S+ (?>[ .-][^\s.-]*)*? ) \s*-\s*)?
(?<title> [^.\n]+ (?>\.[^.\n]*)*? )
(?<ext> \.m(?:p3|4a) )

( \n are only here for test purpose, you can remove them when you apply the pattern one filename at a time)



来源:https://stackoverflow.com/questions/32288423/regex-to-pick-out-artist-name-and-song-title-issue-with-lazy-matching

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!