问题
I have a directory of nuget packages that I've downloaded from nuget.org. I'm trying to create a regex that will parse out the package name and version number from the filename. It doesn't seem difficult at first glance; the filenames have a clear pattern:
{PackageName}.{VersionNumber}.nupkg
Edge cases make it challenging though.
- Package names can have dashes, underscores, and numbers
- Package names can have effectively unlimited parts separated by dots
- Version numbers consist of 3-4 groups of numbers, separated by dots
- Version numbers sometimes are suffixed with pre-release tags (-alpha, -beta, etc)
Here's a sample list of nuget package filenames:
knockoutjs.3.4.2.nupkg
log4net.2.0.8.nupkg
runtime.tizen.4.0.0-armel.microsoft.netcore.jit.2.0.0.nupkg
nuget.core.2.7.0-alpha.nupkg
microsoft.identitymodel.6.1.7600.16394.nupkg
I want to be able to do a search/replace in a Serious Text Editor where the search is a regex with two groups, one for the package name and one for the version number. The output should be "Package: \1 Version: \2". With the 5 packages above, the output should be:
Package: knockoutjs Version: 3.4.2
Package: log4net Version: 2.0.8
Package: runtime.tizen.4.0.0-armel.microsoft.netcore.jit Version: 2.0.0
Package: nuget.core Version: 2.7.0-alpha
Package: microsoft.identitymodel Version: 6.1.7600.16394
The closest relatively concise regex I've come up with is:
^([^\s]*)\.((?:[0-9]+\.){3,})nupkg$
...which results in the following output:
Package: knockoutjs Version: 3.4.2.
Package: log4net Version: 2.0.8.
Package: runtime.tizen.4.0.0-armel.microsoft.netcore.jit Version: 2.0.0.
nuget.core.2.7.0-alpha.nupkg
Package: microsoft.identitymodel.6 Version: 1.7600.16394.
It handles the first three decently, although I don't want that trailing dot. It doesn't even match on the fourth one, and the fifth one has the first part of the version number lumped in with the package name.
Save the day!
回答1:
I modified your expression slightly to:
^(.*?)\.((?:\.?[0-9]+){3,}(?:[-a-z]+)?)\.nupkg$
The main points are that I moved the .
in front of the digits in the first non capturing group, and that I added an optional non capturing group for -alpha
in the fourth string.
Replace with:
Package: \1 Version: \2
Test the regex live here.
回答2:
I think this regex will do what you want:
^(.*?)\.(?=(?:[0-9]+\.){2,}[0-9]+(?:-[a-z]+)?\.nupkg)(.*?)\.nupkg$
It uses a positive lookahead to look for the version number followed (possibly) by a tag in the form -[a-z]+
(e.g. -alpha
) followed by \.nupkg
. This last part prevents it matching the 4.0.0-armel
in the third sample. For your edge cases, and substituting with Package: $1 Version: $2
the output is:
Package: knockoutjs Version: 3.4.2
Package: log4net Version: 2.0.8
Package: runtime.tizen.4.0.0-armel.microsoft.netcore.jit Version: 2.0.0
Package: nuget.core Version: 2.7.0-alpha
Package: microsoft.identitymodel Version: 6.1.7600.16394
Demo
来源:https://stackoverflow.com/questions/51662737/regex-to-parse-package-name-and-version-number-from-nuget-package-filenames