问题
I'm going to try to phrase this clearly... (I'm pretty new at regex). I'm working on a PDF document, with a program called AutoBookmark (from Evermap). I'm trying to set it up to link numbered citations to numbered references in a bibliography.
The goal is to match each numbered citation within brackets, and return that number within brackets, alone. In other words, if I have [85], I'd just return [85]. If I have [85, 93], I'd return both [85] and [93]. If there are more numbers in brackets, up to N numbers, I'd return N of them (in brackets). If there is a range, i.e., [85-93], I only need to return the first.
So it seems to me I'm asking this: the number (1 to 3 digits), only if preceded by EITHER an opening bracket, OR another number followed by a comma and a space, but only if that number is preceded by an opening bracket OR by a number followed by a comma and a space, but only if... you get the picture. Iterate until you hit a bracket (then return the number) or a non-number, in which case, don't return the number. Is this something even reasonable to ask of a regular expression? Or, since I'm doing this in a PDF, must I do a Javascript routine? (which BTW, I also don't know how to do!) Thanks! I know I'm a newbie at this, and I'm grateful for any thoughts.
回答1:
I have no experience with this program, but this should work with javascript, and thus other feature-minimal implementations of Regex.
\[?\s*(\d+)\s*(?=(?:,\s*\d+)+|\])(?=[^\[]*\]).
\[? # Literal [, zero or 1 times
\s* # Any number (*) of whitespace characters
(\d+) # Any number of digits, one or more (+)
\s* # Any number (*) of whitespace characters
(?= # Positive lookahead, support for possitive lookahead is key to the regex
(?: # Open non-capturing group
,\s*\d+ # Literal ",", any number of whitespace characters,
# digits one or more
) # Close non-capturing group
| # or
\] # Literal "]"
) # Close positive lookahead
(?= # Open another positive lookahead
[^\[]*\] # Any number of characters that are not "[", as long as they're followed by "]".
# This is only a validation check, those characters won't be caught
) # Close positive lookahead
. # Match any character except newline
If this program supports variable-length bookbehinds, you can use this, which only adds a lookbehind to makesure the number is prefixed by valid characters as well.
\[?\s*(?<=\[[,\d ]*)(\d+)\s*(?=(?:,\s*\d+)+|\])(?=[^\[]*\]).
If your citation format is 100% reliable [1]
, [12]
, [13, 14, 21]
, etc. You can use a simpler version
\[?\s*(\d+)(?=(?:, \d+)|\])(?=[^\[]*\]).
or this if your program supports variable-length lookbehinds, \[(?<=\[[,\d ]*)(\d+)(?=(?:, \d+)|\])(?=[^\[]*\]).
.
With any of these expressions: You can change the last character, .
, to \]?
to see the citations still separated by commas [1],[15],[22]
.
*
In many flavors of regular expressions, lookbehinds--if supported at all, must be a fixed-length with no quantifiers and all alternation being the same width. For instance, (?<=a|1)
will work but (?<=a|12)
, (<=a|1+)
or (<=a+)
will fail. As will quantifiers applied to the lookbehind itself (?<=a)+
Edit: And thanks for Rawing for input.
回答2:
Thanks for the suggestions! Here's what happens. Apparently, Evermap doesn't understand variable-length lookarounds, so I tried your other ones. They give some results, but not all. They match simple numbers in brackets, and they match the last number in a series within brackets.
AutoBookmark does offer a "multiple rule" way of searching for text patterns, so I could look for [35] or [35 or , 35] or , 35, or 35- all individually.
Right now, I'm using three rules:
(\[)(\d{1,3})(\]|,)
\[?\s*(\d+)(?=(?:, \d+)|\])(?=[^\[]*\]).
(\[|\s)(\d{1,3})\-
For each of these, the 'replace', or what the program calls 'link action', is the extracted number, or \2
.
This gets me most of what I want, but if there are more than two numbers in a series, separated by comma+space, it doesn't match the middle numbers. I would do that by hand, I suppose, if I can't find a better way.
I know I'm stumbling around here... Thanks for helping, and thanks for being patient with a newbie! (If I work this out so it's fully automated, I'll be a god at work...)
来源:https://stackoverflow.com/questions/42603140/regex-for-n-1-to-3-digit-numbers-in-square-brackets-with-commasspaces-between