问题
I am trying to search for filenames in a comma-separated list in:
text.txt,temp_doc.doc,template.tmpl,empty.zip
I use Java's regex implementation. Requirements for output are as follows:
- Display only filenames and not their respective extensions
- Exclude files that begin with "temp_"
It should look like:
text
template
empty
So far I have managed to write more or less satisfactory regex to cope with the first task:
[^\\.,]++(?=\\.[^,]*+,?+)
I believe to make it comply with the second requirement best option is to use lookaround constructs, but not sure how to write a reliable and optimized expression. While the following regex does seem to do what is required, it is obviously a flawed solution if for no other reason than it relies on explicit maximum filename length.
(?!temp_|emp_|mp_|p_|_)(?<!temp_\\w{0,50})[^\\.,]++(?=\\.[^,]*+,?+)
P.S. I've been studying regexes only for a few days, so please don't laugh at this newbie-style overcomplicated code :)
回答1:
How about this:
Pattern regex = Pattern.compile(
"\\b # Start at word boundary\n" +
"(?!temp_) # Exclude words starting with temp_\n" +
"[^,]+ # Match one or more characters except comma\n" +
"(?=\\.) # until the last available dot",
Pattern.COMMENTS);
This also allows dots within filenames.
回答2:
- Display only filenames and not their respective extensions
- Exclude files that begin with "temp_"
One variant would be like this:
(?:^|,)(?!temp_)((?:(?!\.[^.]*(?:,|$)).)+)
This allows
- file names that do not begin with a "word character" (Tim Pietzcker's solution does not)
- file names that contain a dot (sth. like
file.name.ext
will be matched asfile.name
)
But actually, this is really complex. You'll be better off writing a small function that splits the input at the commas and strips the extension from the parts.
Anyway, here's the tear-down:
(?:^|,) # filename start: either start of the string or comma (?!temp_) # negative look-ahead: disallow filenames starting with "temp_" ( # match group 1 (will contain your file name) (?: # non-capturing group (matches one allowed character) (?! # negative look-ahead (not followed by): \. # a dot [^.]* # any number of non-dots (this matches the extension) (?:,|$) # filename-end (either end of string or comma) ) # end negative look-ahead . # this character is valid, match it )+ # end non-capturing group, repeat ) # end group 1
http://rubular.com/r/4jeHhsDuJG
回答3:
Another option:
(?:temp_[^,.]*|([^,.]*))\.[^,]*
That pattern will match all file names, but will capture only valid names.
- If at the current position the pattern can match
temp_file.ext
, it matches it and does not capture. - It it cannot match
temp_
, it tires to match([^,.]*)\.[^,]*
, and capture the file's name.
You can see an example here: http://www.rubular.com/r/QywiDgFxww
来源:https://stackoverflow.com/questions/11817249/regex-lookaround-construct-in-java-advise-on-optimization-needed