问题
I'm writing a Ruby script that uses regex to find all comments of a specific format in Objective-C source code files.
The format is
/* <Headline_in_caps> <#>:
<Comment body>
**/
I want to capture the headline in caps, the number and the body of the comment.
With the regex below I can find one comment in this format within a larger body of text.
My problem is that if there are more than one comments in the file then I end up with all the text, including code, between the first /*
and last **/
. I don't want it to capture all text inclusively, but only what is within each /*
and **/
.
The body of the comment can include all characters, except for **/
and */
which both signify the end of a comment. Am I correct assuming that regex will find multiple-whole-regex-matches only processing text once?
\/\*\s*([A-Z]+). (\d)\:([\w\d\D\W]+)\*{2}\//x
Broken apart the regex does this:
\/\*
—finds the start of a comment
\s*
—finds whitespace
([A-Z]+)
—captures caps word
.<space>
—find the space in between caps word and digit
(\d)
—capture the digit
\:
—find the colon
([\w\W\d\D]+)
—captures the body of a message which can include all valid characters, except **/
or */
\*{2}\/
—finds the end of a comment
Here is a sample, everything from the first /*
to the second **/
is captured.:
/*
HEADLINE 1:
Comment body.
**/
- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
// This text and method declaration are captured
// The regex captures from HEADLINE to the end of the comment "meddled in." inclusively.
/*
HEADLINE 2:
Should be captured separately and without Objective-C code meddled in.
**/
}
Here is the sample on Rubular: http://rubular.com/r/4EoXXotzX0
I'm using gsub
to process the regex on a string of the whole file, running Ruby 1.9.3. Another issue I have is that gsub gives me what Rubular ignores, is this a regression or is Rubular using a different method that gives what I want?
In this question Regex matching multiple occurrences per file and per line about multiple occurrences the answer is to use g for the global option, that is not valid in Ruby regex.
回答1:
Change this: ([\w\W\d\D]+)
To this: ([\w\W\d\D]+?)
This will cause the regex to be non-greedy, stopping as soon as it sees the next closing **/
. (Updated rubular: http://rubular.com/r/Whm31AJ6Kg)
Also, note that [\w\W\d\D]
matches absolutely any character, and can be simpler written as just [\w\W]
. You could alternatively match the body with just [^*\/]
, which would also avoid the above problem of matching through the close. (Updated rubular: http://rubular.com/r/2h0kGYkdVQ)
回答2:
A solution:
- Split the whole String with
'*/'
(end of a comment) - If the split returns only one element, there is no comment in the String
- Otherwise, for each token, except the last one, use the RegExp
%r{/\*(.*)$}
(starting at '/*' until the end of the token) to capture the whole commented content (you may use here a more complex RegExp to capture more data in the comment)
It may not be the most beautiful solution, but it should do the job. And it's no bullet-proof, if you have in your Objective-C source code something like the line below, my solution will fail.
char *myString = "a comment /* */";
来源:https://stackoverflow.com/questions/8946503/find-multiple-objective-c-comments-per-file-in-certain-format-with-ruby-regex