I asked the original question here, and got a practical response with mixed Ruby and Regular Expressions. Now, the purist in me wants know: Can this be done in regular expressions? My gut says it can. There's an ABNF floating around for bash 2.0, though it doesn't include string escapes.
The Spec
Given an input line that is either (1) a variable ("key") assignment from a bash-flavored script or (2) a key-value setting from a typical configuration file like postgresql.conf
, this regex (or pair of regexen) should capture the key and value in such a way that I can use those captures to substitute a new value for that key.
You may use a different regular expression for shell-flavored and config-flavored lines; the caller will know which to use.
There will be a 50-point bounty here. I can't add a bounty for two days, so I won't accept an answer till then, but you can start answering immediately. You earn points for:
- Readability (named capture groups, definitions via ?(DEFINE) or {0})
- Using a single regex instead of two
- Teaching me something about DFA
- Regex performance, if relevant
- Getting upvoted
- First to use a technique
Example:
Given the input
export RAILS_ENV=production
I should be able to write in Ruby:
match = THE_REGEX.match("export RAILS_ENV=production")
newline = "export #{match[:key]}=#{match[:value]}"
Test cases: shell style
RAILS_ENV=development # Don't forget to change this for TechCrunch
HOSTNAME=`cat /etc/hostname`
plist=`cat "/Applications/Sublim\`e Text 2.app/Content's/Info.plist"`
# Optional bonus input: "#" present in the string
FORMAT=" ##0.00 passe\`" #comment
Test cases: config style
listen_addresses = 127.0.0.1 #localhost only by default
# listen_addresses = 0.0.0.0 commented out, should not match
For the purpose of this challenge, "regular expression" and "regex" mean the same thing and both can refer to any common flavor you like, though I prefer Ruby 1.9-compatible.
I'm not sure about the full specs and what exactly you want in the value capturing group, but this should work for your test cases:
/
^\s*+
(?:export\s++)?
(?<key>\w++)
\s*+
=
\s*+
(?<value>
(?> "(?:[^"\\]+|\\.)*+"
| '(?:[^'\\]+|\\.)*+'
| `(?:[^`\\]+|\\.)*+`
| [^#\n\r]++
)
)
\s*+
(?:#.*+)?
$
/mx;
Handles comments and quotes with escapes.
Perl/PCRE flavor and quoting.
Example usage in Perl:
my $re = qr/
^\s*+
(?:export\s++)?
(?<key>\w++)
\s*+
=
\s*+
(?<value>
(?> "(?:[^"\\]+|\\.)*+"
| '(?:[^'\\]+|\\.)*+'
| `(?:[^`\\]+|\\.)*+`
| [^#\n\r]++
)
)
\s*+
(?:\#.*+)?
$
/mx;
my $str = <<'_TESTS_';
RAILS_ENV=development # Don't forget to change this for TechCrunch
HOSTNAME=`cat /etc/hostname`
plist=`cat "/Applications/Sublim\`e Text 2.app/Content's/Info.plist"`
# Optional bonus input: "#" present in the string
FORMAT=" ##0.00 passe\`" #comment
listen_addresses = 127.0.0.1 #localhost only by default
# listen_addresses = 0.0.0.0 commented out, should not match
TEST="foo'bar\"baz#"
TEST='foo\'bar"baz#\\'
_TESTS_
for(split /[\r\n]+/, $str){
print "line: $_\n";
print /$re/? "match: $1, $2\n": "no match\n";
print "\n";
}
Output:
line: RAILS_ENV=development # Don't forget to change this for TechCrunch
match: RAILS_ENV, development
line: HOSTNAME=`cat /etc/hostname`
match: HOSTNAME, `cat /etc/hostname`
line: plist=`cat "/Applications/Sublim\`e Text 2.app/Content's/Info.plist"`
match: plist, `cat "/Applications/Sublim\`e Text 2.app/Content's/Info.plist"`
line: # Optional bonus input: "#" present in the string
no match
line: FORMAT=" ##0.00 passe\`" #comment
match: FORMAT, " ##0.00 passe\`"
line: listen_addresses = 127.0.0.1 #localhost only by default
match: listen_addresses, 127.0.0.1
line: # listen_addresses = 0.0.0.0 commented out, should not match
no match
line: TEST="foo'bar\"baz#"
match: TEST, "foo'bar\"baz#"
line: TEST='foo\'bar"baz#\\'
match: TEST, 'foo\'bar"baz#\\'
来源:https://stackoverflow.com/questions/8658722/challenge-regex-only-tokenizer-for-shell-assignment-like-config-lines