How can I use a look after to match either a single or a double quote?

前端未结

关注

 3  567

I have a series of strings I want to extract:

hello.this_is(\"bla bla bla\")
some random text
hello.this_is(\'hello hello\')
other stuff

What I

相关标签:

3条回答

一生所求

2021-01-26 20:40
Note: The sed command at the bottom of this answer works only as long as your strings are nice behaving strings like
```
"foo"
```
or
```
'bar'
```
As soon as your strings start to misbehave :) like:
```
"hello \"world\""
```
it won't work any more.

Your input looks like source code. For a stable solution I recommend to use a parser for that language to extract the strings.

For trivial use cases:

You can use sed. The solution is supposed to work on any POSIX platform in contrast to grep -oP which only works with GNU grep:
```
sed -n 's/hello\.this_is(\(["'\'']\)\([^"]*\)\(["'\'']\).*/\2/gp' file
#                                    ^^^^^^^^              ^^
#                                          capture group 2 ^
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
独厮守ぢ

2021-01-26 20:44
Use a capturing group and look for its content like the following:
```
grep -Po 'hello\.this_is\(([\047"])((?!\1).|\\.)*\1\)' file
```
This cares about escaped characters too e.g. hello.this_is("bla b\"la bla")

See live demo here

If the output should be what comes between parentheses then utilize both \K and a positive lookahead:
```
grep -Po 'hello\.this_is\(([\047"])\K((?!\1).|\\.)*(?=\1\))' file
```
Outputs:
```
bla bla bla
hello hello
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2021-01-26 20:51
Based on revo and hek2mgl excellent answers, I ended up using grep like this:
```
grep -Po '(?<=hello\.this_is\((["'\''])).*(?=\1)' file
```
Which can be explained as:
- grep
- -Po use Perl regexp machine and just prints the matches
- '(?<=hello\.this_is\((["'\''])).*(?=\1)' the expression
  - (?<=hello\.this_is\((["'\''])) look-behind: search strings preceeded by "hello.this_is(" followed by either ' or ". Also, capture this last character to be used later on.
  - .* match everything...
  - (?=\1) until the captured character (that is, either ' or ") appears again.
The key here was to use ["'\''] to indicate either ' or ". By doing '\'' we are closing the enclosing expression, populating with a literal ' (that we have to escape) and opening the enclosing expression again.
0 讨论(0)
发布评论:

提交评论
- 加载中...