问题
I've following type of strings,
abc - xyz
abc - pqr - xyz
abc - - xyz
abc - pqr uvw - xyz
I want to retrieve the text xyz
from 1st string and pqr
from 2nd string, `` (empty) from 3rd & pqr uvw
. The 2nd hyphen is optional. abc
is static string, it has to be there. I've tried following regex,
/^(?:abc) - (.*)[^ -]?/
But it gives me following output,
xyz
pqr - xyz
- xyz
pqr uvw - xyz
I don't need the last part in the second string. I'm using perl for scripting. Can it be done via regex?
回答1:
Note that (.*)
part is a greedily quantified dot and it grabs any 0+ chars other than line break chars, as many as possible, up to the end of the line and the [^ -]?
, being able to match an empty string due to the ?
quantifier (1 or 0 repetitions), matches the empty string at the end of the line. Thus, pqr - xyz
output for abc - pqr - xyz
is only logical for the regex engine.
You need to use a more restrictive pattern here. E.g.
/^abc\h*-\h*((?:[^\s-]+(?:\h+[^\s-]+)*)?)/
See the regex demo.
Details
^
- start of a stringabc
- anabc
\h*-\h*
- a hyphen enclosed with 0+ horizontal whitespaces((?:[^\s-]+(?:\h+[^\s-]+)*)?)
- Group 1 capturing an optional occurrence of[^\s-]+
- 1 or more chars other than whitespace and-
(?:\h+[^\s-]+)*
- zero or more repetitions of\h+
- 1+ horizontal whitespaces[^\s-]+
- 1 or more chars other than whitespace and-
回答2:
You could use ^[^-]*-\s*\K[^\s-]*
.
Here's how it works:
^ # Matches at the beginning of the line (in multiline mode)
[^-]* # Matches every non - characters
- # Followed by -
\s* # Matches every spacing characters
\K # Reset match at current position
[^\s-]* # Matches every non-spacing or - characters
Demo.
Update for multiple enclosed words: ^[^-]*-\s*\K[^\s-]*(?:\s*[^\s-]+)*
Last part (?:\s*[^\s-]+)*
checks for existence of any other word preceded by space(s).
Demo
回答3:
You could use split:
$answer = (split / \- /, $t)[1];
Where $t is the text string and you want the 2nd split (i.e. [1] as starts from 0). Works for everything except abc - - xyz but if the separator is " - " then it should have 2 spaces in the middle to return nothing. If abc - - xyz is correct then you can do this before the split for all to work:
$t =~ s/\- \-/- -/;
It simply inserts an extra space so it'll match " - " twice with nothing in-between.
回答4:
Can it be done via regex?
Yes, with three simple regexes: -
and ^\s+
and \s+$
.
use strict;
use warnings;
use 5.020;
use autodie;
use Data::Dumper;
open my $INFILE, '<', 'data.txt';
my @results = map {
(undef, my $target) = split /-/, $_, 3;
$target =~ s/^\s+//; #remove leading spaces
$target =~ s/\s+$//; #remove trailing spaces
$target;
} <$INFILE>;
close $INFILE;
say Dumper \@results;
--output:--
$VAR1 = [
'xyz',
'pqr',
'',
'pqr uvw'
];
来源:https://stackoverflow.com/questions/48768612/capture-word-between-optional-hyphens-regex