Splitting a string based on locating a varying code (with a similar format)

前端未结

关注

 1  1017

I uploaded a txt file in to R as follows: Election_Parties <- readr::read_lines(\"Election_Parties.txt\") Let\'s say the following text

相关标签:

1条回答

孤独总比滥情好

2021-01-26 09:44

You may use

strsplit(paste(Election_Parties, collapse=" "), "\\s+(?=P\\d+-)", perl=TRUE)[[1]]

See the R demo online.

Output:

[1] "P23-Andalusian Social Democratic Party (Partido Social-Demócrata Andaluz [PSDA])"                              
[2] "P24-Andalusian Socialist Movement (Movimiento Socialista Andaluz [MSA])"                                       
[3] "P235-Andalusian Socialist Party-Andalucian Party (Partido Socialista Andalucista-Partido Andalucista [PSA-PA])"
[4] "P26-Andalusist Party (Partido Andalucista [PA])"                                                               
[5] "P217-Andecha Astur (Andecha Astur [AA])"

The \s+(?=P\d+-) pattern matches 1+ whitespaces that are followed with P, 1+ digits, -, but the P<numbers>- is not consumed since the pattern resides in the positive lookahead construct that is a zero-width assertion. Due to this lookahead, the perl=TRUE argument is necessary to process the regex with the PCRE regex engine.

0 讨论(0)