Multiple regex matches in Google Sheets formula

我是研究僧i 提交于 2019-11-30 04:10:18

问题


I'm trying to get the list of all digits preceding a hyphen in a given string (let's say in cell A1), using a Google Sheets regex formula :

=REGEXEXTRACT(A1, "\d-")

My problem is that it only returns the first match... how can I get all matches?

Example text:

"A1-Nutrition;A2-ActPhysiq;A2-BioMeta;A2-Patho-jour;A2-StgMrktg2;H2-Bioth2/EtudeCas;H2-Bioth2/Gemmo;H2-Bioth2/Oligo;H2-Bioth2/Opo;H2-Bioth2/Organo;H3-Endocrino;H3-Génétiq"

My formula returns 1-, whereas I want to get 1-2-2-2-2-2-2-2-2-2-3-3- (either as an array or concatenated text).

I know I could use a script or another function (like SPLIT) to achieve the desired result, but what I really want to know is how I could get a re2 regular expression to return such multiple matches in a "REGEX.*" Google Sheets formula. Something like the "global - Don't return after first match" option on regex101.com

I've also tried removing the undesired text with REGEXREPLACE, with no success either (I couldn't get rid of other digits not preceding a hyphen).

Any help appreciated! Thanks :)


回答1:


Edit

I came up with more general solution:

=regexreplace(A1,"(.)?(\d-)|(.)","$2")


Try this formula:

=regexreplace(regexreplace(A1,"[^\-0-9]",""),"(\d-)|(.)","$1")

It will handle string like this:

"A1-Nutrition;A2-ActPhysiq;A2-BioM---eta;A2-PH3-Généti***566*9q"

with output:

1-2-2-2-3-




回答2:


You can actually do this in a single formula using regexreplace to surround all the values with a capture group instead of replacing the text:

=join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))

basically what it does is surround all instances of the \d- with a "capture group" then using regex extract, it neatly returns all the captures. if you want to join it back into a single string you can just use join to pack it back into a single cell:




回答3:


This seems to work and I have tried to verify it.

The logic is

(1) Replace letter followed by hyphen with nothing

(2) Replace any digit not followed by a hyphen with nothing

(3) Replace everything which is not a digit or hyphen with nothing

=regexreplace(A1,"[a-zA-Z]-|[0-9][^-]|[a-zA-Z;/é]","")

Result

1-2-2-2-2-2-2-2-2-2-3-3-

Analysis

I had to step through these procedurally to convince myself that this was correct. According to this reference when there are alternatives separated by the pipe symbol, regex should match them in order left-to-right. The above formula doesn't work properly unless rule 1 comes first (otherwise it reduces all characters except a digit or hyphen to null before rule (1) can come into play and you get an extra hyphen from "Patho-jour").

Here are some examples of how I think it must deal with the text




回答4:


I wasn't able to get the accepted answer to work for my case. I'd like to do it that way, but needed a quick solution and went with the following:

Input:

1111 days, 123 hours 1234 minutes and 121 seconds

Expected output:

1111 123 1234 121

Formula:

=split(REGEXREPLACE(C26,"[a-z,]"," ")," ")


来源:https://stackoverflow.com/questions/43432409/multiple-regex-matches-in-google-sheets-formula

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!