Multiple regex matches in Google Sheets formula

后端 未结 5 2012
情歌与酒
情歌与酒 2020-12-01 17:08

I\'m trying to get the list of all digits preceding a hyphen in a given string (let\'s say in cell A1), using a Google Sheets regex formula :

=R         


        
相关标签:
5条回答
  • 2020-12-01 17:24

    You may create your own custom function in the Script Editor:

    function ExtractAllRegex(input, pattern,groupId) {
      return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]);
    }
    

    Or, if you need to return all matches in a single cell joined with some separator:

    function ExtractAllRegex(input, pattern,groupId,separator) {
      return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]).join(separator);
    }
    

    Then, just call it like =ExtractAllRegex(A1, "\d-", 0, ", ").

    Description:

    • input - current cell value
    • pattern - regex pattern
    • groupId - Capturing group ID you want to extract
    • separator - text used to join the matched results.
    0 讨论(0)
  • 2020-12-01 17:36

    Edit

    I came up with more general solution:

    =regexreplace(A1,"(.)?(\d-)|(.)","$2")

    It replaces any text except the second group match (\d-) with just the second group $2.

    "(.)?(\d-)|(.)"
      1    2    3  
      Groups are in ()
      ---------------------------------------
     "$2" -- means return the group number 2
    

    Learn regular expressions: https://regexone.com


    Try this formula:

    =regexreplace(regexreplace(A1,"[^\-0-9]",""),"(\d-)|(.)","$1")

    It will handle string like this:

    "A1-Nutrition;A2-ActPhysiq;A2-BioM---eta;A2-PH3-Généti***566*9q"

    with output:

    1-2-2-2-3-

    0 讨论(0)
  • 2020-12-01 17:38

    You can actually do this in a single formula using regexreplace to surround all the values with a capture group instead of replacing the text:

    =join("",REGEXEXTRACT(A1,REGEXREPLACE(A1,"(\d-)","($1)")))
    

    basically what it does is surround all instances of the \d- with a "capture group" then using regex extract, it neatly returns all the captures. if you want to join it back into a single string you can just use join to pack it back into a single cell:

    0 讨论(0)
  • 2020-12-01 17:39

    This seems to work and I have tried to verify it.

    The logic is

    (1) Replace letter followed by hyphen with nothing

    (2) Replace any digit not followed by a hyphen with nothing

    (3) Replace everything which is not a digit or hyphen with nothing

    =regexreplace(A1,"[a-zA-Z]-|[0-9][^-]|[a-zA-Z;/é]","")
    

    Result

    1-2-2-2-2-2-2-2-2-2-3-3-
    

    Analysis

    I had to step through these procedurally to convince myself that this was correct. According to this reference when there are alternatives separated by the pipe symbol, regex should match them in order left-to-right. The above formula doesn't work properly unless rule 1 comes first (otherwise it reduces all characters except a digit or hyphen to null before rule (1) can come into play and you get an extra hyphen from "Patho-jour").

    Here are some examples of how I think it must deal with the text

    0 讨论(0)
  • 2020-12-01 17:41

    I wasn't able to get the accepted answer to work for my case. I'd like to do it that way, but needed a quick solution and went with the following:

    Input:

    1111 days, 123 hours 1234 minutes and 121 seconds
    

    Expected output:

    1111 123 1234 121
    

    Formula:

    =split(REGEXREPLACE(C26,"[a-z,]"," ")," ")
    
    0 讨论(0)
提交回复
热议问题