Split a string (cell) in Excel without VBA (e.g. for array formula)

后端 未结 6 525
执笔经年
执笔经年 2020-11-27 21:30

I have data in an Excel spreadsheet with delimited strings. For simplicity, think of something like \"4#8#10\", with an arbitrary number of pieces.

Is

相关标签:
6条回答
  • 2020-11-27 21:34

    To sum the entries 4,8,10 you could use something like:

    =SUMPRODUCT(1*TRIM(MID(SUBSTITUTE(A1,"#",REPT(" ",99)),(ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1))-1)*99+((ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)))=1),99)))
    

    The array that is returned is a text array, so the 1* at the beginning is one way to convert them to numbers

    This part:

    TRIM(MID(SUBSTITUTE(A1,"#",REPT(" ",99)),(ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1))-1)*99+((ROW(OFFSET($A$1,,,LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)))=1),99))
    

    returns the array:

    {"4";"8";"10"}
    

    And with 1* before it:

    {4;8;10}
    

    Edit After six years, more succinct formulas are available to create the array:

    With Excel O365 and the SEQUENCE function:

    =1*(TRIM(MID(SUBSTITUTE(A1,"#",REPT(" ",99)),IF(SEQUENCE(LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)=1,1,(SEQUENCE(LEN(A1)-LEN(SUBSTITUTE(A1,"#",""))+1)-1)*99),99)))
    

    With Excel 2010+ for Windows (not MAC versions) with the FILTERXML funcdtion:

    =FILTERXML("<t><s>" & SUBSTITUTE(A1,"#","</s><s>") & "</s></t>","//s")
    

    Note that the FILTERXML function extracts the values as numbers, whereas the first formula extracts the numbers as text which must be converted to numbers before being used in a numeric formula.

    0 讨论(0)
  • 2020-11-27 21:34

    With your data in A1, in B1 enter:

    =TRIM(MID(SUBSTITUTE($A1,"#",REPT(" ",999)),COLUMNS($A:A)*999-998,999))
    

    and copy across.

    0 讨论(0)
  • 2020-11-27 21:43

    Regarding Ron Rosenfeld's answer and Austin Wismer's question as to why it wasn't working for him, I spent the good part of an hour breaking it down and trying to figure out what makes it work(I think).

    The OFFSET part of the function is taking the cell with the text that you want to split(my examples):

    "Haz|Tank|Doub"

    and duplicating it into an array column, with the height specified by taking the total number of characters in the cell

    LEN("Haz|Tank|Doub") = 13 characters
    

    and subtracting it taking the total number of characters in that cell, when the delimiter is replaced by empty space ""

    LEN(SUBSTITUTE("Haz|Tank|Doub","|","")) =
    LEN("HazTankDoub") = 11 characters
    

    Which gives the number of delimiters in the cell being referenced:

    LEN("Haz|Tank|Doub") - LEN(SUBSTITUTE("Haz|Tank|Doub","|","")) = 2
    

    Adding 1 to this:

    LEN("Haz|Tank|Doub")-LEN(SUBSTITUTE("Haz|Tank|Doub","|",""))**+1))** 
    
    13 - 11 + 1 = 3
    

    This gives the number of separate text strings that we want split up. The result is the OFFSET function duplicates the cell by 3(in my example) into separate rows

    "Haz|Tank|Doub" "Haz|Tank|Doub" "Haz|Tank|Doub"

    or

    OFFSET("Haz|Tank|Doub",,,LEN("Haz|Tank|Doub")-LEN(SUBSTITUTE("Haz|Tank|Doub","|",""))+1))
    
    OFFSET("Haz|Tank|Doub",,,3) = {"Haz|Tank|Doub";"Haz|Tank|Doub";"Haz|Tank|Doub"}
    

    Which is then fed into the ROW function, which gives the row number for each cell in a range, and this is where it gets tricky.

    So in my case, the Row # of the first cell in a list of cells with text split by delimiters starts at R2, which OFFSET then duplicates 3 rows down, so when I feed this into the ROW function, I get:

    ROW({"Haz|Tank|Doub";"Haz|Tank|Doub";"Haz|Tank|Doub"}) = {2;3;4}
    

    or

    ROW({$R$2:$R$4}) = {2;3;4}
    

    Seems fine, but then this is subtracted by 1

    (ROW(OFFSET($R$2,,,LEN($R$2)-LEN(SUBSTITUTE($R$2,"|",""))+1))-1)
    
    {2;3;4} - 1 = {1;2;3}
    

    Ok, then that is multiplied by 99

    (ROW(OFFSET($R$2,,,LEN($R$2)-LEN(SUBSTITUTE($R$2,"|",""))+1))-1) * 99
    
    {1;2;3} * 99 = {99;297;396}
    

    After this, the formula adds it to this

    +((ROW(OFFSET($R$2,,,LEN($R$2)-LEN(SUBSTITUTE($R$2,"|",""))+1)))=1)
    

    Which is almost the same as the last part, except, it does not subtract 1 AND it checks if any of these results will = 1. This is done because the assumption is that the row number of the cell you're trying to split up into an array is 1. So for now, sticking with my numbers, this is what happens:

    ((ROW(OFFSET($R$2,,,LEN($R$2)-LEN(SUBSTITUTE($R$2,"|",""))+1)))=1)   
    ((ROW(OFFSET($R$2,,,13 - 11 + 1)))=1)
    ((ROW(OFFSET($R$2,,,3)))=1)
    ((ROW({$R$2:$R$4}))=1)
    (({2;3;4})=1)
    (({FALSE;FALSE;FALSE})
    
    {99;297;396} + ({FALSE;FALSE;FALSE})
    {99;297;396} + ({0;0;0})
    {99;297;396}
    

    This is fed into the MID function as the start number to extract the text from, which is the number of characters into the text string to extract the text from. The text the MID function looks through is the cell we want split, however it first substitutes the delimiter with 99 " " empty spaces to separate them out.

    MID(SUBSTITUTE($R$2,"|",REPT(" ",99)) = "Haz                                Tank                           Doub"
    

    not exactly 99 spaces between each above, just wanted to show a representation

    So our start numbers for each of the 3 rows in the array are {99;297;396}

    So the MID function is going to start at 99 characters into the above text string, and extract 99 characters after that for the first row. Then it'll start at 297 characters in, and extract 99 characters after this.

    MID(SUBSTITUTE($R$2,"|",REPT(" ",99)),{99;297;396},99)
    MID("Haz                            Tank                           Doub",{99;297;396},99)
    
    MID("Haz                            Tank                           Doub",{99},99)
    MID("Haz                            **|Tank                           |**Doub",{99},99)
    Tank
    
    MID("Haz                            Tank                           Doub",{297},99)
    MID("Haz                            Tank                           **|Doub               
         |**",{297},99)
    Doub
    
    MID("Haz                            Tank                           Doub",{396},99)
    MID("Haz                            |Tank                          Doub",{396},99)
    ""
    

    On the last one, it starts 396 characters in and goes 99 character past that, which contains nothing, and returns nothing.

    Now if I my cell started on row $R$1, then it would do this:

    ROW({$R$1:$R$3}) = {1;2;3}
    
    (ROW(OFFSET($R$1,,,LEN($R$1)-LEN(SUBSTITUTE($R$1,"|",""))+1))-1)
    
    {1;2;3} - 1 = {0;1;2}
    
    (ROW(OFFSET($R$1,,,LEN($R$1)-LEN(SUBSTITUTE($R$1,"|",""))+1))-1) * 99
    
    {0;1;2} * 99 = {0;99;198}
    
    {0;99;198} + ((ROW(OFFSET($R$1,,,LEN($R$1)-LEN(SUBSTITUTE($R$1,"|",""))+1)))=1)   
    {0;99;198} + ((ROW(OFFSET($R$1,,,13 - 11 + 1)))=1)
    {0;99;198} + ((ROW(OFFSET($R$1,,,3)))=1)
    {0;99;198} + ((ROW({$R$1:$R$3}))=1)
    {0;99;198} + (({1;2;3})=1)
    {0;99;198} + (({TRUE;FALSE;FALSE})
    {0;99;198} + {1;0;0}
    {1;99;198}
    

    And this, fed into MID, would correctly start at 1 character in, and extract everything 99 characters to the right of it:

    MID("Haz                            Tank                           Doub",{1;99;198},99)
    {Haz;Tank;Doub}
    

    So all of that said, I substituted 2 things in the code so that it'll do it regardless of where you start in the sheet:

    TRIM(MID(SUBSTITUTE([@Endorsements],"|",REPT(" ",99)),
    (ROW(OFFSET([@Endorsements],,,LEN([@Endorsements])-LEN(SUBSTITUTE([@Endorsements],"|",""))+1))-(ROW([@Endorsements])))*99+
    ((ROW(OFFSET([@Endorsements],,,LEN([@Endorsements])-LEN(SUBSTITUTE([@Endorsements],"|",""))+1)))=(ROW([@Endorsements]))),99))
    

    Please note that [@Endorsements] is $R$2 for me, it's just the cell is in a named Table, and @ corresponds to the same row in the table, but another named column, so I think $R2 could be substituted and still work

    TRIM(MID(SUBSTITUTE([@Endorsements],"|",REPT(" ",99)),
    (ROW(OFFSET($R2,,,LEN(R$2)-LEN(SUBSTITUTE($R2,"|",""))+1))-(ROW($R2)))*99+
    ((ROW(OFFSET($R2,,,LEN($R2)-LEN(SUBSTITUTE($R2,"|",""))+1)))=(ROW($R2))),99))
    

    Basically instead of subtracting 1, it subtracts the row number of whatever the row of the cell is that you want to split up. Not much of a change at all, so full credit still goes to Ron, but I figured I'd share it since I just figured it out.

    0 讨论(0)
  • 2020-11-27 21:45
    =INDEX(FILTERXML("<t><s>"&SUBSTITUTE(A1,"#","</s><s>")&"</s></t>","//s"),3,1)
    

    so you can refer for each instance without having an array formula.in this one you refer to the 3rd

    0 讨论(0)
  • 2020-11-27 21:52

    Use XML functionality:

    ={SUM(FILTERXML("<t><s>" & SUBSTITUTE(A1, "#", "</s><s>") & "</s></t>", "//s"))}
    
    0 讨论(0)
  • 2020-11-27 21:59

    Just in case someone is looking to do the same but with Google Sheets, here's how I do it

    Let's say A1 contains 4#8#10

    =ARRAYFORMULA(SPLIT(A1, "#"))
    

    That will create an array, and display the array entry in columns.

    If you need the array to be shown in rows, you can use

    =ARRAYFORMULA(TRANSPOSE(SPLIT(A1, "#")))
    

    I wish there was a SPLIT formula in MS Excel..

    0 讨论(0)
提交回复
热议问题