Longest common prefix of two strings in bash

前端 未结 13 2254
一向
一向 2020-12-03 00:49

I have two strings. For the sake of the example they are set like this:

string1=\"test toast\"
string2=\"test test\"

What I want is to find

相关标签:
13条回答
  • 2020-12-03 01:37

    An improved version of the sed example, this finds the common prefix of N strings (N>=0):

    string1="test toast"
    string2="test test"
    string3="teaser"
    { echo "$string1"; echo "$string2"; echo "$string3"; } | sed -e 'N;s/^\(.*\).*\n\1.*$/\1\n\1/;D'
    

    If the strings are stored in an array, they can be piped to sed with printf:

    strings=("test toast" "test test" "teaser")
    printf "%s\n" "${strings[@]}" | sed -e '$!{N;s/^\(.*\).*\n\1.*$/\1\n\1/;D;}'
    

    You can also use a here-string:

    strings=("test toast" "test test" "teaser")
    oIFS=$IFS
    IFS=$'\n'
    <<<"${strings[*]}" sed -e '$!{N;s/^\(.*\).*\n\1.*$/\1\n\1/;D;}'
    IFS=$oIFS
    # for a local IFS:
    (IFS=$'\n'; sed -e '$!{N;s/^\(.*\).*\n\1.*$/\1\n\1/;D;}' <<<"${strings[*]}")
    

    The here-string (as with all redirections) can go anywhere within a simple command.

    0 讨论(0)
  • 2020-12-03 01:38

    Grep short variant (idea borrowed from sed one):

    $ echo -e "String1\nString2" | grep -zoP '^(.*)(?=.*?\n\1)'
    String
    

    Assumes string have no new line character. But easy may be tuned to use any delimiter.

    Update at 2016-10-24: On modern versions of grep you may receive complain grep: unescaped ^ or $ not supported with -Pz, just use \A instead of ^:

    $ echo -e "String1\nString2" | grep -zoP '\A(.*)(?=.*?\n\1)'
    String
    
    0 讨论(0)
  • 2020-12-03 01:40

    Ok, in bash:

    #!/bin/bash
    
    s="$1"
    t="$2"
    l=1
    
    while [ "${t#${s:0:$l}}" != "$t" ]
    do
      (( l = l + 1 ))
    done
    (( l = l - 1 ))
    
    echo "${s:0:$l}"
    

    It's the same algorithm as in other languages, but pure bash functionality. And, might I say, a bit uglier, too :-)

    0 讨论(0)
  • 2020-12-03 01:43

    Without sed, using the cmp utility to get the index of the 1st different character, and using process substitution to get the 2 strings to cmp:

    string1="test toast"
    string2="test test"
    first_diff_char=$(cmp <( echo "$string1" ) <( echo "$string2" ) | cut -d " " -f 5 | tr -d ",")
    echo ${string1:0:$((first_diff_char-1))}
    
    0 讨论(0)
  • 2020-12-03 01:44

    Another python-based answer, this one based on the os.path module's native commonprefix function

    #!/bin/bash
    cat mystream | python -c $'import sys, os; sys.stdout.write(os.path.commonprefix(sys.stdin.readlines()) + b\'\\n\')'
    

    Longform, that's

    import sys
    import os
    sys.stdout.write(
        os.path.commonprefix(sys.stdin.readlines()) + b'\n'
    )
    

    /!\ Note: the entire text of the stream will be loaded into memory as python string objects before being crunched with this method


    If not buffering the entire stream in memory is a requirement, we can use the communicative property and to the prefix commonality check between every input pair

    $!/bin/bash
    cat mystream | python -c $'import sys\nimport os\nfor line in sys.stdin:\n\tif not os.path.isfile(line.strip()):\n\t\tcontinue\n\tsys.stdout.write(line)\n') | pythoin sys.stdin:\n\tprefix=os.path.commonprefix([line] + ([prefix] if prefix else []))\nsys.stdout.write(prefix)''
    

    Long form

    import sys
    import os
    prefix = None
    for line in sys.stdin:
        prefix=os.path.commonprefix(
            [line] + ([prefix] if prev else [])
        )
    sys.stdout.write(prefix)
    

    Both of these methods should be binary-safe, as in they don't need input/output data to be ascii or utf-8 encoded, if you run into encoding errors, python 3 renamed sys.stdin to sys.stdin.buffer and sys.stdout to sys.stdout.buffer, which will not automatically decode/encode input/output streams on use

    0 讨论(0)
  • 2020-12-03 01:46

    If you have an option to install a python package, you can use this python utility

    # install pythonp
    pythonp -m pip install pythonp
    
    echo -e "$string1\n$string2" | pythonp 'l1,l2=lines
    res=itertools.takewhile(lambda a: a[0]==a[1], zip(l1,l2)); "".join(r[0] for r in res)'
    
    0 讨论(0)
提交回复
热议问题