Longest common prefix of two strings in bash

前端未结

关注

 13  2254

I have two strings. For the sake of the example they are set like this:

string1=\"test toast\"
string2=\"test test\"

What I want is to find

相关标签:

13条回答

滥情空心

2020-12-03 01:37

An improved version of the sed example, this finds the common prefix of N strings (N>=0):

string1="test toast"
string2="test test"
string3="teaser"
{ echo "$string1"; echo "$string2"; echo "$string3"; } | sed -e 'N;s/^\(.*\).*\n\1.*$/\1\n\1/;D'

If the strings are stored in an array, they can be piped to sed with printf:

strings=("test toast" "test test" "teaser")
printf "%s\n" "${strings[@]}" | sed -e '$!{N;s/^\(.*\).*\n\1.*$/\1\n\1/;D;}'

You can also use a here-string:

strings=("test toast" "test test" "teaser")
oIFS=$IFS
IFS=$'\n'
<<<"${strings[*]}" sed -e '$!{N;s/^\(.*\).*\n\1.*$/\1\n\1/;D;}'
IFS=$oIFS
# for a local IFS:
(IFS=$'\n'; sed -e '$!{N;s/^\(.*\).*\n\1.*$/\1\n\1/;D;}' <<<"${strings[*]}")

The here-string (as with all redirections) can go anywhere within a simple command.

0 讨论(0)

隐瞒了意图╮

2020-12-03 01:38
Grep short variant (idea borrowed from sed one):
```
$ echo -e "String1\nString2" | grep -zoP '^(.*)(?=.*?\n\1)'
String
```
Assumes string have no new line character. But easy may be tuned to use any delimiter.

Update at 2016-10-24: On modern versions of grep you may receive complain grep: unescaped ^ or $ not supported with -Pz, just use \A instead of ^:
```
$ echo -e "String1\nString2" | grep -zoP '\A(.*)(?=.*?\n\1)'
String
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2020-12-03 01:40
Ok, in bash:
```
#!/bin/bash

s="$1"
t="$2"
l=1

while [ "${t#${s:0:$l}}" != "$t" ]
do
  (( l = l + 1 ))
done
(( l = l - 1 ))

echo "${s:0:$l}"
```
It's the same algorithm as in other languages, but pure bash functionality. And, might I say, a bit uglier, too :-)
0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2020-12-03 01:43
Without sed, using the cmp utility to get the index of the 1st different character, and using process substitution to get the 2 strings to cmp:
```
string1="test toast"
string2="test test"
first_diff_char=$(cmp <( echo "$string1" ) <( echo "$string2" ) | cut -d " " -f 5 | tr -d ",")
echo ${string1:0:$((first_diff_char-1))}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一个人的身影

2020-12-03 01:44
Another python-based answer, this one based on the os.path module's native commonprefix function
```
#!/bin/bash
cat mystream | python -c $'import sys, os; sys.stdout.write(os.path.commonprefix(sys.stdin.readlines()) + b\'\\n\')'
```
Longform, that's
```
import sys
import os
sys.stdout.write(
    os.path.commonprefix(sys.stdin.readlines()) + b'\n'
)
```
/!\ Note: the entire text of the stream will be loaded into memory as python string objects before being crunched with this method

If not buffering the entire stream in memory is a requirement, we can use the communicative property and to the prefix commonality check between every input pair
```
$!/bin/bash
cat mystream | python -c $'import sys\nimport os\nfor line in sys.stdin:\n\tif not os.path.isfile(line.strip()):\n\t\tcontinue\n\tsys.stdout.write(line)\n') | pythoin sys.stdin:\n\tprefix=os.path.commonprefix([line] + ([prefix] if prefix else []))\nsys.stdout.write(prefix)''
```
Long form
```
import sys
import os
prefix = None
for line in sys.stdin:
    prefix=os.path.commonprefix(
        [line] + ([prefix] if prev else [])
    )
sys.stdout.write(prefix)
```
Both of these methods should be binary-safe, as in they don't need input/output data to be ascii or utf-8 encoded, if you run into encoding errors, python 3 renamed sys.stdin to sys.stdin.buffer and sys.stdout to sys.stdout.buffer, which will not automatically decode/encode input/output streams on use
0 讨论(0)
发布评论:

提交评论
- 加载中...

走了就别回头了

2020-12-03 01:46

If you have an option to install a python package, you can use this python utility

# install pythonp
pythonp -m pip install pythonp

echo -e "$string1\n$string2" | pythonp 'l1,l2=lines
res=itertools.takewhile(lambda a: a[0]==a[1], zip(l1,l2)); "".join(r[0] for r in res)'

0 讨论(0)