Longest common prefix of two strings in bash

前端未结

关注

 13  2252

I have two strings. For the sake of the example they are set like this:

string1=\"test toast\"
string2=\"test test\"

What I want is to find

相关标签:

13条回答

旧时难觅i

2020-12-03 01:22
In sed, assuming the strings don't contain any newline characters:
```
string1="test toast"
string2="test test"
printf "%s\n%s\n" "$string1" "$string2" | sed -e 'N;s/^$.*$.*\n\1.*$/\1/'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

时光取名叫无心

2020-12-03 01:23

This can be done entirely inside bash. Although doing string manipulation in a loop in bash is slow, there is a simple algorithm that is logarithmic in the number of shell operations, so pure bash is a viable option even for long strings.

longest_common_prefix () {
  local prefix= n
  ## Truncate the two strings to the minimum of their lengths
  if [[ ${#1} -gt ${#2} ]]; then
    set -- "${1:0:${#2}}" "$2"
  else
    set -- "$1" "${2:0:${#1}}"
  fi
  ## Binary search for the first differing character, accumulating the common prefix
  while [[ ${#1} -gt 1 ]]; do
    n=$(((${#1}+1)/2))
    if [[ ${1:0:$n} == ${2:0:$n} ]]; then
      prefix=$prefix${1:0:$n}
      set -- "${1:$n}" "${2:$n}"
    else
      set -- "${1:0:$n}" "${2:0:$n}"
    fi
  done
  ## Add the one remaining character, if common
  if [[ $1 = $2 ]]; then prefix=$prefix$1; fi
  printf %s "$prefix"
}

The standard toolbox includes cmp to compare binary files. By default, it indicates the byte offset of the first differing bytes. There is a special case when one string is a prefix of the other: cmp produces a different message on STDERR; an easy way to deal with this is to take whichever string is the shortest.

longest_common_prefix () {
  local LC_ALL=C offset prefix
  offset=$(export LC_ALL; cmp <(printf %s "$1") <(printf %s "$2") 2>/dev/null)
  if [[ -n $offset ]]; then
    offset=${offset%,*}; offset=${offset##* }
    prefix=${1:0:$((offset-1))}
  else
    if [[ ${#1} -lt ${#2} ]]; then
      prefix=$1
    else
      prefix=$2
    fi
  fi
  printf %s "$prefix"
}

Note that cmp operates on bytes, but bash's string manipulation operates on characters. This makes a difference in multibyte locales, for examples locales using the UTF-8 character set. The function above prints the longest prefix of a byte string. To handle character strings with this method, we can first convert the strings to a fixed-width encoding. Assuming the locale's character set is a subset of Unicode, UTF-32 fits the bill.

longest_common_prefix () {
  local offset prefix LC_CTYPE="${LC_ALL:=LC_CTYPE}"
  offset=$(unset LC_ALL; LC_MESSAGES=C cmp <(printf %s "$1" | iconv -t UTF-32)
                                           <(printf %s "$2" | iconv -t UTF-32) 2>/dev/null)
  if [[ -n $offset ]]; then
    offset=${offset%,*}; offset=${offset##* }
    prefix=${1:0:$((offset/4-1))}
  else
    if [[ ${#1} -lt ${#2} ]]; then
      prefix=$1
    else
      prefix=$2
    fi
  fi
  printf %s "$prefix"
}

0 讨论(0)

孤街浪徒

2020-12-03 01:26
It's probably simpler in another language. Here's my solution:
```
common_bit=$(perl -le '($s,$t)=@ARGV;for(split//,$s){last unless $t=~/^\Q$z$_/;$z.=$_}print $z' "$string1" "$string2")
```
If this weren't a one-liner, I'd use longer variable names, more whitespace, more braces, etc. I'm also sure there's a faster way, even in perl, but, again, it's a trade-off between speed and space: this uses less space on what is already a long one-liner.
0 讨论(0)
发布评论:

提交评论
- 加载中...
野的像风

2020-12-03 01:28
Yet another variant, using GNU grep:
```
$ string1="test toast"
$ string2="test test"
$ grep -zPo '(.*).*\n\K\1' <<< "$string1"$'\n'"$string2"
test t
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦毁少年i

2020-12-03 01:33
Man, this is tough. It's an extremely trivial task, yet I don't know how to do this with the shell :)

here is an ugly solution:
```
echo "$2" | awk 'BEGIN{FS=""} { n=0; while(n<=NF) {if ($n == substr(test,n,1)) {printf("%c",$n);} n++;} print ""}' test="$1"
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

爱一瞬间的悲伤

2020-12-03 01:34

Just yet another way using Bash only.

string1="test toast"
string2="test test"
len=${#string1}

for ((i=0; i<len; i++)); do
   if [[ "${string1:i:1}" == "${string2:i:1}" ]]; then
      continue
   else
      echo "${string1:0:i}"                       
      i=len
   fi
done

0 讨论(0)

1 2 3 下一页