I was wondering how to remove:
Avoiding to use time wasting regexp or external library
I've choose to use plain golang instead of regexp, cause there are special character that are not ASCII in every language.
Go Golang!
func RemoveDoubleWhiteSpace(str string) string {
var b strings.Builder
b.Grow(len(str))
for i := range str {
if !(str[i] == 32 && (i+1 < len(str) && str[i+1] == 32)) {
b.WriteRune(rune(str[i]))
}
}
return b.String()
}
And the related test
func TestRemoveDoubleWhiteSpace(t *testing.T) {
data := []string{` test`, `test `, `te st`}
for _, item := range data {
str := RemoveDoubleWhiteSpace(item)
t.Log("Data ->|"+item+"|Found: |"+str+"| Len: ", len(str))
if len(str) != 5 {
t.Fail()
}
}
}
It seems that you might want to use both \s
shorthand character class and \p{Zs}
Unicode property to match Unicode spaces. However, both steps cannot be done with 1 regex replacement as you need two different replacements, and the ReplaceAllStringFunc
only allows a whole match string as argument (I have no idea how to check which group matched).
Thus, I suggest using two regexps:
^[\s\p{Zs}]+|[\s\p{Zs}]+$
- to match all leading/trailing whitespace[\s\p{Zs}]{2,}
- to match 2 or more whitespace symbols inside a stringSample code:
package main
import (
"fmt"
"regexp"
)
func main() {
input := " Text More here "
re_leadclose_whtsp := regexp.MustCompile(`^[\s\p{Zs}]+|[\s\p{Zs}]+$`)
re_inside_whtsp := regexp.MustCompile(`[\s\p{Zs}]{2,}`)
final := re_leadclose_whtsp.ReplaceAllString(input, "")
final = re_inside_whtsp.ReplaceAllString(final, " ")
fmt.Println(final)
}
You can get quite far just using the strings
package as strings.Fields
does most of the work for you:
package main
import (
"fmt"
"strings"
)
func standardizeSpaces(s string) string {
return strings.Join(strings.Fields(s), " ")
}
func main() {
tests := []string{" Hello, World ! ", "Hello,\tWorld ! ", " \t\n\t Hello,\tWorld\n!\n\t"}
for _, test := range tests {
fmt.Println(standardizeSpaces(test))
}
}
// "Hello, World !"
// "Hello, World !"
// "Hello, World !"
Use regexp for this.
func main() {
data := []byte(" Hello, World ! ")
re := regexp.MustCompile(" +")
replaced := re.ReplaceAll(bytes.TrimSpace(data), []byte(" "))
fmt.Println(string(replaced))
// Hello, World !
}
In order to also trim newlines and null characters, you can use the bytes.Trim(src []byte, cutset string)
function instead of bytes.TrimSpace
strings.Fields() splits on any amount of white space, thus:
strings.Join(strings.Fields(strings.TrimSpace(s)), " ")