Why is rune in golang an alias for int32 and not uint32?

前端未结

关注

 5  2098

我寻月下人不归

The type rune in Go is defined as

an alias for int32 and is equivalent to int32 in all ways. It is used, by conv

相关标签:

5条回答

误落风尘

2021-01-31 15:22
The fact that it's allowed a negative value lets you define your own rune sentinel values.

For example:
```
const EOF rune = -1

func (l *lexer) next() (r rune) {
    if l.pos >= len(l.input) {
        l.width = 0
        return EOF
    }
    r, l.width = utf8.DecodeRuneInString(l.input[l.pos:])
    l.pos += l.width
    return r
}
```
Seen here in a talk by Rob Pike: Lexical Scanning in Go.
0 讨论(0)
发布评论:

提交评论
- 加载中...
清歌不尽

2021-01-31 15:24

I googled and found this

This has been asked several times. rune occupies 4 bytes and not just one because it is supposed to store unicode codepoints and not just ASCII characters. Like array indices, the datatype is signed so that you can easily detect overflows or other errors while doing arithmetic with those types.

0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2021-01-31 15:36
"Golang, Go : what is rune by the way?" mentioned:

With the recent Unicode 6.3, there are over 110,000 symbols defined. This requires at least 21-bit representation of each code point, so a rune is like int32 and has plenty of bits.

But regarding the overflow or negative value issues, note that the implementation of some of the unicode functions like unicode.IsGraphic do include:

We convert to uint32 to avoid the extra test for negative

Code:
```
const MaxLatin1 = '\u00FF' // maximum Latin-1 value.

// IsGraphic reports whether the rune is defined as a Graphic by Unicode.
// Such characters include letters, marks, numbers, punctuation, symbols, and
// spaces, from categories L, M, N, P, S, Zs.
func IsGraphic(r rune) bool {
    // We convert to uint32 to avoid the extra test for negative,
    // and in the index we convert to uint8 to avoid the range check.
    if uint32(r) <= MaxLatin1 {
        return properties[uint8(r)]&pg != 0
    }
    return In(r, GraphicRanges...)
}
```
That maybe because a rune is supposed to be constant (as mentioned in "Go rune type explanation", where a rune could be in an int32 or uint32 or even float32 or ...: its constant value authorizes it to be stored in any of those numeric types).
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2021-01-31 15:39

It doesn’t become negative. There are currently 1,114,112 codepoints in Unicode, which is far from 2,147,483,647 (0x7fffffff) – even considering all the reserved blocks.

0 讨论(0)
发布评论:

提交评论
- 加载中...
遥遥无期

2021-01-31 15:40
In addition to the above answers given, here are my two cents to why Go needed rune.
- Strings in GoLang are byte arrays with each character being represented as a single byte. Thus GoLang has a very high-performance advantage when compared to other languages
- But since we need a way to represent UTF-8 codepoints which cannot be represented with an 8bit range, we use the rune to represent them.
- Why int32 and why not uint32? you may ask. This is made deliberately to detect overflows while doing operations on strings.
this article talks all these in much more details
0 讨论(0)
发布评论:

提交评论
- 加载中...