Count bytes in textarea using javascript

后端未结

关注

 10  581

难免孤独

I need to count how long in bytes a textarea is when UTF8 encoded using javascript. Any idea how I would do this?

thanks!

相关标签:

10条回答

一生所求

2020-12-02 11:21
[June 2020: The previous answer has been replaced due to it returning incorrect results].

Most modern JS environments (browsers and Node) now support the TextEncoder API, which may be used as follows to count UTF8 bytes:
```
const textEncoder = new TextEncoder();
textEncoder.encode('⤀⦀⨀').length; // => 9
```
This is not quite as fast as the getUTF8Length() function mentioned in other answers, below, but should suffice for all but the most demanding use cases. Moreover, it has the benefit of leveraging a standard API that is well-tested, well-maintained, and portable.
0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2020-12-02 11:21
How about simple:
```
unescape(encodeURIComponent(utf8text)).length
```
The trick is that encodeURIComponent seems to work on characters while unescape works on bytes.
0 讨论(0)
发布评论:

提交评论
- 加载中...

悲&欢浪女

2020-12-02 11:22

I have been asking myself the same thing. This is the best answer I have stumble upon:

http://www.inter-locale.com/demos/countBytes.html

Here is the code snippet:

<script type="text/javascript">
 function checkLength() {
    var countMe = document.getElementById("someText").value
    var escapedStr = encodeURI(countMe)
    if (escapedStr.indexOf("%") != -1) {
        var count = escapedStr.split("%").length - 1
        if (count == 0) count++  //perverse case; can't happen with real UTF-8
        var tmp = escapedStr.length - (count * 3)
        count = count + tmp
    } else {
        count = escapedStr.length
    }
    alert(escapedStr + ": size is " + count)
 }

but the link contains a live example of it to play with. "encodeURI(STRING)" is the building block here, but also look at encodeURIComponent(STRING) (as already point out on the previous answer) to see which one fits your needs.

Regards

0 讨论(0)

忘了有多久

2020-12-02 11:24

If you have non-bmp characters in your string, it's a little more complicated...

Because javascript does UTF-16 encode, and a "character" is a 2-byte-stack (16 bit) all multibyte characters (3 and more bytes) will not work:

    <script type="text/javascript">
        var nonBmpString = "foo€";
        console.log( nonBmpString.length );
        // will output 5
    </script>

The character "€" has a length of 3 bytes (24bit). Javascript does interpret it as 2 characters, because in JS, a character is a 16 bit block.

So to correctly get the bytesize of a mixed string, we have to code our own function fixedCharCodeAt();

    function fixedCharCodeAt(str, idx) {
        idx = idx || 0;
        var code = str.charCodeAt(idx);
        var hi, low;
        if (0xD800 <= code && code <= 0xDBFF) { // High surrogate (could change last hex to 0xDB7F to treat high private surrogates as single characters)
            hi = code;
            low = str.charCodeAt(idx + 1);
            if (isNaN(low)) {
                throw 'Kein gültiges Schriftzeichen oder Speicherfehler!';
            }
            return ((hi - 0xD800) * 0x400) + (low - 0xDC00) + 0x10000;
        }
        if (0xDC00 <= code && code <= 0xDFFF) { // Low surrogate
            // We return false to allow loops to skip this iteration since should have already handled high surrogate above in the previous iteration
            return false;
            /*hi = str.charCodeAt(idx-1);
            low = code;
            return ((hi - 0xD800) * 0x400) + (low - 0xDC00) + 0x10000;*/
        }
        return code;
    }

Now we can count the bytes...

    function countUtf8(str) {
        var result = 0;
        for (var n = 0; n < str.length; n++) {
            var charCode = fixedCharCodeAt(str, n);
            if (typeof charCode === "number") {
                if (charCode < 128) {
                    result = result + 1;
                } else if (charCode < 2048) {
                    result = result + 2;
                } else if (charCode < 65536) {
                    result = result + 3;
                } else if (charCode < 2097152) {
                    result = result + 4;
                } else if (charCode < 67108864) {
                    result = result + 5;
                } else {
                    result = result + 6;
                }
            }
        }
        return result;
    }

By the way... You should not use the encodeURI-method, because, it's a native browser function ;)

More stuff:

Code on GitHub
More on Mozilla Developer Networks

Cheers

frankneff.ch / @frank_neff

0 讨论(0)

小鲜肉

2020-12-02 11:25
Add Byte length counting function to the string
```
String.prototype.Blength = function() {
    var arr = this.match(/[^\x00-\xff]/ig);
    return  arr == null ? this.length : this.length + arr.length;
}
```
then you can use .Blength() to get the size
0 讨论(0)
发布评论:

提交评论
- 加载中...

温柔的废话

2020-12-02 11:25

Try the following:

function b(c) {
     var n=0;
     for (i=0;i<c.length;i++) {
           p = c.charCodeAt(i);
           if (p<128) {
                 n++;
           } else if (p<2048) {
                 n+=2;
           } else {
                 n+=3;
           }
      }return n;
}

0 讨论(0)

1 2 下一页