decoding | 易学教程

How can I decode a large, multi-byte string file progressively in Java?

阅读更多关于 How can I decode a large, multi-byte string file progressively in Java?

问题 I have a program that may need to process large files possibly containing multi-byte encodings. My current code for doing this has the problem that creates a memory structure to hold the entire file, which can cause an out of memory error if the file is large: Charset charset = Charset.forName( "UTF-8" ); CharsetDecoder decoder = charset.newDecoder(); FileInputStream fis = new FileInputStream( file ); FileChannel fc = fis.getChannel(); int lenFile = (int)fc.size(); MappedByteBuffer bufferFile

javax.xml.bind's Base64 encoder/decoder eats last two characters of string

阅读更多关于 javax.xml.bind's Base64 encoder/decoder eats last two characters of string

问题 I need to convert some strings using Base64 encoding, and was delighted to see that I didn't have to roll my own converter--Java provides one with javax.xml.bind.DataConverter . However, it has some problems. Here's the output of my time with a Jython REPL: >>> import javax.xml.bind.DatatypeConverter as DC >>> import java.lang.String as String >>> def foo(text): ... return DC.printBase64Binary(DC.parseBase64Binary(String(text))) ... >>> foo("hello") 'hell' >>> foo("This, it's a punctuated

Detect (or best guess of) incoming string encoding in Java

阅读更多关于 Detect (or best guess of) incoming string encoding in Java

问题 I was wondering if there are known methods to detect (or give a best guess of) the encoding of a particular string in Java. I know that you always need some additional meta-data to tell what the encoding is, and there are best practices etc., but the situation I'm in, I need to give the best approximation. A solution -- or a pointer -- to programatically distinguishing between UTF-8 and UTF-16 is also welcome. 回答1: The utf-8 encoding should be easy to verify: UTF-8 strings can be fairly

Javascript html decoding

阅读更多关于 Javascript html decoding

问题 When I receive html text by ajax in asp.net application it looks like: <span%20style='color:green;font-weight:bold'>%20Text%20Msg</span> how is it possible in javascript decode that text to normal html? <span style='color:green;font-weight:bold'> Text Msg </span> Thanks! 回答1: Nice function here that does it for you - http://phpjs.org/functions/htmlspecialchars_decode:427 回答2: You are probably best suited with finding a server side solution as already mentioned in the comments, since this

Why do Base64.decode produce same byte array for different strings?

阅读更多关于 Why do Base64.decode produce same byte array for different strings?

问题 I'm using URL safe Base64 encoding to encode my randomly generated byte arrays. But I have a problem on decoding. When I decode two different strings (all but the last chars are identical), it produces the same byte array. For example, for both "dGVzdCBzdHJpbmr" and "dGVzdCBzdHJpbmq" strings the result is same: Array(116, 101, 115, 116, 32, 115, 116, 114, 105, 110, 106) For encoding/decoding I use java.util.Base64 in that way: // encoding... Base64.getUrlEncoder().withoutPadding()

Decoding JSON format in Java

阅读更多关于 Decoding JSON format in Java

问题 I got a JSON(encoded) format nested Arrays which looks like this; [ [[1234,245,10],[312,234,122],[1234,67788,345],[235,001,332]], [[1234,245,10],[312,234,122],[1234,67788,345],[235,001,332],[1234,67788,3450]], [[1234,245,10],[312,234,122],[1234,67788,345],[235,001,332],[1234,67788,34534]]] So I have one big array which contains three arrays (this can be 2 or more than three arrays sometimes) and each of these three arrays contains some arrays, in this above example. What is the reverse

decode-encode UTF-8 doesn't lead to the original unicode

阅读更多关于 decode-encode UTF-8 doesn't lead to the original unicode

问题 When I am trying to separate two Unicode characters by decoding and encoding them again I do not get the same Unicode in return but I get a different one. Attached are the responses when I try to do so. >>> s ='\xf0\x9f\x93\xb1\xf0\x9f\x9a\xac' >>> u = s.decode("utf-8") >>> u u'\U0001f4f1\U0001f6ac' >>> u[0].encode("utf-8") '\xed\xa0\xbd' >>> u[1].encode("utf-8") '\xed\xb3\xb1' >>> u[0] u'\ud83d' >>> u[1] u'\udcf1' 回答1: Your version of python is using UCS-2 (16 bits per character) but these

Reading double to platform endianness with union and bit shift, is it safe?

阅读更多关于 Reading double to platform endianness with union and bit shift, is it safe?

问题 All the examples I've seen of reading a double of known endianness from a buffer to the platform endianness involve detecting the current platform's endianess and performing byte-swapping when necessary. On the other hand, I've seen another way of doing the same thing except for integers that uses bit shifting (one such example). This got me thinking that it might be possible to use a union and the bitshift technique to read doubles (and floats) from buffers, and a quick test implementation

How to solve this weird python encoding issue?

阅读更多关于 How to solve this weird python encoding issue?

问题 I'm doing some NLP task on a corpus of strings from the web - and as you expect, there are encoding issues. Here're a few examples: they don’t serve sushi : the apostrophe in don't is not standard ' but \xe2\x80\x99 Delicious food – Wow : the hyphen before wow is \xe2\x80\x93 So now, I'm gonna read such lines, pass them to NLTK for parsing, use the parse information to train a CRF model through mallet. Let's begin with the solution I've been seeing everywhere on stack-overflow. Here're a few

How to decode nvarchar to text (SQL Server 2008 R2)?

阅读更多关于 How to decode nvarchar to text (SQL Server 2008 R2)?

问题 I have a SQL Server 2008 R2 table with nvarchar(4000) field. Data that stores this table look like '696D616765206D61726B65643A5472' or '303131' ("011") . I see that each char is encoding to hex. How can I read those data from table? I don't want write decoding function, I mean that simpler way exists. P.S. Sorry for my English. 回答1: SQL Server 2008 actually has a built-in hex-encoding and decoding feature! Sample (note the third parameter with value "1" when converting your string to