decoding

How can I decode a large, multi-byte string file progressively in Java?

痞子三分冷 提交于 2019-12-10 17:56:38
问题 I have a program that may need to process large files possibly containing multi-byte encodings. My current code for doing this has the problem that creates a memory structure to hold the entire file, which can cause an out of memory error if the file is large: Charset charset = Charset.forName( "UTF-8" ); CharsetDecoder decoder = charset.newDecoder(); FileInputStream fis = new FileInputStream( file ); FileChannel fc = fis.getChannel(); int lenFile = (int)fc.size(); MappedByteBuffer bufferFile

javax.xml.bind's Base64 encoder/decoder eats last two characters of string

泄露秘密 提交于 2019-12-10 17:48:15
问题 I need to convert some strings using Base64 encoding, and was delighted to see that I didn't have to roll my own converter--Java provides one with javax.xml.bind.DataConverter . However, it has some problems. Here's the output of my time with a Jython REPL: >>> import javax.xml.bind.DatatypeConverter as DC >>> import java.lang.String as String >>> def foo(text): ... return DC.printBase64Binary(DC.parseBase64Binary(String(text))) ... >>> foo("hello") 'hell' >>> foo("This, it's a punctuated

Detect (or best guess of) incoming string encoding in Java

不羁岁月 提交于 2019-12-10 17:14:47
问题 I was wondering if there are known methods to detect (or give a best guess of) the encoding of a particular string in Java. I know that you always need some additional meta-data to tell what the encoding is, and there are best practices etc., but the situation I'm in, I need to give the best approximation. A solution -- or a pointer -- to programatically distinguishing between UTF-8 and UTF-16 is also welcome. 回答1: The utf-8 encoding should be easy to verify: UTF-8 strings can be fairly

Javascript html decoding

。_饼干妹妹 提交于 2019-12-10 16:54:52
问题 When I receive html text by ajax in asp.net application it looks like: <span%20style='color:green;font-weight:bold'>%20Text%20Msg</span> how is it possible in javascript decode that text to normal html? <span style='color:green;font-weight:bold'> Text Msg </span> Thanks! 回答1: Nice function here that does it for you - http://phpjs.org/functions/htmlspecialchars_decode:427 回答2: You are probably best suited with finding a server side solution as already mentioned in the comments, since this

Why do Base64.decode produce same byte array for different strings?

别说谁变了你拦得住时间么 提交于 2019-12-10 16:13:17
问题 I'm using URL safe Base64 encoding to encode my randomly generated byte arrays. But I have a problem on decoding. When I decode two different strings (all but the last chars are identical), it produces the same byte array. For example, for both "dGVzdCBzdHJpbmr" and "dGVzdCBzdHJpbmq" strings the result is same: Array(116, 101, 115, 116, 32, 115, 116, 114, 105, 110, 106) For encoding/decoding I use java.util.Base64 in that way: // encoding... Base64.getUrlEncoder().withoutPadding()

Decoding JSON format in Java

耗尽温柔 提交于 2019-12-10 15:38:37
问题 I got a JSON(encoded) format nested Arrays which looks like this; [ [[1234,245,10],[312,234,122],[1234,67788,345],[235,001,332]], [[1234,245,10],[312,234,122],[1234,67788,345],[235,001,332],[1234,67788,3450]], [[1234,245,10],[312,234,122],[1234,67788,345],[235,001,332],[1234,67788,34534]]] So I have one big array which contains three arrays (this can be 2 or more than three arrays sometimes) and each of these three arrays contains some arrays, in this above example. What is the reverse

decode-encode UTF-8 doesn't lead to the original unicode

ε祈祈猫儿з 提交于 2019-12-10 11:07:51
问题 When I am trying to separate two Unicode characters by decoding and encoding them again I do not get the same Unicode in return but I get a different one. Attached are the responses when I try to do so. >>> s ='\xf0\x9f\x93\xb1\xf0\x9f\x9a\xac' >>> u = s.decode("utf-8") >>> u u'\U0001f4f1\U0001f6ac' >>> u[0].encode("utf-8") '\xed\xa0\xbd' >>> u[1].encode("utf-8") '\xed\xb3\xb1' >>> u[0] u'\ud83d' >>> u[1] u'\udcf1' 回答1: Your version of python is using UCS-2 (16 bits per character) but these

Reading double to platform endianness with union and bit shift, is it safe?

末鹿安然 提交于 2019-12-10 10:37:44
问题 All the examples I've seen of reading a double of known endianness from a buffer to the platform endianness involve detecting the current platform's endianess and performing byte-swapping when necessary. On the other hand, I've seen another way of doing the same thing except for integers that uses bit shifting (one such example). This got me thinking that it might be possible to use a union and the bitshift technique to read doubles (and floats) from buffers, and a quick test implementation

How to solve this weird python encoding issue?

Deadly 提交于 2019-12-10 10:32:09
问题 I'm doing some NLP task on a corpus of strings from the web - and as you expect, there are encoding issues. Here're a few examples: they don’t serve sushi : the apostrophe in don't is not standard ' but \xe2\x80\x99 Delicious food – Wow : the hyphen before wow is \xe2\x80\x93 So now, I'm gonna read such lines, pass them to NLTK for parsing, use the parse information to train a CRF model through mallet. Let's begin with the solution I've been seeing everywhere on stack-overflow. Here're a few

How to decode nvarchar to text (SQL Server 2008 R2)?

我是研究僧i 提交于 2019-12-10 04:07:54
问题 I have a SQL Server 2008 R2 table with nvarchar(4000) field. Data that stores this table look like '696D616765206D61726B65643A5472' or '303131' ("011") . I see that each char is encoding to hex. How can I read those data from table? I don't want write decoding function, I mean that simpler way exists. P.S. Sorry for my English. 回答1: SQL Server 2008 actually has a built-in hex-encoding and decoding feature! Sample (note the third parameter with value "1" when converting your string to