utf-8 | 易学教程

convert ucs-2 to utf-8 in visual basic 2010

阅读更多关于 convert ucs-2 to utf-8 in visual basic 2010

问题 Hello I used visual baisc 2010 and usb modem to sent at commands " ussd " by SerialPort "AT+CUSD=1" my problem when recive result get ucs-2 like this +CUSD: 0,"00430075007200720065006E007400540069006D0065002000690073003A002000320031002D004A0055004C002D0032003000310038002000310036003A00320036",72 how i can convert to utf-8 回答1: It looks like that string, because of its composition, is in BigEndianUnicode format. This encoding format is available from .Net FW 3.5+ / VS 2008. The .Net version in

Beautiful Soup default decode charset?

阅读更多关于 Beautiful Soup default decode charset?

问题 I have a huge set of web pages with different encodings, and I try to parse it using Beautiful Soup. As I have noticed, BS detects encoding using meta-charset or xml-encoding tags. But there are documents with no such tags or typos in charset name - and BS fails on all of them. I suppose it's default guess is utf-8, which is wrong. Luckily, all such pages (or nearly all of them) have the same encoding. Is there any way to set it as default? I've also tried to grep charset and use iconv to

UTF-8 Character Count

阅读更多关于 UTF-8 Character Count

问题 I'm programming something that counts the number of UTF-8 characters in a file. I've already written the base code but now, I'm stuck in the part where the characters are supposed to be counted. So far, these are what I have: What's inside the text file: 黄埔炒蛋你好こんにちは 여보세요 What I've coded so far: #include <stdio.h> typedef unsigned char BYTE; int main(int argc, char const *argv[]) { FILE *file = fopen("file.txt", "r"); if (!file) { printf("Could not open file.\n"); return 1; } int count = 0;

UTF-8 Character Count

阅读更多关于 UTF-8 Character Count

jmeter Invalid UTF-8 middle byte

阅读更多关于 jmeter Invalid UTF-8 middle byte

问题 I'm using jMeter to shoot json through post requests to my test server. the following request always fail: { "location": { "latitude": "37.390737", "longitude": "-121.973864" }, "category": "Café & Bakeries" } the error message in the response data is: Invalid UTF-8 middle byte 0x20 at [Source: org.apache.catalina.connector.CoyoteInputStream@6073ddf0; line: 6, column: 20] the request is not sent to the server at all. other requests (e.g. replacing the value in category with other valid

Simplest way to get rid of zero-width-space in c# string

阅读更多关于 Simplest way to get rid of zero-width-space in c# string

问题 I am parsing emails using a regex in a c# VSTO project. Once in a while, the regex does not seem to work (although if I paste the text and regex in regexbuddy, the regex correctly matches the text). If I look at the email in gmail, I see =E2=80=8B at the beginning and end of some lines (which I understand is the UTF8 zero width space); this appears to be what is messing up the regex. This seems to be only sequence showing up. What is the easiest way to get rid of this exact sequence? I cannot

Eclipse console not printing Chinese characters

阅读更多关于 Eclipse console not printing Chinese characters

问题 I have written a Java function which take a string parameter and generate a random id from it using some logic. Everything is working fine if my String contains English characters but when I pass Chinese characters, these are replaced by ??? Here is my code: public static String generateId(String inputString) { /** * Split input string on the basis of white spaces */ String arr[] = inputString.split(" "); /** * Change the first character of first substring to lowercase */ String id = arr[0]

Eclipse console not printing Chinese characters

阅读更多关于 Eclipse console not printing Chinese characters

C# partial UTF-8 byte stream conversion

阅读更多关于 C# partial UTF-8 byte stream conversion

问题 I have wrote the following simple test: [Test] public void TestUTF8() { var c = "abc☰def"; var b = Encoding.UTF8.GetBytes(c); Assert.That(b.Length, Is.EqualTo(9)); //Assuming, you are reading a byte stream and got partial result with the first 5 bytes var p = Encoding.UTF8.GetChars(b, 0, 5); Trace.WriteLine(new string(p)); Assert.That(p.Length, Is.EqualTo(3)); } The Trace outputs abc� and the last assert fails because p.Length is 4 . However, I wanted Trace outputs abc and the last assert

How to get the value of UTF-8 character

阅读更多关于 How to get the value of UTF-8 character

问题 I have an utf-8 character in chinese or arabic language. I need to get the value of that UTF-8 character, like getting a value of ASCII character. I need to implement it in "C". Can you please provide your suggestions? For example: char array[3] = "ab"; int v1,v2; v1 = array[0]; v2 = array[1]; In the above code I will get corresponding ASCII values in v1 and v2. In the same way for UF8 string I need to get the value for each character in a string. 回答1: Only the C11 standard version of the C