问题
I'm working on processing Tweets from Twitter and storing them in a database (MySQL).
I have my process running perfectly but sometimes I get an error like this one:
2012-08-31 08:11:23,303 WARN org.hibernate.engine.jdbc.spi.SqlExceptionHelper - SQL Error: 1366, SQLState: HY000
2012-08-31 08:11:23,304 ERROR org.hibernate.engine.jdbc.spi.SqlExceptionHelper - Incorrect string value: '\xF0\x9F\x98\x9D #...' for column 'twe_text' at row 1
When looking for the problematic tweet in my logs I find the following one:
2012-08-31 08:11:22,971 INFO com.myapp.TweetLoaderJob - Text for tweet 241175722096480256: RT @totallytoyosi_: My go
odies, my goodies, not your goodies <U+1F61D> #m&ms #sweeties #goodies #food @ The Ritzy Cinema Café, Brixton htt ...
And, finally, looking what the hell is , I discovered that it is an emoticon that Twitter sends as-is
I have debugged, looking only for this specific tweet and my eclipse seems to not recognize this encoding character. So the question is, how can I handle this exception? I looked for configuring my MySQL database, but I cannot change the encoding (it's a requirement), so my option is to avoid managing this kind of tweets or supress this complicated character.
But how to do it, if Java does not recognize it?
回答1:
You could filter your strings and remove the undesired part (with a simple regexp like <U+[^>]+>
) before storing them in your database.
来源:https://stackoverflow.com/questions/12214163/how-to-avoid-twitter-emoticon-character-while-processing-string-in-java