How to capture Hebrew with regex in Java?

岁酱吖の 提交于 2019-12-12 11:05:19

问题


I'm trying to catch a section of Hebrew text (the origin is comments on a news site) using the following regex:

[\u0590-\u05FF \\p{Graph} \\s]+

It works for most comments but some comments are missed.

I've tried to debug this and it seems there's a Hebrew letter that doesn't match the pattern.

When I extract this letter and print it's integer value it seems to be correct but still the regex doesn't catch it...

Ideas?


回答1:


It would be more sematically correct to use \p{InHebrew} instead of \u0590-\u05FF

Also you need to match punctuation, digits (at least, world-common ones) and different kind of spaces. I don't know what is \p{Graph} and are there any Hebrew-specific punctuation symbols, but it seemed, you missed some parts.



来源:https://stackoverflow.com/questions/8987119/how-to-capture-hebrew-with-regex-in-java

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!