I am trying to extract text from pdf and write it into a json file. While extracting unicode characters the Json converts all & to \\u0026. For example my actual String is <
That's actually a valid (but not required) encoding. Any character may be encoded using the unicode escape in JSON and any valid JSON parsing library must be able to interpret those escapes.
&
is not part of the characters that need encoding (see the definition of string
at json.org), but there are a few JSON libraries that are quite "aggressive" in their encoding. That's not usually a problem, unless you don't really handle the resulting JSON with a conforming JSON parser.
GsonBuilder.disableHtmlEscaping() will help you turn that feature off if you absolutely need to.