I am using Html.fromHtml(STRING).toString() to convert a string that may or may not have html and/or html entities in it, to a plain text string.
This is pretty slow
Although I have not tried them yet, I found some possible solutions:
I hope it helps.
What about org.apache.commons.lang.StringEscapeUtils's unescapeHtml(). The library is available on Apache site.
(EDIT: June 2019 - See the comments below for updates about the library)
This is an incredibly fast and simple option: Unbescape
It greatly improved our parsing performance which requires every string to be run through a decoder.
Have you looked at Strip HTML from Text JavaScript
With a large batch of these it can add over a minute
Any parsing will take some time. 22ms seems to me like fast. Anyway, can you do it in background? Can help you some kind of caching?
fromHtml()
does not have a high-performance HTML parser, and I have no idea how quick the toString()
implementation on SpannedString
is. I doubt either were designed for your scenario.
Ideally, the strings are clean before they get to a low-power phone. Either clean them up in the build process (for resources/assets), or clean them up on a server (before you download them).
If, for whatever reason, you absolutely need to clean them up on the device, you can perhaps use the NDK to create a C/C++ library that does the cleaning for you faster.