I need a way to transform numeric HTML entities into their plain-text character equivalent. For example, I would like to turn the entity:
é
Here's another function that will decode all the numeric html character entities in a string. It doesn't rely on xml parsing so it will work on strings that contain unbalanced xml tags. It's not efficient if the string has a large number of entities, but it's pretty good if there are none/few. I have only tested this on Railo, not AdobeCF.
<cffunction name="decodeHtmlEntities" returntype="String" output="false">
<cfargument name="s" type="String"/>
<cfset var LOCAL = {f = ReFind("&##([0-9]+);", ARGUMENTS.s, 1, true), map={}}>
<cfloop condition="LOCAL.f.pos[1] GT 0">
<cfset LOCAL.map[mid(ARGUMENTS.s, LOCAL.f.pos[1], LOCAL.f.len[1])] = chr(mid(ARGUMENTS.s, LOCAL.f.pos[2], LOCAL.f.len[2]))>
<cfset LOCAL.f = ReFind("&##([0-9]+);", ARGUMENTS.s, LOCAL.f.pos[1]+LOCAL.f.len[1], true)>
</cfloop>
<cfloop collection=#LOCAL.map# item="LOCAL.key">
<cfset ARGUMENTS.s = Replace(ARGUMENTS.s, LOCAL.key, LOCAL.map[LOCAL.key], "all")>
</cfloop>
<cfreturn ARGUMENTS.s />
</cffunction>
It should be quite easy to code one up yourself. Just edit the HtmlUNEditFormat() func you found, to include them to the end of the lEntities & lEntitiesChars.
I found this question while working with a method that, by black-box principle, can't trust that an incoming string is either HTML entity encoded or that it is not.
I've adapted Peter Boughton's function so that it can be used safely on strings that haven't already been treated with HTML entities. (The only time this seems to matter is when loose ampersands - i.e. "Cats & Dogs" - are present in the target string.) This modified version will also fail somewhat gracefully on any unforseen XML parse error.
<cffunction name="decodeHtmlEntity" returntype="string" output="false">
<cfargument name="str" type="string" hint="&##<number>; or &<name>;" />
<cfset var XML = '<xml>#arguments.str#</xml>' />
<cfset var XMLDoc = '' />
<!--- ampersands that aren't pre-encoded as entities cause errors --->
<cfset XML = REReplace(XML, '&(?!(\##\d{1,3}|\w+);)', '&', 'all') />
<cftry>
<cfset XMLDoc = XmlParse(XML) />
<cfreturn XMLDoc.XMLRoot.XMLText />
<cfcatch>
<cfreturn arguments.str />
</cfcatch>
</cftry>
</cffunction>
This would support the following use case safely:
<cffunction name="notifySomeoneWhoCares" access="private" returntype="void">
<cfargument name="str" type="string" required="true"
hint="String of unknown preprocessing" />
<cfmail from="process@domain.com" to="someoneWhoCares@domain.com"
subject="Comments from Web User" format="html">
Some Web User Spoke Thus:<br />
<cfoutput>#HTMLEditFormat(decodeHTMLEntity(arguments.str))#</cfoutput>
</cfmail>
</cffunction>
This function is now incredibly useful for ensuring web-submitted content is entity-safe (think XSS) before it's sent out by email or submitted into a database table.
Hope this helps.
Thanks to Todd Sharp for pointing out a very simple way to do this, using the Apache Commons StringEscapeUtils library, which is packaged with CF (and Railo), so you can just do:
<cfset Entity = "&##0233;" />
<cfset StrEscUtils = createObject("java", "org.apache.commons.lang.StringEscapeUtils") />
<cfset Character = StrEscUtils.unescapeHTML(Entity) />
That linked function is icky - there's no need to name them explicitly, and as you say it doesn't do numerics.
Much simpler is to let CF do the work for you - using the XmlParse
function:
<cffunction name="decodeHtmlEntity" returntype="String" output="false">
<cfargument name="Entity" type="String" hint="&##<number>; or &<name>;" />
<cfreturn XmlParse('<xml>#Arguments.Entity#</xml>').XmlRoot.XmlText />
</cffunction>
That one works with Railo, I can't remember if CF supports that syntax yet though, so you might need to change it to:
<cffunction name="decodeHtmlEntity" returntype="String" output="false">
<cfargument name="Entity" type="String" hint="&##<number>; or &<name>;" />
<cfset var XmlDoc = XmlParse('<xml>#Arguments.Entity#</xml>') />
<cfreturn XmlDoc.XmlRoot.XmlText />
</cffunction>