Normalize String in ColdFusion

半世苍凉 提交于 2020-08-23 06:18:28

问题


I'm trying to normalize a string in ColdFusion.

I want to use the Java class java.text.Normalizer for this, as CF doesn't have any similar functions as far as I know.

Here's my current code:

<cfset normalizer = createObject( "java", "java.text.Normalizer" ) />
<cfset string = "äéöè" />
<cfset string = normalizer.normalize(string, createObject( "java", "java.text.Normalizer$Form" ).NFD) />
<cfset string = ReReplace(string, "\\p{InCombiningDiacriticalMarks}+", "") />
<cfoutput>#string#</cfoutput>

Any ideas why it always outputs äéöè and not a normalized string?


回答1:


In ColdFusion, unlike in Java, you don't need to escape backslashes in string literals. Your current regex will not match anything that does not start with a backslash, so no replacement happens.

Other than that, your code is perfectly correct and you can see that the length of the string is 8, not 4, at the time of the output. This is an effect of the normalize call.

However, remember that it is still an equivalent representation of the original string, and so it is not surprising that you cannot tell the difference visually. This is correct Unicode rendering in action.




回答2:


Your "\\p" should have simply been "\p". Cf's ReReplace() automatically escapes backslashes for you, so your "\\p" is interpreted java side as "\\\\p"

One liner:

<cfscript>
var k = "mike's café";
k = createObject( 'java', 'java.text.Normalizer' ).normalize( k, createObject( 'java', 'java.text.Normalizer$Form' ).valueOf('NFD') ).replaceAll('\p{InCombiningDiacriticalMarks}+','').replaceAll('[^\p{ASCII}]+','');
// k is now "mike's cafe"
</cfscript>

http://www.cfquickdocs.com/cf9/#rereplace




回答3:


I recommend using a Java library like Junidecode. https://github.com/gcardone/junidecode

It converts UTF8 & UTF16 strings to ASCII7. Examples:

  • äéöè = aeoe
  • mike's café = mike's cafe
  • ℡ = TEL
  • 北亰 = Bei Jing
  • Mr. まさゆき たけだ = Mr. masayuki takeda
  • ⠏⠗⠑⠍⠊⠑⠗ = premier
  • ราชอาณาจักรไทย = raach`aanaacchakraithy
  • Ελληνικά = Ellenika
  • Москвa = Moskva
  • Հայաստան = Hayastan
  • ℰ𝒳𝒜ℳ𝓟ℒℰ = EXAMPLE

I've shared a full ColdFusion-based demo (which requires the Junidecode JAR file): https://dev.to/gamesover/convert-unicode-strings-to-ascii-with-coldfusion-junidecode-lhf

Here's the code function:

<cfscript>
function JUnidecode(inputString){
    var JUnidecodeLib = "";
    var response = "";
    var temp = {};
    temp.encoder = createObject("java", "java.nio.charset.Charset").forName("utf-8").newEncoder();
    temp.isUTF = temp.encoder.canEncode(arguments.inputString);
    if (temp.isUTF){
        /* NFKC: UTF Compatibility Decomposition, followed by Canonical Composition */
        temp.normalizer = createObject( "java", "java.text.Normalizer" );
        temp.normalizerForm = createObject( "java", "java.text.Normalizer$Form" );
        arguments.inputString = temp.normalizer.normalize( javaCast( "string", arguments.inputString ), temp.normalizerForm.NFKC );
    }
    try {
        JUnidecodeLib = createObject("java", "net.gcardone.junidecode.Junidecode");
        response = JUnidecodeLib.unidecode( javacast("string", arguments.inputString) );
    } catch (any e) {
        response = "ERROR: JUnidecode is not installed";
    }
    return trim(Response.replaceAll("\[\?\]", ""));
}
function isDiff(compareArr, val, pos){
    return (pos GT arrayLen(comparearr) OR comparearr[pos] neq val);
}
</cfscript>


来源:https://stackoverflow.com/questions/11626050/normalize-string-in-coldfusion

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!