问题
I'm trying to normalize a string in ColdFusion.
I want to use the Java class java.text.Normalizer
for this, as CF doesn't have any similar functions as far as I know.
Here's my current code:
<cfset normalizer = createObject( "java", "java.text.Normalizer" ) />
<cfset string = "äéöè" />
<cfset string = normalizer.normalize(string, createObject( "java", "java.text.Normalizer$Form" ).NFD) />
<cfset string = ReReplace(string, "\\p{InCombiningDiacriticalMarks}+", "") />
<cfoutput>#string#</cfoutput>
Any ideas why it always outputs äéöè
and not a normalized string?
回答1:
In ColdFusion, unlike in Java, you don't need to escape backslashes in string literals. Your current regex will not match anything that does not start with a backslash, so no replacement happens.
Other than that, your code is perfectly correct and you can see that the length of the string is 8, not 4, at the time of the output. This is an effect of the normalize
call.
However, remember that it is still an equivalent representation of the original string, and so it is not surprising that you cannot tell the difference visually. This is correct Unicode rendering in action.
回答2:
Your "\\p"
should have simply been "\p"
. Cf's ReReplace() automatically escapes backslashes for you, so your "\\p"
is interpreted java side as "\\\\p"
One liner:
<cfscript>
var k = "mike's café";
k = createObject( 'java', 'java.text.Normalizer' ).normalize( k, createObject( 'java', 'java.text.Normalizer$Form' ).valueOf('NFD') ).replaceAll('\p{InCombiningDiacriticalMarks}+','').replaceAll('[^\p{ASCII}]+','');
// k is now "mike's cafe"
</cfscript>
http://www.cfquickdocs.com/cf9/#rereplace
回答3:
I recommend using a Java library like Junidecode. https://github.com/gcardone/junidecode
It converts UTF8 & UTF16 strings to ASCII7. Examples:
- äéöè = aeoe
- mike's café = mike's cafe
- ℡ = TEL
- 北亰 = Bei Jing
- Mr. まさゆき たけだ = Mr. masayuki takeda
- ⠏⠗⠑⠍⠊⠑⠗ = premier
- ราชอาณาจักรไทย = raach`aanaacchakraithy
- Ελληνικά = Ellenika
- Москвa = Moskva
- Հայաստան = Hayastan
- ℰ𝒳𝒜ℳ𝓟ℒℰ = EXAMPLE
I've shared a full ColdFusion-based demo (which requires the Junidecode JAR file): https://dev.to/gamesover/convert-unicode-strings-to-ascii-with-coldfusion-junidecode-lhf
Here's the code function:
<cfscript>
function JUnidecode(inputString){
var JUnidecodeLib = "";
var response = "";
var temp = {};
temp.encoder = createObject("java", "java.nio.charset.Charset").forName("utf-8").newEncoder();
temp.isUTF = temp.encoder.canEncode(arguments.inputString);
if (temp.isUTF){
/* NFKC: UTF Compatibility Decomposition, followed by Canonical Composition */
temp.normalizer = createObject( "java", "java.text.Normalizer" );
temp.normalizerForm = createObject( "java", "java.text.Normalizer$Form" );
arguments.inputString = temp.normalizer.normalize( javaCast( "string", arguments.inputString ), temp.normalizerForm.NFKC );
}
try {
JUnidecodeLib = createObject("java", "net.gcardone.junidecode.Junidecode");
response = JUnidecodeLib.unidecode( javacast("string", arguments.inputString) );
} catch (any e) {
response = "ERROR: JUnidecode is not installed";
}
return trim(Response.replaceAll("\[\?\]", ""));
}
function isDiff(compareArr, val, pos){
return (pos GT arrayLen(comparearr) OR comparearr[pos] neq val);
}
</cfscript>
来源:https://stackoverflow.com/questions/11626050/normalize-string-in-coldfusion