问题
My content contains multiple BOM (EF BB BF) characters and I want to remove them. The characters are in the middle of strings I want to simply remove them all.
The data comes from a JavaScript source, which I get from a CKEditor instance. Then I POST the variable and read it as string on my backend and the BOMS are there. For now, they are persisted as is, but this results in errors in post-processing when the characters are interpreted and start showing up mid-content. I suspect they come from something that was copypasted into my CKEditor.
I can step through the string char by char, but I don't know how to compare against the BOM. Would it somehow be possible to compare the hex values of the string bytes and compare three byte sequences?
回答1:
The utf-8 BOM bytes get translated to \ufeff
. Unicode character "Zero width no-break space", can't see them, can't hear them. Filter them out with:
var good = bad.Replace("\ufeff", "");
回答2:
Try the following:
CleanString = DirtyString.Replace("\u00EF\u00BB\u00BF", null);
来源:https://stackoverflow.com/questions/13024978/removing-bom-characters-from-ajax-posted-string