问题
Using XMLStarlet (windows) to edit an RSS feed, but got a few issues with norwegian characters 'ÆØÅ'.
I'm using an example I found at this site ( https://stackoverflow.com/a/14397390/3168446 )
This is my feed.xml. (Notepad++ says it's encoded in UTF-8)
<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>My RSS Feed</title>
<description>This is my RSS Feed</description>
</channel>
</rss>
I'm not using the following example as it's for a linux script, but my long command line below does the same-ish..
#!/bin/sh
TITLE="Test title ÆØÅ"
LINK="http://www.example.com"
DATE="`Sat, 26 Jul 2014 01:14:30 +0200`"
xmlstarlet ed -L -a "//channel" -t elem -n item -v "" \
-s "//item[1]" -t elem -n title -v "$TITLE" \
-s "//item[1]" -t elem -n link -v "$LINK" \
-s "//item[1]" -t elem -n pubDate -v "$DATE" \
-d "//item[position()>10]" feed.xml ;
Windows command line (what I'm using):
xml.exe ed -L -a "//channel" -t elem -n item -v "" -s "//item[1]" -t elem -n title -v "Test title ÆØÅ" -s "//item[1]" -t elem -n link -v "http://www.example.com" -s "//item[1]" -t elem -n pubDate -v "Sat, 26 Jul 2014 01:14:30 +0200" -d "//item[position()>10]" feed.xml
'ÆØÅ' is giving me issues when I add the second item containing 'ÆØÅ', well, actually the first item gives me problems, but doesn't produce an error message until second item is added:
feed.xml:8.23: Input is not proper UTF-8, indicate encoding !
Bytes: 0xC6 0xD8 0xC5 0x3C: Bytes: 0xC6 0xD8 0xC5 0x3C
<title>Test title ãÏ┼</title>
Anyone got any tips? I guess it's an encoding issue, but I don't understand why because feed.xml is UTF-8 and encoding is set to utf-8 in the feed.
回答1:
I can confirm this problem is solved in XMLStarlet version 1.6.1+ win32 build!
来源:https://stackoverflow.com/questions/24984162/xmlstarlet-utf-8-nordic-characters