问题
I want to use xmlstarlet from the powershell started with Process in a C# application. My main problem is that when I use this code:
./xml.exe ed -N ns=http://www.w3.org/2006/04/ttaf1 -d '//ns:div[not(contains(@xml:lang,''Italian''))]' "C:\Users\1H144708H\Downloads\a.mul.ttml" > "C:\Users\1H144708H\Downloads\a.mul.ttml.conv"
on powershell I get a file with the wrong encoding (I need UTF-8).
On Bash I used to just
export LANG=it_IT.UTF-8 &&
before xmlstarlet but on powershell I really don't know how to do it. Maybe there is an alternative, I saw that xmlstarlet is able to use sel --encoding utf-8 but I don't know how to use it in ed mode (I tried to use it after xml.exe after ed etc... but it always fail).
What is the alternative to export LANG=it_IT.UTF-8 or how to use --encoding utf-8?
PS. I tried many and many things like:
$MyFile = Get-Content "C:\Users\1H144708H\Downloads\a.mul.ttml"; $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False; [System.IO.File]::WriteAllLines("C:\Users\1H144708H\Downloads\a.mul.ttml.conv", $MyFile, $Utf8NoBomEncoding)
And:
./xml.exe ed -N ns=http://www.w3.org/2006/04/ttaf1 -d '//ns:div[not(contains(@xml:lang,''Italian''))]' "C:\Users\1H144708H\Downloads\a.mul.ttml" | Out-File "C:\Users\1H144708H\Downloads\a.mul.ttml.conv" -Encoding utf8
But characters like è à ì ù are still wrong. If I try to save the original file with Notepad before the conversion it works (only if I don't use xmlstarlet)... but I need to do the same thing in powershell and I don't know how.
EDIT. I was able to print my utf8 on powershell:
Get-Content -Path "C:\Users\1H144708H\Downloads\a.mul.ttml" -Encoding UTF8
But I'm still not able to do the same thing with xmlstarlet.
回答1:
In the end I decided to create a native C# method and I just used a StreamReader to ReadLine by line the file. With a simple Contains I decide where is the xml:lang="Language" and I then start to add every line to a string. Of course I added the head and the end of my file before the while loop and I stop to add every line when I read a line that Contains . I know that this is not the best way to do things, but it works for my case.
来源:https://stackoverflow.com/questions/46582162/xmlstarlet-ed-encoding-and-powershell-inside-process-c-sharp