Xmlstarlet ed encoding and powershell inside Process C#

扶醉桌前 提交于 2019-12-25 08:48:24

问题


I want to use xmlstarlet from the powershell started with Process in a C# application. My main problem is that when I use this code:

./xml.exe ed -N ns=http://www.w3.org/2006/04/ttaf1 -d '//ns:div[not(contains(@xml:lang,''Italian''))]' "C:\Users\1H144708H\Downloads\a.mul.ttml" > "C:\Users\1H144708H\Downloads\a.mul.ttml.conv"

on powershell I get a file with the wrong encoding (I need UTF-8).

On Bash I used to just

export LANG=it_IT.UTF-8 && 

before xmlstarlet but on powershell I really don't know how to do it. Maybe there is an alternative, I saw that xmlstarlet is able to use sel --encoding utf-8 but I don't know how to use it in ed mode (I tried to use it after xml.exe after ed etc... but it always fail).

What is the alternative to export LANG=it_IT.UTF-8 or how to use --encoding utf-8?

PS. I tried many and many things like:

$MyFile = Get-Content "C:\Users\1H144708H\Downloads\a.mul.ttml"; $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False; [System.IO.File]::WriteAllLines("C:\Users\1H144708H\Downloads\a.mul.ttml.conv", $MyFile, $Utf8NoBomEncoding)

And:

./xml.exe ed -N ns=http://www.w3.org/2006/04/ttaf1 -d '//ns:div[not(contains(@xml:lang,''Italian''))]' "C:\Users\1H144708H\Downloads\a.mul.ttml" |  Out-File "C:\Users\1H144708H\Downloads\a.mul.ttml.conv" -Encoding utf8

But characters like è à ì ù are still wrong. If I try to save the original file with Notepad before the conversion it works (only if I don't use xmlstarlet)... but I need to do the same thing in powershell and I don't know how.

EDIT. I was able to print my utf8 on powershell:

Get-Content -Path "C:\Users\1H144708H\Downloads\a.mul.ttml" -Encoding UTF8 

But I'm still not able to do the same thing with xmlstarlet.


回答1:


In the end I decided to create a native C# method and I just used a StreamReader to ReadLine by line the file. With a simple Contains I decide where is the xml:lang="Language" and I then start to add every line to a string. Of course I added the head and the end of my file before the while loop and I stop to add every line when I read a line that Contains . I know that this is not the best way to do things, but it works for my case.



来源:https://stackoverflow.com/questions/46582162/xmlstarlet-ed-encoding-and-powershell-inside-process-c-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!