DOS Batch : dealing with double quotes from XML files

你说的曾经没有我的故事 提交于 2019-12-13 04:44:35

问题


I have written the code below to read XML files (file_1.xml and file_2.xml) and to extract the string between tags and to write it down into a TXT file. The issue is that some strings include double quotation marks and the program then takes these characters as being proper instructions (not part of the strings)...

Content of file_1.xml :

<AAA>C086002-T1111</AAA>
<AAA>C086002-T1222 </AAA>
<AAA>C086002-TR333 "</AAA>
<AAA>C086002-T5444  </AAA>

Content of file_2.xml :

<AAA>C086002-T5555 </AAA>
<AAA>C086002-T1666</AAA>
<AAA>C086002-T1777 "</AAA>
<AAA>C086002-T1888          "</AAA>

My code :

@echo off

setlocal enabledelayedexpansion

for /f "delims=;" %%f in ('dir /b D:\depart\*.xml') do (

    for /f "usebackq delims=;" %%z in ("D:\depart\%%f") do (

        (for /f "delims=<AAA></AAA> tokens=2" %%a in ('echo "%%z" ^| Findstr /r "<AAA>"') do (

            set code=%%a
            set code=!code:""=!
            set code=!code: =!
            echo !code!

        )) >> result.txt
    )
)

I get this in result.txt :

C086002-T1111
C086002-T1222
C086002-T5444
C086002-T5555
C086002-T1666

In fact, 3 out of the 8 lines are missing. These lines include double quotation marks or follow lines that include double quotation marks...

How can I deal with these characters and consider them as parts of the strings ?


回答1:


Please note - parsing XML with batch is a risky business because XML generally ignores white space. Any script you write could probably be broken by simply reformatting the XML into another equivalent valid form. That being said...

I haven't traced the problem through to fully explain your observed behavior, but the unbalanced quote is causing a problem with this line:

(for /f "delims=<AAA></AAA> tokens=2" %%a in ('echo "%%z" ^| Findstr /r "<AAA>"') do (

You can eliminate that problem and get your code to sort of work by eliminating any quotes before-hand.

@echo off

setlocal enabledelayedexpansion
del result.txt
for /f "delims=;" %%f in ('dir /b D:\depart\*.xml') do (
  for /f "usebackq delims=;" %%z in ("D:\depart\%%f") do (
    set code=%%z
    set code=!code:"=!
    set code=!code: =!
    (for /f "delims=<AAA></AAA> tokens=2" %%a in ('echo "!code!" ^| Findstr /r "<AAA>"') do (
      echo %%a
    )) >> result.txt
  )
)

But you have a potential major problem. DELIMS does not specify a string - it specifies a list of characters. So your DELIMS=<AAA></AAA> is equivalent to DELIMS=<>/A. If your element value ever has an A or / in it, then your code will fail.

There is a much better way:

First off, you can use FINDSTR to collect all your <AAA>----</AAA> lines from all files in one pass, without any loop:

findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"

Each matching line will be output as the file path, followed by a colon, followed by the matching line, as in:

D:\depart\file_1.xml:<AAA>C086002-T1111</AAA>

The file path can never contain <, or >, so you can use the following to iterate the result, capturing the appropriate token:

for /f "delims=<> tokens=3" %%A in ( ...

Finally, you can put parentheses around the entire loop, and redirect just once. I'm assuming you want each run to create a new file, so I use > instead of >>.

@echo off
setlocal enabledelayedexpansion
>result.txt (
  for /f "delims=<> tokens=3" %%A in (
    'findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"''
  ) do (
    set code=%%A
    set code=!code:"=!
    set code=!code: =!
    echo(!code!
)

Assuming that you only need to trim leading or trailing spaces/quotes, then the solution is even simpler. It does require odd syntax to specify a quote as a DELIM character. Note that there are two spaces between the last ^ and %%B. The first escaped space is taken as a DELIM character. The unescaped space terminates the FOR /F options string.

@echo off
>result.txt (
  for /f "delims=<> tokens=3" %%A in (
    'findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"'
  ) do for /f delims^=^"^  %%B in ("%%A") do echo(%%B
)

UPDATE in response to comment

I'm assuming your data value will never contain a colon.

If you want to append source file name to each line of output, then you simply need to alter the first FOR /F to capture the first token (the source file) as well as the third token (the data value). The file will contain the full path as well as a trailing colon. The second FOR /F appends the file to the source data string using the ~nx modifier to get just the name and extension (no drive or path), and a colon is added to the DELIMS option so the trailing colon is trimmed off.

@echo off
>result.txt (
  for /f "delims=<> tokens=1,3" %%A in (
    'findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"'
  ) do for /f delims^=:^"^  %%C in ("%%B;%%~nxA") do echo %%C
)



回答2:


If I keep @dbenham suggestion and I complete it in order to echo the filename :

@echo off
>result.txt (
    for /f %%f in ("D:\depart\*.xml") do (
        for /f "delims=<> tokens=3" %%A in ('findstr /r "<AAA>.*</AAA>" "D:\depart\*.xml"') do (
             for /f delims^=^"^  %%B in ("%%A") do (
               echo %%B;%%f
             )
         )
     )
 )

Thanks for your opinion on this code !



来源:https://stackoverflow.com/questions/26715603/dos-batch-dealing-with-double-quotes-from-xml-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!