问题
I have a file (1.8 Mb) that has 1 single (very long) row of text. The values on that row are generally separated by 13 blank spaces. What I am trying to do is to replace these 13 blank spaces with a pipe | delimiter so that I can process this text file using SSIS.
So far, I have had no success in programmatically processing this file using a batch file.
I have tried using the below code that I got from another SO post.
@echo off
REM create empty file:
break>R1.txt
setlocal enabledelayedexpansion
REM prevent empty lines by adding line numbers (find /v /n "")
REM parse the file, taking the second token (*, %%b) with delimiters
REM ] (to eliminate line numbers) and space (to eliminate leading spaces)
for /f "tokens=1,* delims=] " %%a in ('find /v /n "" ^<PXZP_SND_XZ01_GFT10553.dat') do (
call :sub1 "%%b"
REM write the string without quotes:
REM removing the qoutes from the string would make the special chars poisonous again
>>PXZP_SND_XZ01_GFT10553.dat echo(!s:"=!
)
REM Show the written file:
type PXZP_SND_XZ01_GFT10553.dat
goto :eof
:sub1
set S=%*
REM do 13 times (adapt to your Needs):
for /l %%i in (1,1,13) do (
REM replace "space qoute" with "quote" (= removing the last space
set S=!S: "=|!
)
goto :eof
Can someone help me here? Example of my text file:
96859471/971 AAAA HAWAII 96860471/971 BBBB HAWAII 96861471/971 CCCC HAWAII 96863471/971 DDDD HAWAII
回答1:
Use appropiate tools.
Set Inp = wscript.Stdin
Set Outp = wscript.Stdout
Outp.Write Replace(Inp.ReadAll, " ", "|")
To use
cscript //nologo "C:\Replace13Spaces.vbs" < "c:\folder\inputfile.txt" > "C:\Folder\Outputfile.txt"
Using Regular expressions to replace 2 or more spaces with a bar.
Set Inp = wscript.Stdin
Set Outp = wscript.Stdout
Set regEx = New RegExp
regEx.Pattern = "\s{2,}"
regEx.IgnoreCase = True
regEx.Global = True
Outp.Write regEx.Replace(Inp.ReadAll, "|")
There are two other ways to handle this.
Like the first way is to
replace
multiple times from the longest to shortest number of predifined spaces. IE 13, 10, 8 or 5 spaces.Split
the sting on 2 spaces.Filter
the array to exclude blank array elements. ThenJoin
the array with|
as the delimiter.
回答2:
The for /F loop cannot handle lines longer than about 8190 characters. However, there is a way to read files with longer lines: using set /P in a loop, together with input redirection <; set /P
reads at most 1023 characters, unless a line-break or the end of the file is encountered; executing it multiple times for the same open (input-redirected) file handle allows to read very long lines in portions of 1023 characters since set /P
does not reset the file pointer.
Another challenge is to return (echo) very long lines, which is not possible with the echo command again because of the line limitation of about 8190 characters (which applies to command lines and variable contents). Also here block-wise processing helps: firstly, get an end-of-file character (EOF, ASCII 0x1A); then take a text/string portion, append an EOF and write the result to a temporary file using echo
(which appends a line-break), together with output redirection >; next copy the file onto itself using copy, but read it in ASCII text mode to discard the EOF and everything after (hence the line-break previously appended by echo
) and write it in binary mode to get an exact copy of the resulting data; lastly, type out the file content using type.
The following script makes use of these techniques (see all the explanatory rem
remarks in the code):
@echo off
setlocal EnableExtensions DisableDelayedexpansion
rem // Define constants here:
set "_INPUT=.\PXZP_SND_XZ01_GFT10553.dat" & rem // (this is the input file)
set "_OUTPUT=.\R1.txt" & rem // (set to `con` to display the result on the console)
set "_TEMPF=%TEMP%\%~n0_%RANDOM%.tmp" & rem // (specifies a temporary file)
set "_SEARCH= " & rem // (this is the string to be found)
set "_REPLAC=|" & rem // (this is the replacement string)
set "_LTRIM=#" & rem // (set to something to left-trim sub-strings)
(set _LF=^
%= blank line =%
) & rem // (this block stores a new-line character in a variable)
rem // This stores an end-of-file character in a variable:
for /F %%E in ('forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo 0x1A"') do set "_EOF=%%E"
rem /* The input file is going to be processed in a sub-routine,
rem which accesses the file content via input redirection `<`: */
< "%_INPUT%" > "%_OUTPUT%" call :PROCESS
endlocal
exit /B
:PROCESS
rem // Reset variables that store a partial string to be processed and a separator:
set "PART=" & set "SEP="
setlocal EnableDelayedExpansion
:READ
rem /* At this point 1023 characters are read from the input file at most, until
rem a line-break or the end of the file is encountered:*/
set "NEW=" & set /P NEW=""
rem // The read characters are appended to a string buffer that will be processed:
set "PART=!PART!!NEW!"
rem /* Skip processing when the string buffer is empty, which is the case when the end
rem of the file has already been reached: */
:LOOP
if defined PART (
rem /* Make the search string accessible as a `for` meta-variable reference in
rem to not have to use normal (immediate) `%`-expansion, which could cause
rem trouble with some special characters under some circumstances: */
for /F delims^=^ eol^= %%K in ("!_SEARCH!") do (
rem /* Try to split the string buffer at the first search string and store the
rem portion at the right, using sub-string substitution: */
set "RIGHT=!PART:*%%K=!"
rem /* Check whether the split was successful, hence whether a search string
rem even occurred in the string buffer; if not, jump back and read more
rem characters; otherwise (when the end of the file was reached) clear the
rem right portion and continue processing: */
if "!RIGHT!"=="!PART!" if not defined NEW (set "RIGHT=") else goto :READ
rem /* Clear the variable that will receive the portion left to the first
rem occurrence of the search string in the string buffer; then replace each
rem occurrence in the string buffer by a new-line character: */
set "LEFT=" & set ^"PART=!PART:%%K=^%_LF%%_LF%!^"
rem /* Iterate over all lines of the altered string buffer, which is now a
rem multi-line string, then get the first line, which constitutes the
rem portion at the left of the first search string; the (first) line is
rem preceded by an `_` just for it not to appear blank, because `for /F`
rem skips over empty lines; this character is removed later: */
for /F delims^=^ eol^= %%L in (^"_!PART!^") do (
rem // Execute the loop body only for the first iteration:
if not defined LEFT (
rem /* Store the (augmented) left portion with delayed expansion
rem disabled in order not to get trouble with `!` in the string: */
setlocal DisableDelayedExpansion & set "LEFT=%%L"
rem // Enable delayed expansion to be able to safely echo the string:
setlocal EnableDelayedExpansion
rem /* Write to a temporary file the output string, which consists of
rem a replacement string (except for the very first time), the left
rem portion with the preceding `_` removed and an end-of-file
rem character; a line-break is automatically appended by `echo`: */
> "!_TEMPF!" echo(!SEP!!LEFT:~1!%_EOF%
rem /* Copy the temporary file onto itself, but remove the end-of-file
rem character and everything after, then type the file content;
rem this is a safe way of echoing a string without a line-break: */
> nul copy /Y /A "!_TEMPF!" + nul "!_TEMPF!" /B & type "!_TEMPF!"
rem /* Restore the environment present at the beginning of the loop
rem body, then ensure the left portion not to appear empty: */
endlocal & endlocal & set "LEFT=_"
)
)
rem // If specified, left-trim the right portion, so remove leading spaces:
if defined _LTRIM (
for /F "tokens=* eol= delims= " %%T in ("!RIGHT!_") do (
for /F delims^=^ eol^= %%S in (^""!NEW!"^") do (
endlocal & set "NEW=%%~S" & set "RIGHT=%%T"
)
setlocal EnableDelayedExpansion & set "RIGHT=!RIGHT:~,-1!"
)
)
rem // Set the replacement string now to skip it only for the first output:
set "SEP=!_REPLAC!"
rem /* Move the right portion into the string buffer; if there is still some
rem amount of text left, jump back to find more occurrences of the search
rem string; if not, jump back and read more characters, unless the end of
rem the file has already been reached: */
set "PART=!RIGHT!" & if defined PART (
if defined NEW if "!PART:~1024!"=="" goto :READ
goto :LOOP
) else if defined NEW goto :READ
)
)
endlocal
rem // Clean up the temporary file:
del "%_TEMPF%"
exit /B
The following restrictions exist:
- the string portions between two consecutive search strings (= 5 × SPACE in the above approach) must be shorter than about 8190 characters;
- the search string must not be empty, must not begin with
!
,*
,~
and must not contain=
; - the replace string must not contain
!
;
来源:https://stackoverflow.com/questions/55503031/need-to-replace-13-blank-spaces-from-1-very-long-line-of-text-file