Remove trailing spaces from a file using Windows batch?

前端 未结 7 1255
借酒劲吻你
借酒劲吻你 2020-12-02 02:33

How could I trim all trailing spaces from a text file using the Windows command prompt?

相关标签:
7条回答
  • 2020-12-02 02:54

    Go get yourself a copy of CygWin or the sed package from GnuWin32.

    Then use that with the command:

    sed "s/ *$//" inputFile >outputFile
    
    0 讨论(0)
  • 2020-12-02 02:56

    There is a nice trick to remove trailing spaces based on this answer of user Aacini; I modified it so that all other spaces occurring in the string are preserved. So here is the code:

    @echo off
    setlocal EnableDelayedExpansion
    
    rem // This is the input string:
    set "x=  This is   a text  string     containing  many   spaces.   "
    
    rem // Ensure there is at least one trailing space; then initialise auxiliary variables:
    set "y=%x% " & set "wd=" & set "sp="
    
    rem // Now here is the algorithm:
    set "y=%y: =" & (if defined wd (set "y=!y!!sp!!wd!" & set "sp= ") else (set "sp=!sp! ")) & set "wd=%"
    
    rem // Return messages:
    echo  input: "%x%"
    echo output: "%y%"
    
    endlocal
    

    However, this approach fails when a character of the set ^, !, " occurs in the string.

    0 讨论(0)
  • 2020-12-02 02:56

    Good tool for removing trailing spaces in files in windows: http://mountwhite.net/en/spaces.html

    0 讨论(0)
  • 2020-12-02 03:08

    I just found a very nice solution for trimming off white-spaces of a string:
    Have you ever called a sub-routine using call and expanded all arguments using %*? You will notice that any leading and/or trailing white-spaces are removed. Any white-spaces occurring in between other characters are preserved; so are all the other command token separators ,, ;, = and also the non-break space (character code 0xFF). This effect I am going to utilise for my script:

    @echo off
    
    set "STR="
    set /P STR="Enter string: "
    
    rem /* Enable Delayed Expansion to avoid trouble with
    rem    special characters: `&`, `<`, `>`, `|`, `^` */
    setlocal EnableDelayedExpansion
    echo You entered: `!STR!`
    call :TRIM !STR!
    echo And trimmed: `!RES!`
    endlocal
    
    exit /B
    
    :TRIM
    set "RES=%*"
    exit /B
    

    This script expects a string entered by the user which is then trimmed. This can of course also be applied on lines of a file (which the original question is about, but reading such line by line using for /F is shown in other answers anyway, so I skip this herein). To trim the string on one side only, add a single character to the opposite side prior to trimming and remove it afterwards.

    This approach has got some limitations though: it does not handle characters %, !, ^ and " properly. To overcome this, several intermediate string manipulation operations become required:

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    
    set "STR="
    set /P STR="Enter string: "
    
    setlocal EnableDelayedExpansion
    echo You entered: `!STR!`
    set "STR=!STR:%%=%%%%!"
    set "STR=!STR:"=""!^"
    if not "%STR%"=="%STR:!=%" set "STR=!STR:^=^^^^!"
    set "STR=%STR:!=^^^!%"
    call :TRIM !STR!
    set "RES=!RES:""="!^"
    echo And trimmed: `!RES!`
    endlocal
    
    endlocal
    exit /B
    
    :TRIM
    set "RES=%*"
    exit /B
    

    Update

    Both of the above scripts cannot handle the characters &, <, > and |, because call seems to become aborted as soon as such a character appears in an unquoted and unescaped manner.

    However, I finally found a way to fix that and come up with an approach that can successfully deal with all characters (except perhaps some control characters, which I did not test):

    @echo off
    setlocal EnableExtensions EnableDelayedExpansion
    
    rem // The last white-space in `STRING` is a tabulator:
    set "RESULT=" & set "STRING=   (<&>"^|)^^!^^^^;,=   ^"
    echo Input string: `!STRING!`
    
    rem // Double quotes to avoid troubles with unbalanced ones:
    if defined STRING set "STRING=!STRING:"=""!^"
    rem // Particularly handle carets and exclamation marks as delayed expansion is enabled:
    if defined STRING set "STRING=!STRING:^=^^^^!"
    if defined STRING set "STRING=%STRING:!=^^^!%" !
    if defined STRING (
        rem // Escape all characters that `call` has got troubles with:
        set "STRING=!STRING:^=^^!"
        set "STRING=!STRING:&=^&!"
        set "STRING=!STRING:<=^<!"
        set "STRING=!STRING:>=^>!"
        set "STRING=!STRING:|=^|!"
    )
    rem /* Call the sub-routine here; the strigs `!=!` constitute undefined dummy variables
    rem    with an illegal name, which eventually become removed; the purpose of them us to
    rem    enable usage of that `call` inside of a `for` loop with the meta-variable `%%S`,
    rem    which would otherwise become unintentionally expanded rather than `%%STRING%%`,
    rem    which literally contained `%%S`; the `!=!` at the end is just there in case you
    rem    want to append another string that could also match another `for` meta-variable;
    rem    note that `!!` is not possible as this would be collapsed to a single `!`, so
    rem    a (most probably undefined) variable `!STRING%!` would then become expanded: */
    call :TRIM %%!=!STRING%%!=!
    rem /* The caret doubling done by `call` does not need to be reverted, because due to
    rem    doubling of the quotes carets appear unquoted, so implicit reversion occurs here;
    rem    of course the doubling of the quotes must eventually be undone: */
    if defined RESULT set "RESULT=!RESULT:""="!^"
    echo Now trimmed: `!RESULT!`
    
    endlocal
    exit /B
    
    :TRIM
        rem // This is the effective line that does the left- and right-trimming:
        set "RESULT=%*" !
        exit /B
    
    0 讨论(0)
  • 2020-12-02 03:09

    I use this Python 2 script to print lines with trailing whitespace and remove them manually:

    #!/usr/bin/env python2
    import sys
    
    if not sys.argv[1:]:
      sys.exit('usage: whitespace.py <filename>')
    
    for no, line in enumerate(open(sys.argv[1], 'rb').read().splitlines()):
      if line.endswith(' '):
        print no+1, line
    

    I know that Python is not preinstalled for Windows, but at least it works cross-platform.

    0 讨论(0)
  • 2020-12-02 03:11

    The DosTips RTRIM function that Ben Hocking cites can be used to create a script that can right trim each line in a text file. However, the function is relatively slow.

    DosTips user (and moderator) aGerman developed a very efficient right trim algorithm. He implemented the algorithm as a batch "macro" - an interesting concept of storing complex mini scripts in environment variables that can be executed from memory. The macros with arguments are a major discussion topic in and of themselves that is not relevent to this question.

    I have extracted aGerman's algorithm and put it in the following batch script. The script expects the name of a text file as the only parameter and proceeds to right trim the spaces off each line in the file.

    @echo off
    setlocal enableDelayedExpansion
    set "spcs= "
    for /l %%n in (1 1 12) do set "spcs=!spcs!!spcs!"
    findstr /n "^" "%~1" >"%~1.tmp"
    setlocal disableDelayedExpansion
    (
      for /f "usebackq delims=" %%L in ("%~1.tmp") do (
        set "ln=%%L"
        setlocal enableDelayedExpansion
        set "ln=!ln:*:=!"
        set /a "n=4096"
        for /l %%i in (1 1 13) do (
          if defined ln for %%n in (!n!) do (
            if "!ln:~-%%n!"=="!spcs:~-%%n!" set "ln=!ln:~0,-%%n!"
            set /a "n/=2"
          )
        )
        echo(!ln!
        endlocal
      )
    ) >"%~1"
    del "%~1.tmp" 2>nul
    

    Assuming the script is called rtrimFile.bat, then it can be called from the command line as follows:

    rtrimFile "fileName.txt"
    

    A note about performance
    The original DosTips rtrim function performs a linear search and defaults to trimming a maximum of 32 spaces. It has to iterate once per space.

    aGerman's algorithm uses a binary search and it is able to trim the maximum string size allowed by batch (up to ~8k spaces) in 13 iterations.

    Unfotunately, batch is very SLOW when it comes to processing text. Even with the efficient rtrim function, it takes ~70 seconds to trim a 1MB file on my machine. The problem is, just reading and writing the file without any modification takes significant time. This answer uses a FOR loop to read the file, coupled with FINDSTR to prefix each line with the line number so that blank lines are preserved. It toggles delayed expansion to prevent ! from being corrupted, and uses a search and replace operation to remove the line number prefix from each line. All that before it even begins to do the rtrim.

    Performance could be nearly doubled by using an alternate file read mechanism that uses set /p. However, the set /p method is limited to ~1k bytes per line, and it strips trailing control characters from each line.

    If you need to regularly trim large files, then even a doubling of performance is probably not adequate. Time to download (if possible) any one of many utilities that could process the file in the blink of an eye.

    If you can't use non-native software, then you can try VBScript or JScript excecuted via the CSCRIPT batch command. Either one would be MUCH faster.

    UPDATE - Fast solution with JREPL.BAT

    JREPL.BAT is a regular expression find/replace utility that can very efficiently solve the problem. It is pure script (hybrid batch/JScript) that runs natively on any Windows machine from XP onward. No 3rd party exe files are needed.

    With JREPL.BAT somewhere within your PATH, you can strip trailing spaces from file "test.txt" with this simple command:

    jrepl " +$" "" /f test.txt /o -
    

    If you put the command within a batch script, then you must precede the command with CALL:

    call jrepl " +$" "" /f test.txt /o -
    
    0 讨论(0)
提交回复
热议问题