Batch to remove duplicate rows from text file

前端 未结 8 742
旧巷少年郎
旧巷少年郎 2020-11-29 10:37

Is it possible to remove duplicate rows from a text file? If yes, how?

相关标签:
8条回答
  • 2020-11-29 11:06

    Pure batch - 3 effective lines.

    @ECHO OFF
    SETLOCAL
    :: remove variables starting $
    FOR  /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
    
    FOR /f "delims=" %%a IN (q34223624.txt) DO SET $%%a=Y
    (FOR  /F "delims=$=" %%a In ('set $ 2^>Nul') DO ECHO %%a)>u:\resultfile.txt
    
    GOTO :EOF
    

    Works happily if the data does not contain characters to which batch has a sensitivity.

    "q34223624.txt" because question 34223624 contained this data

    1.1.1.1
    1.1.1.1
    1.1.1.1
    1.2.1.2
    1.2.1.2
    1.2.1.2
    1.3.1.3
    1.3.1.3
    1.3.1.3
    

    on which it works perfectly.

    0 讨论(0)
  • 2020-11-29 11:10

    Sure can, but like most text file processing with batch, it is not pretty, and it is not particularly fast.

    This solution ignores case when looking for duplicates, and it sorts the lines. The name of the file is passed in as the 1st and only argument to the batch script.

    @echo off
    setlocal disableDelayedExpansion
    set "file=%~1"
    set "sorted=%file%.sorted"
    set "deduped=%file%.deduped"
    ::Define a variable containing a linefeed character
    set LF=^
    
    
    ::The 2 blank lines above are critical, do not remove
    sort "%file%" >"%sorted%"
    >"%deduped%" (
      set "prev="
      for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%sorted%") do (
        set "ln=%%A"
        setlocal enableDelayedExpansion
        if /i "!ln!" neq "!prev!" (
          endlocal
          (echo %%A)
          set "prev=%%A"
        ) else endlocal
      )
    )
    >nul move /y "%deduped%" "%file%"
    del "%sorted%"
    

    This solution is case sensitive and it leaves the lines in the original order (except for duplicates of course). Again the name of the file is passed in as the 1st and only argument.

    @echo off
    setlocal disableDelayedExpansion
    set "file=%~1"
    set "line=%file%.line"
    set "deduped=%file%.deduped"
    ::Define a variable containing a linefeed character
    set LF=^
    
    
    ::The 2 blank lines above are critical, do not remove
    >"%deduped%" (
      for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%file%") do (
        set "ln=%%A"
        setlocal enableDelayedExpansion
        >"%line%" (echo !ln:\=\\!)
        >nul findstr /xlg:"%line%" "%deduped%" || (echo !ln!)
        endlocal
      )
    )
    >nul move /y "%deduped%" "%file%"
    2>nul del "%line%"
    


    EDIT

    Both solutions above strip blank lines. I didn't think blank lines were worth preserving when talking about distinct values.

    I've modified both solutions to disable the FOR /F "EOL" option so that all non-blank lines are preserved, regardless what the 1st character is. The modified code sets the EOL option to a linefeed character.


    New solution 2016-04-13: JSORT.BAT

    You can use my JSORT.BAT hybrid JScript/batch utility to efficiently sort and remove duplicate lines with a simple one liner (plus a MOVE to overwrite the original file with the final result). JSORT is pure script that runs natively on any Windows machine from XP onward.

    @jsort file.txt /u >file.txt.new
    @move /y file.txt.new file.txt >nul
    
    0 讨论(0)
提交回复
热议问题