Batch to remove duplicate rows from text file

前端 未结 8 741
旧巷少年郎
旧巷少年郎 2020-11-29 10:37

Is it possible to remove duplicate rows from a text file? If yes, how?

相关标签:
8条回答
  • 2020-11-29 10:51

    The Batch file below do what you want:

    @echo off
    setlocal EnableDelayedExpansion
    set "prevLine="
    for /F "delims=" %%a in (theFile.txt) do (
       if "%%a" neq "!prevLine!" (
          echo %%a
          set "prevLine=%%a"
       )
    )
    

    If you need a more efficient method, try this Batch-JScript hybrid script that is developed as a filter, that is, similar to Unix uniq program. Save it with .bat extension, like uniq.bat:

    @if (@CodeSection == @Batch) @then
    
    @CScript //nologo //E:JScript "%~F0" & goto :EOF
    
    @end
    
    var line, prevLine = "";
    while ( ! WScript.Stdin.AtEndOfStream ) {
       line = WScript.Stdin.ReadLine();
       if ( line != prevLine ) {
          WScript.Stdout.WriteLine(line);
          prevLine = line;
       }
    }
    

    Both programs were copied from this post.

    0 讨论(0)
  • 2020-11-29 10:52
    set "file=%CD%\%1"
    sort "%file%">"%file%.sorted"
    del /q "%file%"
    FOR /F "tokens=*" %%A IN (%file%.sorted) DO (
    SETLOCAL EnableDelayedExpansion
    if not [%%A]==[!LN!] (
    set "ln=%%A"
    echo %%A>>"%file%"
    )
    )
    ENDLOCAL
    del /q "%file%.sorted"
    

    This should work exactly the same. That dbenham example seemed way too hardcore for me, so, tested my own solution. usage ex.: filedup.cmd filename.ext

    0 讨论(0)
  • 2020-11-29 10:54

    Did come across this issue and had to resolve it myself because the use was particulate to my need. I needed to find duplicate URL's and order of lines was relevant so it needed to be preserved. The lines of text should not contain any double quotes, should not be very long and sorting cannot be used.

    Thus I did this:

    setlocal enabledelayedexpansion
    type nul>unique.txt
    for /F "tokens=*" %%i in (list.txt) do (
        find "%%i" unique.txt 1>nul
        if !errorlevel! NEQ 0 (
            echo %%i>>unique.txt
        )
    )
    

    Auxiliary: if the text does contain double quotes then the FIND needs to use a filtered set variable as described in this post: Escape double quotes in parameter

    So instead of:

    find "%%i" unique.txt 1>nul
    

    it would be more like:

    set test=%%i
    set test=!test:"=""!
    find "!test!" unique.txt 1>nul
    

    Thus find will look like find """what""" file and %%i will be unchanged.

    0 讨论(0)
  • 2020-11-29 10:56

    I have used a fake "array" to accomplish this

    @echo off
    :: filter out all duplicate ip addresses
    REM you file would take place of %1
    set file=%1%
    if [%1]==[] goto :EOF
    setlocal EnableDelayedExpansion
    set size=0
    set cond=false
    set max=0
    for /F %%a IN ('type %file%') do (   
          if [!size!]==[0] (
              set cond=true
              set /a size="size+1"
              set arr[!size!]=%%a
    
          ) ELSE (
                     call :inner
                     if [!cond!]==[true] (
                         set /a size="size+1" 
                         set arr[!size!]=%%a&& ECHO > NUL                      
                     ) 
          )
    )
    break> %file%
    :: destroys old output
    for /L %%b in (1,1,!size!) do echo !arr[%%b]!>> %file%
    endlocal
    goto :eof
    :inner
    for /L %%b in (1,1,!size!) do (  
              if "%%a" neq "!arr[%%b]!" (set cond=true) ELSE (set cond=false&&goto :break)                                
    )
    :break
    

    the use of the label for the inner loop is something specific to cmd.exe and is the only way I have been successful nesting for loops within each other. Basically this compares each new value that is being passed as a delimiter and if there is no match then the program will add the value into memory. When it is done it will destroy the target files contents and replace them with the unique strings

    0 讨论(0)
  • 2020-11-29 10:56

    Some time ago I found an unexpectly simple solution, but this unfortunately only works on Windows 10: the sort command features some undocumented options that can be adopted:

    • /UNIQ[UE] to output only unique lines;
    • /C[ASE_SENSITIVE] to sort case-sensitively;

    So use the following line of code to remove duplicate lines (remove /C to do that in a case-insensitive manner):

    sort /C /UNIQUE "incoming.txt" /O "outgoing.txt"
    

    This removes duplicate lines from the text in incoming.txt and provides the result in outgoing.txt. Regard that the original order is of course not going to be preserved (because, well, this is the main purpose of sort).

    However, you sould use these options with care as there might be some (un)known issues with them, because there is possibly a good reason for them not to be documented (so far).

    0 讨论(0)
  • 2020-11-29 11:03

    you may use uniq http://en.wikipedia.org/wiki/Uniq from UnxUtils http://sourceforge.net/projects/unxutils/

    0 讨论(0)
提交回复
热议问题