How to join two text files, removing duplicates, in Windows

后端 未结 5 1658
隐瞒了意图╮
隐瞒了意图╮ 2020-12-10 08:42

file 1

A
B
C

file 2

B
C
D

file1 + file2 =

A
B
C
D

Is it possible to do

相关标签:
5条回答
  • 2020-12-10 09:18

    The solution below assume that both input files are sorted in ascending order using the same order of IF command's comparison operators and that does not contain empty lines.

    @echo off
    setlocal EnableDelayedExpansion
    
    set "lastLine=ÿ"
    for /L %%i in (1,1,10) do set "lastLine=!lastLine!!lastLine!"
    
    < file1.txt (
       for /F "delims=" %%a in (file2.txt) do (
          set "line2=%%a"
          if not defined line1 set /P line1=
          if "!line1!" lss "!line2!" call :advanceLine1
          if "!line1!" equ "!line2!" (
             echo !line1!
             set "line1="
          ) else (
             echo !line2!
          )
       )
    )
    if "!line1!" neq "%lastLine%" echo !line1!
    goto :EOF
    
    
    :advanceLine1
    echo !line1!
    set "line1="
    set /P line1=
    if not defined line1 set "line1=%lastLine%"
    if "!line1!" lss "!line2!" goto advanceLine1
    exit /B
    
    0 讨论(0)
  • 2020-12-10 09:22

    If you can affort to use a case insensitive comparison, and if you know that none of the lines are longer than 511 bytes (127 for XP), then you can use the following:

    @echo off
    copy file1.txt merge.txt >nul
    findstr /lvxig:file1.txt file2.txt >>merge.txt
    type merge.txt
    

    For an explanation of the restrictions, see What are the undocumented features and limitations of the Windows FINDSTR command?.

    0 讨论(0)
  • 2020-12-10 09:37

    First part (merging two text files) is possible. (See Documentation of copy command)

    copy file1.txt+file2.txt file1and2.txt
    

    For part 2, you can use sort and uniq utilities from CoreUtils for Windows. This are windows port of the linux utilities.

    sort file1and2.txt filesorted.txt
    uniq filesorted.txt fileunique.txt
    

    This has a limitation that you will lose track of original sequencing.

    Update 1

    Windows also ships with a native SORT.EXE.

    Update 2

    Here is a very simple UNIQ in CMD script

    0 讨论(0)
  • 2020-12-10 09:37

    You may also use the same approach of Unix or PowerShell with pure Batch, developing a simple uniq.bat filter program:

    @echo off
    setlocal EnableDelayedExpansion
    set "prevLine="
    for /F "delims=" %%a in ('findstr "^"') do (
       if "%%a" neq "!prevLine!" (
          echo %%a
          set "prevLine=%%a"
       )
    )
    

    EDIT: The program below is a Batch-JScript hybrid version of uniq program, more reliable and faster; copy this program in a file called uniq.bat:

    @if (@CodeSection == @Batch) @then
    
    @CScript //nologo //E:JScript "%~F0" & goto :EOF
    
    @end
    
    var line, prevLine = "";
    while ( ! WScript.Stdin.AtEndOfStream ) {
       line = WScript.Stdin.ReadLine();
       if ( line != prevLine ) {
          WScript.Stdout.WriteLine(line);
          prevLine = line;
       }
    }
    

    This way, you may use this solution:

    (type file1.txt & type file2.txt) | sort | uniq > result.txt
    

    However, in this case the result lost the original order.

    0 讨论(0)
  • 2020-12-10 09:38

    Using PowerShell:

    Get-Content file?.txt | Sort-Object | Get-Unique > result.txt
    

    For cmd.exe:

    @echo off
    type nul > temp.txt
    type nul > result.txt,
    copy file1.txt+file2.txt temp.txt
    for /f "delims=" %%I in (temp.txt) do findstr /X /C:"%%I" result.txt >NUL ||(echo;%%I)>>result.txt
    del temp.txt
    
    0 讨论(0)
提交回复
热议问题