How to split large text file in windows?

后端 未结 6 1072
野的像风
野的像风 2020-12-12 13:30

I have a log file with size of 2.5 GB. Is there any way to split this file into smaller files using windows command prompt?

相关标签:
6条回答
  • 2020-12-12 14:05

    Below code split file every 500

    @echo off
    setlocal ENABLEDELAYEDEXPANSION
    REM Edit this value to change the name of the file that needs splitting. Include the extension.
    SET BFN=upload.txt
    REM Edit this value to change the number of lines per file.
    SET LPF=15000
    REM Edit this value to change the name of each short file. It will be followed by a number indicating where it is in the list.
    SET SFN=SplitFile
    
    REM Do not change beyond this line.
    
    SET SFX=%BFN:~-3%
    
    SET /A LineNum=0
    SET /A FileNum=1
    
    For /F "delims==" %%l in (%BFN%) Do (
    SET /A LineNum+=1
    
    echo %%l >> %SFN%!FileNum!.%SFX%
    
    if !LineNum! EQU !LPF! (
    SET /A LineNum=0
    SET /A FileNum+=1
    )
    
    )
    endlocal
    Pause
    

    See below: https://forums.techguy.org/threads/solved-split-a-100000-line-csv-into-5000-line-csv-files-with-dos-batch.1023949/

    0 讨论(0)
  • 2020-12-12 14:06

    you can split using a third party software http://www.hjsplit.org/, for example give yours input that could be upto 9GB and then split, in my case I split 10 MB each

    0 讨论(0)
  • 2020-12-12 14:17

    If you have installed Git for Windows, you should have Git Bash installed, since that comes with Git.

    Use the split command in Git Bash to split a file:

    • into files of size 500MB each: split myLargeFile.txt -b 500m

    • into files with 10000 lines each: split myLargeFile.txt -l 10000

    Tips:

    • If you don't have Git/Git Bash, download at https://git-scm.com/download

    • If you lost the shortcut to Git Bash, you can run it using C:\Program Files\Git\git-bash.exe

    That's it!


    I always like examples though...

    Example:

    You can see in this image that the files generated by split are named xaa, xab, xac, etc.

    These names are made up of a prefix and a suffix, which you can specify. Since I didn't specify what I want the prefix or suffix to look like, the prefix defaulted to x, and the suffix defaulted to a two-character alphabetical enumeration.

    Another Example:

    This example demonstrates

    • using a filename prefix of MySlice (instead of the default x),
    • the -d flag for using numerical suffixes (instead of aa, ab, ac, etc...),
    • and the option -a 5 to tell it I want the suffixes to be 5 digits long:

    0 讨论(0)
  • 2020-12-12 14:20

    Of course there is! Win CMD can do a lot more than just split text files :)

    Split a text file into separate files of 'max' lines each:

    Split text file (max lines each):
    : Initialize
    set input=file.txt
    set max=10000
    
    set /a line=1 >nul
    set /a file=1 >nul
    set out=!file!_%input%
    set /a max+=1 >nul
    
    echo Number of lines in %input%:
    find /c /v "" < %input%
    
    : Split file
    for /f "tokens=* delims=[" %i in ('type "%input%" ^| find /v /n ""') do (
    
    if !line!==%max% (
    set /a line=1 >nul
    set /a file+=1 >nul
    set out=!file!_%input%
    echo Writing file: !out!
    )
    
    REM Write next file
    set a=%i
    set a=!a:*]=]!
    echo:!a:~1!>>out!
    set /a line+=1 >nul
    )
    

    If above code hangs or crashes, this example code splits files faster (by writing data to intermediate files instead of keeping everything in memory):

    eg. To split a file with 7,600 lines into smaller files of maximum 3000 lines.

    1. Generate regexp string/pattern files with set command to be fed to /g flag of findstr

    list1.txt

    \[[0-9]\]
    \[[0-9][0-9]\]
    \[[0-9][0-9][0-9]\]
    \[[0-2][0-9][0-9][0-9]\]

    list2.txt

    \[[3-5][0-9][0-9][0-9]\]

    list3.txt

    \[[6-9][0-9][0-9][0-9]\]

    1. Split the file into smaller files:
    type "%input%" | find /v /n "" | findstr /b /r /g:list1.txt > file1.txt
    type "%input%" | find /v /n "" | findstr /b /r /g:list2.txt > file2.txt
    type "%input%" | find /v /n "" | findstr /b /r /g:list3.txt > file3.txt
    
    1. remove prefixed line numbers for each file split:
      eg. for the 1st file:
    for /f "tokens=* delims=[" %i in ('type "%cd%\file1.txt"') do (
    set a=%i
    set a=!a:*]=]!
    echo:!a:~1!>>file_1.txt)
    

    Notes:
    Works with leading whitespace, blank lines & whitespace lines.

    Tested on Win 10 x64 CMD, on 4.4GB text file, 5651982 lines.

    0 讨论(0)
  • 2020-12-12 14:21

    You can use the command split for this task. For example this command entered into the command prompt

    split YourLogFile.txt -b 500m
    

    creates several files with a size of 500 MByte each. This will take several minutes for a file of your size. You can rename the output files (by default called "xaa", "xab",... and so on) to *.txt to open it in the editor of your choice.

    Make sure to check the help file for the command. You can also split the log file by number of lines or change the name of your output files.

    (tested on Windows 7 64 bit)

    0 讨论(0)
  • 2020-12-12 14:26
    Set Arg = WScript.Arguments
    set WshShell = createObject("Wscript.Shell")
    Set Inp = WScript.Stdin
    Set Outp = Wscript.Stdout
        Set rs = CreateObject("ADODB.Recordset")
        With rs
            .Fields.Append "LineNumber", 4 
    
            .Fields.Append "Txt", 201, 5000 
            .Open
            LineCount = 0
            Do Until Inp.AtEndOfStream
                LineCount = LineCount + 1
                .AddNew
                .Fields("LineNumber").value = LineCount
                .Fields("Txt").value = Inp.readline
                .UpDate
            Loop
    
            .Sort = "LineNumber ASC"
    
            If LCase(Arg(1)) = "t" then
                If LCase(Arg(2)) = "i" then
                    .filter = "LineNumber < " & LCase(Arg(3)) + 1
                ElseIf LCase(Arg(2)) = "x" then
                    .filter = "LineNumber > " & LCase(Arg(3))
                End If
            ElseIf LCase(Arg(1)) = "b" then
                If LCase(Arg(2)) = "i" then
                    .filter = "LineNumber > " & LineCount - LCase(Arg(3))
                ElseIf LCase(Arg(2)) = "x" then
                    .filter = "LineNumber < " & LineCount - LCase(Arg(3)) + 1
                End If
            End If
    
            Do While not .EOF
                Outp.writeline .Fields("Txt").Value
    
                .MoveNext
            Loop
        End With
    

    Cut

    filter cut {t|b} {i|x} NumOfLines
    

    Cuts the number of lines from the top or bottom of file.

    t - top of the file
    b - bottom of the file
    i - include n lines
    x - exclude n lines
    

    Example

    cscript /nologo filter.vbs cut t i 5 < "%systemroot%\win.ini"
    

    Another way This outputs lines 5001+, adapt for your use. This uses almost no memory.

    Do Until Inp.AtEndOfStream
             Count = Count + 1
             If count > 5000 then
                OutP.WriteLine Inp.Readline
             End If
    Loop
    
    0 讨论(0)
提交回复
热议问题