Split text file into smaller multiple text file using command line

后端 未结 9 2081
借酒劲吻你
借酒劲吻你 2020-12-22 18:03

I have multiple text file with about 100,000 lines and I want to split them into smaller text files of 5000 lines each.

I used:

split -l 5000 filena         


        
相关标签:
9条回答
  • 2020-12-22 18:18

    You can maybe do something like this with awk

    awk '{outfile=sprintf("file%02d.txt",NR/5000+1);print > outfile}' yourfile
    

    Basically, it calculates the name of the output file by taking the record number (NR) and dividing it by 5000, adding 1, taking the integer of that and zero-padding to 2 places.

    By default, awk prints the entire input record when you don't specify anything else. So, print > outfile writes the entire input record to the output file.

    As you are running on Windows, you can't use single quotes because it doesn't like that. I think you have to put the script in a file and then tell awkto use the file, something like this:

    awk -f script.awk yourfile
    

    and script.awk will contain the script like this:

    {outfile=sprintf("file%02d.txt",NR/5000+1);print > outfile}
    

    Or, it may work if you do this:

    awk "{outfile=sprintf(\"file%02d.txt\",NR/5000+1);print > outfile}" yourfile
    
    0 讨论(0)
  • 2020-12-22 18:18

    here is one in c# that doesn't run out of memory when splitting into large chunks! I needed to split 95M file into 10M x line files.

    var fileSuffix = 0;
    int lines = 0;
    Stream fstream = File.OpenWrite($"{filename}.{(++fileSuffix)}");
    StreamWriter sw = new StreamWriter(fstream);
    
    using (var file = File.OpenRead(filename))
    using (var reader = new StreamReader(file))
    {
        while (!reader.EndOfStream)
        {
            sw.WriteLine(reader.ReadLine());
            lines++;
    
            if (lines >= 10000000)
            {
                  sw.Close();
                  fstream.Close();
                  lines = 0;
                  fstream = File.OpenWrite($"{filename}.{(++fileSuffix)}");
                  sw = new StreamWriter(fstream);
            }
        }
    }
    
    sw.Close();
    fstream.Close();
    
    0 讨论(0)
  • 2020-12-22 18:18

    I have created a simple program for this and your question helped me complete the solution... I added one more feature and few configurations. In case you want to add a specific character/ string after every few lines (configurable). Please go through the notes. I have added the code files : https://github.com/mohitsharma779/FileSplit

    0 讨论(0)
  • 2020-12-22 18:23
    @ECHO OFF
    SETLOCAL
    SET "sourcedir=U:\sourcedir"
    SET /a fcount=100
    SET /a llimit=5000
    SET /a lcount=%llimit%
    FOR /f "usebackqdelims=" %%a IN ("%sourcedir%\q25249516.txt") DO (
     CALL :select
     FOR /f "tokens=1*delims==" %%b IN ('set dfile') DO IF /i "%%b"=="dfile" >>"%%c" ECHO(%%a
    )
    GOTO :EOF
    :select
    SET /a lcount+=1
    IF %lcount% lss %llimit% GOTO :EOF
    SET /a lcount=0
    SET /a fcount+=1
    SET "dfile=%sourcedir%\file%fcount:~-2%.txt"
    GOTO :EOF
    

    Here's a native windows batch that should accomplish the task.

    Now I'll not say that it'll be fast (less than 2 minutes for each 5Kline output file) or that it will be immune to batch character-sensitivites. Really depends on the characteristics of your target data.

    I used a file named q25249516.txt containing 100Klines of data for my testing.


    Revised quicker version

    REM

    @ECHO OFF
    SETLOCAL
    SET "sourcedir=U:\sourcedir"
    SET /a fcount=199
    SET /a llimit=5000
    SET /a lcount=%llimit%
    FOR /f "usebackqdelims=" %%a IN ("%sourcedir%\q25249516.txt") DO (
     CALL :select
     >>"%sourcedir%\file$$.txt" ECHO(%%a
    )
    SET /a lcount=%llimit%
    :select
    SET /a lcount+=1
    IF %lcount% lss %llimit% GOTO :EOF
    SET /a lcount=0
    SET /a fcount+=1
    MOVE /y "%sourcedir%\file$$.txt" "%sourcedir%\file%fcount:~-2%.txt" >NUL 2>nul
    GOTO :EOF
    

    Note that I used llimit of 50000 for testing. Will overwrite the early file numbers if llimit*100 is gearter than the number of lines in the file (cure by setting fcount to 1999 and use ~3 in place of ~2 in file-renaming line.)

    0 讨论(0)
  • 2020-12-22 18:28

    Syntax looks like:

    $ split [OPTION] [INPUT [PREFIX]] 
    

    where prefix is PREFIXaa, PREFIXab, ...

    Just use proper one and youre done or just use mv for renameing. I think $ mv * *.txt should work but test it first on smaller scale.

    :)

    0 讨论(0)
  • 2020-12-22 18:30

    This "File Splitter" Windows command line program works nicely: https://github.com/dubasdey/File-Splitter

    It's open source, simple, documented, proven, and worked for me.

    Example:

    fsplit -split 50 mb mylargefile.txt
    
    0 讨论(0)
提交回复
热议问题