Find content of one file from another file in UNIX

前端 未结 8 561
攒了一身酷
攒了一身酷 2020-12-01 07:39

I have 2 files. First file contains the list of row ID\'s of tuples of a table in the database. And second file contains SQL queries with these row ID\'s in \"where\" clause

相关标签:
8条回答
  • 2020-12-01 08:23

    The awk/grep solutions mentioned above were slow or memory hungry on my machine (file1 10^6 rows, file2 10^7 rows). So I came up with an SQL solution using sqlite3.

    Turn file2 into a CSV-formatted file where the first field is the value after ri=

    cat file2.txt  | gawk -F= '{ print $3","$0 }' | sed 's/;,/,/' > file2_with_ids.txt
    

    Create two tables:

    sqlite> CREATE TABLE file1(rowId char(10));
    sqlite> CREATE TABLE file2(rowId char(10), statement varchar(200));
    

    Import the row IDs from file1:

    sqlite> .import file1.txt file1
    

    Import the statements from file2, using the "prepared" version:

    sqlite> .separator ,
    sqlite> .import file2_with_ids.txt file2
    

    Select all and ony the statements in table file2 with a matching rowId in table file1:

    sqlite> SELECT statement FROM file2 WHERE file2.rowId IN (SELECT file1.rowId FROM file1);
    

    File 3 can be easily created by redirecting output to a file before issuing the select statement:

    sqlite> .output file3.txt
    

    Test data:

    sqlite> select count(*) from file1;
    1000000
    sqlite> select count(*) from file2;
    10000000
    sqlite> select * from file1 limit 4;
    1610666927
    1610661782
    1610659837
    1610664855
    sqlite> select * from file2 limit 4;
    1610665680|update TABLE_X set ATTRIBUTE_A=87 where ri=1610665680;
    1610661907|update TABLE_X set ATTRIBUTE_A=87 where ri=1610661907;
    1610659801|update TABLE_X set ATTRIBUTE_A=87 where ri=1610659801;
    1610670610|update TABLE_X set ATTRIBUTE_A=87 where ri=1610670610;
    

    Without creating any indices, the select statement took about 15 secs on an AMD A8 1.8HGz 64bit Ubuntu 12.04 machine.

    0 讨论(0)
  • Maybe try AWK and use number from file 1 as a key for example simple script

    First script will produce awk script:
    awk -f script1.awk

     {
       print "\$0 ~ ",$0,"{ print \$0 }" > script2.awk;
     }
    
    

    and then invoke script2.awk with file

    0 讨论(0)
提交回复
热议问题