Find value from one csv in another one (like vlookup) in bash (Linux)

前端 未结 4 1942
猫巷女王i
猫巷女王i 2021-01-03 16:38

I have already tried all options that I found online to solve my issue but without good result.

Basically I have two csv files (pipe separated):

file1.

相关标签:
4条回答
  • 2021-01-03 17:10

    You can use Miller (https://github.com/johnkerl/miller).

    Starting from input01.txt

    123|21|0452|IE|IE|1|MAYOBAN|BRIN|OFFICE|STREET|MAIN STREET|MAYOBAN|
    123|21|0453|IE|IE|1|CORKKIN|ROBERT|SURNAME|CORK|APTS|CORKKIN|
    123|21|0452|IE|IE|1|CORKCOR|NAME|HARRINGTON|DUBLIN|STREET|CORKCOR|
    

    and input02.txt

    MAYOBAN|BANGOR|2400
    MAYOBEL|BELLAVARY|2400
    CORKKIN|KINSALE|2200
    CORKCOR|CORK|2200
    DUBLD11|DUBLIN 11|2100
    

    and running

    mlr --csv -N --ifs "|" join  -j 7 -l 7 -r 1 -f input01.txt then cut -f 3 input02.txt
    

    you will have

    2400
    2200
    2200
    

    Some notes:

    • -N to set input and output without header;
    • --ifs "|" to set the input fields separator;
    • -l 7 -r 1 to set the join fields of the input files;
    • cut -f 3 to extract the field named 3 from the join output
    0 讨论(0)
  • 2021-01-03 17:21

    This will work, but since the input files must be sorted, the output order will be affected:

    join -t '|' -1 7 -2 1 -o 2.3 <(sort -t '|' -k7,7 file1.csv) <(sort -t '|' -k1,1 file2.csv)
    

    The output would look like:

    2200
    2200
    2400
    

    which is useless. In order to have a useful output, include the key value:

    join -t '|' -1 7 -2 1 -o 0,2.3 <(sort -t '|' -k7,7 file1.csv) <(sort -t '|' -k1,1 file2.csv)
    

    The output then looks like this:

    CORKCOR|2200
    CORKKIN|2200
    MAYOBAN|2400
    

    Edit:

    Here's an AWK version:

    awk -F '|' 'FNR == NR {keys[$7]; next} {if ($1 in keys) print $3}' file1.csv file2.csv
    

    This loops through file1.csv and creates array entries for each value of field 7. Simply referring to an array element creates it (with a null value). FNR is the record number in the current file and NR is the record number across all files. When they're equal, the first file is being processed. The next instruction reads the next record, creating a loop. When FNR == NR is no longer true, the subsequent file(s) are processed.

    So file2.csv is now processed and if it has a field 1 that exists in the array, then its field 3 is printed.

    0 讨论(0)
  • 2021-01-03 17:21
    cut -d\| -f7 file1.csv|while read line
    do 
      grep $line file1.csv|cut -d\| -f3
    done
    
    0 讨论(0)
  • 2021-01-03 17:31

    A little approach, far away to be perfect:

    DELIMITER="|"
    
    for i in $(cut -f 7 -d "${DELIMITER}" file1.csv ); 
    do 
        grep "${i}" file2.csv | cut -f 3 -d "${DELIMITER}"; 
    done
    
    0 讨论(0)
提交回复
热议问题