join two csv files with key value

后端 未结 3 671
故里飘歌
故里飘歌 2020-12-16 00:47

I have two csv files, I want to join them using a key value, the column of the city.

One csv file, d01.csv has this form,

Barcelona, 19.5, 29.5
Tarra         


        
相关标签:
3条回答
  • 2020-12-16 00:57

    This awk may do:

    awk 'FNR==NR {a[$1]=$2FS$3FS$4;next} $1 in a {print $0,a[$1]}' OFS=", " d02,csv d01csv
    Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
    Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5
    Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5
    
    0 讨论(0)
  • 2020-12-16 00:59

    I suggest the CSV Cruncher which takes CSV files as SQL tables and then allows SQL queries, resulting in another CSV file.

    Example:

    crunch input.csv output.csv \
       "SELECT AVG(duration) AS durAvg FROM (SELECT * FROM indata ORDER BY duration LIMIT 2 OFFSET 6)"
    

    The tool needs Java 5 or later.

    Some of the advantages:

    • You really get CSV support, not just "let's assume the data is correct".
    • You can join on multiple keys.
    • Easier to use and understand than join-based solutions.
    • You can combine more than 2 CSV files.
    • You can join by SQL expressions - the values don't have to be the same.

    Disclaimer: I wrote that tool. Unknown project state - Google Code was closed and I didn't transfer it soon enough. I might have a look at it if someone is insterested.

    0 讨论(0)
  • Here's how to use join in bash:

    {
      echo "City, Tmin, Tmax, Date, Tmin1, Tmax1"
      join -t, <(sort d01.csv) <(sed 1d d02.csv | sort)
    } > d03.csv
    cat d03.csv
    
    City, Tmin, Tmax, Date, Tmin1, Tmax1
    Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
    Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5 
    Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5  
    

    Note that join only outputs records where the key exists in both files. To get all of them, specify that you want missing records from both files, specify the fields you want, and give a default value for the missing fields:

    join -t, -a1 -a2 -o 0,1.2,1.3,2.2,2.3,2.4 -e '?' <(sort d01.csv) <(sed 1d d02.csv | sort)
    
    Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
    Girona, 17.2, 32.5,?,?,?
    Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5 
    Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5  
    Tortosa,?,?, 20140916, 20.5, 30.4
    Vic, 17.5, 31.4,?,?,?
    
    0 讨论(0)
提交回复
热议问题