Converting sparse matrix to ARFF using awk

前端 未结 1 425
春和景丽
春和景丽 2021-01-17 00:51

I am working with an extremely large data set in a sparse matrix format.

The data has the filing format (3 tab separated columns, where the string in the first colum

相关标签:
1条回答
  • 2021-01-17 01:04

    I've no idea what arff is (nor do I need to know to help you transpose your text to a different format) so let's start with this:

    $ cat tst.awk
    BEGIN{ FS="\t" }
    NR==1 { printf "@relation '%s'\n", FILENAME }
    {
        row = $1
        attr = $2
    
        if (!seenRow[row]++) {
            rows[++numRows] = row
        }
    
        if (!seenAttr[attr]++) {
            printf "@attribute \"%s\" string\n", attr
            attrs[++numAttrs] = attr
        }
    
        score[row,attr] = $3
    }
    END {
        print "\n\n@data"
        for (rowNr=1; rowNr<=numRows; rowNr++) {
            row = rows[rowNr]
            for (attrNr=1;attrNr<=numAttrs;attrNr++)  {
                attr = attrs[attrNr]
                printf "%d,", score[row,attr]
            }
            print row
        }
    }
    $
    $ cat file
    church  place   3
    church  institution     6
    man     place   86
    man     food    63
    woman   book    37
    $
    $ awk -f tst.awk file
    @relation 'file'
    @attribute "place" string
    @attribute "institution" string
    @attribute "food" string
    @attribute "book" string
    
    
    @data
    3,6,0,0,church
    86,0,63,0,man
    0,0,0,37,woman
    

    Now, tell us what's wrong with that and we can go from there.

    0 讨论(0)
提交回复
热议问题