Parsing a lisp file with Python

后端 未结 4 640
你的背包
你的背包 2021-01-02 15:02

I have the following lisp file, which is from the UCI machine learning database. I would like to convert it into a flat text file using python. A typical line looks like th

相关标签:
4条回答
  • 2021-01-02 15:37

    Since the data is already in Lisp, use lisp itself:

    (let ((input '(1 ((ST 8) (PITCH 67) (DUR 4) (KEYSIG 1) (TIMESIG 12) (FERMATA 0))
                ((ST 12) (PITCH 67) (DUR 8) (KEYSIG 1) (TIMESIG 12) (FERMATA 0)))))
    
           (let ((row-headers (mapcar 'car (second input)))
              (row-data (mapcar (lambda (row) (mapcar 'second row)) (cdr input))))
    
         (format t "~{~A~^ ~}~%" row-headers)
         (format t "~{~{~A~^ ~}~^ ~%~}" row-data)))
    
    0 讨论(0)
  • 2021-01-02 15:39

    As shown in this answer, pyparsing appears to be the right tool for that:

    inputdata = '(1 ((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))'
    
    from pyparsing import OneOrMore, nestedExpr
    
    data = OneOrMore(nestedExpr()).parseString(inputdata)
    print data
    
    # [['1', [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']], [['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']]]]
    

    For the completeness' sake, this is how to format the results (using texttable):

    from texttable import Texttable
    
    tab = Texttable()
    for row in data.asList()[0][1:]:
        row = dict(row)
        tab.header(row.keys())
        tab.add_row(row.values())
    print tab.draw()
    
    +---------+--------+----+-------+-----+---------+
    | timesig | keysig | st | pitch | dur | fermata |
    +=========+========+====+=======+=====+=========+
    | 12      | 1      | 8  | 67    | 4   | 0       |
    +---------+--------+----+-------+-----+---------+
    | 12      | 1      | 12 | 67    | 8   | 0       |
    +---------+--------+----+-------+-----+---------+
    

    To convert that data back to the lisp notation:

    def lisp(x):
        return '(%s)' % ' '.join(lisp(y) for y in x) if isinstance(x, list) else x
    
    d = lisp(d[0])
    
    0 讨论(0)
  • 2021-01-02 15:51

    Separate it into pairs with a regular expression:

    In [1]: import re
    
    In [2]: txt = '(((st 8) (pitch 67) (dur 4) (keysig 1) (timesig 12) (fermata 0))((st 12) (pitch 67) (dur 8) (keysig 1) (timesig 12) (fermata 0)))'
    
    In [3]: [p.split() for p in re.findall('\w+\s+\d+', txt)]
    Out[3]: [['st', '8'], ['pitch', '67'], ['dur', '4'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0'], ['st', '12'], ['pitch', '67'], ['dur', '8'], ['keysig', '1'], ['timesig', '12'], ['fermata', '0']]
    

    Then make it into a dictionary:

    dct = {}
    for p in data:
        if not p[0] in dct.keys():
            dct[p[0]] = [p[1]]
        else:
            dct[p[0]].append(p[1])
    

    The result:

    In [10]: dct
    Out[10]: {'timesig': ['12', '12'], 'keysig': ['1', '1'], 'st': ['8', '12'], 'pitch': ['67', '67'], 'dur': ['4', '8'], 'fermata': ['0', '0']}
    

    Printing:

    print 'time pitch duration keysig timesig fermata'
    for t in range(len(dct['st'])):
        print dct['st'][t], dct['pitch'][t], dct['dur'][t], 
        print dct['keysig'][t], dct['timesig'][t], dct['fermata'][t]
    

    Proper formatting is left as an exercise for the reader...

    0 讨论(0)
  • 2021-01-02 15:55

    If you know that the data is correct and the format uniform (seems so at a first sight), and if you need just this data and don't need to solve the general problem... then why not just replacing every non-numeric with a space and then going with split?

    import re
    data = open("chorales.lisp").read().split("\n")
    data = [re.sub("[^-0-9]+", " ", x) for x in data]
    for L in data:
        L = map(int, L.split())
        i = 1  # first element is chorale number
        while i < len(L):
            st, pitch, dur, keysig, timesig, fermata = L[i:i+6]
            i += 6
            ... your processing goes here ...
    
    0 讨论(0)
提交回复
热议问题