Parsing files (ics/ icalendar) using Python

后端 未结 5 1463
感情败类
感情败类 2020-11-29 00:30

I have a .ics file in the following format. What is the best way to parse it? I need to retrieve the Summary, Description, and Time for each of the entries.

         


        
相关标签:
5条回答
  • 2020-11-29 00:36

    You could probably also use the vobject module for this: http://pypi.python.org/pypi/vobject

    If you have a sample.ics file you can read it's contents like, so:

    # read the data from the file
    data = open("sample.ics").read()
    
    # parse the top-level event with vobject
    cal = vobject.readOne(data)
    
    # Get Summary
    print 'Summary: ', cal.vevent.summary.valueRepr()
    # Get Description
    print 'Description: ', cal.vevent.description.valueRepr()
    
    # Get Time
    print 'Time (as a datetime object): ', cal.vevent.dtstart.value
    print 'Time (as a string): ', cal.vevent.dtstart.valueRepr()
    
    0 讨论(0)
  • 2020-11-29 00:44

    New to python; the above comments were very helpful so wanted to post a more complete sample.

    # ics to csv example
    # dependency: https://pypi.org/project/vobject/
    
    import vobject
    import csv
    
    with open('sample.csv', mode='w') as csv_out:
        csv_writer = csv.writer(csv_out, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        csv_writer.writerow(['WHAT', 'WHO', 'FROM', 'TO', 'DESCRIPTION'])
    
        # read the data from the file
        data = open("sample.ics").read()
    
        # iterate through the contents
        for cal in vobject.readComponents(data):
            for component in cal.components():
                if component.name == "VEVENT":
                    # write to csv
                    csv_writer.writerow([component.summary.valueRepr(),component.attendee.valueRepr(),component.dtstart.valueRepr(),component.dtend.valueRepr(),component.description.valueRepr()])
    
    
    0 讨论(0)
  • 2020-11-29 00:54

    Four years later and understanding ICS format a bit better, if those were the only fields I needed, I'd just use the native string methods:

    import io
    
    # Probably not a valid .ics file, but we don't really care for the example
    # it works fine regardless
    file = io.StringIO('''
    BEGIN:VCALENDAR
    X-LOTUS-CHARSET:UTF-8
    VERSION:2.0
    DESCRIPTION:Emails\nDarlene\n Murphy\nDr. Ferri\n
    
    SUMMARY:smart energy management
    LOCATION:8778/92050462
    DTSTART;TZID="India":20100629T110000
    DTEND;TZID="India":20100629T120000
    TRANSP:OPAQUE
    DTSTAMP:20100713T071037Z
    CLASS:PUBLIC
    SUMMARY:meeting
    UID:6011DDDD659E49D765257751001D2B4B-Lotus_Notes_Generated
    X-LOTUS-UPDATE-SEQ:1
    X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
    X-LOTUS-NOTESVERSION:2
    X-LOTUS-APPTTYPE:0
    X-LOTUS-CHILD_UID:6011DDDD659E49D765257751001D2B4B
    END:VEVENT
    '''.strip())
    
    parsing = False
    for line in file:
        field, _, data = line.partition(':')
        if field in ('SUMMARY', 'DESCRIPTION', 'DTSTAMP'):
            parsing = True
            print(field)
            print('\t'+'\n\t'.join(data.split('\n')))
        elif parsing and not data:
            print('\t'+'\n\t'.join(field.split('\n')))
        else:
            parsing = False
    

    Storing the data and parsing the datetime is left as an exercise for the reader (it's always UTC)

    old answer below


    You could use a regex:

    import re
    text = #your text
    print(re.search("SUMMARY:.*?:", text, re.DOTALL).group())
    print(re.search("DESCRIPTION:.*?:", text, re.DOTALL).group())
    print(re.search("DTSTAMP:.*:?", text, re.DOTALL).group())
    

    I'm sure it may be possible to skip the first and last words, I'm just not sure how to do it with regex. You could do it this way though:

    print(' '.join(re.search("SUMMARY:.*?:", text, re.DOTALL).group().replace(':', ' ').split()[1:-1])
    
    0 讨论(0)
  • 2020-11-29 01:00

    The icalendar package looks nice.

    For instance, to write a file:

    from icalendar import Calendar, Event
    from datetime import datetime
    from pytz import UTC # timezone
    
    cal = Calendar()
    cal.add('prodid', '-//My calendar product//mxm.dk//')
    cal.add('version', '2.0')
    
    event = Event()
    event.add('summary', 'Python meeting about calendaring')
    event.add('dtstart', datetime(2005,4,4,8,0,0,tzinfo=UTC))
    event.add('dtend', datetime(2005,4,4,10,0,0,tzinfo=UTC))
    event.add('dtstamp', datetime(2005,4,4,0,10,0,tzinfo=UTC))
    event['uid'] = '20050115T101010/27346262376@mxm.dk'
    event.add('priority', 5)
    
    cal.add_component(event)
    
    f = open('example.ics', 'wb')
    f.write(cal.to_ical())
    f.close()
    

    Tadaaa, you get this file:

    BEGIN:VCALENDAR
    PRODID:-//My calendar product//mxm.dk//
    VERSION:2.0
    BEGIN:VEVENT
    DTEND;VALUE=DATE:20050404T100000Z
    DTSTAMP;VALUE=DATE:20050404T001000Z
    DTSTART;VALUE=DATE:20050404T080000Z
    PRIORITY:5
    SUMMARY:Python meeting about calendaring
    UID:20050115T101010/27346262376@mxm.dk
    END:VEVENT
    END:VCALENDAR
    

    But what lies in this file?

    g = open('example.ics','rb')
    gcal = Calendar.from_ical(g.read())
    for component in gcal.walk():
        print component.name
    g.close()
    

    You can see it easily:

    >>> 
    VCALENDAR
    VEVENT
    >>> 
    

    What about parsing the data about the events:

    g = open('example.ics','rb')
    gcal = Calendar.from_ical(g.read())
    for component in gcal.walk():
        if component.name == "VEVENT":
            print(component.get('summary'))
            print(component.get('dtstart'))
            print(component.get('dtend'))
            print(component.get('dtstamp'))
    g.close()
    

    Now you get:

    >>> 
    Python meeting about calendaring
    20050404T080000Z
    20050404T100000Z
    20050404T001000Z
    >>> 
    
    0 讨论(0)
  • 2020-11-29 01:00

    I'd parse line by line and do a search for your terms, then get the index and extract that and X number of characters further (however many you think you'll need). Then parse that much smaller string to get it to be what you need.

    0 讨论(0)
提交回复
热议问题