Parsing srt subtitles

后端 未结 6 1778
粉色の甜心
粉色の甜心 2021-02-08 12:31

I want to parse srt subtitles:

    1
    00:00:12,815 --> 00:00:14,509
    Chlapi, jak to jde s
    těma pracovníma světlama?.

    2
    00:00:14,815 -->          


        
6条回答
  •  后悔当初
    2021-02-08 12:52

    Here's a snippet I wrote which converts SRT files into dictionaries:

    import re
    def srt_time_to_seconds(time):
        split_time=time.split(',')
        major, minor = (split_time[0].split(':'), split_time[1])
        return int(major[0])*1440 + int(major[1])*60 + int(major[2]) + float(minor)/1000
    
    def srt_to_dict(srtText):
        subs=[]
        for s in re.sub('\r\n', '\n', srtText).split('\n\n'):
            st = s.split('\n')
            if len(st)>=3:
                split = st[1].split(' --> ')
                subs.append({'start': srt_time_to_seconds(split[0].strip()),
                             'end': srt_time_to_seconds(split[1].strip()),
                             'text': '
    '.join(j for j in st[2:len(st)]) }) return subs

    Usage:

    import srt_to_dict
    with open('test.srt', "r") as f:
            srtText = f.read()
            print srt_to_dict(srtText)
    

提交回复
热议问题