我在下载的udacity中教程时,字幕和视频是分离的,对于英文还无法完全听懂的我来说,字幕还是比较重要.不想看解释的可直接跳到最后复制代码运行即可.
查看了vtt和srt的区别,使用记事本打开vtt和srt,发现主要有两个
- 首行多了 WEBVTT\n\n 标识符
- 标点格式区别,vtt内部的"."在srt中为","
流程图:
基于python写了一个简单的脚本对其进行批量修改
-
1 引入依赖
-
- os获取文件信息
- sys获取命令行输入args
- re对获取的文件内容进行匹配或更换
import os
import sys
import re
-
2 定义主函数
-
if __name__ == '__main__': args = sys.argv print(args) if os.path.isdir(args[1]): file_list = get_file_name(args[1], ".vtt") for file in file_list: vtt2srt(file) elif os.path.isfile(args[1]): vtt2srt(args[1]) else: print("arg[0] should be file name or dir")
-
3 定义获取文件名称函数get_file_name
-
def get_file_name(dir, file_extension): f_list = os.listdir(dir) result_list = [] for file_name in f_list: if os.path.splitext(file_name)[1] == file_extension: result_list.append(os.path.join(dir, file_name)) return result_list
-
4 定义转换逻辑
-
def vtt2srt(file_name): content = open(file_name, "r", encoding="utf-8").read() # 删除WEBVTT行 content = re.sub("WEBVTT\n\n",'',content) # 替换“.”为“,” content = re.sub("(\d{2}:\d{2}:\d{2}).(\d{3})", lambda m: m.group(1) + ',' + m.group(2), content) output_file = os.path.splitext(file_name)[0] + '.srt' open(output_file, "w", encoding="utf-8").write(content) def srt2vtt(file_name): content = open(file_name, "r", encoding="utf-8").read() # 添加WEBVTT行 content = "WEBVTT\n\n" + content # 替换“,”为“.” content = re.sub("(\d{2}:\d{2}:\d{2}),(\d{3})", lambda m: m.group(1) + '.' + m.group(2), content) output_file = os.path.splitext(file_name)[0] + '.vtt' open(output_file, "w", encoding="utf-8").write(content)
-
5 完整代码
-
import os import sys import re def get_file_name(dir, file_extension): f_list = os.listdir(dir) result_list = [] for file_name in f_list: if os.path.splitext(file_name)[1] == file_extension: result_list.append(os.path.join(dir, file_name)) return result_list def vtt2srt(file_name): content = open(file_name, "r", encoding="utf-8").read() # 删除WEBVTT行 content = re.sub("WEBVTT\n\n",'',content) # 替换“.”为“,” content = re.sub("(\d{2}:\d{2}:\d{2}).(\d{3})", lambda m: m.group(1) + ',' + m.group(2), content) output_file = os.path.splitext(file_name)[0] + '.srt' open(output_file, "w", encoding="utf-8").write(content) def srt2vtt(file_name): content = open(file_name, "r", encoding="utf-8").read() # 添加WEBVTT行 content = "WEBVTT\n\n" + content # 替换“,”为“.” content = re.sub("(\d{2}:\d{2}:\d{2}),(\d{3})", lambda m: m.group(1) + '.' + m.group(2), content) output_file = os.path.splitext(file_name)[0] + '.vtt' open(output_file, "w", encoding="utf-8").write(content) if __name__ == '__main__': args = sys.argv if os.path.isdir(args[1]): file_list = get_file_name(args[1], ".vtt") for file in file_list: vtt2srt(file) elif os.path.isfile(args[1]): vtt2srt(args[1]) print('done') else: print("arg[0] should be file name or dir")
注意:
-
1 为避免路径错误,请使用文件夹的绝对路径
- 代码基于python3.x
-
来源:oschina
链接:https://my.oschina.net/u/4398995/blog/4011358