问题
I'm trying to load and parse a .srt subtitle file in VB.net
. It is a very simple text file, but I'm having difficulty.
Here is the structure:
Hide Copy Code
1
00:00:01,600 --> 00:00:04,200
English (US)
2
00:00:05,900 --> 00:00:07,999
This is a subtitle in American English
Sometimes subtitles have 2 lines
3
00:00:10,000 --> 00:00:14,000
Adding subtitles is very easy to do
- A number
- Followed by start and end time
- followed by the text which can be 1 or multiple lines
What I'm really trying to do is find the length in time of the subtitle file - meaning finding the last end time for the subtitle file. I'm creating a program that hard codes subtitles to a video file so I need to know how long the video should be based on the length of the subtitle file.
The outcome I'm looking for is:
After reading a .srt file to know the "length" in time of the .srt file - meaning the last time code. In the example above it would be: 00:00:14,000 that's the last time the subtitle is displayed.
回答1:
You can do it easily with LINQ and File.Readlines
Dim SrtTimeCode As String = ""
Dim lastTimeLine As String = File.ReadLines(FILE_NAME) _
.LastOrDefault(Function(s) s.Contains(" --> "))
If lastTimeLine IsNot Nothing Then
SrtTimeCode = lastTimeLine.Split(New String() {" --> "}, StringSplitOptions.None)(1)
End If
Note that File.ReadLines
keeps only the current line in memory when enumerating the lines. It does not store the whole file. This scales better with big files.
回答2:
Also, that can be achieved through the Regular Expressions
Imports System.IO
Imports System.Text.RegularExpressions
'...
Private Sub TheCaller()
Dim srtFile As String = "English.srt"
Dim endTime = "Not Found!"
If File.Exists(srtFile) Then
Dim patt As String = ">.(\d\d:\d\d:\d\ds?,s?\d{3})"
'Get the last match, --> 00:00:14,000 in your example:
Dim lastMatch = File.ReadLines(srtFile).
LastOrDefault(Function(x) Regex.IsMatch(x, patt))
If lastMatch IsNot Nothing Then
endTime = Regex.Match(lastMatch, patt).Groups(1).Value
End If
End If
Console.WriteLine(endTime)
End Sub
The output is regex101:
00:00:14,000
If you want to get rid of the milliseconds part, then use the following pattern instead:
Dim patt As String = ">.(\d\d:\d\d:\d\d)"
and you will get regex101:
00:00:14
回答3:
Comments and explanations in-line.
Private Sub OpCode()
'Using Path.Combine you don't have to worry about if the backslash is there or not
Dim theFile1 = Path.Combine(Application.StartupPath(), ListBox1.SelectedItem.ToString)
'A streamreader needs to be closed and disposed,File.ReadAllLines opens the file, reads it, and closes it.
'It returns an array of lines
Dim lines = File.ReadAllLines(theFile1)
Dim LastLineIndex = lines.Length - 1
Dim lastLine As String = lines(LastLineIndex)
'You tried to parse the entire line. You only want the first character
Do Until Integer.TryParse(lastLine.Substring(0, 1), Nothing)
LastLineIndex -= 1
lastLine = lines(LastLineIndex)
Loop
'The lower case c tells the compiler that the preceding string is really a Char.
Dim splitLine = lastLine.Split(">"c)
'Starting at index 1 because there is a space between > and 0
Dim SrtEndTimeCode As String = splitLine(1).Substring(1, 12)
MessageBox.Show(SrtEndTimeCode)
End Sub
回答4:
Well I guess I got it - it's probably not the best code, but it works:
Here's what's going on in the code: I have a Listbox with .srt files The code takes the .srt file and puts it in a textbox Then it parses it starting with the last line and goes back up to 20 lines (to give room for extra line breaks at end of file etc. Then it looks for the first line that only has an integer (meaning the last line) then it looks for the line after that which is the timecode then it takes the part on the right which is the end code And that is the "length" of the .srt file
Dim appPath As String = Application.StartupPath() ' app path
Dim theFile1 As String
theFile1 = appPath & "\" & ListBox1.SelectedItem.ToString 'this is where i have the .srt files
Dim FILE_NAME As String = theFile1
Dim TextLine As String
If System.IO.File.Exists(FILE_NAME) = True Then
Dim objReader As New System.IO.StreamReader(FILE_NAME)
Do While objReader.Peek() <> -1
TextLine = TextLine & objReader.ReadLine() & vbNewLine
Loop
TextBox7.Text = TextLine ' load .srt into textbox
Else
MessageBox.Show("File Does Not Exist")
End If
Dim SrtTimeCode As String
SrtTimeCode = ""
If TextBox7.Lines.Any = True Then ' only execute if textbox has lines
Dim lastLine As String
For i = 1 To 20 'Check from the end of text file back 20 lines for final subtitle chunk
lastLine = TextBox7.Lines(TextBox7.Lines.Length - i)
If Integer.TryParse(lastLine, vbNull) Then ' if the last line is found
SrtTimeCode = TextBox7.Lines(TextBox7.Lines.Length - i + 1) 'the last timecode has been found - now it needs to be split
GoTo TheEnd
End If
Next i
End If
theEnd:
Dim ChoppedSRTTimeCodeFinal As String
Dim test As String = SrtTimeCode
Dim ChoppedSRTTimeCode As String = test.Substring(test.IndexOf(">"c) + 1)
'ChoppedSRTTimeCodeFinal = ChoppedSRTTimeCode.Substring(test.IndexOf(","c) + 1)
ChoppedSRTTimeCodeFinal = ChoppedSRTTimeCode.Substring(0, ChoppedSRTTimeCode.IndexOf(","))
MsgBox(ChoppedSRTTimeCodeFinal) ' this is the final timecode parsed
来源:https://stackoverflow.com/questions/59326128/how-do-i-parse-a-srt-subtitle-file