Parsing HLS m3u8 file using regular expressions

后端 未结 2 1464
一个人的身影
一个人的身影 2021-02-10 09:26

I want to parse HLS master m3u8 file and get the bandwidth, resolution and file name from it. Currently i am using String parsing to search string for some patterns and do the s

2条回答
  •  礼貌的吻别
    2021-02-10 10:11

    You could try something like this:

        final Pattern pattern = Pattern.compile("^#EXT-X-STREAM-INF:.*BANDWIDTH=(\\d+).*RESOLUTION=([\\dx]+).*");
    
        Matcher matcher = pattern.matcher("#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=476416,RESOLUTION=416x234");
        String bandwidth = "";
        String resolution = "";
    
        if (matcher.find()) {
            bandwidth = matcher.group(1);
            resolution = matcher.group(2);
        }
    

    Would set bandwidth and resolution to the correct (String) values.

    I haven't tried this on an android device or emulator, but judging from the link you sent and the android API it should work the same as the above plain old java.

    The regex matches strings starting with #EXT-X-STREAM-INF: and contains BANDWIDTH and RESOLUTION followed by the correct value formats. These are then back-referenced in back-reference group 1 and 2 so we can extract them.

    Edit:

    If RESOLUTION isn't always present then you can make that portion optional as such:

    "^#EXT-X-STREAM-INF:.*BANDWIDTH=(\\d+).*(?:RESOLUTION=([\\dx]+))?.*"
    

    The resolution string would be null in cases where only BANDWIDTH is present.

    Edit2:

    ? makes things optional, and (?:___) means a passive group (as opposed to a back-reference group (___). So it's basically a optional passive group. So yes, anything inside it will be optional.

    A . matches a single character, and a * makes means it will be repeated zero or more times. So .* will match zero or more characters. The reason we need this is to consume anything between what we are matching, e.g. anything between #EXT-X-STREAM-INF: and BANDWIDTH. There are many ways of doing this but .* is the most generic/broad one.

    \d is basically a set of characters that represent numbers (0-9), but since we define the string as a Java string, we need the double \\, otherwise the Java compiler will fail because it does not recognize the escaped character \d (in Java). Instead it will parse \\ into \ so that we get \d in the final string passed to the Pattern constructor.

    [\dx]+ means one or more characters (+) out of the characters 0-9 and x. [\dx\d] would be a single character (no +) out of the same set of characters.

    If you are interested in regex you could check out regular-expressions.info or/and regexone.com, there you will find much more in depth answers to all your questions.

提交回复
热议问题