For the sake of interest I want to convert video durations from YouTubes ISO 8601
to seconds. To future proof my solution, I picked a really long video to test it a
Python's built-in dateutil module only supports parsing ISO 8601 dates, not ISO 8601 durations. For that, you can use the "isodate" library (in pypi at https://pypi.python.org/pypi/isodate -- install through pip or easy_install). This library has full support for ISO 8601 durations, converting them to datetime.timedelta objects. So once you've imported the library, it's as simple as:
dur=isodate.parse_duration('P1W2DT6H21M32S')
print dur.total_seconds()
Here's my answer which takes 9000's regex solution (thank you - amazing mastery of regex!) and finishes the job for the original poster's YouTube use case i.e. converting hours, minutes, and seconds to seconds. I used .groups()
instead of .groupdict()
, followed by a couple of lovingly constructed list comprehensions.
import re
def yt_time(duration="P1W2DT6H21M32S"):
"""
Converts YouTube duration (ISO 8061)
into Seconds
see http://en.wikipedia.org/wiki/ISO_8601#Durations
"""
ISO_8601 = re.compile(
'P' # designates a period
'(?:(?P<years>\d+)Y)?' # years
'(?:(?P<months>\d+)M)?' # months
'(?:(?P<weeks>\d+)W)?' # weeks
'(?:(?P<days>\d+)D)?' # days
'(?:T' # time part must begin with a T
'(?:(?P<hours>\d+)H)?' # hours
'(?:(?P<minutes>\d+)M)?' # minutes
'(?:(?P<seconds>\d+)S)?' # seconds
')?') # end of time part
# Convert regex matches into a short list of time units
units = list(ISO_8601.match(duration).groups()[-3:])
# Put list in ascending order & remove 'None' types
units = list(reversed([int(x) if x != None else 0 for x in units]))
# Do the maths
return sum([x*60**units.index(x) for x in units])
Sorry for not posting higher up - still new here and not enough reputation points to add comments.
Works on python 2.7+. Adopted from a JavaScript one-liner for Youtube v3 question here.
import re
def YTDurationToSeconds(duration):
match = re.match('PT(\d+H)?(\d+M)?(\d+S)?', duration).groups()
hours = _js_parseInt(match[0]) if match[0] else 0
minutes = _js_parseInt(match[1]) if match[1] else 0
seconds = _js_parseInt(match[2]) if match[2] else 0
return hours * 3600 + minutes * 60 + seconds
# js-like parseInt
# https://gist.github.com/douglasmiranda/2174255
def _js_parseInt(string):
return int(''.join([x for x in string if x.isdigit()]))
# example output
YTDurationToSeconds(u'PT15M33S')
# 933
Handles iso8061 duration format to extent Youtube Uses up to hours
Extending on 9000's answer, apparently Youtube's format is using weeks, but not months which means total seconds can be easily computed.
Not using named groups here because I initially needed this to work with PySpark.
from operator import mul
from itertools import accumulate
import re
from typing import Pattern, List
SECONDS_PER_SECOND: int = 1
SECONDS_PER_MINUTE: int = 60
MINUTES_PER_HOUR: int = 60
HOURS_PER_DAY: int = 24
DAYS_PER_WEEK: int = 7
WEEKS_PER_YEAR: int = 52
ISO8601_PATTERN: Pattern = re.compile(
r"P(?:(\d+)Y)?(?:(\d+)W)?(?:(\d+)D)?"
r"T(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?"
)
def extract_total_seconds_from_ISO8601(iso8601_duration: str) -> int:
"""Compute duration in seconds from a Youtube ISO8601 duration format. """
MULTIPLIERS: List[int] = (
SECONDS_PER_SECOND, SECONDS_PER_MINUTE, MINUTES_PER_HOUR,
HOURS_PER_DAY, DAYS_PER_WEEK, WEEKS_PER_YEAR
)
groups: List[int] = [int(g) if g is not None else 0 for g in
ISO8601_PATTERN.match(iso8601_duration).groups()]
return sum(g * multiplier for g, multiplier in
zip(reversed(groups), accumulate(MULTIPLIERS, mul)))
This works by parsing the input string 1 character at a time, if the character is numerical it simply adds it (string add, not mathematical add) to the current value being parsed. If it is one of 'wdhms' the current value is assigned to the appropriate variable (week, day, hour, minute, second), and value is then reset ready to take the next value. Finally it sum the number of seconds from the 5 parsed values.
def ytDurationToSeconds(duration): #eg P1W2DT6H21M32S
week = 0
day = 0
hour = 0
min = 0
sec = 0
duration = duration.lower()
value = ''
for c in duration:
if c.isdigit():
value += c
continue
elif c == 'p':
pass
elif c == 't':
pass
elif c == 'w':
week = int(value) * 604800
elif c == 'd':
day = int(value) * 86400
elif c == 'h':
hour = int(value) * 3600
elif c == 'm':
min = int(value) * 60
elif c == 's':
sec = int(value)
value = ''
return week + day + hour + min + sec
So this is what I came up with - a custom parser to interpret the time:
def durationToSeconds(duration):
"""
duration - ISO 8601 time format
examples :
'P1W2DT6H21M32S' - 1 week, 2 days, 6 hours, 21 mins, 32 secs,
'PT7M15S' - 7 mins, 15 secs
"""
split = duration.split('T')
period = split[0]
time = split[1]
timeD = {}
# days & weeks
if len(period) > 1:
timeD['days'] = int(period[-2:-1])
if len(period) > 3:
timeD['weeks'] = int(period[:-3].replace('P', ''))
# hours, minutes & seconds
if len(time.split('H')) > 1:
timeD['hours'] = int(time.split('H')[0])
time = time.split('H')[1]
if len(time.split('M')) > 1:
timeD['minutes'] = int(time.split('M')[0])
time = time.split('M')[1]
if len(time.split('S')) > 1:
timeD['seconds'] = int(time.split('S')[0])
# convert to seconds
timeS = timeD.get('weeks', 0) * (7*24*60*60) + \
timeD.get('days', 0) * (24*60*60) + \
timeD.get('hours', 0) * (60*60) + \
timeD.get('minutes', 0) * (60) + \
timeD.get('seconds', 0)
return timeS
Now it probably is super non-cool and so on, but it works, so I'm sharing because I care about you people.