How to convert objects or string to time format when the input string/object is malformed that is %H does not exist for all rows

别说谁变了你拦得住时间么 提交于 2020-05-31 04:58:10

问题


A similar question has been asked before but has received no responses

I have looked through a number of forums for a solution. The other questions involve a year but mine does not - it is simply H:M:S

I web scraped this data which returned

Time - 36:42 38:34 1:38:32 1:41:18

Data samples here: Source data 1 and Source data 2

I need this time in minutes like so 36.70 38.57 98.53 101.30

To do this I tried this:

time_mins = []
for i in time_list:
    h, m, s = i.split(':')
    math = (int(h) * 3600 + int(m) * 60 + int(s))/60
    time_mins.append(math)

But that didn't work because 36:42 is not in the format H:M:S, so I tried to convert 36:42 using this

df1.loc[1:,6] = df1[6]+ timedelta(hours=0)

and this

df1['minutes'] = pd.to_datetime(df1[6], format='%H:%M:%S')

but have had no luck.

Can I do it at the extraction stage? I have to do it for over 500 rows

row_td = soup.find_all('td') 

If not, how can it do it after conversion into a data frame

Thanks in advance


回答1:


If your input (time delta string) only contains hours/minutes/seconds (no days etc.), you could use a custom function that you apply to the column:

import pandas as pd

df = pd.DataFrame({'Time': ['36:42', '38:34', '1:38:32', '1:41:18']})

def to_minutes(s):
    # split string s on ':', reverse so that seconds come first
    # multiply the result as type int with elements from tuple (1/60, 1, 60) to get minutes for each value
    # return the sum of these multiplications
    return sum(int(a)*b for a, b in zip(s.split(':')[::-1], (1/60, 1, 60)))

df['Minutes'] = df['Time'].apply(to_minutes)
# df['Minutes']
# 0     36.700000
# 1     38.566667
# 2     98.533333
# 3    101.300000
# Name: Minutes, dtype: float64

Edit: it took me a while to find it but this is a variation of this question. And my answer here is based on this reply.




回答2:


You were on the right track. Below has some modifications to your code and it gets the minutes.

Create a function

def get_time(i):
    ilist = i.split(':')
    if(len(ilist)==3):
        h, m, s = i.split(':')
    else:
        m, s = i.split(':')
        h = 0
math = (int(h) * 3600 + int(m) * 60 + int(s))/60
return np.round(math, 2)

Call the function using split

x = "36:42 38:34 1:38:32 1:41:18"
x = x.split(" ")
xmin = [get_time(i) for i in x]
xmin

Output

[36.7, 38.57, 98.53, 101.3]



回答3:


I have no experience with pandas, but here is something you may find useful

...
time_mins = []
for i in time_list:
    parts = i.split(':')
    minutes_multiplier = 1/60
    math = 0
    for part in reversed(parts):
        math += (minutes_multiplier * int(part))
        minutes_multiplier *= 60
    time_mins.append(math)
...



回答4:


I had earlier commented that @NileshIngle's response above was not working as it was giving me a

NameError: name 'h' is not defined.

A simple correction was required - moving h above m,s as it is the first variable referenced

h = 0 # move this above
m, s = i.split(':') 


 def get_time(i):
    ilist = i.split(':')
    if(len(ilist)==3):
        h, m, s = i.split(':')
    else:
        h = 0
        m, s = i.split(':')
    math = (int(h) * 3600 + int(m) * 60 + int(s))/60
    return np.round(math, 2)

I would like to thank @MrFuppes, @NileshIngle and @KaustubhBadrike for taking the time to respond. I have learned three different methods.



来源:https://stackoverflow.com/questions/61995881/how-to-convert-objects-or-string-to-time-format-when-the-input-string-object-is

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!