问题
I have a Numpy array of datetime64 objects that I need to convert to a specific time format yyyy-mm-dd,HH:MM:SS.SSS
Numpy has a function called datetime_as_string
that outputs ISO8601 (yyyy-mm-ddTHH:MM:SS.SSS) time, which is extremely close to what I want, the only difference being there is a T where I want a comma.
Is there a way to quickly swap the "T" for a ","? Here is an example data set:
offset = np.arange(0, 1000)
epoch = np.datetime64('1970-01-01T00:00:00.000')
time_objects = epoch + offset.astype('timedelta64[ms]')
time_strings = np.datetime_as_string(time_objects)
I have had success using a lambda and a list comprehension, but it seems awkward switching back and forth from a Python list to a Numpy array.
f = lambda x: x[:10] + ',' + x[11:]
np.array([f(x) for x in time_strings])
I know in some cases lambdas can be applied "direct" to a Numpy array, but it doesn't work in this case. f(time_strings)
produces a TypeError. Any thoughts?
I know I could convert back to a Python datetime (which is the direction I'm coming from) or use Pandas. But the datetime_as_string
function is really fast and I'd like to stick to Numpy solution.
--- Conclusions based on answers ---
It turns out that Paul's view casting black magic was 75x faster than my list comprehension, and 100x faster than np.char.replace()
. Here are the results from the three methods (all were initialized with the above dataset, but with 1000000 elements).
start = time.time()
time_strings[..., None].view('U1')[..., 10] = ','
print(time.time() - start)
0.016000747680664062 seconds
start = time.time()
f = lambda x: x[:10] + ',' + x[11:]
time_strings = np.array([f(x) for x in time_strings])
print(time.time() - start, 'seconds')
1.1740672588348389 seconds
start = time.time()
time_strings = np.char.replace(time_strings,'T',',')
print(time.time() - start, 'seconds')
1.4980854988098145 seconds
回答1:
You could use viewcasting to get access to individual characters:
time_strings[...,None].view('U1')[...,10] = ','
changes time_strings
in-place.
回答2:
In [309]: np.char.replace(time_strings,'T',',')
Out[309]:
array(['1970-01-01,00:00:00.000', '1970-01-01,00:00:00.001',
'1970-01-01,00:00:00.002', '1970-01-01,00:00:00.003',
'1970-01-01,00:00:00.004', '1970-01-01,00:00:00.005',
'1970-01-01,00:00:00.006', '1970-01-01,00:00:00.007',
....
But @PaulPanzer's inplace is much faster (even it is a bit more obscure):
In [316]: %%timeit temp=time_strings.copy()
...: temp[...,None].view('U1')[...,10] = ','
8.48 µs ± 34.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [317]: timeit np.char.replace(time_strings,'T',',')
1.23 ms ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
来源:https://stackoverflow.com/questions/58227354/replace-a-single-character-in-a-numpy-list-of-strings