Is there a preferred way to keep the data type of a numpy
array fixed as int
(or int64
or whatever), while still having an element ins
This capability has been added to pandas (beginning with version 0.24): https://pandas.pydata.org/pandas-docs/version/0.24/whatsnew/v0.24.0.html#optional-integer-na-support
At this point, it requires the use of extension dtype Int64 (capitalized), rather than the default dtype int64 (lowercase).
If performance is not the main issue, you can store strings instead.
df.col = df.col.dropna().apply(lambda x: str(int(x)) )
Then you can mix then with NaN
as much as you want. If you really want to have integers, depending on your application, you can use -1
, or 0
, or 1234567890
, or some other dedicated value to represent NaN
.
You can also temporarily duplicate the columns: one as you have, with floats; the other one experimental, with ints or strings. Then inserts asserts
in every reasonable place checking that the two are in sync. After enough testing you can let go of the floats.