问题
I am looking into rewriting some data analysis code using Pandas (since I just discovered it) on Ubuntu 14.04 64-bit and I have hit upon some strange behaviour. My data files look like this:
26/09/2014 00:00:00 2.423009 -58.864655 3.312355E-7 6.257226E-8 302 305
26/09/2014 00:00:00 2.395637 -62.73302 3.321525E-7 7.065322E-8 302 305
26/09/2014 00:00:01 2.332541 -57.763269 3.285718E-7 6.873837E-8 302 305
26/09/2014 00:00:02 2.366828 -51.900812 3.262279E-7 7.397762E-8 302 305
26/09/2014 00:00:03 2.435500 -40.820161 3.241068E-7 6.777224E-8 302 305
26/09/2014 00:00:04 2.428922 -65.573049 3.212358E-7 6.761804E-8 302 305
26/09/2014 00:00:05 2.419931 -59.414711 3.185517E-7 7.243236E-8 302 305
26/09/2014 00:00:06 2.416663 -60.064279 3.209795E-7 6.242328E-8 302 305
26/09/2014 00:00:07 2.411954 -52.586242 3.184297E-7 5.825581E-8 302 304
26/09/2014 00:00:08 2.457342 -61.874388 3.151493E-7 6.327384E-8 303 304
Where columns are tab-separated. In order to read these into Pandas, I am using the following simple commands:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv("path/to/file.dat", sep="\t", header=None)
print data
This produces the following output:
0 1 2 3 4 5 6 7
0 26/09/2014 00:00:00 2.423009 -58.864655 0 6.257226e-08 302 305
1 26/09/2014 00:00:00 2.395637 -62.733020 0 7.065322e-08 302 305
2 26/09/2014 00:00:01 2.332541 -57.763269 0 6.873837e-08 302 305
3 26/09/2014 00:00:02 2.366828 -51.900812 0 7.397762e-08 302 305
4 26/09/2014 00:00:03 2.435500 -40.820161 0 6.777224e-08 302 305
5 26/09/2014 00:00:04 2.428922 -65.573049 0 6.761804e-08 302 305
6 26/09/2014 00:00:05 2.419931 -59.414711 0 7.243236e-08 302 305
7 26/09/2014 00:00:06 2.416663 -60.064279 0 6.242328e-08 302 305
8 26/09/2014 00:00:07 2.411954 -52.586242 0 5.825581e-08 302 304
9 26/09/2014 00:00:08 2.457342 -61.874388 0 6.327384e-08 303 304
[10 rows x 8 columns]
The important thing to notice here is column 4. Compare it to column 5, and to the original data. Column 5 has been rendered in scientific notation, while column 4 has not. It hasn't just zeroed out the column or converted it to int because:
>>> data[4][0]*1e7
3.3123550000000002
Which is what I would expect. So the data values are the same but the representation has changed. If this is just a cosmetic thing, then I could put up with it, but it makes me feel uneasy and I'd like to know what's going on here.
回答1:
Yes it's a cosmetic thing, you can change this using set_option:
In [21]:
pd.set_option('display.precision',20)
df[4]
Out[21]:
0 0.0000003312355
1 0.0000003321525
2 0.0000003285718
3 0.0000003262279
4 0.0000003241068
5 0.0000003212358
6 0.0000003185517
7 0.0000003209795
8 0.0000003184297
9 0.0000003151493
Name: 4, dtype: float64
The underlying data will not have been truncated and will be preserved including when you write the data back out to csv
If you are in iPython then you can check what the default settings are, for display precision (significant digits) it is 7 normally.
来源:https://stackoverflow.com/questions/26464334/python-pandas-scientific-notation-iconsistent