write_csv read_csv with scientific notation after 1000th row

只愿长相守 提交于 2019-12-05 14:09:24

Adding the two answers, both correct, and the rationale as Community Wiki.

read_csv has an argument guess_max, which by default will be set to 1000. So read_csv only reads the first 1000 records before trying to figure out how each column should be parsed. Increasing guess_max to be larger than the total number of rows should fix the problem. – Marius 4 hours ago

You could also specify ,col_types= ..., as double or character. – CPak 3 hours ago

Using @CPak's suggestion will make your code more reproducible and your analyses more predictable in the long run. That's a primary reason read_csv() spits out a message about the colspec upon reading (so you can copy it and use it). Copy it, modify it and tell it to use a different type.

I just installed the dev version of readr: devtools::install_github("tidyverse/readr"), so now I have readr_1.2.0, and the NA problem went away. But the column "a" is "guessed" by read_csv() as dbl now (whether or not there is a large integer in it), whereas it was correctly read as int before, so if I need it as int I still have to do a as.integer() conversion. At least now it does not crash my code.

tib <- test_write_read(1, 1002, 1001, 1000)
tib %>% tail(n = 3)
## A tibble: 6 x 1
#        a
#    <dbl>
#1    1.00
#2 1000
#3    1.00

The large value is still written as 1e3 by write_csv(), though, so to my opinion this is not quite a final solution.

$ tail -n3 tib.csv
#1
#1e3
#1
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!