readr : Turn off scientific notation in write_csv

后端未结

关注

 6  2021

I am using R to process Census data which uses really long numeric GEOIDs to identify the geographies. The issue I am facing is when writing out the processed data using w


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情书的邮戳        
                
              
                            
                2021-02-07 03:16
              
            
            
                                                                       
I'd rather suggest recoding such columns to type int, because if so write_* won't use scientific number encoding anymore. To convert all numeric columns in one pass (e.g. in case you're dealing with a counts matrix), you could do: 

require(dplyr)    
tbl_df = mutate_if(tbl_df, is.numeric, as.integer)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  深忆病人        
                
              
                            
                2021-02-07 03:21
              
            
            
                                                                       
I would suggest you use

write.csv((tbl_df(data.frame(GEOID = seq(from=60150001022000, to=60150001022005, 1)))), "test.csv")


instead of 

write_csv((tbl_df(data.frame(GEOID = seq(from=60150001022000, to=60150001022005, 1)))), "test.csv")


If I open test.csv it opens the file in Excel. Excel turns it into scientific notation.
When I right click and open with notepad it looks good and I see the original numbers without scientific notation.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2021-02-07 03:22
              
            
            
                                                                       
Use bit64, it is a S3 class for vectors for 64bit Integers

library(dplyr)
library(readr)
options(digits = 22)
tbl_df <- data.frame(GEOID = seq(from=60150001022000, to=60150001022005, 1))
> tbl_df
           GEOID
1 60150001022000
2 60150001022001
3 60150001022002
4 60150001022003
5 60150001022004
6 60150001022005

library(bit64)
tbl_df$GEOID <- as.integer64(tbl_df$GEOID)
write_csv(tbl_df,'test.csv')


If you read this data again in R, it will assign the correct datatype.

dfr <- read_csv('test.csv')
> dfr
Source: local data frame [6 x 1]

           GEOID
1 60150001022000
2 60150001022001
3 60150001022002
4 60150001022003
5 60150001022004
6 60150001022005 

> str(tbl_df)
'data.frame':   6 obs. of  1 variable:  
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   6 obs. of  1 variable:
 $ GEOID: num  6.02e+13 6.02e+13 6.02e+13 6.02e+13 6.02e+13 ...


Hope this helps. I opened the csv in a text editor, the numbers had "" around them. But it still worked.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  自闭症患者        
                
              
                            
                2021-02-07 03:28
              
            
            
                                                                       
It would probably be safer to use character values:

X <- tbl_df(data.frame(GEOID = as.character(seq(from=60150001022000, to=60150001022005))))

 write_csv(X, "test.csv")


It's a bit ironic that the write_csv function does coerce some of its output to character values, but not numeric columns. Only if a column passes the is.object test will it be coerced.  There does not appear to be a switch to throw that will preserve maximal precision. The write.table and its offspring write.csv functions have several switches that allow suppression of quotes and other settings that allow tailoring of output but write_csv has very little of such.

You can trick write_csv into thinking that a numeric column is something more complex and this does result in the as.character  output, albeit with quotes.

 class(X[[1]])<- c("num", "numeric")
 vapply(X, is.object, logical(1))
#GEOID 
# TRUE 

 write_csv(X, "")
#[1] #"\"GEOID\"\n\"60150001022000\"\n\"60150001022001\"\n\"60150001022002\"\n\"60150001022003\"\n\"60150001022004\"\n\"60150001022005\"\n"


As a matter of best practices I do not agree with your choice of insisting that ID-variables remain numeric. There is too much violence that can be applied to that storage mode for an object. You do not need any of the arithmetic operations for an ID-variable.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  名媛妹妹        
                
              
                            
                2021-02-07 03:29
              
            
            
                                                                       
 require(dplyr)    
 tbl_df = mutate_if(tbl_df, is.numeric, as.integer)


I think it would be safer to recode those columns into string while exporting the CSV in such a case
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情歌与酒        
                
              
                            
                2021-02-07 03:34
              
            
            
                                                                       
I wrote a pull request with a patch to improve the control of scientific notation in write_csv.

With this patch you would would have a int_use_scientific=FALSE argument in write_csv that would solve your issue. Hopefully it will be eventually merged.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复