readr

big integers when reading file with readr in r

梦想的初衷 提交于 2019-12-11 00:01:48
问题 I wanted to use the readr package since I will work on some bigger files in the future. My problem is, that there is a column called Intensity which has some very big values (e.g. 5493500000 ). My problem is, the first time this big value appears is in line 2200 and readr already defined the column as integer instead of numeric and produces a buffer overflow. Is there a way to only provide a single column type to the read_tsv function, since I don't want to provide all (about) 40 columns the

write_csv read_csv with scientific notation after 1000th row

只愿长相守 提交于 2019-12-05 14:09:24
Writing a data frame with a mix of small integer entries (value less than 1000) and "large" ones (value 1000 or more) into csv file with write_csv() mixes scientific and non-scientific entries. If the first 1000 rows are small values but there is a large value thereafter, read_csv() seems to get confused with this mix and outputs NA for scientific notations: test_write_read <- function(small_value, n_fills, position, large_value) { tib <- tibble(a = rep(small_value, n_fills)) tib$a[position] <- large_value write_csv(tib, "tib.csv") tib <- read_csv("tib.csv") } The following lines do not make

What are permissible column objects of the form “col_*()” used in readr?

强颜欢笑 提交于 2019-12-02 01:08:04
问题 readr::read_csv is misreading some column types in a file I am loading so I want to use cols to set them manually. In ?read_csv , it says the col_types argument should be _"One of ‘NULL’, a ‘cols()’ specification, or a string. See ‘vignette("column-types")’ for more details". Well, vignette("column-types") gives vignette("column-types") not found so I tried ?cols . It says it accepts "column objects created by ‘col_*()’ or their abbreviated character names". What are the acceptable functions

passing named list to cols_only() [closed]

泄露秘密 提交于 2019-12-01 12:40:29
When I try to do something like this: data <- read_csv("blah.csv", + n_max = 100, + col_types = cols_only(list(files = "c")) + ) Error: Some `col_types` are not S3 collector objects: 1 so question is whether it is possible to pass a named list to cols_only() Sure, just use do.call to use the list as the parameters for the function, e.g. library(readr) read_csv(system.file('extdata', 'mtcars.csv', package = 'readr'), # sample data from readr col_types = do.call(cols_only, list(cyl = 'i'))) #> # A tibble: 32 × 1 #> cyl #> <int> #> 1 6 #> 2 6 #> 3 4 #> 4 6 #> 5 8 #> 6 6 #> 7 8 #> 8 4 #> 9 4 #> 10

passing named list to cols_only() [closed]

心已入冬 提交于 2019-12-01 10:57:13
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . When I try to do something like this: data <- read_csv("blah.csv", + n_max = 100, + col_types = cols_only(list(files = "c")) + ) Error: Some `col_types` are not S3 collector objects: 1 so question is whether it is possible to pass a named list to cols_only() 回答1: Sure, just use do.call to use the list as the

How to use “cols()” and “col_double” with respect to comma as decimal mark

半世苍凉 提交于 2019-12-01 09:26:37
I would like to parse my columns with the readr package to the right type while reading. Difficulty: the fields are separated by semicolon ( ; ), while comma ( , ) is used as decimal mark. library(readr) # Test data: T <- "Date;Time;Var1;Var2 01.01.2011;11:11;2,4;5,6 02.01.2011;12:11;2,5;5,5 03.01.2011;13:11;2,6;5,4 04:01.2011;14:11;2,7;5,3" read_delim(T, ";") # A tibble: 4 × 4 # Date Time Var1 Var2 # <chr> <time> <dbl> <dbl> # 1 01.01.2011 11:11:00 24 56 # 2 02.01.2011 12:11:00 25 55 # 3 03.01.2011 13:11:00 26 54 # 4 04:01.2011 14:11:00 27 53 So, I thought the parsing thing would work like

readr::read_csv issue: Chinese Character becomes messy codes

房东的猫 提交于 2019-12-01 08:02:28
I'm trying to import a dataset to RStudio, however I am stuck with Chinese characters, as they become messy codes. Here is the code: library(tidyverse) df <- read_csv("中文,英文\n英文,德文") df # A tibble: 1 x 2 `\xd6\xd0\xce\xc4` `Ӣ\xce\xc4` <chr> <chr> 1 "<U+04E2>\xce\xc4" "<U+00B5>\xc2\xce\xc4" When I use the base function read.csv, it works well. I guess I must do something wrong with encoding. But there are no encoding option in read_csv, how can I do this? This is because that the characters are marked as UTF-8 whereas the actual encoding is the system default (you can get by stringi::stri_enc

Suppress reader parse problems in r

試著忘記壹切 提交于 2019-11-30 18:35:49
I am currently reading in a file using the package readr . The idea is to use read_delim to read in row for row to find the maximum columns in my unstructured data file. The code outputs that there are parsing problems. I know of these and will deal with column type after import. Is there a way to turn off the problems() as the usual options(warn) is not working i=1 max_col <- 0 options(warn = -1) while(i != "stop") { n_col<- ncol(read_delim("file.txt", n_max = 1, skip = i, delim="\t")) if(n_col > max_col) { max_col <- n_col print(max_col) } i <- i+1 if(n_col==0) i<-"stop" } options(warn = 0)

How can I write dplyr groups to separate files?

╄→гoц情女王★ 提交于 2019-11-30 12:59:46
I'm trying to create separate .csv files for each group in a data frame grouped with dplyr's group_by function. So far I have something like by_cyl <- group_by(mtcars, cyl) do(by_cyl, write_csv(., "test.csv")) As expected, this writes a single .csv file with only the data from the last group. How can I modify this to write multiple .csv files, each with filenames that include cyl? You can wrap the csv write process in a custom function as follows. Note that the function has to return a data.frame else it returns an error Error: Results are not data frames at positions This will return 3 csv

Suppress reader parse problems in r

我的梦境 提交于 2019-11-30 03:22:48
问题 I am currently reading in a file using the package readr . The idea is to use read_delim to read in row for row to find the maximum columns in my unstructured data file. The code outputs that there are parsing problems. I know of these and will deal with column type after import. Is there a way to turn off the problems() as the usual options(warn) is not working i=1 max_col <- 0 options(warn = -1) while(i != "stop") { n_col<- ncol(read_delim("file.txt", n_max = 1, skip = i, delim="\t")) if(n