readr

Parsing a CSV with irregular quoting rules using readr

[亡魂溺海] 提交于 2019-12-25 01:48:40
问题 I have a weird CSV that I can't parse with readr. Let's call it data.csv . It looks something like this: name,info,amount_spent John Doe,Is a good guy,5412030 Jane Doe,"Jan Doe" is cool,3159 Senator Sally Doe,"Sally "Sal" Doe is from New York, NY",4451 If all of the rows were like first one below the columns row – two character columns followed by an integer column – this would be easy to parse with read_csv : df <- read_csv("data.csv") However, some rows are formatted like the second one, in

How to accumulate the results of readr::read_lines_chunked?

戏子无情 提交于 2019-12-25 01:37:36
问题 I'm using readr::read_lines_chunked in the following way: if(!require(readr)) install.packages("readr", repos = "http://cran.us.r-project.org") mytb <- NULL read_lines_chunked(file="/tmp/huge.xml", chunk_size=10, callback = function(xml, pos) { // extract values from xml into tmp if (is.null(mytb)) { users <- as_tibble(tmp) } else { users <- bind_rows(users, as_tibble(tmp)) } }) but this doesn't work as mytb always ends up being null ... how do you accumulate the results into a tibble? 回答1: I

Finding the cause of an unwanted deletion within an lappy function

有些话、适合烂在心里 提交于 2019-12-25 01:12:03
问题 I uploaded a .txt file in to R as follows: Election_Parties <- readr::read_lines("Election_Parties.txt") The following text is in the file: pastebin link. The text more or less looks as follows (Please use actual file for solution!): BOLIVIA P1-Nationalist Revolutionary Movement-Free Bolivia Movement (Movimiento Nacionalista Revolucionario [MNR]) P19-Liberty and Justice (Libertad y Justicia [LJ]) P20-Tupak Katari Revolutionary Movement (Movimiento Revolucionario Tupak Katari [MRTK]) COLOMBIA

Remove empty columns from read_csv()

こ雲淡風輕ζ 提交于 2019-12-24 17:24:40
问题 I am trying to read the csv file linked here using read_csv() from the readr package, and then remove empty columns. If I use read.csv() instead, then the empty columns 8:12 can easily be removed using library(dplyr) select(data, 1:7) However, when I read the csv file using the read_csv() function, then the same code gives an error; Error: found duplicated column name: NA, NA, NA, NA How can I remove these empty columns? It seems pointless to properly name empty columns just so I can remove

readr::read_csv() — pass vector of character column names to import

我们两清 提交于 2019-12-23 17:23:21
问题 I am writing a function that accepts a vector of column names to be read from a CSV file using readr::read_csv() . I would like to read only the column names in the vector from the file, and I would like to use readr 's default column-type guessing algorithm. Is there a more direct way to accomplish this than creating a named list of col_guess() specifications as below? # test csv data test_csv <- "x,y,z\n1,2,3\n3,4,4\n5,6,7" # vector of column names to import col_names <- c("x", "y") #

readr::read_csv issue: Chinese Character becomes messy codes

僤鯓⒐⒋嵵緔 提交于 2019-12-19 09:25:23
问题 I'm trying to import a dataset to RStudio, however I am stuck with Chinese characters, as they become messy codes. Here is the code: library(tidyverse) df <- read_csv("中文,英文\n英文,德文") df # A tibble: 1 x 2 `\xd6\xd0\xce\xc4` `Ӣ\xce\xc4` <chr> <chr> 1 "<U+04E2>\xce\xc4" "<U+00B5>\xc2\xce\xc4" When I use the base function read.csv, it works well. I guess I must do something wrong with encoding. But there are no encoding option in read_csv, how can I do this? 回答1: This is because that the

R readr::read_fwf ignore characters using fwf_widths

淺唱寂寞╮ 提交于 2019-12-12 01:44:17
问题 I would like to know if there is an easy way to skip characters using the read_fwf from the readr package in R. For example, modifying one of the examples in the documentation library(readr) fwf_sample <- system.file("extdata/fwf-sample.txt", package = "readr") read_fwf(fwf_sample, fwf_widths(c(2, -3,2, 3))) throws the error: Error: Begin offset (2) must be smaller than end offset (-1) Using the base read.fwf function works just fine however: read.fwf(fwf_sample, widths = c(2,-3,2,3)) # V1 V2

Read All Excel Files into R by Sheet with file name as column

╄→尐↘猪︶ㄣ 提交于 2019-12-11 18:45:57
问题 I have a local folder with excel files in the same format. Each excel file has 10 sheets. I want to be able to do the following: 1) Read all the excel files in R 2) Rbind all the results together but by sheet . 3) Result would be 10 new dataframes with all the excel files rbinded together. 4) New column will be added with file name I have looked up code and the best I could find is this but it doesn't do it by sheet: files = list.files() library(plyr) library(readr) library(readxl) data2

fread : Specifying colClasses of file with 2 columns with same name, using col and col.1 does not work

懵懂的女人 提交于 2019-12-11 15:20:47
问题 fread : Specifying colClasses of file with 2 columns with same name, using col and col.1 does not work fread(file, colClasses(col = "character", col.1 = "character"), check.names = TRUE) It seems the check.names = TRUE is performed later, after the file is read. Is there any workaround to help do it. I need to preserve the precision of the columns col... 来源: https://stackoverflow.com/questions/58178779/fread-specifying-colclasses-of-file-with-2-columns-with-same-name-using-col-a

Parser does not match column name in .csv file when importing using readr package

馋奶兔 提交于 2019-12-11 04:41:50
问题 I am trying to import a .csv file into R that contains employment data from the BLS. When I attempt to import the data, every column works except the first, which gives me the error: EmpEd <- read_csv("~/Documents/Research/Global Business Research Center/Future of Education/EmploymentbyEd.csv", col_types = cols(`Date` = col_date(format = "%B-%y"), `LessHsPart` = col_number(), `HsPart` = col_number(), `SomeUgPart` = col_number(), `UgHighPart` = col_number(), `LessHsUp` = col_number(), `HsUp` =