sapply

sapply vs. lapply while reading files and rbind'ing them

天大地大妈咪最大 提交于 2019-12-08 06:09:19
问题 I followed Hadley's thread: Issue in Loading multiple .csv files into single dataframe in R using rbind to read multiple CSV files and then convert them to one dataframe. I also experimented with lapply vs. sapply as discussed on Grouping functions (tapply, by, aggregate) and the *apply family. Here's my first CSV file: dput(File1) structure(list(First.Name = structure(c(1L, 2L, 1L, 1L, 1L), .Label = c("A", "C"), class = "factor"), Last.Name = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("B",

Matching two Columns in R

安稳与你 提交于 2019-12-08 05:42:05
问题 I have a big dataset df (354903 rows) with two columns named df$ColumnName and df$ColumnName.1 head(df) CompleteName CompleteName.1 1 Lefebvre Arnaud Lefebvre Schuhl Anne 1.1 Lefebvre Arnaud Abe Lyu 1.2 Lefebvre Arnaud Abe Lyu 1.3 Lefebvre Arnaud Louvet Nicolas 1.4 Lefebvre Arnaud Muller Jean Michel 1.5 Lefebvre Arnaud De Dinechin Florent I am trying to create labels to see weather the name is the same or not. When I try a small subset it works [1 if they are the same, 0 if not]: > match(df

Apply vs For loop in R

余生颓废 提交于 2019-12-08 03:51:55
问题 I wrote the following code to scrap tendering information from a portal on daily basis. packages <- c('rvest', 'stringi', 'tidyverse','lubridate','dplyr') purrr::walk(packages, library, character.only = TRUE, warn.conflicts = FALSE) start_time <- proc.time() Main Page to scrap and get total no of records. data <- read_html('https://eprocure.gov.in/mmp/latestactivetenders') total_tenders_raw <- html_nodes(data,xpath = '//*[(@id = "table")]') All_tenders <- data.frame(html_table(total_tenders

R - Select Elements from list that meet the criteria

偶尔善良 提交于 2019-12-08 03:33:11
问题 I had a tough time selecting elements from a list that meet a function. So documenting the same with a solution. check.digits <- function(x){ grepl('^(\\d+)$' , x) } x = "741 abc pqr street 71 15 41 510741" lx = strsplit(x, split = " ", fixed = TRUE) lapply(lx, check.digits) This does not work - lx[[1]][c(lapply(lx, check.digits))] Use - lx[[1]][sapply(lx, check.digits)] thanks!!! 回答1: Given what you're after, perhaps you should just use gregexpr + regmatches : regmatches(x, gregexpr("\\d+",

R - Select Elements from list that meet the criteria

一世执手 提交于 2019-12-07 02:29:28
I had a tough time selecting elements from a list that meet a function. So documenting the same with a solution. check.digits <- function(x){ grepl('^(\\d+)$' , x) } x = "741 abc pqr street 71 15 41 510741" lx = strsplit(x, split = " ", fixed = TRUE) lapply(lx, check.digits) This does not work - lx[[1]][c(lapply(lx, check.digits))] Use - lx[[1]][sapply(lx, check.digits)] thanks!!! Given what you're after, perhaps you should just use gregexpr + regmatches : regmatches(x, gregexpr("\\d+", x)) # [[1]] # [1] "741" "71" "15" "41" "510741" Or, from "qdapRegex", use rm_number : library(qdapRegex) rm

Convert a list of lists to a character vector

故事扮演 提交于 2019-12-06 21:05:11
问题 I have a list of lists of characters. For example: l <- list(list("A"),list("B"),list("C","D")) So as you can see some elements are lists of length > 1. I want to convert this list of lists to a character vector, but I'd like the lists with length > 1 to appear as a single element in the character vector. the unlist function does not achieve that but rather: > unlist(l) [1] "A" "B" "C" "D" Is there anything faster than: sapply(l,function(x) paste(unlist(x),collapse="")) To get my desired

Apply List of functions on List of columns based on different combinations

萝らか妹 提交于 2019-12-06 19:49:40
I have a dataframe df with three categorical variables cat1 , cat2 , cat3 and two continuous variables con1 , con2 . I would like to compute list of functions sd , mean on list of columns con1 , con2 based on different combinations of list of columns cat1 , cat2 , cat3 . I have done them explicitly subsetting all different combinations. # Random generation of values for categorical data set.seed(33) df <- data.frame(cat1 = sample( LETTERS[1:2], 100, replace=TRUE ), cat2 = sample( LETTERS[3:5], 100, replace=TRUE ), cat3 = sample( LETTERS[2:4], 100, replace=TRUE ), con1 = runif(100,0,100), con2

Matching two Columns in R

百般思念 提交于 2019-12-06 15:10:52
I have a big dataset df (354903 rows) with two columns named df$ColumnName and df$ColumnName.1 head(df) CompleteName CompleteName.1 1 Lefebvre Arnaud Lefebvre Schuhl Anne 1.1 Lefebvre Arnaud Abe Lyu 1.2 Lefebvre Arnaud Abe Lyu 1.3 Lefebvre Arnaud Louvet Nicolas 1.4 Lefebvre Arnaud Muller Jean Michel 1.5 Lefebvre Arnaud De Dinechin Florent I am trying to create labels to see weather the name is the same or not. When I try a small subset it works [1 if they are the same, 0 if not]: > match(df$CompleteName[1], df$CompleteName.1[1], nomatch = 0) [1] 0 > match(df$CompleteName[1:10], df$CompleteName

How to subset a list using another list?

怎甘沉沦 提交于 2019-12-06 15:02:22
问题 I have two lists and I want to subset data in a list using another list. Say, I have lists called mylist and hislist : mylist <- list(a = data.frame(cola = 1:3, colb = 4:6), b = data.frame(cola = 1:3, colb = 6:8)) > mylist $a cola colb 1 1 4 2 2 5 3 3 6 $b cola colb 1 1 6 2 2 7 3 3 8 > and hislist hislist <- list(a = 5:6, b = 7:8) > hislist $a [1] 5 6 $b [1] 7 8 I tried to subset mylist using lapply function: lapply(mylist, function(x) subset(x, colb %in% hislist)) #or lapply(mylist, function

Determining if one value occurs once in a row of columns, but a second value doesn't occur at all

╄→гoц情女王★ 提交于 2019-12-05 13:08:08
Probably a terrible title, but I have a table of qualifiers stored as "1", "2", and "3". What I'm trying to do is is look in each row (approximately 300,000 rows, but variable.) and determine where a single "3" occurs, (if it occurs more than once, I am not interested in it) and the rest of the columns in that row have a "1", and return that to a list. (The number of columns and column names change based on the input files.) Instinctively I want to attempt this by doing nested for loops that index the row count, and then the column count, then some function that looks for one "3" and no "2"'s.