sapply | 易学教程

sapply vs. lapply while reading files and rbind'ing them

阅读更多关于 sapply vs. lapply while reading files and rbind'ing them

问题 I followed Hadley's thread: Issue in Loading multiple .csv files into single dataframe in R using rbind to read multiple CSV files and then convert them to one dataframe. I also experimented with lapply vs. sapply as discussed on Grouping functions (tapply, by, aggregate) and the *apply family. Here's my first CSV file: dput(File1) structure(list(First.Name = structure(c(1L, 2L, 1L, 1L, 1L), .Label = c("A", "C"), class = "factor"), Last.Name = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("B",

Matching two Columns in R

阅读更多关于 Matching two Columns in R

问题 I have a big dataset df (354903 rows) with two columns named df$ColumnName and df$ColumnName.1 head(df) CompleteName CompleteName.1 1 Lefebvre Arnaud Lefebvre Schuhl Anne 1.1 Lefebvre Arnaud Abe Lyu 1.2 Lefebvre Arnaud Abe Lyu 1.3 Lefebvre Arnaud Louvet Nicolas 1.4 Lefebvre Arnaud Muller Jean Michel 1.5 Lefebvre Arnaud De Dinechin Florent I am trying to create labels to see weather the name is the same or not. When I try a small subset it works [1 if they are the same, 0 if not]: > match(df

Apply vs For loop in R

阅读更多关于 Apply vs For loop in R

问题 I wrote the following code to scrap tendering information from a portal on daily basis. packages <- c('rvest', 'stringi', 'tidyverse','lubridate','dplyr') purrr::walk(packages, library, character.only = TRUE, warn.conflicts = FALSE) start_time <- proc.time() Main Page to scrap and get total no of records. data <- read_html('https://eprocure.gov.in/mmp/latestactivetenders') total_tenders_raw <- html_nodes(data,xpath = '//*[(@id = "table")]') All_tenders <- data.frame(html_table(total_tenders

R - Select Elements from list that meet the criteria

阅读更多关于 R - Select Elements from list that meet the criteria

问题 I had a tough time selecting elements from a list that meet a function. So documenting the same with a solution. check.digits <- function(x){ grepl('^(\\d+)$' , x) } x = "741 abc pqr street 71 15 41 510741" lx = strsplit(x, split = " ", fixed = TRUE) lapply(lx, check.digits) This does not work - lx[[1]][c(lapply(lx, check.digits))] Use - lx[[1]][sapply(lx, check.digits)] thanks!!! 回答1: Given what you're after, perhaps you should just use gregexpr + regmatches : regmatches(x, gregexpr("\\d+",

R - Select Elements from list that meet the criteria

阅读更多关于 R - Select Elements from list that meet the criteria

I had a tough time selecting elements from a list that meet a function. So documenting the same with a solution. check.digits <- function(x){ grepl('^(\\d+)$' , x) } x = "741 abc pqr street 71 15 41 510741" lx = strsplit(x, split = " ", fixed = TRUE) lapply(lx, check.digits) This does not work - lx[[1]][c(lapply(lx, check.digits))] Use - lx[[1]][sapply(lx, check.digits)] thanks!!! Given what you're after, perhaps you should just use gregexpr + regmatches : regmatches(x, gregexpr("\\d+", x)) # [[1]] # [1] "741" "71" "15" "41" "510741" Or, from "qdapRegex", use rm_number : library(qdapRegex) rm

Convert a list of lists to a character vector

阅读更多关于 Convert a list of lists to a character vector

问题 I have a list of lists of characters. For example: l <- list(list("A"),list("B"),list("C","D")) So as you can see some elements are lists of length > 1. I want to convert this list of lists to a character vector, but I'd like the lists with length > 1 to appear as a single element in the character vector. the unlist function does not achieve that but rather: > unlist(l) [1] "A" "B" "C" "D" Is there anything faster than: sapply(l,function(x) paste(unlist(x),collapse="")) To get my desired

Apply List of functions on List of columns based on different combinations

阅读更多关于 Apply List of functions on List of columns based on different combinations

I have a dataframe df with three categorical variables cat1 , cat2 , cat3 and two continuous variables con1 , con2 . I would like to compute list of functions sd , mean on list of columns con1 , con2 based on different combinations of list of columns cat1 , cat2 , cat3 . I have done them explicitly subsetting all different combinations. # Random generation of values for categorical data set.seed(33) df <- data.frame(cat1 = sample( LETTERS[1:2], 100, replace=TRUE ), cat2 = sample( LETTERS[3:5], 100, replace=TRUE ), cat3 = sample( LETTERS[2:4], 100, replace=TRUE ), con1 = runif(100,0,100), con2

Matching two Columns in R

阅读更多关于 Matching two Columns in R

I have a big dataset df (354903 rows) with two columns named df$ColumnName and df$ColumnName.1 head(df) CompleteName CompleteName.1 1 Lefebvre Arnaud Lefebvre Schuhl Anne 1.1 Lefebvre Arnaud Abe Lyu 1.2 Lefebvre Arnaud Abe Lyu 1.3 Lefebvre Arnaud Louvet Nicolas 1.4 Lefebvre Arnaud Muller Jean Michel 1.5 Lefebvre Arnaud De Dinechin Florent I am trying to create labels to see weather the name is the same or not. When I try a small subset it works [1 if they are the same, 0 if not]: > match(df$CompleteName[1], df$CompleteName.1[1], nomatch = 0) [1] 0 > match(df$CompleteName[1:10], df$CompleteName

How to subset a list using another list?

阅读更多关于 How to subset a list using another list?

问题 I have two lists and I want to subset data in a list using another list. Say, I have lists called mylist and hislist : mylist <- list(a = data.frame(cola = 1:3, colb = 4:6), b = data.frame(cola = 1:3, colb = 6:8)) > mylist $a cola colb 1 1 4 2 2 5 3 3 6 $b cola colb 1 1 6 2 2 7 3 3 8 > and hislist hislist <- list(a = 5:6, b = 7:8) > hislist $a [1] 5 6 $b [1] 7 8 I tried to subset mylist using lapply function: lapply(mylist, function(x) subset(x, colb %in% hislist)) #or lapply(mylist, function

Determining if one value occurs once in a row of columns, but a second value doesn't occur at all

阅读更多关于 Determining if one value occurs once in a row of columns, but a second value doesn't occur at all

Probably a terrible title, but I have a table of qualifiers stored as "1", "2", and "3". What I'm trying to do is is look in each row (approximately 300,000 rows, but variable.) and determine where a single "3" occurs, (if it occurs more than once, I am not interested in it) and the rest of the columns in that row have a "1", and return that to a list. (The number of columns and column names change based on the input files.) Instinctively I want to attempt this by doing nested for loops that index the row count, and then the column count, then some function that looks for one "3" and no "2"'s.