tibble

dplyr unquoting does not work with filter function

為{幸葍}努か 提交于 2019-12-22 06:44:12
问题 maybe I am missing something, but I can't seem to make dplyr's unquoting operator to work with the filter function. It does with with select, but not with filter... Example set.seed(1234) A = matrix(rnorm(100),nrow = 10, ncol = 10) colnames(A) <- paste("var", seq(1:10), sep = "") varname_test <- "var2" A <- as_tibble(A) select(A, !!varname_test) #this works as expected # this does NOT give me only the rows where var2 # is positive (result1 <- filter(A, !!varname_test > 0)) # This is how the

How to add metadata to a tibble

余生颓废 提交于 2019-12-21 19:24:40
问题 How does one add metadata to a tibble? I would like a sentence describing each of my variable names such that I could print out the tibble with the associated metadata and if I handed it to someone who hadn't seen the data before, they could make some sense of it. as_tibble(iris) # A tibble: 150 × 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species <dbl> <dbl> <dbl> <dbl> <fctr> 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3

R: add a new column to dataframes from a function

放肆的年华 提交于 2019-12-20 05:19:08
问题 I have many tibbles similar to this: dftest_tw <- structure(list(text = c("RT @BitMEXdotcom: A new high: US$500M turnover in the last 24 hours, over 80% of it on $XBTUSD. Congrats to the team and thank you to our u…", "RT @Crowd_indicator: Thank you for this nice video, @Nicholas_Merten", "RT @Crowd_indicator: Review of #Cindicator by DataDash: t.co/D0da3u5y3V" ), Tweet.id = c("896858423521837057", "896858275689398272", "896858135314538497" ), created.date = structure(c(17391, 17391, 17391),

Column name of last non-NA row per row; using tidyverse solution?

▼魔方 西西 提交于 2019-12-20 03:43:21
问题 Brief Dataset description: I have survey data generated from Qualtrics, which I've imported into R as a tibble. Each column corresponds to a survey question, and I've preserved the original column order (to correspond with the order of the questions in the survey). Problem in plain language: Due to normal participant attrition, not all participants completed all of the questions in the survey. I want to know how far each participant got in the survey, and the last question they each answered

What is the difference between as.tibble(), as_data_frame(), and tbl_df()?

和自甴很熟 提交于 2019-12-18 13:07:16
问题 I remember reading somewhere that as.tibble() is an alias for as_data_frame() , but I don't know what exactly an alias is in programming terminology. Is it similar to a wrapper? So I guess my question probably comes down to the difference in possible usages between tbl_df() and as_data_frame() : what are the differences between them, if any? More specifically, given a (non-tibble) data frame df , I often turn it into a tibble by using: df <- tbl_df(df) Wouldn't df <- as_data_frame(df) do the

Unnest a list column directly into several columns

给你一囗甜甜゛ 提交于 2019-12-17 19:57:53
问题 Can I unnest a list column directly into n columns? The list can be assumed to regular, with all elements being of equal length. If instead of a list column I would have a character vector, I could tidyr::separate . I can tidyr::unnest , but we need another helper variable to be able to tidyr::spread . Am I missing an obvious method? Example data: library(tibble) df1 <- data_frame( gr = c('a', 'b', 'c'), values = list(1:2, 3:4, 5:6) ) # A tibble: 3 x 2 gr values <chr> <list> 1 a <int [2]> 2 b

Distinct in dplyr does not work (sometimes)

倖福魔咒の 提交于 2019-12-11 15:28:54
问题 I have the following data frame which I have obtained from a count. I have used dput to make the data frame available and then edited the data frame so there is a duplicate of A . df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"), class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n")) print(df) # A tibble: 4 x 2 Procedure n <fct> <int> 1 D 10717 2 A

Calculate function on a column of nested tibbles?

这一生的挚爱 提交于 2019-12-11 06:24:44
问题 I have a dataframe with a column of tibbles. Here is a portion of my data: date time uuid data 2018-06-23 18:25:24 0b27ea5fad61c99d <tibble> 2018-06-23 18:25:38 0b27ea5fad61c99d <tibble> 2018-06-23 18:26:01 0b27ea5fad61c99d <tibble> 2018-06-23 18:26:23 0b27ea5fad61c99d <tibble> 2018-06-23 18:26:37 0b27ea5fad61c99d <tibble> 2018-06-23 18:27:00 0b27ea5fad61c99d <tibble> 2018-06-23 18:27:22 0b27ea5fad61c99d <tibble> 2018-06-23 18:27:39 0b27ea5fad61c99d <tibble> 2018-06-23 18:28:06

Running “apply” command on a very large data frame

£可爱£侵袭症+ 提交于 2019-12-11 06:03:44
问题 I have a tibble in R that has dimension of 15,000,000 x 140 . Size-wise it's about 6 gb. I want to check if any of columns 11-40 for a given row start in a specific list. I want to get out a vector of 1 & 0's that is then 15,000,000 long. I can do this using the following: subResult <- apply(rawData[,11:40], c(1,2), function(x){substring(x,1,3) %in% c("295", "296", "297", "298", "299")}) result <- apply(subResult, 1, sum) Problem is that this is way too slow -- it would take over 1 day to do

Exclude groups with NAs in tidy dataset

假装没事ソ 提交于 2019-12-11 05:59:43
问题 I have a tidy tibble with a value column identified by 4 ID columns. > MWA # A tibble: 16 x 5 # Groups: Dir [2] VP Con Dir Seg time_seg <int> <int> <int> <int> <int> 1 10 2 1 1 1810 2 10 2 1 2 260 3 10 2 1 3 540 4 10 2 1 4 1470 5 10 2 1 5 460 6 10 2 1 6 690 7 10 2 1 7 760 8 10 2 1 8 NA 9 10 2 2 1 320 10 10 2 2 2 1110 11 10 2 2 3 450 12 10 2 2 4 600 13 10 2 2 5 1680 14 10 2 2 6 730 15 10 2 2 7 850 16 10 2 2 8 840 The dput to reproduce is > dput(MWA) structure(list(VP = c(10L, 10L, 10L, 10L,