data-manipulation | 易学教程

Importing from CSV from a specified range of values

阅读更多关于 Importing from CSV from a specified range of values

问题 I am trying to read in a CSV file and I am running into the following error. Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1097 did not have 5 elements After further inspection of the CSV file I find that around line 1097 there is a break in the rows and starts a new header with annualised data (I am interested in monthly for now). temp <- tempfile() download.file("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV

time to event for panel data

阅读更多关于 time to event for panel data

问题 I have a panel data set of country years. I would like to calculate time since event, as well as get a running total of events per country which I can decay over time. I am using the timeSinceEvent function in the doBy package, which returns a data frame which has the values that I want, but I am having trouble applying this to my main df. structure(list(ccode.a = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

change the hierarchy of a list in R

阅读更多关于 change the hierarchy of a list in R

问题 I have list like this, myList <- lapply(unique(diamonds$cut), function(x){ lst <- lapply(unique(diamonds$color), function(y){ dta <- diamonds[diamonds$cut == x & diamonds$color == y, ] lm(price ~ carat, data = dta) }) names(lst) <- unique(diamonds$color) return(lst) }) names(myList) <- unique(diamonds$cut) The structure is, > str(myList, max.level=2) List of 5 $ Ideal :List of 7 ..$ E:List of 12 .. ..- attr(*, "class")= chr "lm" ..$ I:List of 12 .. ..- attr(*, "class")= chr "lm" ..$ J:List of

R: Create barplot from aggregated data.frame

阅读更多关于 R: Create barplot from aggregated data.frame

问题 I have a result of "aggregate" like this: week year Severity 1 10 2013 26 2 11 2013 5 3 16 2013 26 I would like to draw a barplot with (as maximum) 52 bars (one for every week) with stacked bars of "severity" height for every year. I see from "barplot" documentation that I need a matrix for that. Of course I could use for/while and smth like that to get what I need, but I wonder if there's not a more "R-ish" way to solve this (seemingly pretty typical task) ? So, in more technical terms, I

Model Prediction for pooled regression model in panel data

阅读更多关于 Model Prediction for pooled regression model in panel data

问题 I'm trying to produce a predictive model where i performed multiple pooled regressions in each year (based on previous years) and thus allow coefficients to vary across time. (This might not make sense in the sample data provided, but it is done in practice for my sample). Here is what I came up so far: I adjusted my code to a reproducible sample from the plm package: The data is structured in the following way (panel) with firm, year indexed. > head(Grunfeld) firm year inv value capital 1 1

Delete rows containing specific words with additional conditions in R

阅读更多关于 Delete rows containing specific words with additional conditions in R

问题 I would like to get rid of rows where the word "plan" is included in keyword unless "advertising" or "marketing" is also included. Specifically in the sample dataset, the rows with keyword containing "hr plan" and "operation plan" should be deleted. keyword <- c("advertising plan", "advertising budget", "marketing plan", "marketing budget", "hr plan", "hr budget", "operation plan", "operation budget") indicator <- c(1,0,0,1,1,1,0,1) sample <- cbind(keyword,indicator) 回答1: Without using fancy

R: Convert values into pipe-delimited format

阅读更多关于 R: Convert values into pipe-delimited format

问题 I'm trying to create a REDCap data dictionary from an SPSS output. SPSS lists the allowed values, or factors, for each variable like this: SEX 0 Male 1 Female LANGUAGE 1 English 2 Spanish 3 Other 6 Unknown How can I convert the above to this format for REDCap: Variable Values SEX 0, Male | 1, Female LANGUAGE 1, English | 2, Spanish | 3, Other | 6, Unknown The language I'm best with is R. 回答1: Here's one approach that relies on sub() and tidyr::fill(). It returns a dataset that you may want to

Combining an ifelse statement with shift data.table function in R

阅读更多关于 Combining an ifelse statement with shift data.table function in R

问题 I am trying to work out how I would combine an ifelse statement with the shift function in data.table. My data looks like this: DF <- structure(list(CHR = c(1, 1, 1, 1, 1,1), SNP = c("rs2494631", "rs4648637", "rs2494627", "rs11122119", "rs1844583","rs2292242"), BP = c(2399149, 2401364, 2402499, 6768856, 8383469, 8385059), KBdist= c(NA, 2215, 1135, 4366357, 1614613, 1590), locus = c(1, NA, NA, NA, NA, NA)), .Names = c("CHR","SNP","BP","KBdist","locus"), row.names = c(NA, 6L), class = "data

Mapping (x, y) coordinates to nearest point of a set in R

阅读更多关于 Mapping (x, y) coordinates to nearest point of a set in R

问题 I am building a shiny application, and I have a line of code that is currently slowing me down quite a bit. I have the following dataframe, with 1008 unique (x,y) coordinates (apologies for the large copy and paste, although I think sharing this whole dataframe is helpful): dput(rounded_coords) structure(list(xspots = c(1, 2.5, 4, 5.5, 7, 8.5, 10, 11.5, 13, 14.5, 16, 17.5, 19, 20.5, 22, 23.5, 25, 26.5, 28, 29.5, 31, 32.5, 34, 35.5, 37, 38.5, 40, 41.5, 43, 44.5, 46, 47.5, 49, 1.75, 3.25, 4.75,

Pandas dataframe: count number of string value is in row for specific ID

阅读更多关于 Pandas dataframe: count number of string value is in row for specific ID

问题 I have the following use case: I want to make a dataframe where for each row I have a column where I can see how many interactions there have been for this ID (user) in the categories. The hardest thing to me is that they can't be double counted, while a match in just one of the categories is enough to be counted as 1. So for example I have: richtingen id 0 Marketing, Sales 1110 1 Marketing, Sales 1110 2 Finance 220 3 Marketing, Engineering 1110 4 IT 3300 Now I want to create a third row