data-manipulation

Importing from CSV from a specified range of values

☆樱花仙子☆ 提交于 2019-12-11 16:20:01
问题 I am trying to read in a CSV file and I am running into the following error. Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1097 did not have 5 elements After further inspection of the CSV file I find that around line 1097 there is a break in the rows and starts a new header with annualised data (I am interested in monthly for now). temp <- tempfile() download.file("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_Factors_CSV

time to event for panel data

半世苍凉 提交于 2019-12-11 14:04:48
问题 I have a panel data set of country years. I would like to calculate time since event, as well as get a running total of events per country which I can decay over time. I am using the timeSinceEvent function in the doBy package, which returns a data frame which has the values that I want, but I am having trouble applying this to my main df. structure(list(ccode.a = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,

change the hierarchy of a list in R

萝らか妹 提交于 2019-12-11 12:47:31
问题 I have list like this, myList <- lapply(unique(diamonds$cut), function(x){ lst <- lapply(unique(diamonds$color), function(y){ dta <- diamonds[diamonds$cut == x & diamonds$color == y, ] lm(price ~ carat, data = dta) }) names(lst) <- unique(diamonds$color) return(lst) }) names(myList) <- unique(diamonds$cut) The structure is, > str(myList, max.level=2) List of 5 $ Ideal :List of 7 ..$ E:List of 12 .. ..- attr(*, "class")= chr "lm" ..$ I:List of 12 .. ..- attr(*, "class")= chr "lm" ..$ J:List of

R: Create barplot from aggregated data.frame

五迷三道 提交于 2019-12-11 10:50:33
问题 I have a result of "aggregate" like this: week year Severity 1 10 2013 26 2 11 2013 5 3 16 2013 26 I would like to draw a barplot with (as maximum) 52 bars (one for every week) with stacked bars of "severity" height for every year. I see from "barplot" documentation that I need a matrix for that. Of course I could use for/while and smth like that to get what I need, but I wonder if there's not a more "R-ish" way to solve this (seemingly pretty typical task) ? So, in more technical terms, I

Model Prediction for pooled regression model in panel data

旧巷老猫 提交于 2019-12-11 09:07:59
问题 I'm trying to produce a predictive model where i performed multiple pooled regressions in each year (based on previous years) and thus allow coefficients to vary across time. (This might not make sense in the sample data provided, but it is done in practice for my sample). Here is what I came up so far: I adjusted my code to a reproducible sample from the plm package: The data is structured in the following way (panel) with firm, year indexed. > head(Grunfeld) firm year inv value capital 1 1

Delete rows containing specific words with additional conditions in R

不羁岁月 提交于 2019-12-11 07:03:55
问题 I would like to get rid of rows where the word "plan" is included in keyword unless "advertising" or "marketing" is also included. Specifically in the sample dataset, the rows with keyword containing "hr plan" and "operation plan" should be deleted. keyword <- c("advertising plan", "advertising budget", "marketing plan", "marketing budget", "hr plan", "hr budget", "operation plan", "operation budget") indicator <- c(1,0,0,1,1,1,0,1) sample <- cbind(keyword,indicator) 回答1: Without using fancy

R: Convert values into pipe-delimited format

懵懂的女人 提交于 2019-12-11 06:49:56
问题 I'm trying to create a REDCap data dictionary from an SPSS output. SPSS lists the allowed values, or factors, for each variable like this: SEX 0 Male 1 Female LANGUAGE 1 English 2 Spanish 3 Other 6 Unknown How can I convert the above to this format for REDCap: Variable Values SEX 0, Male | 1, Female LANGUAGE 1, English | 2, Spanish | 3, Other | 6, Unknown The language I'm best with is R. 回答1: Here's one approach that relies on sub() and tidyr::fill(). It returns a dataset that you may want to

Combining an ifelse statement with shift data.table function in R

我与影子孤独终老i 提交于 2019-12-11 06:39:13
问题 I am trying to work out how I would combine an ifelse statement with the shift function in data.table. My data looks like this: DF <- structure(list(CHR = c(1, 1, 1, 1, 1,1), SNP = c("rs2494631", "rs4648637", "rs2494627", "rs11122119", "rs1844583","rs2292242"), BP = c(2399149, 2401364, 2402499, 6768856, 8383469, 8385059), KBdist= c(NA, 2215, 1135, 4366357, 1614613, 1590), locus = c(1, NA, NA, NA, NA, NA)), .Names = c("CHR","SNP","BP","KBdist","locus"), row.names = c(NA, 6L), class = "data

Mapping (x, y) coordinates to nearest point of a set in R

风流意气都作罢 提交于 2019-12-11 05:57:16
问题 I am building a shiny application, and I have a line of code that is currently slowing me down quite a bit. I have the following dataframe, with 1008 unique (x,y) coordinates (apologies for the large copy and paste, although I think sharing this whole dataframe is helpful): dput(rounded_coords) structure(list(xspots = c(1, 2.5, 4, 5.5, 7, 8.5, 10, 11.5, 13, 14.5, 16, 17.5, 19, 20.5, 22, 23.5, 25, 26.5, 28, 29.5, 31, 32.5, 34, 35.5, 37, 38.5, 40, 41.5, 43, 44.5, 46, 47.5, 49, 1.75, 3.25, 4.75,

Pandas dataframe: count number of string value is in row for specific ID

徘徊边缘 提交于 2019-12-11 05:48:16
问题 I have the following use case: I want to make a dataframe where for each row I have a column where I can see how many interactions there have been for this ID (user) in the categories. The hardest thing to me is that they can't be double counted, while a match in just one of the categories is enough to be counted as 1. So for example I have: richtingen id 0 Marketing, Sales 1110 1 Marketing, Sales 1110 2 Finance 220 3 Marketing, Engineering 1110 4 IT 3300 Now I want to create a third row