data-manipulation

Finding maximum value of one column (by group) and inserting value into another data frame in R

痞子三分冷 提交于 2020-01-09 08:13:41
问题 All, I was hoping someone could find a solution to an issue of mine that isn't necessarily causing headaches, but, as of right now, invites the possibility for human error in creating a data set for a project on which I'm working. The data set I'm using right now is a directed dyad-year (A vs. B, B vs. A) data set for select pairs of countries for every year between 1950 and 2010. Some countries, like A in my example, will be paired with every country in the world and every country will be

How to extract certain under specific condition in pandas? (Sentimental analysis)

自古美人都是妖i 提交于 2020-01-07 08:25:18
问题 The picture is what my dataframe looks like. I have user_name, movie_name and time column. I want to extract only rows that are first day of certain movie. For example, if movie a's first date in the time column is 2018-06-27, i want all the rows in that date and if movie b's first date in the time column is 2018-06-12, i only want those rows. How would i do that with pandas? 回答1: I assume that time column is of datetime type. If not, convert this column calling pd.to_datetime . Then run: df

How to eliminate suspicious barcode (like 123456) data [closed]

有些话、适合烂在心里 提交于 2020-01-06 06:58:29
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago . Here's some bar code data from a pandas database 737318 Sikat Botol Pigeon 4902508045506 75170 737379 Natur Manual Breast Pump 8850851860016 75170 738753 Sunlight 1232131321313 75261 739287 Bodymist bodyshop 1122334455667 75296 739677 Bodymist ale 1234567890123 75367 I want to

How to eliminate suspicious barcode (like 123456) data [closed]

走远了吗. 提交于 2020-01-06 06:58:12
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago . Here's some bar code data from a pandas database 737318 Sikat Botol Pigeon 4902508045506 75170 737379 Natur Manual Breast Pump 8850851860016 75170 738753 Sunlight 1232131321313 75261 739287 Bodymist bodyshop 1122334455667 75296 739677 Bodymist ale 1234567890123 75367 I want to

Transform unique values of a column into multiple columns containing their corresponding values in another column

元气小坏坏 提交于 2020-01-05 06:59:12
问题 Here is a sample dataframe similar to the one I'm working with: set.seed(74.3) df<-data.frame(ID=sample(c(1000:1004),size=20,replace=T), Fruit=sample(c("Apple","Banana","Pear","Orange","Plum"),size=20,replace=T)) library(dplyr) df <- df %>% group_by(ID,Fruit) %>% summarise(n=n()) ID Fruit n (int) (fctr) (int) 1 1000 Banana 1 2 1000 Orange 3 3 1000 Pear 1 4 1001 Banana 1 5 1001 Plum 2 6 1002 Banana 1 7 1003 Banana 1 8 1003 Orange 2 9 1003 Pear 1 10 1003 Plum 1 11 1004 Apple 2 12 1004 Banana 2

python flatten an array of array

随声附和 提交于 2020-01-05 04:39:07
问题 I have an array of array, something like that: array([[array([33120, 28985, 9327, 45918, 30035, 17794, 40141, 1819, 43668], dtype=int64)], [array([33754, 24838, 17704, 21903, 17668, 46667, 17461, 32665], dtype=int64)], [array([46842, 26434, 39758, 27761, 10054, 21351, 22598, 34862, 40285, 17616, 25146, 32645, 41276], dtype=int64)], ..., [array([24534, 8230, 14267, 9352, 3543, 29397, 900, 32398, 34262, 37646, 11930, 37173], dtype=int64)], [array([25157], dtype=int64)], [array([ 8859, 20850,

R - Composite functions vs piped functions in `purrr::map()`

三世轮回 提交于 2020-01-05 03:37:48
问题 I have the following list: my_list = list(alpha = list('hi'), beta = list(1:4, 'how'), gamma = list(TRUE, FALSE, 'are')) str(my_list) List of 3 $ alpha:List of 1 ..$ : chr "hi" $ beta :List of 2 ..$ : int [1:4] 1 2 3 4 ..$ : chr "how" $ gamma:List of 3 ..$ : logi TRUE ..$ : logi FALSE ..$ : chr "are" I would like to figure out which data types are contained within each level 1 element. To do this, I can use the following pipeline: piped = map(my_list, ~map(., class) %>% unique %>% unlist) str

R - Calculate difference (similarity measure) between similar datasets

时光怂恿深爱的人放手 提交于 2020-01-02 11:03:50
问题 I have seen many questions that touch on this topic but haven't yet found an answer. If I have missed a question that does answer this question, please do mark this and point us to the question. Scenario: We have a benchmark dataset, we have imputation methods, we systematically delete values from the benchmark and use two different imputation methods. Thus we have a benchmark, imputedData1 and imputedData2. Question: Is there a function that can produce a number that represents the

Writing a generic function for “find and replace” in R

放肆的年华 提交于 2020-01-01 07:37:09
问题 I need to write a generic function for "find and replace in R". How can I write a function that takes the following inputs A CSV file (or data frame) A string to find, for example "name@email.com" A string the replace the found string with, for example "medium" and rewrites the CSV file/data frame so that all the found strings are replaced with the replacement string? 回答1: Here's a quick function to do the job: library(stringr) replace_all <- function(df, pattern, replacement) { char <-

Reading csv from pandas having both quotechar and delimiter for a column value

倾然丶 夕夏残阳落幕 提交于 2019-12-31 03:08:08
问题 Here is the content of a csv file 'test.csv', i am trying to read it via pandas read_csv() "col1", "col2", "col3", "col4" "v1", "v2", "v3", "v4" "v21", "v22", "v23", "this, "creating, what to do? " problems" This is the command i am using - messages = pd.read_csv('test.csv', sep=',', skipinitialspace=True) But i am getting the following error - CParserError: Error tokenizing data. C error: Expected 4 fields in line 3, saw 5 i want the content for column4 in line3 to be 'this, "creating, what