data-munging

openxlsx::write.xlsx overwriting existing worksheet instead append

感情迁移 提交于 2021-01-29 11:22:37
问题 The openxlsx::write.xlsx function is overwriting spreadsheet instead of adding another tab. I tried do follow some orientations of Stackoverflow, but without sucess. dt.escrita <- format(Sys.time(), '%Y%m%d%H%M%S') write.xlsx( tbl.messages ,file = paste('.\\2_Datasets\\messages_',dt.escrita,'.xlsx') ,sheetName = format(Sys.time(), '%d-%m-%y') ,append = FALSE) write.xlsx( tbl.dic.dados ,file = paste('.\\2_Datasets\\messages_',dt.escrita,'.xlsx') ,sheetName = 'Dicionario_Dados' ,append = TRUE)

wide to long data table transformation with variables in columns and rows

元气小坏坏 提交于 2020-06-28 06:50:51
问题 I have a csv with multiple tables with variables stored in both rows and columns. About this csv: I'd want to go "wide" to "long" There are multiple "data frames" in one csv There are different types of variables for each "data frames" > df3 V1 V2 V3 V4 V5 V6 V7 V8 1 nyc 123 main st month 1 2 3 4 5 2 nyc 123 main st x 58568 567567 567909 35876 56943 3 nyc 123 main st y 5345 3673 3453 3467 788 4 nyc 123 main st z 53223 563894 564456 32409 56155 5 6 la 63 main st month 1 2 3 4 5 7 la 63 main st

melt column by substring of the columns name in pandas (python)

 ̄綄美尐妖づ 提交于 2020-01-24 12:52:47
问题 I have dataframe: subject A_target_word_gd A_target_word_fd B_target_word_gd B_target_word_fd subject_type 1 1 2 3 4 mild 2 11 12 13 14 moderate And I want to melt it to a dataframe that will look: cond subject subject_type value_type value A 1 mild gd 1 A 1 mild fg 2 B 1 mild gd 3 B 1 mild fg 4 A 2 moderate gd 11 A 2 moderate fg 12 B 2 moderate gd 13 B 2 moderate fg 14 ... ... Meaning, to melt based on the delimiter of the columns name. What is the best way to do that? 回答1: One more approach

melt column by substring of the columns name in pandas (python)

ⅰ亾dé卋堺 提交于 2020-01-24 12:52:12
问题 I have dataframe: subject A_target_word_gd A_target_word_fd B_target_word_gd B_target_word_fd subject_type 1 1 2 3 4 mild 2 11 12 13 14 moderate And I want to melt it to a dataframe that will look: cond subject subject_type value_type value A 1 mild gd 1 A 1 mild fg 2 B 1 mild gd 3 B 1 mild fg 4 A 2 moderate gd 11 A 2 moderate fg 12 B 2 moderate gd 13 B 2 moderate fg 14 ... ... Meaning, to melt based on the delimiter of the columns name. What is the best way to do that? 回答1: One more approach

Retaining the previous date in R

丶灬走出姿态 提交于 2020-01-04 07:18:54
问题 I got stuck at a fairly easy data munging task. I have a transactional data frame in R that resembles this one: id<-c(11,11,22,22,22) dates<-as.Date(c('2013-11-15','2013-11-16','2013-11-15','2013-11-16','2013-11-17'), "%Y-%m-%d") example<-data.frame(id=id,dates=dates) id dates 1 11 2013-11-15 2 11 2013-11-16 3 22 2013-11-15 4 22 2013-11-16 5 22 2013-11-17 I'm looking for a way to retain the date of the previous transaction. The resulting table would look like this: previous_dates<-as.Date(c('

Retaining the previous date in R

对着背影说爱祢 提交于 2020-01-04 07:18:53
问题 I got stuck at a fairly easy data munging task. I have a transactional data frame in R that resembles this one: id<-c(11,11,22,22,22) dates<-as.Date(c('2013-11-15','2013-11-16','2013-11-15','2013-11-16','2013-11-17'), "%Y-%m-%d") example<-data.frame(id=id,dates=dates) id dates 1 11 2013-11-15 2 11 2013-11-16 3 22 2013-11-15 4 22 2013-11-16 5 22 2013-11-17 I'm looking for a way to retain the date of the previous transaction. The resulting table would look like this: previous_dates<-as.Date(c('

Expanding pandas Data Frame rows based on number and group ID (Python 3).

狂风中的少年 提交于 2019-12-31 03:24:10
问题 I have been struggling with finding a way to expand/clone observation rows based on a pre-determined number and a grouping variable (id). For context, here is an example data frame using pandas and numpy (python3). df = pd.DataFrame([[1, 15], [2, 20]], columns = ['id', 'num']) df Out[54]: id num 0 1 15 1 2 20 I want to expand/clone the rows by the number given in the "num" variable based on their ID group. In this case, I would want 15 rows for id = 1 and 20 rows for id = 2. This is probably

How to efficiently rearrange pandas data as follows?

ε祈祈猫儿з 提交于 2019-12-22 08:30:04
问题 I need some help with a concise and first of all efficient formulation in pandas of the following operation: Given a data frame of the format id a b c d 1 0 -1 1 1 42 0 1 0 0 128 1 -1 0 1 Construct a data frame of the format: id one_entries 1 "c d" 42 "b" 128 "a d" That is, the column "one_entries" contains the concatenated names of the columns for which the entry in the original frame is 1. 回答1: Here's one way using boolean rule and applying lambda func. In [58]: df Out[58]: id a b c d 0 1 0

How to efficiently rearrange pandas data as follows?

久未见 提交于 2019-12-22 08:29:01
问题 I need some help with a concise and first of all efficient formulation in pandas of the following operation: Given a data frame of the format id a b c d 1 0 -1 1 1 42 0 1 0 0 128 1 -1 0 1 Construct a data frame of the format: id one_entries 1 "c d" 42 "b" 128 "a d" That is, the column "one_entries" contains the concatenated names of the columns for which the entry in the original frame is 1. 回答1: Here's one way using boolean rule and applying lambda func. In [58]: df Out[58]: id a b c d 0 1 0

How to convert a python datetime.datetime to excel serial date number

感情迁移 提交于 2019-12-17 10:42:31
问题 I need to convert dates into Excel serial numbers for a data munging script I am writing. By playing with dates in my OpenOffice Calc workbook, I was able to deduce that '1-Jan 1899 00:00:00' maps to the number zero. I wrote the following function to convert from a python datetime object into an Excel serial number: def excel_date(date1): temp=dt.datetime.strptime('18990101', '%Y%m%d') delta=date1-temp total_seconds = delta.days * 86400 + delta.seconds return total_seconds However, when I try