dataframe

Backfilling columns by groups in Pandas

不打扰是莪最后的温柔 提交于 2021-02-16 20:06:38
问题 I have a csv like A,B,C,D 1,2,, 1,2,30,100 1,2,40,100 4,5,, 4,5,60,200 4,5,70,200 8,9,, In row 1 and row 4 C value is missing ( NaN ). I want to take their value from row 2 and 5 respectively. (First occurrence of same A,B value). If no matching row is found, just put 0 (like in last line) Expected op: A,B,C,D 1,2,30, 1,2,30,100 1,2,40,100 4,5,60, 4,5,60,200 4,5,70,200 8,9,0, using fillna I found bfill: use NEXT valid observation to fill gap but the NEXT observation has to be taken logically

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

拜拜、爱过 提交于 2021-02-16 20:06:35
问题 I have a dataframe df.sample like this id <- c("A","A","A","A","A","A","A","A","A","A","A") date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12", "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14", "2018-11-12") hour <- c(8,8,9,9,13,13,16,6,7,19,7) min <- c(47,59,6,18,22,36,12,32,12,21,47) value <- c(70,70,86,86,86,74,81,77,79,83,91) df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) df.sample$date <- as.Date(df.sample$date,format="%Y-%m-

Backfilling columns by groups in Pandas

可紊 提交于 2021-02-16 20:06:11
问题 I have a csv like A,B,C,D 1,2,, 1,2,30,100 1,2,40,100 4,5,, 4,5,60,200 4,5,70,200 8,9,, In row 1 and row 4 C value is missing ( NaN ). I want to take their value from row 2 and 5 respectively. (First occurrence of same A,B value). If no matching row is found, just put 0 (like in last line) Expected op: A,B,C,D 1,2,30, 1,2,30,100 1,2,40,100 4,5,60, 4,5,60,200 4,5,70,200 8,9,0, using fillna I found bfill: use NEXT valid observation to fill gap but the NEXT observation has to be taken logically

Create separate vectors for each of a data frame's columns (variables)

馋奶兔 提交于 2021-02-16 20:05:25
问题 Goal: Take a data frame and create separate vectors for each of its columns (variables). The following code gets me close: batting <- read.csv("mlb_2014.csv", header = TRUE, sep = ",") hr <- batting[(batting$HR >= 20 & batting$PA >= 100), ] var_names <- colnames(hr) for(i in var_names) { path <- paste("hr$", i, sep = "") assign(i, as.vector(path)) } It creates the a vector for each column in the data frame as shown by the output below: > ls() [1] "AB" "Age" "BA" "batting" "BB" "CS" [7] "G"

Create separate vectors for each of a data frame's columns (variables)

假装没事ソ 提交于 2021-02-16 20:04:28
问题 Goal: Take a data frame and create separate vectors for each of its columns (variables). The following code gets me close: batting <- read.csv("mlb_2014.csv", header = TRUE, sep = ",") hr <- batting[(batting$HR >= 20 & batting$PA >= 100), ] var_names <- colnames(hr) for(i in var_names) { path <- paste("hr$", i, sep = "") assign(i, as.vector(path)) } It creates the a vector for each column in the data frame as shown by the output below: > ls() [1] "AB" "Age" "BA" "batting" "BB" "CS" [7] "G"

pyspark 'DataFrame' object has no attribute '_get_object_id'

百般思念 提交于 2021-02-16 18:57:44
问题 I am trying to run some code, but getting error: 'DataFrame' object has no attribute '_get_object_id' The code: items = [(1,12),(1,float('Nan')),(1,14),(1,10),(2,22),(2,20),(2,float('Nan')),(3,300), (3,float('Nan'))] sc = spark.sparkContext rdd = sc.parallelize(items) df = rdd.toDF(["id", "col1"]) import pyspark.sql.functions as func means = df.groupby("id").agg(func.mean("col1")) # The error is thrown at this line df = df.withColumn("col1", func.when((df["col1"].isNull()), means.where(func

How do I change column names in list of data frames inside a function?

|▌冷眼眸甩不掉的悲伤 提交于 2021-02-16 18:51:48
问题 I know that the answer to "how to change names in a list of data frames" has been answered multiple times. However, I'm stuck trying to generate a function that can take any list as an argument and change all of the column names of all of the data frames in the list. I am working with a large number of .csv files, all of which will have the same 3 column names. I'm importing the files in groups as follows: # Get a group of drying data data files, remove 1st column files <- list.files('Mang

How do I change column names in list of data frames inside a function?

我怕爱的太早我们不能终老 提交于 2021-02-16 18:51:25
问题 I know that the answer to "how to change names in a list of data frames" has been answered multiple times. However, I'm stuck trying to generate a function that can take any list as an argument and change all of the column names of all of the data frames in the list. I am working with a large number of .csv files, all of which will have the same 3 column names. I'm importing the files in groups as follows: # Get a group of drying data data files, remove 1st column files <- list.files('Mang

Extract value of a particular column name in pandas as listed in another column

谁说胖子不能爱 提交于 2021-02-16 15:09:30
问题 The title wasn't too clear but here's an example. Suppose I have: person apple orange type Alice 11 23 apple Bob 14 20 orange and I want to get this column person new_col Alice 11 Bob 20 so we get the column 'apple' for row 'Alice' and 'orange' for row 'Bob'. I'm thinking iterrows, but that would be slow. Are there faster ways to do this? 回答1: Use DataFrame.lookup: df['new_col'] = df.lookup(df.index, df['type']) print (df) person apple orange type new_col 0 Alice 11 23 apple 11 1 Bob 14 20

Extract value of a particular column name in pandas as listed in another column

久未见 提交于 2021-02-16 15:09:28
问题 The title wasn't too clear but here's an example. Suppose I have: person apple orange type Alice 11 23 apple Bob 14 20 orange and I want to get this column person new_col Alice 11 Bob 20 so we get the column 'apple' for row 'Alice' and 'orange' for row 'Bob'. I'm thinking iterrows, but that would be slow. Are there faster ways to do this? 回答1: Use DataFrame.lookup: df['new_col'] = df.lookup(df.index, df['type']) print (df) person apple orange type new_col 0 Alice 11 23 apple 11 1 Bob 14 20