dataframe | 易学教程

Backfilling columns by groups in Pandas

阅读更多关于 Backfilling columns by groups in Pandas

问题 I have a csv like A,B,C,D 1,2,, 1,2,30,100 1,2,40,100 4,5,, 4,5,60,200 4,5,70,200 8,9,, In row 1 and row 4 C value is missing ( NaN ). I want to take their value from row 2 and 5 respectively. (First occurrence of same A,B value). If no matching row is found, just put 0 (like in last line) Expected op: A,B,C,D 1,2,30, 1,2,30,100 1,2,40,100 4,5,60, 4,5,60,200 4,5,70,200 8,9,0, using fillna I found bfill: use NEXT valid observation to fill gap but the NEXT observation has to be taken logically

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

阅读更多关于 Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

问题 I have a dataframe df.sample like this id <- c("A","A","A","A","A","A","A","A","A","A","A") date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12", "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14", "2018-11-12") hour <- c(8,8,9,9,13,13,16,6,7,19,7) min <- c(47,59,6,18,22,36,12,32,12,21,47) value <- c(70,70,86,86,86,74,81,77,79,83,91) df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) df.sample$date <- as.Date(df.sample$date,format="%Y-%m-

Backfilling columns by groups in Pandas

阅读更多关于 Backfilling columns by groups in Pandas

Create separate vectors for each of a data frame's columns (variables)

阅读更多关于 Create separate vectors for each of a data frame's columns (variables)

问题 Goal: Take a data frame and create separate vectors for each of its columns (variables). The following code gets me close: batting <- read.csv("mlb_2014.csv", header = TRUE, sep = ",") hr <- batting[(batting$HR >= 20 & batting$PA >= 100), ] var_names <- colnames(hr) for(i in var_names) { path <- paste("hr$", i, sep = "") assign(i, as.vector(path)) } It creates the a vector for each column in the data frame as shown by the output below: > ls() [1] "AB" "Age" "BA" "batting" "BB" "CS" [7] "G"

Create separate vectors for each of a data frame's columns (variables)

阅读更多关于 Create separate vectors for each of a data frame's columns (variables)

pyspark 'DataFrame' object has no attribute '_get_object_id'

阅读更多关于 pyspark 'DataFrame' object has no attribute '_get_object_id'

问题 I am trying to run some code, but getting error: 'DataFrame' object has no attribute '_get_object_id' The code: items = [(1,12),(1,float('Nan')),(1,14),(1,10),(2,22),(2,20),(2,float('Nan')),(3,300), (3,float('Nan'))] sc = spark.sparkContext rdd = sc.parallelize(items) df = rdd.toDF(["id", "col1"]) import pyspark.sql.functions as func means = df.groupby("id").agg(func.mean("col1")) # The error is thrown at this line df = df.withColumn("col1", func.when((df["col1"].isNull()), means.where(func

How do I change column names in list of data frames inside a function?

阅读更多关于 How do I change column names in list of data frames inside a function?

问题 I know that the answer to "how to change names in a list of data frames" has been answered multiple times. However, I'm stuck trying to generate a function that can take any list as an argument and change all of the column names of all of the data frames in the list. I am working with a large number of .csv files, all of which will have the same 3 column names. I'm importing the files in groups as follows: # Get a group of drying data data files, remove 1st column files <- list.files('Mang

How do I change column names in list of data frames inside a function?

阅读更多关于 How do I change column names in list of data frames inside a function?

Extract value of a particular column name in pandas as listed in another column

阅读更多关于 Extract value of a particular column name in pandas as listed in another column

问题 The title wasn't too clear but here's an example. Suppose I have: person apple orange type Alice 11 23 apple Bob 14 20 orange and I want to get this column person new_col Alice 11 Bob 20 so we get the column 'apple' for row 'Alice' and 'orange' for row 'Bob'. I'm thinking iterrows, but that would be slow. Are there faster ways to do this? 回答1: Use DataFrame.lookup: df['new_col'] = df.lookup(df.index, df['type']) print (df) person apple orange type new_col 0 Alice 11 23 apple 11 1 Bob 14 20

Extract value of a particular column name in pandas as listed in another column

阅读更多关于 Extract value of a particular column name in pandas as listed in another column