dataframe

Python DataFrame - plot a bar chart for data frame with grouped-by columns (at least two columns)

别等时光非礼了梦想. 提交于 2021-02-16 21:11:06
问题 I've been struggling to recreate this Excel graph in python using matlplotlib: The data is in a dataframe; I'm trying to automate the process of generating this graph. I've tried unstacking my dataframe, subplotting, but I haven't managed to create the "Zone" index which is so elegant in Excel. I have successfully managed to plot the graph without this "Zone" index, but that's not really what I want to do. Here is my code: data = pd.DataFrame( { 'Factory Zone': ["AMERICAS","APAC","APAC","APAC

Pandas column content to new columns, with other original columns

我的梦境 提交于 2021-02-16 20:27:29
问题 A table like below, and I want to make a new table from it (using the values in the 'Color' column). I've tried: import pandas as pd import functools data = {'Seller': ["Mike","Mike","Mike","Mike","David","David","Pete","Pete","Pete"], 'Code' : ["9QBR1","9QBR1","9QBW2","9QBW2","9QD1X","9QD1X","9QEBO","9QEBO","9QEBO"], 'From': ["2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03"], 'Color_date' : ["2020-02-14","2020-02-14","2020

Replacing an unknown number in Pandas data frame with previous number

非 Y 不嫁゛ 提交于 2021-02-16 20:22:25
问题 I have some data frames I am trying to upload to a database. They are lists of values but some of the columns have the string 'null' in them and so this is causing errors. so I would like to use a function to remove these 'null' strings and am trying to use replace to back fill them below: df.replace("null", method = bfill) but it is giving me the error message: ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 I have also tried putting "bfill" instead and it

Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?

三世轮回 提交于 2021-02-16 20:22:17
问题 I have a database as partially shown below. For each date, there are entries for duration (1-20 per date), with items (100s) listed for each duration. Each item has several associated data points in adjacent columns, including an identifier. For each date, I want to select the largest duration. Then, I want to find the item with a value closest to a given input value. I would like to then obtain the ID for that item to be able to follow the value of this item through its time in the database.

Python Pandas dataframe find missing values

余生颓废 提交于 2021-02-16 20:18:15
问题 I'm trying to find missing values and then drop off missing values. Tried looking for the data online but can't seem to find the answer. Extracted Dataframe: In the df, for 1981 and 1982, it should be '-', i.e. missing values. I would like to find the missing values then drop off the missing values. Exported Dataframe using isnull: I used df.isnull() but in 1981 and 1982, it's detected as 'False' which means there's data. But it should be '-', therefore considered as missing values. I had

How to rank rows by id in Pandas Python

隐身守侯 提交于 2021-02-16 20:13:25
问题 I have a Dataframe like this: id points1 points2 1 44 53 1 76 34 1 63 66 2 23 34 2 44 56 I want output like this: id points1 points2 points1_rank points2_rank 1 44 53 3 2 1 76 34 1 3 1 63 66 2 1 2 23 79 2 1 2 44 56 1 2 Basically, I want to groupby('id') , and find the rank of each column with same id. I tried this: features = ["points1","points2"] df = pd.merge(df, df.groupby('id')[features].rank().reset_index(), suffixes=["", "_rank"], how='left', on=['id']) But I get keyerror 'id' 回答1: You

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

旧街凉风 提交于 2021-02-16 20:07:29
问题 I have a dataframe df.sample like this id <- c("A","A","A","A","A","A","A","A","A","A","A") date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12", "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14", "2018-11-12") hour <- c(8,8,9,9,13,13,16,6,7,19,7) min <- c(47,59,6,18,22,36,12,32,12,21,47) value <- c(70,70,86,86,86,74,81,77,79,83,91) df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) df.sample$date <- as.Date(df.sample$date,format="%Y-%m-

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

折月煮酒 提交于 2021-02-16 20:07:05
问题 I have a dataframe df.sample like this id <- c("A","A","A","A","A","A","A","A","A","A","A") date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12", "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14", "2018-11-12") hour <- c(8,8,9,9,13,13,16,6,7,19,7) min <- c(47,59,6,18,22,36,12,32,12,21,47) value <- c(70,70,86,86,86,74,81,77,79,83,91) df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) df.sample$date <- as.Date(df.sample$date,format="%Y-%m-

Sort date in string format in a pandas dataframe?

懵懂的女人 提交于 2021-02-16 20:06:58
问题 I have a dataframe like this, how to sort this. df = pd.DataFrame({'Date':['Oct20','Nov19','Jan19','Sep20','Dec20']}) Date 0 Oct20 1 Nov19 2 Jan19 3 Sep20 4 Dec20 I familiar in sorting list of dates(string) a.sort(key=lambda date: datetime.strptime(date, "%d-%b-%y")) Any thoughts? Should i split it ? 回答1: First convert column to datetimes and get positions of sorted values by Series.argsort what is used for change ordering with DataFrame.iloc: df = df.iloc[pd.to_datetime(df['Date'], format='

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

纵然是瞬间 提交于 2021-02-16 20:06:52
问题 I have a dataframe df.sample like this id <- c("A","A","A","A","A","A","A","A","A","A","A") date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12", "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14", "2018-11-12") hour <- c(8,8,9,9,13,13,16,6,7,19,7) min <- c(47,59,6,18,22,36,12,32,12,21,47) value <- c(70,70,86,86,86,74,81,77,79,83,91) df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) df.sample$date <- as.Date(df.sample$date,format="%Y-%m-