dataframe | 易学教程

Python DataFrame - plot a bar chart for data frame with grouped-by columns (at least two columns)

阅读更多关于 Python DataFrame - plot a bar chart for data frame with grouped-by columns (at least two columns)

问题 I've been struggling to recreate this Excel graph in python using matlplotlib: The data is in a dataframe; I'm trying to automate the process of generating this graph. I've tried unstacking my dataframe, subplotting, but I haven't managed to create the "Zone" index which is so elegant in Excel. I have successfully managed to plot the graph without this "Zone" index, but that's not really what I want to do. Here is my code: data = pd.DataFrame( { 'Factory Zone': ["AMERICAS","APAC","APAC","APAC

Pandas column content to new columns, with other original columns

阅读更多关于 Pandas column content to new columns, with other original columns

问题 A table like below, and I want to make a new table from it (using the values in the 'Color' column). I've tried: import pandas as pd import functools data = {'Seller': ["Mike","Mike","Mike","Mike","David","David","Pete","Pete","Pete"], 'Code' : ["9QBR1","9QBR1","9QBW2","9QBW2","9QD1X","9QD1X","9QEBO","9QEBO","9QEBO"], 'From': ["2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03","2020-01-03"], 'Color_date' : ["2020-02-14","2020-02-14","2020

Replacing an unknown number in Pandas data frame with previous number

阅读更多关于 Replacing an unknown number in Pandas data frame with previous number

问题 I have some data frames I am trying to upload to a database. They are lists of values but some of the columns have the string 'null' in them and so this is causing errors. so I would like to use a function to remove these 'null' strings and am trying to use replace to back fill them below: df.replace("null", method = bfill) but it is giving me the error message: ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 I have also tried putting "bfill" instead and it

Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?

阅读更多关于 Pandas groupby: how to select adjacent column data after selecting a row based on data in another column in pandas groupby groups?

问题 I have a database as partially shown below. For each date, there are entries for duration (1-20 per date), with items (100s) listed for each duration. Each item has several associated data points in adjacent columns, including an identifier. For each date, I want to select the largest duration. Then, I want to find the item with a value closest to a given input value. I would like to then obtain the ID for that item to be able to follow the value of this item through its time in the database.

Python Pandas dataframe find missing values

阅读更多关于 Python Pandas dataframe find missing values

问题 I'm trying to find missing values and then drop off missing values. Tried looking for the data online but can't seem to find the answer. Extracted Dataframe: In the df, for 1981 and 1982, it should be '-', i.e. missing values. I would like to find the missing values then drop off the missing values. Exported Dataframe using isnull: I used df.isnull() but in 1981 and 1982, it's detected as 'False' which means there's data. But it should be '-', therefore considered as missing values. I had

How to rank rows by id in Pandas Python

阅读更多关于 How to rank rows by id in Pandas Python

问题 I have a Dataframe like this: id points1 points2 1 44 53 1 76 34 1 63 66 2 23 34 2 44 56 I want output like this: id points1 points2 points1_rank points2_rank 1 44 53 3 2 1 76 34 1 3 1 63 66 2 1 2 23 79 2 1 2 44 56 1 2 Basically, I want to groupby('id') , and find the rank of each column with same id. I tried this: features = ["points1","points2"] df = pd.merge(df, df.groupby('id')[features].rank().reset_index(), suffixes=["", "_rank"], how='left', on=['id']) But I get keyerror 'id' 回答1: You

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

阅读更多关于 Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

问题 I have a dataframe df.sample like this id <- c("A","A","A","A","A","A","A","A","A","A","A") date <- c("2018-11-12","2018-11-12","2018-11-12","2018-11-12","2018-11-12", "2018-11-12","2018-11-12","2018-11-14","2018-11-14","2018-11-14", "2018-11-12") hour <- c(8,8,9,9,13,13,16,6,7,19,7) min <- c(47,59,6,18,22,36,12,32,12,21,47) value <- c(70,70,86,86,86,74,81,77,79,83,91) df.sample <- data.frame(id,date,hour,min,value,stringsAsFactors = F) df.sample$date <- as.Date(df.sample$date,format="%Y-%m-

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

阅读更多关于 Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

Sort date in string format in a pandas dataframe?

阅读更多关于 Sort date in string format in a pandas dataframe?

问题 I have a dataframe like this, how to sort this. df = pd.DataFrame({'Date':['Oct20','Nov19','Jan19','Sep20','Dec20']}) Date 0 Oct20 1 Nov19 2 Jan19 3 Sep20 4 Dec20 I familiar in sorting list of dates(string) a.sort(key=lambda date: datetime.strptime(date, "%d-%b-%y")) Any thoughts? Should i split it ? 回答1: First convert column to datetimes and get positions of sorted values by Series.argsort what is used for change ordering with DataFrame.iloc: df = df.iloc[pd.to_datetime(df['Date'], format='

Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2

阅读更多关于 Merge 2 dataframes using conditions on “hour” and “min” of df1 in datetimes of df2