dataframe | 易学教程

Creating percentage stacked bar chart using groupby

阅读更多关于 Creating percentage stacked bar chart using groupby

问题 I'm looking at home ownership within levels of different loan statuses, and I'd like to display this using a stacked bar chart in percentages. I've been able to create a frequency stacked bar chart using this code: df_trunc1=df[['loan_status','home_ownership','id']] sub_df1=df_trunc1.groupby(['loan_status','home_ownership'])['id'].count() sub_df1.unstack().plot(kind='bar',stacked=True,rot=1,figsize=(8,8),title="Home ownership across Loan Types") which gives me this picture:1 but I can't

Merge data frame based on vector key

阅读更多关于 Merge data frame based on vector key

问题 I'm an absolute beginner and am hoping someone will be able to help me with a merge problem that I've been stuck on for most of this evening and have thus far been unable to successfully adapt solutions to similar problems to this particular example. I've made a dummy data frame and vector to help illustrate my problem: dumdata <- data.frame(id=c(1:5), pcode=c(1234,9876,4477,2734,3999), vlo=c(100,450,1000,1325,1500), vhi=c(300,950,1100,1450,1700)) id pcode vlo vhi 1 1234 100 300 2 9876 450

Adding rows to a dataframe based on column names and add NA to empty columns

阅读更多关于 Adding rows to a dataframe based on column names and add NA to empty columns

问题 What I am asking is probably quite simple but I still didn't figure out a quick and simple way to do it. I have data frame with 96 columns from A1 to H12. I will start receiving files every week that I want to compile in one single data frame. The problem is that this files miss some of the columns (that can be the first columns or any other column in the middle) thus making the merge slightly nasty. Here is a sample of what I have: t = data.frame(A1 = c(1,2,3,4,5), B1 = c(7,8,9,10,11), C1 =

subtracting two columns from pandas dataframe and store the result in third column [closed]

阅读更多关于 subtracting two columns from pandas dataframe and store the result in third column [closed]

问题 Closed. This question needs debugging details. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . Improve this question I have a DataFrame, df , with 3 columns and I want to perform subtraction as follows: df['available'] = df['recommended'] - df['manual input'] But I am getting an error stating: unsupported operand type(s) for -: 'int' and 'str' I have also tried doing df['available'] = df[

Grouping data by id, var1 into consecutive dates in python using pandas

阅读更多关于 Grouping data by id, var1 into consecutive dates in python using pandas

问题 I have some data that looks like: df_raw_dates = pd.DataFrame({"id": [102, 102, 102, 103, 103, 103, 104], "var1": ['a', 'b', 'a', 'b', 'b', 'a', 'c'], "val": [9, 2, 4, 7, 6, 3, 2], "dates": [pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 2), pd.Timestamp(2020, 1, 2), pd.Timestamp(2020, 1, 3), pd.Timestamp(2020, 1, 5), pd.Timestamp(2020, 3, 12)]}) I want group this data into IDs and var1 where the dates are consecutive, if a day is missed I want to start a new record

Grouping data by id, var1 into consecutive dates in python using pandas

阅读更多关于 Grouping data by id, var1 into consecutive dates in python using pandas

Filling missing middle values in pandas dataframe

阅读更多关于 Filling missing middle values in pandas dataframe

问题 I have a pandas dataframe df as Date cost NC 20 5 NaN 21 7 NaN 23 9 78.0 25 6 80.0 Now what I need to do is fillup the missing dates and hence fill the column with a value say x only if there is number in the previous row. That is I want the output like Date cost NC 20 5 NaN 21 7 NaN 22 x NaN 23 9 78.0 24 x x 25 6 80.0 See Date 22 was missing and on 21 NC was missing, So on 22 cost is assigned to x but NC is assigned to NaN . Now setting the Date column to index and reindex ing it to missing

Pandas Dataframe nan values not replacing

阅读更多关于 Pandas Dataframe nan values not replacing

问题 Trying to replace values in my data frame which are listed as 'nan' (note, not 'NaN') I've read in an excel file, then tried to replace the nan values like this: All_items_df = ALL_df[df_items].fillna(' ') Finally I get an output that still contains 'nan' All_items_df ['Colour'].head(10) Out[]: 7 nan 8 nan 9 nan 10 nan 13 nan 14 nan 15 nan 16 nan 18 nan 19 nan Name: Colour, dtype: object Checking the nan values using isna() or isnull().value.all() gives me False for the above values. Why is

How to plot day and month

阅读更多关于 How to plot day and month

问题 I have a chart of a daily trend over time. The year is not relevant here and I want to show only day and month. I know you can show year and month but that is not the case. I tried to create a new variable called "Day_Month": import datetime as dt df['Day'] = df['date'].dt.day df['Month'] = df['date'].dt.month df['Day_Month'] = df['Day'].astype(str) + "-" + but it's not possible to plot it as a string nor to convert it to date type. eventually, I would like my chart to look like this: 回答1:

How to plot day and month

阅读更多关于 How to plot day and month