dataframe

Creating percentage stacked bar chart using groupby

大兔子大兔子 提交于 2021-02-19 06:20:06
问题 I'm looking at home ownership within levels of different loan statuses, and I'd like to display this using a stacked bar chart in percentages. I've been able to create a frequency stacked bar chart using this code: df_trunc1=df[['loan_status','home_ownership','id']] sub_df1=df_trunc1.groupby(['loan_status','home_ownership'])['id'].count() sub_df1.unstack().plot(kind='bar',stacked=True,rot=1,figsize=(8,8),title="Home ownership across Loan Types") which gives me this picture:1 but I can't

Merge data frame based on vector key

爱⌒轻易说出口 提交于 2021-02-19 06:06:09
问题 I'm an absolute beginner and am hoping someone will be able to help me with a merge problem that I've been stuck on for most of this evening and have thus far been unable to successfully adapt solutions to similar problems to this particular example. I've made a dummy data frame and vector to help illustrate my problem: dumdata <- data.frame(id=c(1:5), pcode=c(1234,9876,4477,2734,3999), vlo=c(100,450,1000,1325,1500), vhi=c(300,950,1100,1450,1700)) id pcode vlo vhi 1 1234 100 300 2 9876 450

Adding rows to a dataframe based on column names and add NA to empty columns

∥☆過路亽.° 提交于 2021-02-19 06:01:27
问题 What I am asking is probably quite simple but I still didn't figure out a quick and simple way to do it. I have data frame with 96 columns from A1 to H12. I will start receiving files every week that I want to compile in one single data frame. The problem is that this files miss some of the columns (that can be the first columns or any other column in the middle) thus making the merge slightly nasty. Here is a sample of what I have: t = data.frame(A1 = c(1,2,3,4,5), B1 = c(7,8,9,10,11), C1 =

subtracting two columns from pandas dataframe and store the result in third column [closed]

我的梦境 提交于 2021-02-19 05:42:54
问题 Closed. This question needs debugging details. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . Improve this question I have a DataFrame, df , with 3 columns and I want to perform subtraction as follows: df['available'] = df['recommended'] - df['manual input'] But I am getting an error stating: unsupported operand type(s) for -: 'int' and 'str' I have also tried doing df['available'] = df[

Grouping data by id, var1 into consecutive dates in python using pandas

心不动则不痛 提交于 2021-02-19 05:32:45
问题 I have some data that looks like: df_raw_dates = pd.DataFrame({"id": [102, 102, 102, 103, 103, 103, 104], "var1": ['a', 'b', 'a', 'b', 'b', 'a', 'c'], "val": [9, 2, 4, 7, 6, 3, 2], "dates": [pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 2), pd.Timestamp(2020, 1, 2), pd.Timestamp(2020, 1, 3), pd.Timestamp(2020, 1, 5), pd.Timestamp(2020, 3, 12)]}) I want group this data into IDs and var1 where the dates are consecutive, if a day is missed I want to start a new record

Grouping data by id, var1 into consecutive dates in python using pandas

与世无争的帅哥 提交于 2021-02-19 05:32:43
问题 I have some data that looks like: df_raw_dates = pd.DataFrame({"id": [102, 102, 102, 103, 103, 103, 104], "var1": ['a', 'b', 'a', 'b', 'b', 'a', 'c'], "val": [9, 2, 4, 7, 6, 3, 2], "dates": [pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 1), pd.Timestamp(2020, 1, 2), pd.Timestamp(2020, 1, 2), pd.Timestamp(2020, 1, 3), pd.Timestamp(2020, 1, 5), pd.Timestamp(2020, 3, 12)]}) I want group this data into IDs and var1 where the dates are consecutive, if a day is missed I want to start a new record

Filling missing middle values in pandas dataframe

我与影子孤独终老i 提交于 2021-02-19 05:25:06
问题 I have a pandas dataframe df as Date cost NC 20 5 NaN 21 7 NaN 23 9 78.0 25 6 80.0 Now what I need to do is fillup the missing dates and hence fill the column with a value say x only if there is number in the previous row. That is I want the output like Date cost NC 20 5 NaN 21 7 NaN 22 x NaN 23 9 78.0 24 x x 25 6 80.0 See Date 22 was missing and on 21 NC was missing, So on 22 cost is assigned to x but NC is assigned to NaN . Now setting the Date column to index and reindex ing it to missing

Pandas Dataframe nan values not replacing

痴心易碎 提交于 2021-02-19 05:20:22
问题 Trying to replace values in my data frame which are listed as 'nan' (note, not 'NaN') I've read in an excel file, then tried to replace the nan values like this: All_items_df = ALL_df[df_items].fillna(' ') Finally I get an output that still contains 'nan' All_items_df ['Colour'].head(10) Out[]: 7 nan 8 nan 9 nan 10 nan 13 nan 14 nan 15 nan 16 nan 18 nan 19 nan Name: Colour, dtype: object Checking the nan values using isna() or isnull().value.all() gives me False for the above values. Why is

How to plot day and month

杀马特。学长 韩版系。学妹 提交于 2021-02-19 05:15:00
问题 I have a chart of a daily trend over time. The year is not relevant here and I want to show only day and month. I know you can show year and month but that is not the case. I tried to create a new variable called "Day_Month": import datetime as dt df['Day'] = df['date'].dt.day df['Month'] = df['date'].dt.month df['Day_Month'] = df['Day'].astype(str) + "-" + but it's not possible to plot it as a string nor to convert it to date type. eventually, I would like my chart to look like this: 回答1:

How to plot day and month

纵饮孤独 提交于 2021-02-19 05:14:05
问题 I have a chart of a daily trend over time. The year is not relevant here and I want to show only day and month. I know you can show year and month but that is not the case. I tried to create a new variable called "Day_Month": import datetime as dt df['Day'] = df['date'].dt.day df['Month'] = df['date'].dt.month df['Day_Month'] = df['Day'].astype(str) + "-" + but it's not possible to plot it as a string nor to convert it to date type. eventually, I would like my chart to look like this: 回答1: