dataframe

Groupby and drop NaN rows while preserving one in Pandas

£可爱£侵袭症+ 提交于 2021-02-17 03:33:26
问题 Given a test dataset as follows: id city name 0 1 bj NaN 1 2 bj jack 2 3 bj NaN 3 4 bj jim 4 5 sh NaN 5 6 sh NaN 6 7 sh steve 7 8 sh fiona 8 9 sh NaN How could I groupby city and drop NaN rows for name while preserving one only for each group ? Many thanks. The expected result will like this: id city name 0 1 bj NaN 1 2 bj jack 2 4 bj jim 3 5 sh NaN 4 7 sh steve 5 8 sh fiona New dataset read by df = pd.read_clipboard(na_filter = False) from excel file, please note N/A should not be considered

Groupby and drop NaN rows while preserving one in Pandas

余生长醉 提交于 2021-02-17 03:33:05
问题 Given a test dataset as follows: id city name 0 1 bj NaN 1 2 bj jack 2 3 bj NaN 3 4 bj jim 4 5 sh NaN 5 6 sh NaN 6 7 sh steve 7 8 sh fiona 8 9 sh NaN How could I groupby city and drop NaN rows for name while preserving one only for each group ? Many thanks. The expected result will like this: id city name 0 1 bj NaN 1 2 bj jack 2 4 bj jim 3 5 sh NaN 4 7 sh steve 5 8 sh fiona New dataset read by df = pd.read_clipboard(na_filter = False) from excel file, please note N/A should not be considered

Multiply and replace values in data frame according to condition in R

笑着哭i 提交于 2021-02-17 02:52:08
问题 I'm new to R and I've been trying to multiply and replace certain values in my data frame with no success. Basically, what I want to do is that when a value from my df (any column) is 0 < x < 1, multiplicate it by 10 and then replace that value with the result of this equation. A glimpse to my df just in case... 'data.frame': 404 obs. of 15 variables: $ D3: num 16.1 17.1 16.1 16.1 17.2 ... $ TH : num 9.9 8.6 9.7 7.7 7.6 7.6 8.7 9.8 9.8 7.7 ... $ D2 : num 33.3 29.3 30.3 29.3 33.3 ... $ D1 :

Pandas read csv where one header is missing

主宰稳场 提交于 2021-02-17 02:44:26
问题 I am trying to read a csv file with Pandas but the first column contains a first name and a last name seperated by a comma. This causes Pandas to think that there are 5 columns instead of 4 so the last column now has no header making it unable to be selected. The file looks like this: CustomerName,ClientID,EmailDate,EmailAddress FNAME1,LNAME1,100,2019-01-13 00:00:00.000,FNAME1@HOTMAIL.COM FNAME2,LNAME2,100,2019-01-13 00:00:00.000,FNAME2@GMAIL.COM FNAME3,LNAME3,100,2019-01-13 00:00:00.000

How to filter dataframe by splitting categories of a columns into sets?

夙愿已清 提交于 2021-02-17 02:06:26
问题 I have a dataframe: Prop_ID Unit_ID Prop_Usage Unit_Usage 1 1 RESIDENTIAL RESIDENTIAL 1 2 RESIDENTIAL COMMERCIAL 1 3 RESIDENTIAL INDUSTRIAL 1 4 RESIDENTIAL RESIDENTIAL 2 1 COMMERCIAL RESIDENTIAL 2 2 COMMERCIAL COMMERCIAL 2 3 COMMERCIAL COMMERCIAL 3 1 INDUSTRIAL INDUSTRIAL 3 2 INDUSTRIAL COMMERCIAL 4 1 RESIDENTIAL - COMMERCIAL RESIDENTIAL 4 2 RESIDENTIAL - COMMERCIAL COMMERCIAL 4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL 5 1 COMMERCIAL / RESIDENTIAL RESIDENTIAL 5 2 COMMERCIAL / RESIDENTIAL

How to filter dataframe by splitting categories of a columns into sets?

你离开我真会死。 提交于 2021-02-17 02:06:10
问题 I have a dataframe: Prop_ID Unit_ID Prop_Usage Unit_Usage 1 1 RESIDENTIAL RESIDENTIAL 1 2 RESIDENTIAL COMMERCIAL 1 3 RESIDENTIAL INDUSTRIAL 1 4 RESIDENTIAL RESIDENTIAL 2 1 COMMERCIAL RESIDENTIAL 2 2 COMMERCIAL COMMERCIAL 2 3 COMMERCIAL COMMERCIAL 3 1 INDUSTRIAL INDUSTRIAL 3 2 INDUSTRIAL COMMERCIAL 4 1 RESIDENTIAL - COMMERCIAL RESIDENTIAL 4 2 RESIDENTIAL - COMMERCIAL COMMERCIAL 4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL 5 1 COMMERCIAL / RESIDENTIAL RESIDENTIAL 5 2 COMMERCIAL / RESIDENTIAL

How to calculate time difference between two pandas column [duplicate]

久未见 提交于 2021-02-16 22:40:30
问题 This question already has answers here : Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes (3 answers) Closed 2 years ago . My df looks like, start stop 0 2015-11-04 10:12:00 2015-11-06 06:38:00 1 2015-11-04 10:23:00 2015-11-05 08:30:00 2 2015-11-04 14:01:00 2015-11-17 10:34:00 4 2015-11-19 01:43:00 2015-12-21 09:04:00 print(time_df.dtypes) start datetime64[ns] stop datetime64[ns] dtype: object I am trying to find the time difference between, stop and start.

How to calculate time difference between two pandas column [duplicate]

被刻印的时光 ゝ 提交于 2021-02-16 22:40:23
问题 This question already has answers here : Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes (3 answers) Closed 2 years ago . My df looks like, start stop 0 2015-11-04 10:12:00 2015-11-06 06:38:00 1 2015-11-04 10:23:00 2015-11-05 08:30:00 2 2015-11-04 14:01:00 2015-11-17 10:34:00 4 2015-11-19 01:43:00 2015-12-21 09:04:00 print(time_df.dtypes) start datetime64[ns] stop datetime64[ns] dtype: object I am trying to find the time difference between, stop and start.

Combine data frames from a vector of names

独自空忆成欢 提交于 2021-02-16 21:27:18
问题 I have an issue that I thought easy to solve, but I did not manage to find a solution. I have a large number of data frames that I want to bind by rows. To avoid listing the names of all data frames, I used "paste0" to quickly create a vector of names of the data frames. The problem is that I do not manage to make the rbind function identify the data frames from this vector of name. More explicitely: df1 <- data.frame(x1 = sample(1:5,5), x2 = sample(1:5,5)) df2 <- data.frame(x1 = sample(1:5,5

Python DataFrame - plot a bar chart for data frame with grouped-by columns (at least two columns)

自闭症网瘾萝莉.ら 提交于 2021-02-16 21:12:11
问题 I've been struggling to recreate this Excel graph in python using matlplotlib: The data is in a dataframe; I'm trying to automate the process of generating this graph. I've tried unstacking my dataframe, subplotting, but I haven't managed to create the "Zone" index which is so elegant in Excel. I have successfully managed to plot the graph without this "Zone" index, but that's not really what I want to do. Here is my code: data = pd.DataFrame( { 'Factory Zone': ["AMERICAS","APAC","APAC","APAC