dataframe | 易学教程

Groupby and drop NaN rows while preserving one in Pandas

阅读更多关于 Groupby and drop NaN rows while preserving one in Pandas

问题 Given a test dataset as follows: id city name 0 1 bj NaN 1 2 bj jack 2 3 bj NaN 3 4 bj jim 4 5 sh NaN 5 6 sh NaN 6 7 sh steve 7 8 sh fiona 8 9 sh NaN How could I groupby city and drop NaN rows for name while preserving one only for each group ? Many thanks. The expected result will like this: id city name 0 1 bj NaN 1 2 bj jack 2 4 bj jim 3 5 sh NaN 4 7 sh steve 5 8 sh fiona New dataset read by df = pd.read_clipboard(na_filter = False) from excel file, please note N/A should not be considered

Groupby and drop NaN rows while preserving one in Pandas

阅读更多关于 Groupby and drop NaN rows while preserving one in Pandas

Multiply and replace values in data frame according to condition in R

阅读更多关于 Multiply and replace values in data frame according to condition in R

问题 I'm new to R and I've been trying to multiply and replace certain values in my data frame with no success. Basically, what I want to do is that when a value from my df (any column) is 0 < x < 1, multiplicate it by 10 and then replace that value with the result of this equation. A glimpse to my df just in case... 'data.frame': 404 obs. of 15 variables: $ D3: num 16.1 17.1 16.1 16.1 17.2 ... $ TH : num 9.9 8.6 9.7 7.7 7.6 7.6 8.7 9.8 9.8 7.7 ... $ D2 : num 33.3 29.3 30.3 29.3 33.3 ... $ D1 :

Pandas read csv where one header is missing

阅读更多关于 Pandas read csv where one header is missing

问题 I am trying to read a csv file with Pandas but the first column contains a first name and a last name seperated by a comma. This causes Pandas to think that there are 5 columns instead of 4 so the last column now has no header making it unable to be selected. The file looks like this: CustomerName,ClientID,EmailDate,EmailAddress FNAME1,LNAME1,100,2019-01-13 00:00:00.000,FNAME1@HOTMAIL.COM FNAME2,LNAME2,100,2019-01-13 00:00:00.000,FNAME2@GMAIL.COM FNAME3,LNAME3,100,2019-01-13 00:00:00.000

How to filter dataframe by splitting categories of a columns into sets?

阅读更多关于 How to filter dataframe by splitting categories of a columns into sets?

问题 I have a dataframe: Prop_ID Unit_ID Prop_Usage Unit_Usage 1 1 RESIDENTIAL RESIDENTIAL 1 2 RESIDENTIAL COMMERCIAL 1 3 RESIDENTIAL INDUSTRIAL 1 4 RESIDENTIAL RESIDENTIAL 2 1 COMMERCIAL RESIDENTIAL 2 2 COMMERCIAL COMMERCIAL 2 3 COMMERCIAL COMMERCIAL 3 1 INDUSTRIAL INDUSTRIAL 3 2 INDUSTRIAL COMMERCIAL 4 1 RESIDENTIAL - COMMERCIAL RESIDENTIAL 4 2 RESIDENTIAL - COMMERCIAL COMMERCIAL 4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL 5 1 COMMERCIAL / RESIDENTIAL RESIDENTIAL 5 2 COMMERCIAL / RESIDENTIAL

How to filter dataframe by splitting categories of a columns into sets?

阅读更多关于 How to filter dataframe by splitting categories of a columns into sets?

How to calculate time difference between two pandas column [duplicate]

阅读更多关于 How to calculate time difference between two pandas column [duplicate]

问题 This question already has answers here : Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes (3 answers) Closed 2 years ago . My df looks like, start stop 0 2015-11-04 10:12:00 2015-11-06 06:38:00 1 2015-11-04 10:23:00 2015-11-05 08:30:00 2 2015-11-04 14:01:00 2015-11-17 10:34:00 4 2015-11-19 01:43:00 2015-12-21 09:04:00 print(time_df.dtypes) start datetime64[ns] stop datetime64[ns] dtype: object I am trying to find the time difference between, stop and start.

How to calculate time difference between two pandas column [duplicate]

阅读更多关于 How to calculate time difference between two pandas column [duplicate]

Combine data frames from a vector of names

阅读更多关于 Combine data frames from a vector of names

问题 I have an issue that I thought easy to solve, but I did not manage to find a solution. I have a large number of data frames that I want to bind by rows. To avoid listing the names of all data frames, I used "paste0" to quickly create a vector of names of the data frames. The problem is that I do not manage to make the rbind function identify the data frames from this vector of name. More explicitely: df1 <- data.frame(x1 = sample(1:5,5), x2 = sample(1:5,5)) df2 <- data.frame(x1 = sample(1:5,5

Python DataFrame - plot a bar chart for data frame with grouped-by columns (at least two columns)

阅读更多关于 Python DataFrame - plot a bar chart for data frame with grouped-by columns (at least two columns)

问题 I've been struggling to recreate this Excel graph in python using matlplotlib: The data is in a dataframe; I'm trying to automate the process of generating this graph. I've tried unstacking my dataframe, subplotting, but I haven't managed to create the "Zone" index which is so elegant in Excel. I have successfully managed to plot the graph without this "Zone" index, but that's not really what I want to do. Here is my code: data = pd.DataFrame( { 'Factory Zone': ["AMERICAS","APAC","APAC","APAC