dataframe | 易学教程

Filtering multiple conditions from a Dataframe in Python

阅读更多关于 Filtering multiple conditions from a Dataframe in Python

问题 I want to filter out data from a dataframe using multiple conditions using multiple columns. I tried doing so like this: arrival_delayed_weather = [[[flight_data_finalcopy["ArrDelay"] > 0]] & [[flight_data_finalcopy["WeatherDelay"]>0]]] arrival_delayed_weather_filter = arrival_delayed_weather[["UniqueCarrier", "AirlineID"]] print arrival_delayed_weather_filter However I get this error message: TypeError: unsupported operand type(s) for &: 'list' and 'list' How do I solve this? Thanks in

Filtering multiple conditions from a Dataframe in Python

阅读更多关于 Filtering multiple conditions from a Dataframe in Python

Using Pandas to sample DataFrame using a specific column's weight

阅读更多关于 Using Pandas to sample DataFrame using a specific column's weight

问题 I have a DataFrame which look like: index name city 0 Yam Hadera 1 Meow Hadera 2 Don Hadera 3 Jazz Hadera 4 Bond Tel Aviv 5 James Tel Aviv I want Pandas to randomly choose values, using the number of appearances in the city column (kind of using: df.city.value_counts() ), so the results of my magic function, suppose: df.magic_sample(3, weight_column='city') might look like: 0 Yam Hadera 1 Meow Hadera 2 Bond Tel Aviv Thanks! :) 回答1: You can group by city and then sample each group based on

multithreading for data from dataframe pandas

阅读更多关于 multithreading for data from dataframe pandas

问题 I'm struggling to use multithreading for calculating relatedness between list of customers who have different shopping items on their baskets. So I have a pandas data frame consists of 1,000 customers, which means that I have to calculate the relatedness 1 million times and this takes too long to process An example of the data frame looks like this: ID Item 1 Banana 1 Apple 2 Orange 2 Banana 2 Tomato 3 Apple 3 Tomato 3 Orange Here is the simplefied version of the code: import pandas as pd def

add a different random number to every cell in a pandas dataframe

阅读更多关于 add a different random number to every cell in a pandas dataframe

问题 I need to add some 'noise' to my data, so I would like to add a different random number to every cell in my pandas dataframe. This code works, but seems unpythonic. Is there a better way? import pandas as pd import numpy as np df = pd.DataFrame(0.0, index=[1,2,3,4,5], columns=list('ABC') ) print df for x,line in df.iterrows(): for col in df: line[col] = line[col] + (np.random.rand()-0.5)/1000.0 print df 回答1: df + np.random.rand(*df.shape) / 10000.0 OR Let's use applymap: df = pd.DataFrame(1.0

Julia Dataframes vs Python pandas

阅读更多关于 Julia Dataframes vs Python pandas

问题 I am currently using python pandas and want to know if there is a way to output the data from pandas into julia Dataframes and vice versa. (I think you can call python from Julia with Pycall but I am not sure if it works with dataframes) Is there a way to call Julia from python and have it take in panda s dataframes? (without saving to another file format like csv) When would it be advantageous to use Julia Dataframes than Pandas other than extremely large datasets and running things with

Accessing a Pandas index like a regular column

阅读更多关于 Accessing a Pandas index like a regular column

问题 I have a Pandas DataFrame with a named index. I want to pass it off to a piece off code that takes a DataFrame, a column name, and some other stuff, and does a bunch of work involving that column. Only in this case the column I want to highlight is the index, but giving the index's label to this piece of code doesn't work because you can't extract an index like you can a regular column. For example, I can construct a DataFrame like this: import pandas as pd, numpy as np df=pd.DataFrame({'name

Accessing a Pandas index like a regular column

阅读更多关于 Accessing a Pandas index like a regular column

combine two looping structures to obtain a matrix output

阅读更多关于 combine two looping structures to obtain a matrix output

问题 I'm using two closely related formulas in R. I was wondering if it might be possible to combine B1 and B2 to get my desired matrix output shown below? z <- "group y1 y2 1 1 2 3 2 1 3 4 3 1 5 4 4 1 2 5 5 2 4 8 6 2 5 6 7 2 6 7 8 3 7 6 9 3 8 7 10 3 10 8 11 3 9 5 12 3 7 6" dat <- read.table(text = z, header = T) (B1 = Reduce("+", group_split(dat, group, .keep = FALSE) %>% map(~ nrow(.)*(colMeans(.)-colMeans(dat[-1]))^2))) # y1 y2 #61.86667 19.05000 (B2 = Reduce("+",group_split(dat, group, .keep =

Extract rows with highest and lowest values from a data frame

阅读更多关于 Extract rows with highest and lowest values from a data frame

问题 I'm quite new to R, I use it mainly for visualising statistics using ggplot2 library. Now I have faced a problem with data preparation. I need to write a function, that will remove some number (2, 5 or 10) rows from a data frame that have highest and lowest values in specified column and put them into another data frame, and do this for each combination of two factors (in my case: for each day and server). Up to this point, I have done the following steps (MWE using esoph example dataset). I