dataframe

Filtering multiple conditions from a Dataframe in Python

梦想的初衷 提交于 2021-02-18 12:09:35
问题 I want to filter out data from a dataframe using multiple conditions using multiple columns. I tried doing so like this: arrival_delayed_weather = [[[flight_data_finalcopy["ArrDelay"] > 0]] & [[flight_data_finalcopy["WeatherDelay"]>0]]] arrival_delayed_weather_filter = arrival_delayed_weather[["UniqueCarrier", "AirlineID"]] print arrival_delayed_weather_filter However I get this error message: TypeError: unsupported operand type(s) for &: 'list' and 'list' How do I solve this? Thanks in

Filtering multiple conditions from a Dataframe in Python

自闭症网瘾萝莉.ら 提交于 2021-02-18 12:08:48
问题 I want to filter out data from a dataframe using multiple conditions using multiple columns. I tried doing so like this: arrival_delayed_weather = [[[flight_data_finalcopy["ArrDelay"] > 0]] & [[flight_data_finalcopy["WeatherDelay"]>0]]] arrival_delayed_weather_filter = arrival_delayed_weather[["UniqueCarrier", "AirlineID"]] print arrival_delayed_weather_filter However I get this error message: TypeError: unsupported operand type(s) for &: 'list' and 'list' How do I solve this? Thanks in

Using Pandas to sample DataFrame using a specific column's weight

夙愿已清 提交于 2021-02-18 11:43:27
问题 I have a DataFrame which look like: index name city 0 Yam Hadera 1 Meow Hadera 2 Don Hadera 3 Jazz Hadera 4 Bond Tel Aviv 5 James Tel Aviv I want Pandas to randomly choose values, using the number of appearances in the city column (kind of using: df.city.value_counts() ), so the results of my magic function, suppose: df.magic_sample(3, weight_column='city') might look like: 0 Yam Hadera 1 Meow Hadera 2 Bond Tel Aviv Thanks! :) 回答1: You can group by city and then sample each group based on

multithreading for data from dataframe pandas

守給你的承諾、 提交于 2021-02-18 11:11:50
问题 I'm struggling to use multithreading for calculating relatedness between list of customers who have different shopping items on their baskets. So I have a pandas data frame consists of 1,000 customers, which means that I have to calculate the relatedness 1 million times and this takes too long to process An example of the data frame looks like this: ID Item 1 Banana 1 Apple 2 Orange 2 Banana 2 Tomato 3 Apple 3 Tomato 3 Orange Here is the simplefied version of the code: import pandas as pd def

add a different random number to every cell in a pandas dataframe

自古美人都是妖i 提交于 2021-02-18 10:38:07
问题 I need to add some 'noise' to my data, so I would like to add a different random number to every cell in my pandas dataframe. This code works, but seems unpythonic. Is there a better way? import pandas as pd import numpy as np df = pd.DataFrame(0.0, index=[1,2,3,4,5], columns=list('ABC') ) print df for x,line in df.iterrows(): for col in df: line[col] = line[col] + (np.random.rand()-0.5)/1000.0 print df 回答1: df + np.random.rand(*df.shape) / 10000.0 OR Let's use applymap: df = pd.DataFrame(1.0

Julia Dataframes vs Python pandas

你离开我真会死。 提交于 2021-02-18 09:55:02
问题 I am currently using python pandas and want to know if there is a way to output the data from pandas into julia Dataframes and vice versa. (I think you can call python from Julia with Pycall but I am not sure if it works with dataframes) Is there a way to call Julia from python and have it take in panda s dataframes? (without saving to another file format like csv) When would it be advantageous to use Julia Dataframes than Pandas other than extremely large datasets and running things with

Accessing a Pandas index like a regular column

不羁岁月 提交于 2021-02-18 09:54:31
问题 I have a Pandas DataFrame with a named index. I want to pass it off to a piece off code that takes a DataFrame, a column name, and some other stuff, and does a bunch of work involving that column. Only in this case the column I want to highlight is the index, but giving the index's label to this piece of code doesn't work because you can't extract an index like you can a regular column. For example, I can construct a DataFrame like this: import pandas as pd, numpy as np df=pd.DataFrame({'name

Accessing a Pandas index like a regular column

半城伤御伤魂 提交于 2021-02-18 09:54:28
问题 I have a Pandas DataFrame with a named index. I want to pass it off to a piece off code that takes a DataFrame, a column name, and some other stuff, and does a bunch of work involving that column. Only in this case the column I want to highlight is the index, but giving the index's label to this piece of code doesn't work because you can't extract an index like you can a regular column. For example, I can construct a DataFrame like this: import pandas as pd, numpy as np df=pd.DataFrame({'name

combine two looping structures to obtain a matrix output

怎甘沉沦 提交于 2021-02-18 08:38:49
问题 I'm using two closely related formulas in R. I was wondering if it might be possible to combine B1 and B2 to get my desired matrix output shown below? z <- "group y1 y2 1 1 2 3 2 1 3 4 3 1 5 4 4 1 2 5 5 2 4 8 6 2 5 6 7 2 6 7 8 3 7 6 9 3 8 7 10 3 10 8 11 3 9 5 12 3 7 6" dat <- read.table(text = z, header = T) (B1 = Reduce("+", group_split(dat, group, .keep = FALSE) %>% map(~ nrow(.)*(colMeans(.)-colMeans(dat[-1]))^2))) # y1 y2 #61.86667 19.05000 (B2 = Reduce("+",group_split(dat, group, .keep =

Extract rows with highest and lowest values from a data frame

ぐ巨炮叔叔 提交于 2021-02-18 07:01:47
问题 I'm quite new to R, I use it mainly for visualising statistics using ggplot2 library. Now I have faced a problem with data preparation. I need to write a function, that will remove some number (2, 5 or 10) rows from a data frame that have highest and lowest values in specified column and put them into another data frame, and do this for each combination of two factors (in my case: for each day and server). Up to this point, I have done the following steps (MWE using esoph example dataset). I