pandas

Pandas Flatten a list of list within a column?

痞子三分冷 提交于 2021-02-19 03:46:49
问题 I am trying to flatten a column which is a list of lists: var var2 0 9122532.0 [[458182615.0], [79834910.0]] 1 79834910.0 [[458182615.0], [9122532.0]] 2 458182615.0 [[79834910.0], [9122532.0]] I want: var var2 0 9122532.0 [458182615.0, 79834910.0] 1 79834910.0 [458182615.0, 9122532.0] 2 458182615.0 [79834910.0, 9122532.0] Applying sample8['var2'] = sample8['var2'].apply(chain.from_iterable).apply(list) Gives me: var1 var2 0 9122532.0 [[, 4, 5, 8, 1, 8, 2, 6, 1, 5, ., 0, ], [, 7, ... 1

Python: How to update a value in Google BigQuery in less than 40 seconds?

穿精又带淫゛_ 提交于 2021-02-19 03:43:07
问题 I have a table in Google BigQuery that I access and modify in Python using the pandas functions read_gbq and to_gbq . The problem is that appending 100,000 lines takes about 150 seconds while appending 1 line takes about 40 seconds. I would like to update a value in the table rather than append a line, is there a way to update a value in the table using python that is very fast, or faster than 40 seconds? 回答1: Not sure if you can do so using pandas but you sure can using google-cloud library.

How to pivot pandas DataFrame column to create binary “value table”?

狂风中的少年 提交于 2021-02-19 03:37:46
问题 I have the following pandas dataframe: import pandas as pd df = pd.read_csv("filename.csv") df A B C D E 0 a 0.469112 -0.282863 -1.509059 cat 1 c -1.135632 1.212112 -0.173215 dog 2 e 0.119209 -1.044236 -0.861849 dog 3 f -2.104569 -0.494929 1.071804 bird 4 g -2.224569 -0.724929 2.234213 elephant ... I would like to create more columns based on the identity of categorical values in column E such that the dataframe looks like this: df A B C D cat dog bird elephant .... 0 a 0.469112 -0.282863 -1

Assigning a scalar value to an empty DataFrame doesn't appear to do anything

大憨熊 提交于 2021-02-19 03:24:58
问题 I'm new to pandas and have a very basic question, please! On Python v3.6 through spyder: x= pd.DataFrame(columns = ['1','2']) print(x) x['1'] = '25' print(x) From the print statements, the dataframe x does not appear to change. My question: What does x['1'] = '25' do, if anything? 回答1: There is actually a difference between the semantics of assigning scalars and iterables (think containers such as lists as list-like objects). Consider, df = pd.DataFrame(columns=['1', '2']) df Empty DataFrame

Reverse DataFrame Column, But Maintain the Index

依然范特西╮ 提交于 2021-02-19 02:56:07
问题 Consider the following In [214]: df = pd.DataFrame(index=range(4,8), data=[33,22,11,00]) In [215]: df Out[215]: 0 4 33 5 22 6 11 7 0 I'd like to reverse the order of the first column, but maintain the index in its current order, so df will look like 4 0 5 11 6 22 7 33 I can't seem to find the right reset_index , reindex , etc to make this happen. 回答1: use iloc and slice appropriately df.iloc[::-1] 0 7 0 6 11 5 22 4 33 In order to preserve the index use iloc df.iloc[:] = df.iloc[::-1].values

Forward fill column with an index-based limit

倾然丶 夕夏残阳落幕 提交于 2021-02-19 02:55:07
问题 I want to forward fill a column and I want to specify a limit, but I want the limit to be based on the index---not a simple number of rows like limit allows. For example, say I have the dataframe given by: df = pd.DataFrame({ 'data': [0.0, 1.0, np.nan, 3.0, np.nan, 5.0, np.nan, np.nan, np.nan, np.nan], 'group': [0, 0, 0, 1, 1, 0, 0, 0, 1, 1] }) which looks like In [27]: df Out[27]: data group 0 0.0 0 1 1.0 0 2 NaN 0 3 3.0 1 4 NaN 1 5 5.0 0 6 NaN 0 7 NaN 0 8 NaN 1 9 NaN 1 If I group by the

how to get a continuous rolling mean in pandas?

一曲冷凌霜 提交于 2021-02-19 02:46:39
问题 Looking to get a continuous rolling mean of a dataframe. df looks like this index price 0 4 1 6 2 10 3 12 looking to get a continuous rolling of price the goal is to have it look this a moving mean of all the prices. index price mean 0 4 4 1 6 5 2 10 6.67 3 12 8 thank you in advance! 回答1: you can use expanding: df['mean'] = df.price.expanding().mean() df index price mean 0 4 4.000000 1 6 5.000000 2 10 6.666667 3 12 8.000000 回答2: Welcome to SO: Hopefully people will soon remember you from

Convert Bigquery results to Pandas Data Frame

末鹿安然 提交于 2021-02-19 02:38:02
问题 Below is the code to convert BigQuery results into Pandas data frame. Im learning Python&Pandas and wonder if i can get suggestion/ideas about any kind of improvements to the code? #...code to run query, that returns 3 columns: 'date' DATE, 'currency' STRING,'rate' FLOAT... rows, total_count, token = query.fetch_data() currency = [] rate = [] dates = [] for row in rows: dates.append(row[0]) currency.append(row[1]) rate.append(row[2]) dict = { 'currency' : currency, 'date' : dates, 'rate' :

resample a start & end employee holiday table correctly

女生的网名这么多〃 提交于 2021-02-19 02:26:21
问题 I have the following dataframe. df = pd.DataFrame( {'name' : ['Khan','Khan','Khan','Dean','Dean','Dean'], 'start_date' : ['01-01-2020','04-02-2020','02-03-2020','09-04-2020','06-08-2020','12-12-2020'], 'end_date' : ['03-01-2020', '09-02-2020','02-03-2020','15-05-2020','19-08-2020','31-12-2020'], 'holiday_type' : ['holiday','holiday','sick leave','holiday','holiday','sick leave'] } ) df[['start_date','end_date']] = df[['start_date','end_date']].apply(pd.to_datetime,format='%d-%m-%Y') print(df)

resample a start & end employee holiday table correctly

梦想与她 提交于 2021-02-19 02:26:09
问题 I have the following dataframe. df = pd.DataFrame( {'name' : ['Khan','Khan','Khan','Dean','Dean','Dean'], 'start_date' : ['01-01-2020','04-02-2020','02-03-2020','09-04-2020','06-08-2020','12-12-2020'], 'end_date' : ['03-01-2020', '09-02-2020','02-03-2020','15-05-2020','19-08-2020','31-12-2020'], 'holiday_type' : ['holiday','holiday','sick leave','holiday','holiday','sick leave'] } ) df[['start_date','end_date']] = df[['start_date','end_date']].apply(pd.to_datetime,format='%d-%m-%Y') print(df)