pandas | 易学教程

Pandas Flatten a list of list within a column?

阅读更多关于 Pandas Flatten a list of list within a column?

问题 I am trying to flatten a column which is a list of lists: var var2 0 9122532.0 [[458182615.0], [79834910.0]] 1 79834910.0 [[458182615.0], [9122532.0]] 2 458182615.0 [[79834910.0], [9122532.0]] I want: var var2 0 9122532.0 [458182615.0, 79834910.0] 1 79834910.0 [458182615.0, 9122532.0] 2 458182615.0 [79834910.0, 9122532.0] Applying sample8['var2'] = sample8['var2'].apply(chain.from_iterable).apply(list) Gives me: var1 var2 0 9122532.0 [[, 4, 5, 8, 1, 8, 2, 6, 1, 5, ., 0, ], [, 7, ... 1

Python: How to update a value in Google BigQuery in less than 40 seconds?

阅读更多关于 Python: How to update a value in Google BigQuery in less than 40 seconds?

问题 I have a table in Google BigQuery that I access and modify in Python using the pandas functions read_gbq and to_gbq . The problem is that appending 100,000 lines takes about 150 seconds while appending 1 line takes about 40 seconds. I would like to update a value in the table rather than append a line, is there a way to update a value in the table using python that is very fast, or faster than 40 seconds? 回答1: Not sure if you can do so using pandas but you sure can using google-cloud library.

How to pivot pandas DataFrame column to create binary “value table”?

阅读更多关于 How to pivot pandas DataFrame column to create binary “value table”?

问题 I have the following pandas dataframe: import pandas as pd df = pd.read_csv("filename.csv") df A B C D E 0 a 0.469112 -0.282863 -1.509059 cat 1 c -1.135632 1.212112 -0.173215 dog 2 e 0.119209 -1.044236 -0.861849 dog 3 f -2.104569 -0.494929 1.071804 bird 4 g -2.224569 -0.724929 2.234213 elephant ... I would like to create more columns based on the identity of categorical values in column E such that the dataframe looks like this: df A B C D cat dog bird elephant .... 0 a 0.469112 -0.282863 -1

Assigning a scalar value to an empty DataFrame doesn't appear to do anything

阅读更多关于 Assigning a scalar value to an empty DataFrame doesn't appear to do anything

问题 I'm new to pandas and have a very basic question, please! On Python v3.6 through spyder: x= pd.DataFrame(columns = ['1','2']) print(x) x['1'] = '25' print(x) From the print statements, the dataframe x does not appear to change. My question: What does x['1'] = '25' do, if anything? 回答1: There is actually a difference between the semantics of assigning scalars and iterables (think containers such as lists as list-like objects). Consider, df = pd.DataFrame(columns=['1', '2']) df Empty DataFrame

Reverse DataFrame Column, But Maintain the Index

阅读更多关于 Reverse DataFrame Column, But Maintain the Index

问题 Consider the following In [214]: df = pd.DataFrame(index=range(4,8), data=[33,22,11,00]) In [215]: df Out[215]: 0 4 33 5 22 6 11 7 0 I'd like to reverse the order of the first column, but maintain the index in its current order, so df will look like 4 0 5 11 6 22 7 33 I can't seem to find the right reset_index , reindex , etc to make this happen. 回答1: use iloc and slice appropriately df.iloc[::-1] 0 7 0 6 11 5 22 4 33 In order to preserve the index use iloc df.iloc[:] = df.iloc[::-1].values

Forward fill column with an index-based limit

阅读更多关于 Forward fill column with an index-based limit

问题 I want to forward fill a column and I want to specify a limit, but I want the limit to be based on the index---not a simple number of rows like limit allows. For example, say I have the dataframe given by: df = pd.DataFrame({ 'data': [0.0, 1.0, np.nan, 3.0, np.nan, 5.0, np.nan, np.nan, np.nan, np.nan], 'group': [0, 0, 0, 1, 1, 0, 0, 0, 1, 1] }) which looks like In [27]: df Out[27]: data group 0 0.0 0 1 1.0 0 2 NaN 0 3 3.0 1 4 NaN 1 5 5.0 0 6 NaN 0 7 NaN 0 8 NaN 1 9 NaN 1 If I group by the

how to get a continuous rolling mean in pandas?

阅读更多关于 how to get a continuous rolling mean in pandas?

问题 Looking to get a continuous rolling mean of a dataframe. df looks like this index price 0 4 1 6 2 10 3 12 looking to get a continuous rolling of price the goal is to have it look this a moving mean of all the prices. index price mean 0 4 4 1 6 5 2 10 6.67 3 12 8 thank you in advance! 回答1: you can use expanding: df['mean'] = df.price.expanding().mean() df index price mean 0 4 4.000000 1 6 5.000000 2 10 6.666667 3 12 8.000000 回答2: Welcome to SO: Hopefully people will soon remember you from

Convert Bigquery results to Pandas Data Frame

阅读更多关于 Convert Bigquery results to Pandas Data Frame

问题 Below is the code to convert BigQuery results into Pandas data frame. Im learning Python&Pandas and wonder if i can get suggestion/ideas about any kind of improvements to the code? #...code to run query, that returns 3 columns: 'date' DATE, 'currency' STRING,'rate' FLOAT... rows, total_count, token = query.fetch_data() currency = [] rate = [] dates = [] for row in rows: dates.append(row[0]) currency.append(row[1]) rate.append(row[2]) dict = { 'currency' : currency, 'date' : dates, 'rate' :

resample a start & end employee holiday table correctly

阅读更多关于 resample a start & end employee holiday table correctly

问题 I have the following dataframe. df = pd.DataFrame( {'name' : ['Khan','Khan','Khan','Dean','Dean','Dean'], 'start_date' : ['01-01-2020','04-02-2020','02-03-2020','09-04-2020','06-08-2020','12-12-2020'], 'end_date' : ['03-01-2020', '09-02-2020','02-03-2020','15-05-2020','19-08-2020','31-12-2020'], 'holiday_type' : ['holiday','holiday','sick leave','holiday','holiday','sick leave'] } ) df[['start_date','end_date']] = df[['start_date','end_date']].apply(pd.to_datetime,format='%d-%m-%Y') print(df)

resample a start & end employee holiday table correctly

阅读更多关于 resample a start & end employee holiday table correctly