multi-index | 易学教程

pandas apply function on multiindex

阅读更多关于 pandas apply function on multiindex

问题 I would like to apply a function on a multiindex dataframe (basically groupby describe dataframe) without using for loop to traverse level 0 index. Function I'd like to apply: def CI(x): import math sigma = x["std"] n = x["count"] return 1.96 * sigma / math.sqrt(n) Sample of my dataframe: df = df.iloc[47:52, [3,4,-1]] a b id 47 0.218182 0.000000 0d1974107c6731989c762e96def73568 48 0.000000 0.000000 0d1974107c6731989c762e96def73568 49 0.218182 0.130909 0d1974107c6731989c762e96def73568 50 0

set_index not indexing in pandas

阅读更多关于 set_index not indexing in pandas

问题 For a simple program below, I was expecting the 2nd output to be same as first.. Why is this not happening? It's just a order change in data1 and data2 columnList = ["PID", "Sec", "Util", "random"] data1 = [('67123', 12, '85' , '100'), ('67123', 112, '15', '100'), ('87878', 23, "95", '100'), ] df1 = pd.DataFrame(data1, columns=columnList) df1 = df1.set_index(["PID", "Sec"]) print df1 Util random PID Sec 67123 12 85 100 112 15 100 87878 23 95 100 data2 = [('67123', 12, '85' , '100'), ('87878',

filter multi-indexed grouped pandas dataframe

阅读更多关于 filter multi-indexed grouped pandas dataframe

问题 The data looks like the following: id timestamp date value 1 2001-01-01 2001-05-01 0 1 2001-10-01 2001-05-01 1 2 2001-01-01 2001-05-01 0 2 2001-10-01 2001-05-01 0 as you see the table contains the columns id , timestamp , date and value . Every row with the same id also has the same date . Furthermore date is timewise always somewhere in between the first and the last timestamp of each id . The task is to filter the table in the way to remove every id which does not contain at least one entry

Removing rows with NaN in MultiIndex with duplicates

阅读更多关于 Removing rows with NaN in MultiIndex with duplicates

问题 Updated with a DataFrame that repros my exact issue I have an issue where NaN appearing in my indexes is leading to non-unique rows (since NaN !== NaN ). I need to drop all rows where NaN occurs in the index. My previous question had an example DataFrame with a single NaN row, however the original solution did not resolve my issue as it did not meet this poorly advertised requirement: (Note that in the actual data I have thousands of such rows, including duplicate rows since NaN !== NaN so

Removing rows with NaN in MultiIndex with duplicates

阅读更多关于 Removing rows with NaN in MultiIndex with duplicates

How to re-order the multi-index columns using Pandas?

阅读更多关于 How to re-order the multi-index columns using Pandas?

问题 Table is shown here code: dff = pd.DataFrame({'Country':['France']*4+['China']*4, 'Progress':['Develop','Middle','Operate','Start']*2, 'NumTrans':np.random.randint(100,900,8), 'TransValue':np.random.randint(10000,9999999,8)}) dff = dff.set_index(['Country','Progress']).T Data and code are shown above. I want to know is there any way to re-order the "Progress" as start-develop-middle-operate using Python. I tried using map function and set each stage with a number, but cannot extract "Progress

Swapping/Ordering multi-index columns in pandas

阅读更多关于 Swapping/Ordering multi-index columns in pandas

问题 Following the documentation code on multi-indexing, I do the following: arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo'], ['one', 'two', 'one', 'two', 'one', 'two']] tuples = list(zip(*arrays)) index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) df2 = pd.DataFrame(np.random.randn(3, 6), index=['A', 'B', 'C'], columns=index) This yields a dataframe that looks like: first bar baz foo second one two one two one two A -0.398965 -1.103247 -0.530605 0.758178 1.462003 2.175783 B

select individual rows from multiindex pandas dataframe [duplicate]

阅读更多关于 select individual rows from multiindex pandas dataframe [duplicate]

问题 This question already has answers here : Dynamically filtering a pandas dataframe (3 answers) Closed 2 years ago . I am trying to select individual rows from a multiindex dataframe using a list of multiindices. For example. I have got the following dataframe: Col1 A B C 1 1 1 -0.148593 2 2.043589 2 3 -1.696572 4 -0.249049 2 1 5 2.012294 6 -1.756410 2 7 0.476035 8 -0.531612 I would like to select all 'C' with (A,B) = [(1,1), (2,2)] Col1 A B C 1 1 1 -0.148593 2 2.043589 2 2 7 0.476035 8 -0

Merging crosstabs in Python

阅读更多关于 Merging crosstabs in Python

问题 I am trying to merge multiple crosstabs into a single one. Note that the data provided is obviously only for test purposes. The actual data is much larger so efficiency is quite important for me. The crosstabs are generated, listed, and then merged with a lambda function on the word column. However, the result of this merging is not what I expect it to be. I think the problem is that the columns with only NA values of the crosstabs are being dropped even when using dropna = False , which

Merging crosstabs in Python

阅读更多关于 Merging crosstabs in Python