multi-index

pandas apply function on multiindex

淺唱寂寞╮ 提交于 2020-07-05 04:14:12
问题 I would like to apply a function on a multiindex dataframe (basically groupby describe dataframe) without using for loop to traverse level 0 index. Function I'd like to apply: def CI(x): import math sigma = x["std"] n = x["count"] return 1.96 * sigma / math.sqrt(n) Sample of my dataframe: df = df.iloc[47:52, [3,4,-1]] a b id 47 0.218182 0.000000 0d1974107c6731989c762e96def73568 48 0.000000 0.000000 0d1974107c6731989c762e96def73568 49 0.218182 0.130909 0d1974107c6731989c762e96def73568 50 0

set_index not indexing in pandas

大城市里の小女人 提交于 2020-06-28 08:55:45
问题 For a simple program below, I was expecting the 2nd output to be same as first.. Why is this not happening? It's just a order change in data1 and data2 columnList = ["PID", "Sec", "Util", "random"] data1 = [('67123', 12, '85' , '100'), ('67123', 112, '15', '100'), ('87878', 23, "95", '100'), ] df1 = pd.DataFrame(data1, columns=columnList) df1 = df1.set_index(["PID", "Sec"]) print df1 Util random PID Sec 67123 12 85 100 112 15 100 87878 23 95 100 data2 = [('67123', 12, '85' , '100'), ('87878',

filter multi-indexed grouped pandas dataframe

最后都变了- 提交于 2020-05-24 03:31:03
问题 The data looks like the following: id timestamp date value 1 2001-01-01 2001-05-01 0 1 2001-10-01 2001-05-01 1 2 2001-01-01 2001-05-01 0 2 2001-10-01 2001-05-01 0 as you see the table contains the columns id , timestamp , date and value . Every row with the same id also has the same date . Furthermore date is timewise always somewhere in between the first and the last timestamp of each id . The task is to filter the table in the way to remove every id which does not contain at least one entry

Removing rows with NaN in MultiIndex with duplicates

£可爱£侵袭症+ 提交于 2020-05-14 03:44:49
问题 Updated with a DataFrame that repros my exact issue I have an issue where NaN appearing in my indexes is leading to non-unique rows (since NaN !== NaN ). I need to drop all rows where NaN occurs in the index. My previous question had an example DataFrame with a single NaN row, however the original solution did not resolve my issue as it did not meet this poorly advertised requirement: (Note that in the actual data I have thousands of such rows, including duplicate rows since NaN !== NaN so

Removing rows with NaN in MultiIndex with duplicates

爷,独闯天下 提交于 2020-05-14 03:44:45
问题 Updated with a DataFrame that repros my exact issue I have an issue where NaN appearing in my indexes is leading to non-unique rows (since NaN !== NaN ). I need to drop all rows where NaN occurs in the index. My previous question had an example DataFrame with a single NaN row, however the original solution did not resolve my issue as it did not meet this poorly advertised requirement: (Note that in the actual data I have thousands of such rows, including duplicate rows since NaN !== NaN so

How to re-order the multi-index columns using Pandas?

有些话、适合烂在心里 提交于 2020-03-25 05:58:48
问题 Table is shown here code: dff = pd.DataFrame({'Country':['France']*4+['China']*4, 'Progress':['Develop','Middle','Operate','Start']*2, 'NumTrans':np.random.randint(100,900,8), 'TransValue':np.random.randint(10000,9999999,8)}) dff = dff.set_index(['Country','Progress']).T Data and code are shown above. I want to know is there any way to re-order the "Progress" as start-develop-middle-operate using Python. I tried using map function and set each stage with a number, but cannot extract "Progress

Swapping/Ordering multi-index columns in pandas

◇◆丶佛笑我妖孽 提交于 2020-03-14 06:58:49
问题 Following the documentation code on multi-indexing, I do the following: arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo'], ['one', 'two', 'one', 'two', 'one', 'two']] tuples = list(zip(*arrays)) index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) df2 = pd.DataFrame(np.random.randn(3, 6), index=['A', 'B', 'C'], columns=index) This yields a dataframe that looks like: first bar baz foo second one two one two one two A -0.398965 -1.103247 -0.530605 0.758178 1.462003 2.175783 B

select individual rows from multiindex pandas dataframe [duplicate]

人盡茶涼 提交于 2020-01-15 09:39:55
问题 This question already has answers here : Dynamically filtering a pandas dataframe (3 answers) Closed 2 years ago . I am trying to select individual rows from a multiindex dataframe using a list of multiindices. For example. I have got the following dataframe: Col1 A B C 1 1 1 -0.148593 2 2.043589 2 3 -1.696572 4 -0.249049 2 1 5 2.012294 6 -1.756410 2 7 0.476035 8 -0.531612 I would like to select all 'C' with (A,B) = [(1,1), (2,2)] Col1 A B C 1 1 1 -0.148593 2 2.043589 2 2 7 0.476035 8 -0

Merging crosstabs in Python

邮差的信 提交于 2020-01-14 13:14:37
问题 I am trying to merge multiple crosstabs into a single one. Note that the data provided is obviously only for test purposes. The actual data is much larger so efficiency is quite important for me. The crosstabs are generated, listed, and then merged with a lambda function on the word column. However, the result of this merging is not what I expect it to be. I think the problem is that the columns with only NA values of the crosstabs are being dropped even when using dropna = False , which

Merging crosstabs in Python

筅森魡賤 提交于 2020-01-14 13:14:09
问题 I am trying to merge multiple crosstabs into a single one. Note that the data provided is obviously only for test purposes. The actual data is much larger so efficiency is quite important for me. The crosstabs are generated, listed, and then merged with a lambda function on the word column. However, the result of this merging is not what I expect it to be. I think the problem is that the columns with only NA values of the crosstabs are being dropped even when using dropna = False , which