multi-index | 易学教程

Selecting columns from pandas MultiIndex

阅读更多关于 Selecting columns from pandas MultiIndex

问题 I have DataFrame with MultiIndex columns that looks like this: # sample data col = pd.MultiIndex.from_arrays([['one', 'one', 'one', 'two', 'two', 'two'], ['a', 'b', 'c', 'a', 'b', 'c']]) data = pd.DataFrame(np.random.randn(4, 6), columns=col) data What is the proper, simple way of selecting only specific columns (e.g. ['a', 'c'] , not a range) from the second level? Currently I am doing it like this: import itertools tuples = [i for i in itertools.product(['one', 'two'], ['a', 'c'])] new

How to multi index rows using pandas

阅读更多关于 How to multi index rows using pandas

问题 I am new to pandas, whenever I implement my code, first index is repeating for each row. What I have tried is: pg, ag, inc are arrays cases=['a1','a2','a3'] data={'RED':rg,'GREEN':gg,'BLUE':bb} stat_index=['HELO','HERE' ] df=pd.DataFrame(data,pd.MultiIndex.from_product([cases,stat_index]),['RED','GREEN','BLUE']) df.to_csv("OUT.CSV") What I get is : RED GREEN BLUE a1 HELO 304.907 286.074 12.498 a1 HERE 508.670 509.784 94.550 a2 HELO 448.974 509.406 56.466 a2 HERE 764.727 432.084 43.462 a3 HELO

Pandas MultiIndex custom sort levels by categorical order, not alphabetically

阅读更多关于 Pandas MultiIndex custom sort levels by categorical order, not alphabetically

问题 I'm new to Pandas (0.16.1), and want custom sort in multiindex so i use Categoricals. Part of my multiindex: Part Defect Own Кузов 504 ИП Кузов 504 Итого Кузов 504 ПС Кузов 505 ПС Кузов 506 ПС Кузов 507 ПС Кузов 530 ИП Кузов 530 Итого Кузов 530 ПС I create pivot table with MultiIndex levels [Defect, Own]. Then i make "Own" Categorical (see p.s. part of question) to sort it as [ИП, ПС, Итого]. But when i prepend levels with "Part", which is also Categorical based on "Defect" level, and sort

How do I get a multilevel x axis labelled plot in pandas?

阅读更多关于 How do I get a multilevel x axis labelled plot in pandas?

问题 I Have a multi-indexed pandas data frame, I can produce the correctly shaped plot for what I require however the x axis is displayed only as the column headers of my multi index. I am after a way of getting a set of layered labels. What I currently have: Current Data Frame The plot using : df.plot(x=None, y=['Published NIV','Future NIV'], kind='line') Circled in blue is how I want the axis to look 回答1: This is the question I've worked most on Stackoverflow. I hope this fits your problem

How to retrieve pandas df multiindex from HDFStore?

阅读更多关于 How to retrieve pandas df multiindex from HDFStore?

问题 If DataFrame with simple index is the case, one may retrieve index from HDFStore as follows: df = pd.DataFrame(np.random.randn(2, 3), index=list('yz'), columns=list('abc')) df >>> a b c >>> y -0.181063 1.919440 1.550992 >>> z -0.701797 1.917156 0.645707 with pd.HDFStore('test.h5') as store: store.put('df', df, format='t') store.select_column('df', 'index') >>> 0 y >>> 1 z >>> Name: index, dtype: object As stated in the docs. But in case with MultiIndex such trick doesn't work: df = pd

Pandas multi-index subtract from value based on value in other column part 2

阅读更多关于 Pandas multi-index subtract from value based on value in other column part 2

问题 Based on a thorough and accurate response to this question, I am now faced with a new issue based on slightly different data. Given this data frame: df = pd.DataFrame({ ('A', 'a'): [23,3,54,7,32,76], ('B', 'b'): [23,'n/a',54,7,32,76], ('possible','possible'):[100,100,100,100,100,100] }) df A B possible a b possible 0 23 23 100 1 3 n/a 100 2 54 54 100 3 7 n/a 100 4 32 32 100 5 76 76 100 I'd like to subtract 4 from 'possible', per row, for any instance (column) where the value is 'n/a' for that

DataFrame: N largest indexes values (from level=1) to n columns

阅读更多关于 DataFrame: N largest indexes values (from level=1) to n columns

问题 I am trying to convert such a df: df = pd.DataFrame({'A': ['A1', 'A1', 'A1', 'A1', 'A1', 'A1', 'A2', 'A2', 'A2', 'A2', 'A2', 'A2', 'A2'], 'B': ['B1', 'B1', 'B2', 'B2', 'B3', 'B3', 'B4', 'B5', 'B6', 'B7', 'B7', 'B8', 'B8']}) by taking n (here 2) largest indexes (by count of B) to: My way of doing it: df = df.groupby(['A', 'B'])['A'].count() df = df.groupby(level=0).nlargest(2).reset_index(level=0, drop=True) what gives me (which is close to what I need): Now, the only methods I know to

Boost.MultiIndex: How to make an effective set intersection?

阅读更多关于 Boost.MultiIndex: How to make an effective set intersection?

问题 assume that we have a data1 and data2 . How can I intersect them with std::set_intersect() ? struct pID { int ID; unsigned int IDf;// postition in the file pID(int id,const unsigned int idf):ID(id),IDf(idf){} bool operator<(const pID& p)const { return ID<p.ID;} }; struct ID{}; struct IDf{}; typedef multi_index_container< pID, indexed_by< ordered_unique< tag<IDf>, BOOST_MULTI_INDEX_MEMBER(pID,unsigned int,IDf)>, ordered_non_unique< tag<ID>,BOOST_MULTI_INDEX_MEMBER(pID,int,ID)> > > pID_set; ID

Partition pandas .diff() in multi-index level

阅读更多关于 Partition pandas .diff() in multi-index level

问题 My question relates to calling .diff() within the partition of a multi index level In the following sample the output of the first df.diff() is values Greek English alpha a NaN b 2 c 2 d 2 beta e 11 f 1 g 1 h 1 But I want it to be: values Greek English alpha a NaN b 2 c 2 d 2 beta e NaN f 1 g 1 h 1 Here is a solution, using a loop but I am thinking I can avoid that loop ! import pandas as pd import numpy as np df = pd.DataFrame({'values' : [1.,3.,5.,7.,18.,19.,20.,21.], 'Greek' : ['alpha',

Pandas set_levels, how to avoid sorting of labels?

阅读更多关于 Pandas set_levels, how to avoid sorting of labels?

问题 I came across a problem using set_levels of multi index from io import StringIO txt = '''Name,Height,Age "",Metres,"" A,-1,25 B,95,-1''' df = pd.read_csv(StringIO(txt),header=[0,1],na_values=['-1','']) df.columns = df.columns.set_levels(df.columns.get_level_values(level=1).str.replace('Un.*',''),level=1) Name Height Age Metres 0 A NaN 25.0 1 B 95.0 NaN If I run the same command again df.columns = df.columns.set_levels(df.columns.get_level_values(level=1).str.replace('Un.*',''),level=1) Name