Prepend a level to a pandas MultiIndex

后端 未结 5 682
一向
一向 2020-11-28 02:25

I have a DataFrame with a MultiIndex created after some grouping:

import numpy as np
import pandas as p
from numpy.random import randn

df = p.DataFrame({
           


        
相关标签:
5条回答
  • 2020-11-28 02:50

    A nice way to do this in one line using pandas.concat():

    import pandas as pd
    
    pd.concat([df], keys=['Foo'], names=['Firstlevel'])
    

    An even shorter way:

    pd.concat({'Foo': df}, names=['Firstlevel'])
    

    This can be generalized to many data frames, see the docs.

    0 讨论(0)
  • 2020-11-28 02:53

    How about building it from scratch with pandas.MultiIndex.from_tuples?

    df.index = p.MultiIndex.from_tuples(
        [(nl, A, B) for nl, (A, B) in
            zip(['Foo'] * len(df), df.index)],
        names=['FirstLevel', 'A', 'B'])
    

    Similarly to cxrodger's solution, this is a flexible method and avoids modifying the underlying array for the dataframe.

    0 讨论(0)
  • 2020-11-28 03:01

    I made a little function out of cxrodgers answer, which IMHO is the best solution since it works purely on an index, independent of any data frame or series.

    There is one fix I added: the to_frame() method will invent new names for index levels that don't have one. As such the new index will have names that don't exist in the old index. I added some code to revert this name-change.

    Below is the code, I've used it myself for a while and it seems to work fine. If you find any issues or edge cases, I'd be much obliged to adjust my answer.

    import pandas as pd
    
    def _handle_insert_loc(loc: int, n: int) -> int:
        """
        Computes the insert index from the right if loc is negative for a given size of n.
        """
        return n + loc + 1 if loc < 0 else loc
    
    
    def add_index_level(old_index: pd.Index, value: Any, name: str = None, loc: int = 0) -> pd.MultiIndex:
        """
        Expand a (multi)index by adding a level to it.
    
        :param old_index: The index to expand
        :param name: The name of the new index level
        :param value: Scalar or list-like, the values of the new index level
        :param loc: Where to insert the level in the index, 0 is at the front, negative values count back from the rear end
        :return: A new multi-index with the new level added
        """
        loc = _handle_insert_loc(loc, len(old_index.names))
        old_index_df = old_index.to_frame()
        old_index_df.insert(loc, name, value)
        new_index_names = list(old_index.names)  # sometimes new index level names are invented when converting to a df,
        new_index_names.insert(loc, name)        # here the original names are reconstructed
        new_index = pd.MultiIndex.from_frame(old_index_df, names=new_index_names)
        return new_index
    

    It passed the following unittest code:

    import unittest
    
    import numpy as np
    import pandas as pd
    
    class TestPandaStuff(unittest.TestCase):
    
        def test_add_index_level(self):
            df = pd.DataFrame(data=np.random.normal(size=(6, 3)))
            i1 = add_index_level(df.index, "foo")
    
            # it does not invent new index names where there are missing
            self.assertEqual([None, None], i1.names)
    
            # the new level values are added
            self.assertTrue(np.all(i1.get_level_values(0) == "foo"))
            self.assertTrue(np.all(i1.get_level_values(1) == df.index))
    
            # it does not invent new index names where there are missing
            i2 = add_index_level(i1, ["x", "y"]*3, name="xy", loc=2)
            i3 = add_index_level(i2, ["a", "b", "c"]*2, name="abc", loc=-1)
            self.assertEqual([None, None, "xy", "abc"], i3.names)
    
            # the new level values are added
            self.assertTrue(np.all(i3.get_level_values(0) == "foo"))
            self.assertTrue(np.all(i3.get_level_values(1) == df.index))
            self.assertTrue(np.all(i3.get_level_values(2) == ["x", "y"]*3))
            self.assertTrue(np.all(i3.get_level_values(3) == ["a", "b", "c"]*2))
    
            # df.index = i3
            # print()
            # print(df)
    
    0 讨论(0)
  • 2020-11-28 03:05

    You can first add it as a normal column and then append it to the current index, so:

    df['Firstlevel'] = 'Foo'
    df.set_index('Firstlevel', append=True, inplace=True)
    

    And change the order if needed with:

    df.reorder_levels(['Firstlevel', 'A', 'B'])
    

    Which results in:

                          Vals
    Firstlevel A  B           
    Foo        a1 b1  0.871563
                  b2  0.494001
               a2 b3 -0.167811
               a3 b4 -1.353409
    
    0 讨论(0)
  • 2020-11-28 03:11

    I think this is a more general solution:

    # Convert index to dataframe
    old_idx = df.index.to_frame()
    
    # Insert new level at specified location
    old_idx.insert(0, 'new_level_name', new_level_values)
    
    # Convert back to MultiIndex
    df.index = pandas.MultiIndex.from_frame(old_idx)
    

    Some advantages over the other answers:

    • The new level can be added at any location, not just the top.
    • It is purely a manipulation on the index and doesn't require manipulating the data, like the concatenation trick.
    • It doesn't require adding a column as an intermediate step, which can break multi-level column indexes.
    0 讨论(0)
提交回复
热议问题