Why is it required to typecast a map into a list to assign it to a pandas series?

后端 未结 1 1159
梦如初夏
梦如初夏 2021-02-14 09:07

I have just started learning the basics of pandas, and there is one thing which made me think.

import pandas as pd
data = pd.DataFrame({\'Column1\': [\'A\', \'B\         


        
相关标签:
1条回答
  • 2021-02-14 09:52

    TL;DR: ranges have __getitem__, and __len__, while maps don't.


    The details

    I'm assuming that the syntax of creating a new dataframe column is in some way syntactic sugar for Pandas.DataFrame.insert, which takes as an argument for value a

    scalar, Series, or array-like

    Given that, it seems the question reduces to "Why does pandas treat a list and a range as array-like, but not a map?"

    See: numpy: formal definition of "array_like" objects?.

    If you try making an array out of a range, it works fine, because range is close enough to array-like, but you can't do so with a map.

    >>> import numpy as np
    >>> foo = np.array(range(10))
    >>> bar = np.array(map(lambda x: x + 1, range(10))
    >>> foo
    array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    >>> bar
    array(<map object at 0x7f7e553219e8>, dtype=object)

    map is not "array-like", while range is.

    Looking further into PyArray_GetArrayParamsFromObject, referred to in the linked answer, the end of the function calls PySequence_Check. That code is python code, and there's a good discussion of it on Stack Overflow: What is Python's sequence protocol? .

    Earlier, in the same file, it says:

       /*
         * PySequence_Check detects whether an old type object is a
         * sequence by the presence of the __getitem__ attribute, and
         * for new type objects that aren't dictionaries by the
         * presence of the __len__ attribute as well. In either case it
         * is possible to have an object that tests as a sequence but
         * doesn't behave as a sequence and consequently, the
         * PySequence_GetItem call can fail. When that happens and the
         * object looks like a dictionary, we truncate the dimensions
         * and set the object creation flag, otherwise we pass the
         * error back up the call chain.
         */
    

    This seems to be a major part of "array-like" - any item that has getitem and len is array like. range has both, while map has neither.

    Try it yourself!

    __getitem__ and __len__ are necessary and sufficient to make a sequence, and therefore get the column to display as you wish instead of as a single object.

    Try this:

    class Column(object):
        def __len__(self):
            return 5
        def __getitem__(self, index):
            if 0 <= index < 5:
                return index+5
            else:
                raise IndexError
    
    col = Column()
    a_col = np.array(col)
    
    • If you don't have either __getitem__() or __len()__, numpy will create an array for you, but it will be with the object in it, and it won't iterate through for you.
    • If you have both functions, it displays the way you want.

    (Thanks to user2357112 for correcting me. In a slightly simpler example, I thought __iter__ was required. It's not. The __getitem__ function does need to make sure the index is in range, though.)

    0 讨论(0)
提交回复
热议问题