I have just started learning the basics of pandas, and there is one thing which made me think.
import pandas as pd
data = pd.DataFrame({\'Column1\': [\'A\', \'B\
TL;DR: range
s have __getitem__
, and __len__
, while map
s don't.
I'm assuming that the syntax of creating a new dataframe column is in some way syntactic sugar for Pandas.DataFrame.insert, which takes as an argument for value
a
scalar, Series, or array-like
Given that, it seems the question reduces to "Why does pandas treat a list and a range as array-like, but not a map?"
See: numpy: formal definition of "array_like" objects?.
If you try making an array out of a range, it works fine, because range is close enough to array-like, but you can't do so with a map.
>>> import numpy as np
>>> foo = np.array(range(10))
>>> bar = np.array(map(lambda x: x + 1, range(10))
>>> foo
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> bar
array(
map
is not "array-like", while range
is.
Looking further into PyArray_GetArrayParamsFromObject, referred to in the linked answer, the end of the function calls PySequence_Check. That code is python code, and there's a good discussion of it on Stack Overflow: What is Python's sequence protocol? .
Earlier, in the same file, it says:
/* * PySequence_Check detects whether an old type object is a * sequence by the presence of the __getitem__ attribute, and * for new type objects that aren't dictionaries by the * presence of the __len__ attribute as well. In either case it * is possible to have an object that tests as a sequence but * doesn't behave as a sequence and consequently, the * PySequence_GetItem call can fail. When that happens and the * object looks like a dictionary, we truncate the dimensions * and set the object creation flag, otherwise we pass the * error back up the call chain. */
This seems to be a major part of "array-like" - any item that has getitem and len is array like. range
has both, while map
has neither.
__getitem__
and __len__
are necessary and sufficient to make a sequence, and therefore get the column to display as you wish instead of as a single object.
Try this:
class Column(object):
def __len__(self):
return 5
def __getitem__(self, index):
if 0 <= index < 5:
return index+5
else:
raise IndexError
col = Column()
a_col = np.array(col)
__getitem__()
or __len()__
, numpy will create an array for you, but it will be with the object in it, and it won't iterate through for you.(Thanks to user2357112 for correcting me. In a slightly simpler example, I thought __iter__
was required. It's not. The __getitem__
function does need to make sure the index is in range, though.)