Difference between map, applymap and apply methods in Pandas

前端 未结 10 1675
刺人心
刺人心 2020-11-22 03:00

Can you tell me when to use these vectorization methods with basic examples?

I see that map is a Series method whereas the rest are

相关标签:
10条回答
  • 2020-11-22 03:04

    Just wanted to point out, as I struggled with this for a bit

    def f(x):
        if x < 0:
            x = 0
        elif x > 100000:
            x = 100000
        return x
    
    df.applymap(f)
    df.describe()
    

    this does not modify the dataframe itself, has to be reassigned

    df = df.applymap(f)
    df.describe()
    
    0 讨论(0)
  • 2020-11-22 03:04

    My understanding:

    From the function point of view:

    If the function has variables that need to compare within a column/ row, use apply.

    e.g.: lambda x: x.max()-x.mean().

    If the function is to be applied to each element:

    1> If a column/row is located, use apply

    2> If apply to entire dataframe, use applymap

    majority = lambda x : x > 17
    df2['legal_drinker'] = df2['age'].apply(majority)
    
    def times10(x):
      if type(x) is int:
        x *= 10 
      return x
    df2.applymap(times10)
    
    0 讨论(0)
  • 2020-11-22 03:06

    @jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. But it seems you can still use apply for element-wise computation....

    frame.apply(np.sqrt)
    Out[102]: 
                   b         d         e
    Utah         NaN  1.435159       NaN
    Ohio    1.098164  0.510594  0.729748
    Texas        NaN  0.456436  0.697337
    Oregon  0.359079       NaN       NaN
    
    frame.applymap(np.sqrt)
    Out[103]: 
                   b         d         e
    Utah         NaN  1.435159       NaN
    Ohio    1.098164  0.510594  0.729748
    Texas        NaN  0.456436  0.697337
    Oregon  0.359079       NaN       NaN
    
    0 讨论(0)
  • 2020-11-22 03:06

    Based on the answer of cs95

    • map is defined on Series ONLY
    • applymap is defined on DataFrames ONLY
    • apply is defined on BOTH

    give some examples

    In [3]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
    
    In [4]: frame
    Out[4]:
                b         d         e
    Utah    0.129885 -0.475957 -0.207679
    Ohio   -2.978331 -1.015918  0.784675
    Texas  -0.256689 -0.226366  2.262588
    Oregon  2.605526  1.139105 -0.927518
    
    In [5]: myformat=lambda x: f'{x:.2f}'
    
    In [6]: frame.d.map(myformat)
    Out[6]:
    Utah      -0.48
    Ohio      -1.02
    Texas     -0.23
    Oregon     1.14
    Name: d, dtype: object
    
    In [7]: frame.d.apply(myformat)
    Out[7]:
    Utah      -0.48
    Ohio      -1.02
    Texas     -0.23
    Oregon     1.14
    Name: d, dtype: object
    
    In [8]: frame.applymap(myformat)
    Out[8]:
                b      d      e
    Utah     0.13  -0.48  -0.21
    Ohio    -2.98  -1.02   0.78
    Texas   -0.26  -0.23   2.26
    Oregon   2.61   1.14  -0.93
    
    In [9]: frame.apply(lambda x: x.apply(myformat))
    Out[9]:
                b      d      e
    Utah     0.13  -0.48  -0.21
    Ohio    -2.98  -1.02   0.78
    Texas   -0.26  -0.23   2.26
    Oregon   2.61   1.14  -0.93
    
    
    In [10]: myfunc=lambda x: x**2
    
    In [11]: frame.applymap(myfunc)
    Out[11]:
                b         d         e
    Utah    0.016870  0.226535  0.043131
    Ohio    8.870453  1.032089  0.615714
    Texas   0.065889  0.051242  5.119305
    Oregon  6.788766  1.297560  0.860289
    
    In [12]: frame.apply(myfunc)
    Out[12]:
                b         d         e
    Utah    0.016870  0.226535  0.043131
    Ohio    8.870453  1.032089  0.615714
    Texas   0.065889  0.051242  5.119305
    Oregon  6.788766  1.297560  0.860289
    
    0 讨论(0)
  • 2020-11-22 03:09

    Comparing map, applymap and apply: Context Matters

    First major difference: DEFINITION

    • map is defined on Series ONLY
    • applymap is defined on DataFrames ONLY
    • apply is defined on BOTH

    Second major difference: INPUT ARGUMENT

    • map accepts dicts, Series, or callable
    • applymap and apply accept callables only

    Third major difference: BEHAVIOR

    • map is elementwise for Series
    • applymap is elementwise for DataFrames
    • apply also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

    Fourth major difference (the most important one): USE CASE

    • map is meant for mapping values from one domain to another, so is optimised for performance (e.g., df['A'].map({1:'a', 2:'b', 3:'c'}))
    • applymap is good for elementwise transformations across multiple rows/columns (e.g., df[['A', 'B', 'C']].applymap(str.strip))
    • apply is for applying any function that cannot be vectorised (e.g., df['sentences'].apply(nltk.sent_tokenize))

    Summarising

    Footnotes

    1. map when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.
    2. applymap in more recent versions has been optimised for some operations. You will find applymap slightly faster than apply in some cases. My suggestion is to test them both and use whatever works better.

    3. map is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.

    4. Series.apply returns a scalar for aggregating operations, Series otherwise. Similarly for DataFrame.apply. Note that apply also has fastpaths when called with certain NumPy functions such as mean, sum, etc.
    0 讨论(0)
  • 2020-11-22 03:13

    Adding to the other answers, in a Series there are also map and apply.

    Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.

    In [40]: p=pd.Series([1,2,3])
    In [41]: p
    Out[31]:
    0    1
    1    2
    2    3
    dtype: int64
    
    In [42]: p.apply(lambda x: pd.Series([x, x]))
    Out[42]: 
       0  1
    0  1  1
    1  2  2
    2  3  3
    
    In [43]: p.map(lambda x: pd.Series([x, x]))
    Out[43]: 
    0    0    1
    1    1
    dtype: int64
    1    0    2
    1    2
    dtype: int64
    2    0    3
    1    3
    dtype: int64
    dtype: object
    

    Also if I had a function with side effects, such as "connect to a web server", I'd probably use apply just for the sake of clarity.

    series.apply(download_file_for_every_element) 
    

    Map can use not only a function, but also a dictionary or another series. Let's say you want to manipulate permutations.

    Take

    1 2 3 4 5
    2 1 4 5 3
    

    The square of this permutation is

    1 2 3 4 5
    1 2 5 3 4
    

    You can compute it using map. Not sure if self-application is documented, but it works in 0.15.1.

    In [39]: p=pd.Series([1,0,3,4,2])
    
    In [40]: p.map(p)
    Out[40]: 
    0    0
    1    1
    2    4
    3    2
    4    3
    dtype: int64
    
    0 讨论(0)
提交回复
热议问题