Python for loops and comprehension for loops

前端 未结 3 577
无人及你
无人及你 2021-01-24 03:45

Can someone explain to me why would this two statements (the for loop and the comprehension ) return two different answers. I thought they were the same, just different ways of

相关标签:
3条回答
  • 2021-01-24 04:11

    you are setting the whole column (vector) in each iteration step:

    Top152['HighRenew'] = 1
    

    Try this vectorized approach instead:

    Top152['HighRenew'] = (Top152['% Renewable'] >= Top152['% Renewable'].median()).astype(int)
    

    so your function may be implemented as follows:

    def answer_ten():
        return (Top15['% Renewable'] >= Top15['% Renewable'].median()).astype(int)
    
    0 讨论(0)
  • 2021-01-24 04:19

    Better is convert boolean mask to int, because pandas the fastest working with very fast vectorized functions:

    print (Top152['% Renewable']> Top152['% Renewable'].median())
    China                  True
    United States         False
    Japan                 False
    United Kingdom        False
    Russian Federation     True
    Canada                 True
    Germany                True
    India                 False
    France                False
    South Korea           False
    Italy                  True
    Spain                  True
    Iran                  False
    Australia             False
    Brazil                 True
    Name: % Renewable, dtype: bool
    

    def answer_ten():
        return (Top152['% Renewable'] > Top152['% Renewable'].median())
                .astype(int).rename('HighRenew')
    
    
    print (answer_ten())
    China                 1
    United States         0
    Japan                 0
    United Kingdom        0
    Russian Federation    1
    Canada                1
    Germany               1
    India                 0
    France                0
    South Korea           0
    Italy                 1
    Spain                 1
    Iran                  0
    Australia             0
    Brazil                1
    Name: HighRenew, dtype: int32
    

    For loop, very slow solution is possible use iterrows, but faster is first solution:

    def answer_ten():
        for idx, x in Top152.iterrows():
            if Top152.loc[idx, '% Renewable'] >= Top152['% Renewable'].median():
                Top152.loc[idx, 'HighRenew'] = 1
            else:
                Top152.loc[idx, 'HighRenew'] = 0
        return Top152['HighRenew'].astype(int)
    
    print (answer_ten())
    China                 1
    United States         0
    Japan                 0
    United Kingdom        0
    Russian Federation    1
    Canada                1
    Germany               1
    India                 0
    France                1
    South Korea           0
    Italy                 1
    Spain                 1
    Iran                  0
    Australia             0
    Brazil                1
    Name: HighRenew, dtype: int32
    

    Timings:

    #[15000 rows x 1 columns]
    Top152 = pd.concat([Top152]*1000).reset_index(drop=True)  
    
    def answer_ten1():
        return (Top152['% Renewable']> Top152['% Renewable'].median()).astype(int).rename('HighRenew')
    
    def answer_ten2():
        for idx, x in Top152.iterrows():
            if Top152.loc[idx, '% Renewable'] >= Top152['% Renewable'].median():
                Top152.loc[idx, 'HighRenew'] = 1
            else:
                Top152.loc[idx, 'HighRenew'] = 0
        return Top152['HighRenew'].astype(int)
    
    
    def answer_ten3():
        Top152['HighRenew'] = [1 if x >= Top152['% Renewable'].median() else 0 for x in Top152['% Renewable']]
        return Top152['HighRenew']
    
    print (answer_ten1())   
    print (answer_ten2())
    print (answer_ten3())  
    

    In [169]: %timeit (answer_ten1())
    1000 loops, best of 3: 528 µs per loop
    
    In [170]: %timeit answer_ten2()
    1 loop, best of 3: 16 s per loop
    
    In [171]: %timeit (answer_ten3())
    1 loop, best of 3: 2.67 s per loop
    
    0 讨论(0)
  • 2021-01-24 04:26

    In the second approach you are editing your vector. While the for loop will save it (in the background) to avoid the unwanted edits!

    0 讨论(0)
提交回复
热议问题