Python for loops and comprehension for loops

前端 未结 3 586
无人及你
无人及你 2021-01-24 03:45

Can someone explain to me why would this two statements (the for loop and the comprehension ) return two different answers. I thought they were the same, just different ways of

3条回答
  •  臣服心动
    2021-01-24 04:19

    Better is convert boolean mask to int, because pandas the fastest working with very fast vectorized functions:

    print (Top152['% Renewable']> Top152['% Renewable'].median())
    China                  True
    United States         False
    Japan                 False
    United Kingdom        False
    Russian Federation     True
    Canada                 True
    Germany                True
    India                 False
    France                False
    South Korea           False
    Italy                  True
    Spain                  True
    Iran                  False
    Australia             False
    Brazil                 True
    Name: % Renewable, dtype: bool
    

    def answer_ten():
        return (Top152['% Renewable'] > Top152['% Renewable'].median())
                .astype(int).rename('HighRenew')
    
    
    print (answer_ten())
    China                 1
    United States         0
    Japan                 0
    United Kingdom        0
    Russian Federation    1
    Canada                1
    Germany               1
    India                 0
    France                0
    South Korea           0
    Italy                 1
    Spain                 1
    Iran                  0
    Australia             0
    Brazil                1
    Name: HighRenew, dtype: int32
    

    For loop, very slow solution is possible use iterrows, but faster is first solution:

    def answer_ten():
        for idx, x in Top152.iterrows():
            if Top152.loc[idx, '% Renewable'] >= Top152['% Renewable'].median():
                Top152.loc[idx, 'HighRenew'] = 1
            else:
                Top152.loc[idx, 'HighRenew'] = 0
        return Top152['HighRenew'].astype(int)
    
    print (answer_ten())
    China                 1
    United States         0
    Japan                 0
    United Kingdom        0
    Russian Federation    1
    Canada                1
    Germany               1
    India                 0
    France                1
    South Korea           0
    Italy                 1
    Spain                 1
    Iran                  0
    Australia             0
    Brazil                1
    Name: HighRenew, dtype: int32
    

    Timings:

    #[15000 rows x 1 columns]
    Top152 = pd.concat([Top152]*1000).reset_index(drop=True)  
    
    def answer_ten1():
        return (Top152['% Renewable']> Top152['% Renewable'].median()).astype(int).rename('HighRenew')
    
    def answer_ten2():
        for idx, x in Top152.iterrows():
            if Top152.loc[idx, '% Renewable'] >= Top152['% Renewable'].median():
                Top152.loc[idx, 'HighRenew'] = 1
            else:
                Top152.loc[idx, 'HighRenew'] = 0
        return Top152['HighRenew'].astype(int)
    
    
    def answer_ten3():
        Top152['HighRenew'] = [1 if x >= Top152['% Renewable'].median() else 0 for x in Top152['% Renewable']]
        return Top152['HighRenew']
    
    print (answer_ten1())   
    print (answer_ten2())
    print (answer_ten3())  
    

    In [169]: %timeit (answer_ten1())
    1000 loops, best of 3: 528 µs per loop
    
    In [170]: %timeit answer_ten2()
    1 loop, best of 3: 16 s per loop
    
    In [171]: %timeit (answer_ten3())
    1 loop, best of 3: 2.67 s per loop
    

提交回复
热议问题