How to efficiently compare rows in a pandas DataFrame?

前端 未结 4 1843
自闭症患者
自闭症患者 2021-02-10 11:28

I have a pandas dataframe containing a record of lightning strikes with timestamps and global positions in the following format:

Index      Date      Time                


        
4条回答
  •  春和景丽
    2021-02-10 12:06

    This is one of those problems that seems easy initially but the more your think about it the more your head melts! We have essentially got a three-dimensional (Lat, Lon, Time) clustering problem, followed by filtering based on cluster size. There are a number of questions a little like this (though more abstract) and the responses tend to involve scipy. Check out this one. I would also check out fuzzy c-means clustering. Here is the skfuzzy example.

    In your case though, the geodesic distance might be key, in which case you might not want to disregard computing distance. The high-maths examples sort of miss the point.

    If accuracy is not important there may be more basic ways of doing it, like creating arbitrary time 'bins' using dataframe.cut or similar. There would be an optimum size between speed and accuracy. For instance, if you cut into t/4 bins (1800 seconds), and take a 4 bins gap as being far away in time, then your actual time difference could be 5401-8999. An example of cutting. Applying something similar to the lon and lat co-ordinates, and doing calculations on the approximate values, will be faster.

    Hope that helps.

提交回复
热议问题