How to efficiently compare rows in a pandas DataFrame?

前端 未结 4 1840
自闭症患者
自闭症患者 2021-02-10 11:28

I have a pandas dataframe containing a record of lightning strikes with timestamps and global positions in the following format:

Index      Date      Time                


        
4条回答
  •  被撕碎了的回忆
    2021-02-10 11:54

    You can use some unsupervised ML algorithms for improving speed. Before using ML algorithms need a do some data transformation. For example:

    1. transform "Date","Timestamp" into one column feature "Timestamp".
    2. It's possible to use raw "Lat","Lon" but it's maybe helpful when we merge them in one. Common approach calculates a distance from some arbitrary point(it' maybe a center of area), sometimes for increasing an importance of the geolocation you can use more than one point for measure distance from theirs . For the distance calculations, you can use get_distance from ysearka.
    3. Data scaling(http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html#sklearn.preprocessing.RobustScaler). Try with and without it.

    After data preprocessing , you can simply use one of scikit-learn clustering algorithms(http://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster), for arrange your data in clusters.KMeans good point for beginning.

    Also, pay attention on NearestNeighbors(http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html) for search concrete amount of objects in order of similarity.

提交回复
热议问题