问题
I have dataset of records where each record is with 5 labels and the importance of each label is different.
I know to labels order according to importance but don't know the differences, so the difference between two records is look like: adist of label1 + bdist of label2 + c*dist of label3 such that a+b+c = 1.
The data set contains around 3000 records and I want to cluster it(don't know the number of clusters) in some way.
I thought about DBSCAN but it is not really good with high dimensional data.
Hierarchical clustering need to know the number of clusters and also I think that it depands what it the first record you compare to so maybe the result will be wrong in this case.
Also look for graph clustering so the difference between two records will be the weight of the edge between this tow nodes but didn't find an algorithm that does that.
EDIT:
the data is a CDR data, represent the antennas user connected to while using his cellphone for calling, SMS and internet so the labels are:
location(longitude,latitude), part_of_day(night,morning-noon,after noon,evening),
workday\weekend, ,day_of_week, num of days of connection to this antenna
And I want to cluster it to detect points of interest of this user such as gym, mall, etc.. so I want to cluster it and separate between gym and mall even though they are close to each other but it is a different activity.
Any ideas about how to do it?
来源:https://stackoverflow.com/questions/59248764/unsupervised-high-dimension-clustering