Clustering Algorithm for Paper Boys

后端 未结 17 1805
后悔当初
后悔当初 2021-01-30 01:32

I need help selecting or creating a clustering algorithm according to certain criteria.

Imagine you are managing newspaper delivery persons.

  • You have a set
相关标签:
17条回答
  • 2021-01-30 02:15

    I know of a pretty novel approach to this problem that I have seen applied to Bioinformatics, though it is valid for any sort of clustering problem. It's certainly not the simplest solution but one that I think is very interesting. The basic premise is that clustering involves multiple objectives. For one you want to minimise the number of clusters, the trival solution being a single cluster with all the data. The second standard objective is to minimise the amount of variance within a cluster, the trivial solution being many clusters each with only a single data point. The interesting solutions come about when you try to include both of these objectives and optimise the trade-off.

    At the core of the proposed approach is something called a memetic algorithm that is a little like a genetic algorithm, which steve mentioned, however it not only explores the solution space well but also has the ability to focus in on interesting regions, i.e. solutions. At the very least I recommend reading some of the papers on this subject as memetic algorithms are an unusual approach, though a word of warning; it may lead you to read The Selfish Gene and I still haven't decided whether that was a good thing... If algorithms don't interest you then maybe you can just try and express your problem as the format requires and use the source code provided. Related papers and code can be found here: Multi Objective Clustering

    0 讨论(0)
  • 2021-01-30 02:18

    I think you want a hierarchical agglomeration technique rather than k-means. If you get your algorithm right you can stop it when you have the right number of clusters. As someone else mentioned you can seed subsequent clusterings with previous solutions which may give you a siginificant performance improvement.

    You may want to look closely at the distance function you use, especially if your problem has high dimension. Euclidean distance is the easiest to understand but may not be the best, look at alternatives such as Mahalanobis.

    I'm presuming that your real problem has nothing to do with delivering newspapers...

    0 讨论(0)
  • 2021-01-30 02:23

    I would use a basic algorithm to create a first set of paperboy routes according to where they live, and current locations of subscribers, then:

    when paperboys are:

    • Added: They take locations from one or more paperboys working in the same general area from where the new guy lives.
    • Removed: His locations are given to the other paperboys, using the closest locations to their routes.

    when locations are:

    • Added : Same thing, the location is added to the closest route.
    • Removed: just removed from that boy's route.

    Once a quarter, you could re-calculate the whole thing and change all the routes.

    0 讨论(0)
  • 2021-01-30 02:24

    Rather than a clustering model, I think you really want some variant of the Set Covering location model, with an additional constraint to cover the number of addresses covered by each facility. I can't really find a good explanation of it online. You can take a look at this page, but they're solving it using areal units and you probably want to solve it in either euclidean or network space. If you're willing to dig up something in dead tree format, check out chapter 4 of Network and Discrete Location by Daskin.

    0 讨论(0)
  • 2021-01-30 02:24

    This is not directly related to the problem, but something I've heard and which should be considered if this is truly a route-planning problem you have. This would affect the ordering of the addresses within the set assigned to each driver.

    UPS has software which generates optimum routes for their delivery people to follow. The software tries to maximize the number of right turns that are taken during the route. This saves them a lot of time on deliveries.

    For people that don't live in the USA the reason for doing this may not be immediately obvious. In the US people drive on the right side of the road, so when making a right turn you don't have to wait for oncoming traffic if the light is green. Also, in the US, when turning right at a red light you (usually) don't have to wait for green before you can go. If you're always turning right then you never have to wait for lights.

    There's an article about it here: http://abcnews.go.com/wnt/story?id=3005890

    0 讨论(0)
  • 2021-01-30 02:27

    You can have K means or expected maximization remain as unchanged as possible by using the previous cluster as a clustering feature. Getting each cluster to have the same amount of items seems bit trickier. I can think of how to do it as a post clustering step by doing k means and then shuffling some points until things balance but that doesn't seem very efficient.

    0 讨论(0)
提交回复
热议问题