问题
Looking for some advice here. Does anyone know a good place to start looking into matching algorithm in a n-dimensional space. For example, any dating site out there must be using some sort of algorithm to match 2 people. What I have read is that we can map characteristics of a person in a n-dimensional array with a point system for each characteristic. Once we have all (available) characteristics of a person, we can represent this person in a point within a n-dimensional array. Then, to match 2 person would be as simple as finding the shortest distance between 2 point in this n-dim array. Does anyone has any reference in implementation of these kind of algorithm? What's the best language to write these kind of stuff in?
回答1:
If you want to find the closest match for one person, Bentley & Shamos published a multi-dimensional divide-and-conquer method: Divide-and-conquer in O(N log N) time: Divide-and-conquer in multidimensional space in Proceedings of the eighth annual ACM symposium on Theory of computing 1976. If you can't get a copy this may also be helpful.
However for your example application actually finding the nearest neighbour doesn't seem to be the biggest problem - much trickier is mapping inputs into dimensions. For example if one dimension is "likes animals", what value do you give to someone who likes dogs & cats but can't stand horses? What about someone who loves horses, thinks dogs are OK, is annoyed by cats and is ambivalent about goldfish?
回答2:
First off, pick the language you are most familiar with. The algorithms for handling this are fairly straightforward, and should work in any modern language. (As long as there is some concept of array, and potentially a matrix library, you should be fine.) I've implemented many of these in C, C++, and C# before, but seen implementations in python, vb.net, etc.
Depending on what you are trying to do, there are a couple of options.
That being said, what you want to do depends on your goals. If you just want to find the best match, you can use simple distance calculations (ie: sqrt of sum of squares for each dimension/property in the n-dimensional array), optionally weight each properties distance, and use the closest point.
If you want to group together people, you'll want to look at clustering algorithms. For data like this, I would suspect that some form of K-Means clustering or fuzzy c-means clustering would work the best.
回答3:
How about following solution.
Assume the users are U1, U2, U3, U4, U5.... Un. Attributes are A1, A2, A3, A4, A5..... Am
Store these as
A1 - U1, U2, U3... A2 - U4, U6, U7.... A3 -
Profile attribute is the index and stores all users. Now, if a new User comes, see the its attributes and for those attributes, find common people. number of times a person exists in these lists - higher ranking.
回答4:
The process you mention is known as k-nearest neighbor, with k=1. It's the most intuitive approach for finding similar vectors.
http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
回答5:
What you describe with your example is not n-dimentional matching, but rather bipartite matching of nodes with multiple features. (You will need to provide a function that will, given two persons compute this distance). There should be very efficient algorithms for that. In n-dimentional matching you would try to match nodes from more than two sets (in your example, suppose you could cut people to body, soul, and music preferences, then recombine them to make new persons. Then the n-dimentional matching would cut the people apart, and recombine them so that the new persons engineered would make really nice couples :D ) Here is the wikipedia article for 3-dimentional matching, which is np-complete.
Also, as noted by another, if your goal is not to match persons in pairs, but rather find compatible groups, you should consider clustering them to groups. This can be done with e.g. Unsupervised Learning
来源:https://stackoverflow.com/questions/677987/n-dimensional-matching-algorithm