Yesterday I was pairing the socks from the clean laundry and figured out the way I was doing it is not very efficient. I was doing a naive search — picking one sock and
The problem of sorting your n pairs of socks is O(n). Before you throw them in the laundry basket, you thread the left one to the right one. On taking them out, you cut the thread and put each pair into your drawer - 2 operations on n pairs, so O(n).
Now the next question is simply whether you do your own laundry and your wife does hers. That is a problem likely in an entirely different domain of problems. :)
You are trying to solve the wrong problem.
Solution 1: Each time you put dirty socks in your laundry basket, tie them in a little knot. That way you will not have to do any sorting after the washing. Think of it like registering an index in a Mongo database. A little work ahead for some CPU savings in the future.
Solution 2: If it's winter, you don't have to wear matching socks. We are programmers. Nobody needs to know, as long as it works.
Solution 3: Spread the work. You want to perform such a complex CPU process asynchronously, without blocking the UI. Take that pile of socks and stuff them in a bag. Only look for a pair when you need it. That way the amount of work it takes is much less noticeable.
Hope this helps!
Create a hash table which will be used for unmatched socks, using the pattern as the hash. Iterate over the socks one by one. If the sock has a pattern match in the hash table, take the sock out of the table and make a pair. If the sock does not have a match, put it into the table.
As a practical solution:
If you have 1000 socks, with 8 colors and an average distribution, you can make 4 piles of each 125 socks in c*n time. With a threshold of 5 socks you can sort every pile in 6 runs. (Counting 2 seconds to throw a sock on the right pile it will take you little under 4 hours.)
If you have just 60 socks, 3 colors and 2 sort of socks (yours / your wife's) you can sort every pile of 10 socks in 1 runs (Again threshold = 5). (Counting 2 seconds it will take you 2 min).
The initial bucket sorting will speed up your process, because it divides your n socks into k buckets in c*n
time so than you will only have to do c*n*log(k)
work. (Not taking into account the threshold). So all in all you do about n*c*(1 + log(k))
work, where c is the time to throw a sock on a pile.
This approach will be favourable compared to any c*x*n + O(1)
method roughly as long as log(k) < x - 1
.
In computer science this can be helpful: We have a collection of n things, an order on them (length) and also an equivalence relation (extra information, for example the color of socks). The equivalence relation allows us to make a partition of the original collection, and in every equivalence class our order is still maintained. The mapping of a thing to it's equivalence class can be done in O(1), so only O(n) is needed to assign each item to a class. Now we have used our extra information and can proceed in any manner to sort every class. The advantage is that the data sets are already significantly smaller.
The method can also be nested, if we have multiple equivalence relations -> make colour piles, than within every pile partition on texture, than sort on length. Any equivalence relation that creates a partition with more than 2 elements that have about even size will bring a speed improvement over sorting (provided we can directly assign a sock to its pile), and the sorting can happen very quickly on smaller data sets.
List<Sock> UnSearchedSocks = getAllSocks();
List<Sock> UnMatchedSocks = new list<Sock>();
List<PairOfSocks> PairedSocks = new list<PairOfSocks>();
foreach (Sock newSock in UnsearchedSocks)
{
Sock MatchedSock = null;
foreach(Sock UnmatchedSock in UnmatchedSocks)
{
if (UnmatchedSock.isPairOf(newSock))
{
MatchedSock = UnmatchedSock;
break;
}
}
if (MatchedSock != null)
{
UnmatchedSocks.remove(MatchedSock);
PairedSocks.Add(new PairOfSocks(MatchedSock, NewSock));
}
else
{
UnmatchedSocks.Add(NewSock);
}
}
I hope I can contribute something new to this problem. I noticed that all of the answers neglect the fact that there are two points where you can perform preprocessing, without slowing down your overall laundry performance.
Also, we don't need to assume a large number of socks, even for large families. Socks are taken out of the drawer and are worn, and then they are tossed in a place (maybe a bin) where they stay before being laundered. While I wouldn't call said bin a LIFO-Stack, I'd say it is safe to assume that
Since all washing machines I know about are limited in size (regardless of how many socks you have to wash), and the actual randomizing occurs in the washing machine, no matter how many socks we have, we always have small subsets which contain almost no singletons.
Our two preprocessing stages are "putting the socks on the clothesline" and "Taking the socks from the clothesline", which we have to do, in order to get socks which are not only clean but also dry. As with washing machines, clotheslines are finite, and I assume that we have the whole part of the line where we put our socks in sight.
Here's the algorithm for put_socks_on_line():
while (socks left in basket) {
take_sock();
if (cluster of similar socks is present) {
Add sock to cluster (if possible, next to the matching pair)
} else {
Hang it somewhere on the line, this is now a new cluster of similar-looking socks.
Leave enough space around this sock to add other socks later on
}
}
Don't waste your time moving socks around or looking for the best match, this all should be done in O(n), which we would also need for just putting them on the line unsorted. The socks aren't paired yet, we only have several similarity clusters on the line. It's helpful that we have a limited set of socks here, as this helps us to create "good" clusters (for example, if there are only black socks in the set of socks, clustering by colours would not be the way to go)
Here's the algorithm for take_socks_from_line():
while(socks left on line) {
take_next_sock();
if (matching pair visible on line or in basket) {
Take it as well, pair 'em and put 'em away
} else {
put the sock in the basket
}
I should point out that in order to improve the speed of the remaining steps, it is wise not to randomly pick the next sock, but to sequentially take sock after sock from each cluster. Both preprocessing steps don't take more time than just putting the socks on the line or in the basket, which we have to do no matter what, so this should greatly enhance the laundry performance.
After this, it's easy to do the hash partitioning algorithm. Usually, about 75% of the socks are already paired, leaving me with a very small subset of socks, and this subset is already (somewhat) clustered (I don't introduce much entropy into my basket after the preprocessing steps). Another thing is that the remaining clusters tend to be small enough to be handled at once, so it is possible to take a whole cluster out of the basket.
Here's the algorithm for sort_remaining_clusters():
while(clusters present in basket) {
Take out the cluster and spread it
Process it immediately
Leave remaining socks where they are
}
After that, there are only a few socks left. This is where I introduce previously unpaired socks into the system and process the remaining socks without any special algorithm - the remaining socks are very few and can be processed visually very fast.
For all remaining socks, I assume that their counterparts are still unwashed and put them away for the next iteration. If you register a growth of unpaired socks over time (a "sock leak"), you should check your bin - it might get randomized (do you have cats which sleep in there?)
I know that these algorithms take a lot of assumptions: a bin which acts as some sort of LIFO stack, a limited, normal washing machine, and a limited, normal clothesline - but this still works with very large numbers of socks.
About parallelism: As long as you toss both socks into the same bin, you can easily parallelize all of those steps.