How can I pair socks from a pile efficiently?

后端 未结 30 956
旧巷少年郎
旧巷少年郎 2020-11-27 08:37

Yesterday I was pairing the socks from the clean laundry and figured out the way I was doing it is not very efficient. I was doing a naive search — picking one sock and

相关标签:
30条回答
  • 2020-11-27 09:07

    Socks, whether real ones or some analogous data structure, would be supplied in pairs.

    The simplest answer is prior to allowing the pair to be separated, a single data structure for the pair should have been initialized that contained a pointer to the left and right sock, thus enabling socks to be referred to directly or via their pair. A sock may also be extended to contain a pointer to its partner.

    This solves any computational pairing problem by removing it with a layer of abstraction.

    Applying the same idea to the practical problem of pairing socks, the apparent answer is: don't allow your socks to ever be unpaired. Socks are provided as a pair, put in the drawer as a pair (perhaps by balling them together), worn as a pair. But the point where unpairing is possible is in the washer, so all that's required is a physical mechanism that allows the socks to stay together and be washed efficiently.

    There are two physical possibilities:

    For a 'pair' object that keeps a pointer to each sock we could have a cloth bag that we use to keep the socks together. This seems like massive overhead.

    But for each sock to keep a reference to the other, there is a neat solution: a popper (or a 'snap button' if you're American), such as these:

    http://www.aliexpress.com/compare/compare-invisible-snap-buttons.html

    Then all you do is snap your socks together right after you take them off and put them in your washing basket, and again you've removed the problem of needing to pair your socks with a physical abstraction of the 'pair' concept.

    0 讨论(0)
  • 2020-11-27 09:08

    Here's an Omega(n log n) lower bound in comparison based model. (The only valid operation is comparing two socks.)

    Suppose that you know that your 2n socks are arranged this way:

    p1 p2 p3 ... pn pf(1) pf(2) ... pf(n)

    where f is an unknown permutation of the set {1,2,...,n}. Knowing this cannot make the problem harder. There are n! possible outputs (matchings between first and second half), which means you need log(n!) = Omega(n log n) comparisons. This is obtainable by sorting.

    Since you are interested in connections to element distinctness problem: proving the Omega(n log n) bound for element distinctness is harder, because the output is binary yes/no. Here, the output has to be a matching and the number of possible outputs suffices to get a decent bound. However, there's a variant connected to element distinctness. Suppose you are given 2n socks and wonder if they can be uniquely paired. You can get a reduction from ED by sending (a1, a2, ..., an) to (a1, a1, a2, a2, ..., an, an). (Parenthetically, the proof of hardness of ED is very interesting, via topology.)

    I think that there should be an Omega(n2) bound for the original problem if you allow equality tests only. My intuition is: Consider a graph where you add an edge after a test, and argue that if the graph is not dense the output is not uniquely determined.

    0 讨论(0)
  • 2020-11-27 09:09

    Sorting solutions have been proposed, but sorting is a little too much: We don't need order; we just need equality groups.

    So hashing would be enough (and faster).

    1. For each color of socks, form a pile. Iterate over all socks in your input basket and distribute them onto the color piles.
    2. Iterate over each pile and distribute it by some other metric (e.g. pattern) into the second set of piles
    3. Recursively apply this scheme until you have distributed all socks onto very small piles that you can visually process immediately

    This kind of recursive hash partitioning is actually being done by SQL Server when it needs to hash join or hash aggregate over huge data sets. It distributes its build input stream into many partitions which are independent. This scheme scales to arbitrary amounts of data and multiple CPUs linearly.

    You don't need recursive partitioning if you can find a distribution key (hash key) that provides enough buckets that each bucket is small enough to be processed very quickly. Unfortunately, I don't think socks have such a property.

    If each sock had an integer called "PairID" one could easily distribute them into 10 buckets according to PairID % 10 (the last digit).

    The best real-world partitioning I can think of is creating a rectangle of piles: one dimension is color, the other is the pattern. Why a rectangle? Because we need O(1) random-access to piles. (A 3D cuboid would also work, but that is not very practical.)


    Update:

    What about parallelism? Can multiple humans match the socks faster?

    1. The simplest parallelization strategy is to have multiple workers take from the input basket and put the socks onto the piles. This only scales up so much - imagine 100 people fighting over 10 piles. The synchronization costs (manifesting themselves as hand-collisions and human communication) destroy efficiency and speed-up (see the Universal Scalability Law!). Is this prone to deadlocks? No, because each worker only needs to access one pile at a time. With just one "lock" there cannot be a deadlock. Livelocks might be possible depending on how the humans coordinate access to piles. They might just use random backoff like network cards do that on a physical level to determine what card can exclusively access the network wire. If it works for NICs, it should work for humans as well.
    2. It scales nearly indefinitely if each worker has its own set of piles. Workers can then take big chunks of socks from the input basket (very little contention as they are doing it rarely) and they do not need to synchronise when distributing the socks at all (because they have thread-local piles). At the end, all workers need to union their pile-sets. I believe that can be done in O(log (worker count * piles per worker)) if the workers form an aggregation tree.

    What about the element distinctness problem? As the article states, the element distinctness problem can be solved in O(N). This is the same for the socks problem (also O(N), if you need only one distribution step (I proposed multiple steps only because humans are bad at calculations - one step is enough if you distribute on md5(color, length, pattern, ...), i.e. a perfect hash of all attributes)).

    Clearly, one cannot go faster than O(N), so we have reached the optimal lower bound.

    Although the outputs are not exactly the same (in one case, just a boolean. In the other case, the pairs of socks), the asymptotic complexities are the same.

    0 讨论(0)
  • 2020-11-27 09:11

    What I do is that I pick up the first sock and put it down (say, on the edge of the laundry bowl). Then I pick up another sock and check to see if it's the same as the first sock. If it is, I remove them both. If it's not, I put it down next to the first sock. Then I pick up the third sock and compare that to the first two (if they're still there). Etc.

    This approach can be fairly easily be implemented in an array, assuming that "removing" socks is an option. Actually, you don't even need to "remove" socks. If you don't need sorting of the socks (see below), then you can just move them around and end up with an array that has all the socks arranged in pairs in the array.

    Assuming that the only operation for socks is to compare for equality, this algorithm is basically still an n2 algorithm, though I don't know about the average case (never learned to calculate that).

    Sorting, of course improves efficiency, especially in real life where you can easily "insert" a sock between two other socks. In computing the same could be achieved by a tree, but that's extra space. And, of course, we're back at NlogN (or a bit more, if there are several socks that are the same by sorting criteria, but not from the same pair).

    Other than that, I cannot think of anything, but this method does seem to be pretty efficient in real life. :)

    0 讨论(0)
  • 2020-11-27 09:11

    If the "move" operation is fairly expensive, and the "compare" operation is cheap, and you need to move the whole set anyway, into a buffer where search is much faster than in original storage... just integrate sorting into the obligatory move.

    I found integrating the process of sorting into hanging to dry makes it a breeze. I need to pick up each sock anyway, and hang it (move) and it costs me about nothing to hang it in a specific place on the strings. Now just not to force search of the whole buffer (the strings) I choose to place socks by color/shade. Darker left, brighter right, more colorful front etc. Now before I hang each sock, I look in its "right vicinity" if a matching one is there already - this limits "scan" to 2-3 other socks - and if it is, I hang the other one right next to it. Then I roll them into pairs while removing from the strings, when dry.

    Now this may not seem all that different from "forming piles by color" suggested by top answers but first, by not picking discrete piles but ranges, I have no problem classifying whether "purple" goes to "red" or "blue" pile; it just goes between. And then by integrating two operations (hang to dry and sort) the overhead of sorting while hanging is like 10% of what separate sorting would be.

    0 讨论(0)
  • 2020-11-27 09:11

    I have taken simple steps to reduce my effort into a process taking O(1) time.

    By reducing my inputs to one of two types of socks (white socks for recreation, black socks for work), I only need to determine which of two socks I have in hand. (Technically, since they are never washed together, I have reduced the process to O(0) time.)

    Some upfront effort is required to find desirable socks, and to purchase in sufficient quantity as to eliminate need for your existing socks. As I'd done this before my need for black socks, my effort was minimal, but mileage may vary.

    Such an upfront effort has been seen many times in very popular and effective code. Examples include #DEFINE'ing pi to several decimals (other examples exist, but that's the one that comes to mind right now).

    0 讨论(0)
提交回复
热议问题