Algorithm Question on File Search Indexing

前端 未结 2 1377
离开以前
离开以前 2021-02-14 04:41

There is one question and I have the solution to it also. But I couldn\'t understand the solution. Kindly help with some set of examples and shower some experience.

2条回答
  •  傲寒
    傲寒 (楼主)
    2021-02-14 05:03

    In round terms, you have about 1/3 of the numbers that could exist in the file, assuming no duplicates.

    The idea is to make two passes through the data. Treat each number as a 32-bit (unsigned) number. In the first pass, keep a track of how many numbers have the same number in the most significant 16 bits. In practice, there will be a number of codes where there are zero (all those for 10-digit SSNs, for example; quite likely, all those with a zero for the first digit are missing too). But of the ranges with a non-zero count, most will not have 65536 entries, which would be how many would appear if there were no gaps in the range. So, with a bit of care, you can choose one of the ranges to concentrate on in the second pass.

    If you're lucky, you can find a range in the 100,000,000..999,999,999 with zero entries - you can choose any number from that range as missing.

    Assuming you aren't quite that lucky, choose one with the lowest number of bits (or any of them with less than 65536 entries); call it the target range. Reset the array to all zeroes. Reread the data. If the number you read is not in your target range, ignore it. If it is in the range, record the number by setting the array value to 1 for the low-order 16-bits of the number. When you've read the whole file, any of the numbers with a zero in the array represents a missing SSN.

提交回复
热议问题