YouTube URL algorithm?

后端 未结 11 1842
遇见更好的自我
遇见更好的自我 2020-12-04 09:39

How would you go about generating the unique video URL\'s that YouTube uses?

Example:

  • http://www.youtube.com/watch?v=CvUN8qg9lsk
相关标签:
11条回答
  • 2020-12-04 10:12

    I don't think that the URL v parameter has anything to do with the content (video properties, title, description etc).

    It's a randomly generated string of fixed length and contains a very specific set of characters. No duplicates are allowed.

    0 讨论(0)
  • 2020-12-04 10:13

    You could generate a GUID and have that as the ID for the video. Guids are very unlikely to collide.

    0 讨论(0)
  • 2020-12-04 10:13

    Just pick random values until you have one never seen before.

    Randomly picking and exhausting all values form a set runs in expected time O(nlogn): What is O value for naive random selection from finite set?

    In your case you wouldn't exhaust the set, so you should get constant time picks. Just use a fast data structure to do the duplication lookups.

    0 讨论(0)
  • 2020-12-04 10:17

    There is no need to use a hash. It is probably just a quasi-random 64 bit value passed through base64 or some equivalent.

    By quasi-random, I mean it is just a one-to-one mapping with the counting integers, just shuffled.

    For example, you could take a monotonically increasing database id and multiply it by some prime near 2^64, then base64 the result. If you did not want people to be able to guess, you might choose a more complex mapping or just pick a random number that is not in the database yet.

    Normal base64 would add an equals at the end, but in this case it is implied because the size is known. The character mapping could easily be something besides the standard.

    0 讨论(0)
  • 2020-12-04 10:18

    Using some non-trivial hashing function. The probability of collision is very low, depending on the function, the parameters and the input domain. Keep in mind that cryptographic hashes were specifically designed to have very low collision rates for non-random input (i.e. completely different hashes for two close-but-unequal inputs).

    This post by Jeff Attwood is a nice overview of the topic.

    And here is an online hash calculator you can play with.

    0 讨论(0)
  • 2020-12-04 10:20

    Eli's link to Jeff's article is, in my opinion, irrelevant. URL shortening is not the same thing as presenting an ID to the world. Instead, a nicer way would be to convert your existing integer ID to a different radix.

    An example in PHP:

    $id = 9999;
    //$url_id = base_convert($id, 10, 26+26+10); // PHP doesn't like this
    $url_id = base_convert($id, 10, 26+10); // Works, but only digits + lowercase
    

    Sadly, PHP only supports up to base 36 (digits + alphabet). Base 62 would support alphabet in both upper-case and lower-case.


    People are talking about these other systems:

    • Random number/letters - Why? If you want people to not see the next video (id+1), then just make it private. On a website like youtube, where it actively shows any video it has, why bother with random ids?
    • Hashing an ID - This design concept really stinks. Think about it; so you have an ID guaranteed by your DBM software to be unique, and you hash it (introducing a collision factor)? Give me one reason why to even consider this idea.
    • Using the ID in URL - To be honest, I don't see any problems with this either, though it will grow to be large when in fact you can express the same number with fewer letters (hence my solution).
    • Using Base64 - Base64 expects bytes of data, literally anything from nulls to spaces. Why use this function when your data consists of a number (ie, a mix of 10 different characters, instead of 256)?
    0 讨论(0)
提交回复
热议问题