Datatype for URL

前端 未结 2 1161
渐次进展
渐次进展 2021-01-03 16:59

I read that the max length of URL can be 2,000 characters. I have therefore a table with varchar(2000) column type to store URLs. But this column can not be indexing only th

相关标签:
2条回答
  • 2021-01-03 17:57

    Your question leaves a lot to the imagination.

    For one thing we must assume your index's purpose is to serve as a primary key to avoid duplicates. You won't be developing an application that ever says to a user, "sorry, there's a mistake in your 1800-character data entry; it doesn't match, please try again."

    For another thing, we must assume these URLs of yours potentially have lots of CGI parameters (?param=val&param=val&param=val) in them.

    If these assumptions are true, then here's what you can do.

    1. Make your URL column longer, as a varchar, if you need to.

    2. Add a SHA-1 hash column to your table. SHA-1 hashes consist of strings of 40 characters (hexdigits).

    3. Make that column your primary key.

    4. When you put stuff into your table, use the mySQL SHA1 function to compute the hash values.

    5. Use the INSERT ... ON DUPLICATE KEY UPDATE mySQL command to add rows to your database.

    This will let you keep duplicate URLs out of your data base without confusion in a way that scales up nicely.

    http://dev.mysql.com/doc/refman/5.1/en/insert-on-duplicate.html

    0 讨论(0)
  • 2021-01-03 18:00

    How about

    alter table myweb create FULLTEXT INDEX on myweb_idx1(url);

    Although I have to agree with zerkms that a 1000 char index should be more than enough, considering the fact that you are very unlikely to encounter a url longer than that, and even then the 1000 char prefix should do a fine job.

    Regarding your original question: I think it's safe to save URLs in varchars. Where are these urls coming from ? Who's the producer of the data? You can probably enforce limits.

    If you're crawling the web for urls, then you are almost certainly not going to happen upon a 2000 char url , cause the only way I can imagine getting there would be with GET data.

    Hope this rambling makes sense.

    0 讨论(0)
提交回复
热议问题