SQL performance searching for long strings

前端 未结 2 992
小鲜肉
小鲜肉 2021-01-18 20:49

I need to store user agent strings in a database for tracking and comparing customer behavior and sales performance between different browsers. A pretty plain user agent str

相关标签:
2条回答
  • 2021-01-18 21:07

    Your idea of hashing long strings to create a token upon which to lookup within a store (cache, or database) is a good one. I have seen this done for extremely large strings, and within high volume environments, and it works great.

    "Which hash would you use for this application?"

    • I don't think the encryption (hashing) algorithm really matters, as you are not hashing to encrypt data, you are hashing to create a token upon which to use as a key to look up longer values. So the choice of hashing algorithm should be based off of speed.

    "Would you compute the hash in code or let the db handle it?"

    • If it were my project, I would do the hashing at the app layer and then pass it through to look up within the store (cache, then database).

    "Is there a radically different approach for storing/searching long strings in a database?"

    • As I mentioned, I think for your specific purpose, your proposed solution is a good one.

    Table recommendations (demonstrative only):

    user

    • id int(11) unsigned not null
    • name_first varchar(100) not null

    user_agent_history

    • user_id int(11) unsigned not null
    • agent_hash varchar(255) not null

    agent

    • agent_hash varchar(255) not null
    • browser varchar(100) not null
    • agent text not null

    Few notes on schema:

    • From your OP it sounds like you need a M:M relationship between user and agent, due to the fact that a user may be using Firefox from work, but then may switch to IE9 at home. Hence the need for the pivot table.

    • The varchar(255) used for agent_hash is up for debate. MySQL suggests using a varbinary column type for storing hashes, of which there are several types.

    • I would also suggest either making agent_hash a primary key, or at the very least, adding a UNIQUE constraint to the column.

    0 讨论(0)
  • 2021-01-18 21:09

    Your hash idea is sound. I've actually used hashing to speed up some searches on millions of records. A hash index will be quicker since each entry is the same size. md5 will likely be fine in your case and will probably give you the shortest hash length. If you are worried about hash collisions, you can add include the length of the agent string.

    0 讨论(0)
提交回复
热议问题