I need to store user agent strings in a database for tracking and comparing customer behavior and sales performance between different browsers. A pretty plain user agent str
Your idea of hashing long strings to create a token upon which to lookup within a store (cache, or database) is a good one. I have seen this done for extremely large strings, and within high volume environments, and it works great.
"Which hash would you use for this application?"
"Would you compute the hash in code or let the db handle it?"
"Is there a radically different approach for storing/searching long strings in a database?"
Table recommendations (demonstrative only):
user
user_agent_history
user_id
int(11) unsigned not nullagent_hash
varchar(255) not nullagent
agent_hash
varchar(255) not nullbrowser
varchar(100) not nullagent
text not nullFew notes on schema:
From your OP it sounds like you need a M:M relationship between user and agent, due to the fact that a user may be using Firefox from work, but then may switch to IE9 at home. Hence the need for the pivot table.
The varchar(255) used for agent_hash
is up for debate. MySQL suggests using a varbinary column type for storing hashes, of which there are several types.
I would also suggest either making agent_hash
a primary key, or at the very least, adding a UNIQUE constraint to the column.
Your hash idea is sound. I've actually used hashing to speed up some searches on millions of records. A hash index will be quicker since each entry is the same size. md5 will likely be fine in your case and will probably give you the shortest hash length. If you are worried about hash collisions, you can add include the length of the agent string.