I am writing a simple tool to check duplicate files(i.e. files having same data). The mechanism is to generate hashes for each file using sha-512 algorithm and then store these hashes in MYSQL database. I store hashes in binary(64) unique not null column. Each row will have a unique binary hash and used to check file is duplicate or not.
-- My questions are --
Can I use indexes on binary column, my default table collation is latin1 - default collation?
Which Indexing mechanism should I use Btree or Hash, for getting high performance? I need to update or add 100 of rows per seconds.
What other things should I take care of to get best performance?
Can I use indexes on binary column, my default table collation is latin1 - default collation?
Yes, you can; collation is only relevant for character datatypes, not binary datatypes (it defines how characters should be ordered)—also, be aware that
latin1
is a character encoding, not a collation.Which Indexing mechanism should I use Btree or Hash, for getting high performance? I need to update or add 100 of rows per seconds.
Note that hash indexes are only available with the
MEMORY
andNDB
storage engines, so you may not even have a choice.In any event, either would typically be able to meet your performance criteria—although for this particular application I see no benefit from using B-Tree (which is ordered), whereas Hash would give better performance. Therefore, if you have the choice, you may as well use Hash.
See Comparison of B-Tree and Hash Indexes for more information.
What other things should I take care of to get best performance?
Depends on your definition of "best performance" and your environment. In general, remember Knuth's maxim "premature optimisation is the root of all evil": that is, only optimise when you know that there will be a problem with the simplest approach.
来源:https://stackoverflow.com/questions/16806986/which-index-should-i-use-on-binary-datatype-column-mysql