I have a table with a primary key that is a varchar(255). Some cases have arisen where 255 characters isn\'t enough. I tried changing the field to a text, but I get the foll
I got this error when adding an index to a table with text type columns. You need to declare the size amount you want to use for each text type.
Put the size amount within the parenthesis ( )
If too many bytes are used you can declare a size in the brackets for varchar to decrease the amount used for indexing. This is even if you declared a size for a type already like varchar(1000). You don't need to create a new table like others have said.
Adding index
alter table test add index index_name(col1(255),col2(255));
Adding unique index
alter table test add unique index_name(col1(255),col2(255));
Add another varChar(255) column (with default as empty string not null) to hold the overflow when 255 chars are not enough, and change this PK to use both columns. This does not sound like a well designed database schema however, and I would recommend getting a data modeler to look at what you have with a view towards refactoring it for more Normalization.
MySQL disallows indexing a full value of BLOB
, TEXT
and long VARCHAR
columns because data they contain can be huge, and implicitly DB index will be big, meaning no benefit from index.
MySQL requires that you define first N characters to be indexed, and the trick is to choose a number N that’s long enough to give good selectivity, but short enough to save space. The prefix should be long enough to make the index nearly as useful as it would be if you’d indexed the whole column.
Before we go further let us define some important terms. Index selectivity is ratio of the total distinct indexed values and total number of rows. Here is one example for test table:
+-----+-----------+
| id | value |
+-----+-----------+
| 1 | abc |
| 2 | abd |
| 3 | adg |
+-----+-----------+
If we index only the first character (N=1), then index table will look like the following table:
+---------------+-----------+
| indexedValue | rows |
+---------------+-----------+
| a | 1,2,3 |
+---------------+-----------+
In this case, index selectivity is equal to IS=1/3 = 0.33.
Let us now see what will happen if we increase number of indexed characters to two (N=2).
+---------------+-----------+
| indexedValue | rows |
+---------------+-----------+
| ab | 1,2 |
| ad | 3 |
+---------------+-----------+
In this scenario IS=2/3=0.66 which means we increased index selectivity, but we have also increased the size of index. Trick is to find the minimal number N which will result to maximal index selectivity.
There are two approaches you can do calculations for your database table. I will make demonstration on the this database dump.
Let's say we want to add column last_name in table employees to the index, and we want to define the smallest number N which will produce the best index selectivity.
First let us identify the most frequent last names:
select count(*) as cnt, last_name
from employees
group by employees.last_name
order by cnt
+-----+-------------+
| cnt | last_name |
+-----+-------------+
| 226 | Baba |
| 223 | Coorg |
| 223 | Gelosh |
| 222 | Farris |
| 222 | Sudbeck |
| 221 | Adachi |
| 220 | Osgood |
| 218 | Neiman |
| 218 | Mandell |
| 218 | Masada |
| 217 | Boudaillier |
| 217 | Wendorf |
| 216 | Pettis |
| 216 | Solares |
| 216 | Mahnke |
+-----+-------------+
15 rows in set (0.64 sec)
As you can see, the last name Baba is the most frequent one. Now we are going to find the most frequently occurring last_name prefixes, beginning with five-letter prefixes.
+-----+--------+
| cnt | prefix |
+-----+--------+
| 794 | Schaa |
| 758 | Mande |
| 711 | Schwa |
| 562 | Angel |
| 561 | Gecse |
| 555 | Delgr |
| 550 | Berna |
| 547 | Peter |
| 543 | Cappe |
| 539 | Stran |
| 534 | Canna |
| 485 | Georg |
| 417 | Neima |
| 398 | Petti |
| 398 | Duclo |
+-----+--------+
15 rows in set (0.55 sec)
There are much more occurrences of every prefix, which means we have to increase number N until the values are almost the same as in the previous example.
Here are results for N=9
select count(*) as cnt, left(last_name,9) as prefix
from employees
group by prefix
order by cnt desc
limit 0,15;
+-----+-----------+
| cnt | prefix |
+-----+-----------+
| 336 | Schwartzb |
| 226 | Baba |
| 223 | Coorg |
| 223 | Gelosh |
| 222 | Sudbeck |
| 222 | Farris |
| 221 | Adachi |
| 220 | Osgood |
| 218 | Mandell |
| 218 | Neiman |
| 218 | Masada |
| 217 | Wendorf |
| 217 | Boudailli |
| 216 | Cummings |
| 216 | Pettis |
+-----+-----------+
Here are results for N=10.
+-----+------------+
| cnt | prefix |
+-----+------------+
| 226 | Baba |
| 223 | Coorg |
| 223 | Gelosh |
| 222 | Sudbeck |
| 222 | Farris |
| 221 | Adachi |
| 220 | Osgood |
| 218 | Mandell |
| 218 | Neiman |
| 218 | Masada |
| 217 | Wendorf |
| 217 | Boudaillie |
| 216 | Cummings |
| 216 | Pettis |
| 216 | Solares |
+-----+------------+
15 rows in set (0.56 sec)
This are very good results. This means that we can make index on column last_name
with indexing only first 10 characters. In table definition column last_name
is defined as VARCHAR(16)
, and this means we have saved 6 bytes (or more if there are UTF8 characters in the last name) per entry. In this table there are 1637 distinct values multiplied by 6 bytes is about 9KB, and imagine how this number would grow if our table contains million of rows.
You can read other ways of calculating number of N in my post Prefixed indexes in MySQL.
Also, if you want to use index in this field, you should use the MyISAM storage engine and the FULLTEXT index type.
You have to change column type to varchar
or integer
for indexing.
Nobody mentioned it so far... with utf8mb4 which is 4-byte and can also store emoticons (we should never more use 3-byte utf8) and we can avoid errors like Incorrect string value: \xF0\x9F\x98\...
we should not use typical VARCHAR(255) but rather VARCHAR(191) because in case utf8mb4 and VARCHAR(255) same part of data are stored off-page and you can not create index for column VARCHAR(255) but for VARCHAR(191) you can. It is because the maximum indexed column size is 767 bytes for ROW_FORMAT=COMPACT or ROW_FORMAT=REDUNDANT.
For newer row formats ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED (which requires newer file format innodb_file_format=Barracuda not older Antelope) maximum indexed column size is 3072. It is available since MySQL >= 5.6.3 when innodb_large_prefix=1 (disabled by default for MySQL <= 5.7.6 and enabled by default for MySQL >= 5.7.7). So in this case we can use VARCHAR(768) for utf8mb4 (or VARCHAR(1024) for old utf8) for indexed column. Option innodb_large_prefix is deprecated since 5.7.7 because its behavior is built-in MySQL 8 (in this version is option removed).