utf8mb4_unicode_ci vs utf8mb4_bin

后端 未结 2 1914
遇见更好的自我
遇见更好的自我 2021-01-30 07:03

So first let\'s see if I get it right:

A charset is a set of symbols and encodings. A collation is a set of rules for comparing characters in a charset.

相关标签:
2条回答
  • 2021-01-30 07:34

    If for example I want to allow Case-insensitive search using utf8mb4_bin I will have to do things like:

    Keep in mind that if you use LOWER it will ignore indexing

    0 讨论(0)
  • 2021-01-30 07:39

    Did you "get things right"? Yes, Except that I think that French accents are 'correctly' compared in utf8mb4_unicode_520_ci.

    Your two SELECTs will both to a full table scan, thereby be inefficient. The reason is that you are overriding the collation (for #1) or hiding the column in a function (LOWER, for #2) or using a leading wildcard (LIKE %...).

    If you want it to be efficient, declare name to be COLLATION utf8mb4_bin and do simply WHERE name = ....

    Do you think some of these equivalences and orderings are 'incorrect' for French?

    A=a=ª=À=Á=Â=Ã=Ä=Å=à=á=â=ã=ä=å=Ā=ā=Ą=ą  Aa  ae=Æ=æ  az  B=b  C=c=Ç=ç=Ć=ć=Č=č  ch  cz
    D=d=Ð=ð=Ď=ď  dz  E=e=È=É=Ê=Ë=è=é=ê=ë=Ē=ē=Ĕ=ĕ=Ė=ė=Ę=ę=Ě=ě  F=f  fz  ƒ  G=g=Ğ=ğ=Ģ=ģ
    gz  H=h  hz  I=i=Ì=Í=Î=Ï=ì=í=î=ï=Ī=ī=Į=į=İ  ij=ij  iz  ı  J=j  K=k=Ķ=ķ
    L=l=Ĺ=ĺ=Ļ=ļ=Ł=ł  lj=LJ=Lj=lj  ll  lz  M=m  N=n=Ñ=ñ=Ń=ń=Ņ=ņ=Ň=ň  nz
    O=o=º=Ò=Ó=Ô=Õ=Ö=Ø=ò=ó=ô=õ=ö=ø  oe=Œ=œ  oz  P=p  Q=q  R=r=Ř=ř  S=s=Ś=ś=Ş=ş=Š=š  sh
    ss=ß  sz  T=t=Ť=ť  TM=tm=™  tz  U=u=Ù=Ú=Û=Ü=ù=ú=û=ü=Ū=ū=Ů=ů=Ų=ų  ue  uz  V=v  W=w  X=x
    Y=y=Ý=ý=ÿ=Ÿ  yz  Z=z=Ź=ź=Ż=ż=Ž=ž  zh  zz  Þ=þ  µ
    

    More utf8 collations . 8.0 and utf8mb4 collations .

    The "520" (newer) version by not treating Æ, Ð, Ł, and Ø as a separate 'letters', and perhaps other things.

    0 讨论(0)
提交回复
热议问题