How long should SQL email fields be? [duplicate]

余生长醉 提交于 2019-11-28 16:09:06

The theoretical limit is really long but do you really need worry about these long Email addresses? If someone can't login with a 100-char Email, do you really care? We actually prefer they can't.

Some statistical data may shed some light on the issue. We analyzed a database with over 10 million Email addresses. These addresses are not confirmed so there are invalid ones. Here are some interesting facts,

  1. The longest valid one is 89.
  2. There are hundreds longer ones up to the limit of our column (255) but they are apparently fake by visual inspection.
  3. The peak of the length distribution is at 19.
  4. There isn't long tail. Everything falls off sharply after 38.

We cleaned up the DB by throwing away anything longer than 40. The good news is that no one has complained but the bad news is not many records got cleaned out.

I've in the past just done 255 because that's the so-ingrained standard of short but not too short input. That, and I'm a creature of habit.

However, since the max is 319, I'd do nvarchar(320) on the column. Gotta remember the @!

nvarchar won't use the space that you don't need, so if you only have a 20 character email address, it will only take up 20 bytes. This is in contrast to a nchar which will always take up its maximum (it right-pads the value with spaces).

I'd also use nvarchar in lieu of varchar since it's Unicode. Given the volatility of email addresses, this is definitely the way to go.

The following email address is only 94 characters:

i.have.a.really.long.name.like.seetharam.krishnapillai@AReallyLongCompanyNameOfSomeKind.com.au

  • Would an organisation actually give you an email that long?
  • If they were stupid enough to, would you actually use an email address like that?
  • Would anyone? Of course not. Too long to type and too hard to remember.

Even a 92-year-old technophobe would figure out how to sign up for a nice short gmail address, and just use that, rather than type this into your registration page.

Disk space probably isn't an issue, but there are at least two problems with allowing user input fields to be many times longer than they need to be:

  • Displaying them could mess up your UI (at best they will be cut off, at worst they push your containers and margins around)
  • Malicious users can do things with them you can't anticipate (like those cases where hackers used a free online API to store a bunch of data)

I like 50 chars:

123456789.123456789.123456789@1234567890123456.com

If one user in a million has to use their other email address to use my app, so be it.

(Statistics show that no-one actually enters more than about 40 chars for email address, see e.g.: ZZ Coder's answer https://stackoverflow.com/a/1297352/87861)

If you're really being pendantic about it, make a username varchar(60), domain varchar(255). Then you can do ridiculous statistics on domain usage that is slightly faster than doing it as a single field. If you're feeling really gun-ho about optimization, that will also make your SMTP server able to send out emails with fewer connections / better batching.

RFC 5321 (the current SMTP spec, obsoletes RFC2821) states:

4.5.3.1.1. Local-part

The maximum total length of a user name or other local-part is 64
octets.

4.5.3.1.2. Domain

The maximum total length of a domain name or number is 255 octets.

This pertains to just localpart@domain, for a total of 320 ASCII (7-bit) characters.

If you plan to normalize your data, perhaps by splitting the localpart and domain into separate fields, additional things to keep in mind:

  • A technique known as VERP may result in full-length localparts for automatically generated mail (may not be relevant to your use case)
  • domains are case insensitive; recommend lowercasing the domain portion
  • localparts are case sensitive; user@domain.com and USER@domain.com are technically different addresses per the specs, although the policy at the domain.com may be to treat the two addresses as equivalent. It's best to restrict localpart case folding to domains that are known to do this.
SasQ

According to this text, based on the proper RFC documents, it's not 320 but 254: http://www.eph.co.uk/resources/email-address-length-faq/

Edit: Using WayBack Machine: https://web.archive.org/web/20120222213813/http://www.eph.co.uk/resources/email-address-length-faq/

What is the maximum length of an email address?

254 characters

There appears to be some confusion over the maximum valid email address size. Most people believe it to be 320 characters (64 characters for the username + 255 characters for the domain + 1 character for the @ symbol). Other sources suggest 129 (64 + 1 + 64) or 384 (128+1+255, assuming the username doubles in length in the future).

This confusion means you should heed the 'robustness principle' ("developers should carefully write software that adheres closely to extant RFCs but accept and parse input from peers that might not be consistent with those RFCs." - Wikipedia) when writing software that deals with email addresses. Furthermore, some software may be crippled by naive assumptions, e.g. thinking that 50 characters is adequate (examples). Your 200 character email address may be technically valid but that will not help you if most websites or applications reject it.

The actual maximum email length is currently 254 characters:

"The original version of RFC 3696 did indeed say 320 was the maximum length, but John Klensin (ICANN) subsequently accepted this was wrong."

"This arises from the simple arithmetic of maximum length of a domain (255 characters) + maximum length of a mailbox (64 characters) + the @ symbol = 320 characters. Wrong. This canard is actually documented in the original version of RFC3696. It was corrected in the errata. There's actually a restriction from RFC5321 on the path element of an SMTP transaction of 256 characters. But this includes angled brackets around the email address, so the maximum length of an email address is 254 characters." - Dominic Sayers

I use varchar(64) i do not think anyone could have longer email

For email, regardless of the spec, I virtually always go with 512 (nvarchar). Names and surnames are similar.

Really, you need to look at how much you care about having a little extra data. For me, mostly, it's not a worry, so I'll err on the conservative side. But if you've decided, through logically and accurate means, that you'll need to conserve space, then do so. But in general, be conservative with field sizes, and life shall be good.

Note that probably not all email clients support the RFC, so regardless of what it says, you may encounter different things in the wild.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!