I am not a database guy, but am trying to clean up another database. So my question is would normalizing the gender table be going too far?
User table:
userid in
I'll remark on another aspect: sorting. Normally, 'M' sorts after 'F'; in a project one time, a database table had a gender field with either of those two values. There was a desire to be able to sort results on the gender (census data) and a further preference to have 'M' appear before 'F'. My solution was to add a separate lookup table, assigning the Male value an ID of 0, and Female an ID of 1. So queries on the main table could easily be sorted on the new genderID field.
Whether or not you choose to normalize your table structure to accomodate gender is going to depend on the requirements of your application and your business requirements.
I would normalize if:
I would not normalize if:
I'm also not a database guy but I do it. It gives me the possibility to assure that only the genders are entered, that are valid (referencial integrity) and I can also use it to populate the selection control.
I can think of applications where I'd use different columns for sex and gender, have three values for sex (male/female/decline to state) and six for gender (male/female/transgendered male/transgendered female/asexual/decline to state). Granted, I live in San Francisco, where there's an level of public discussion of transgender issues that much of the rest of the world is behind the curve on.
The point is: without a compelling reason to think otherwise, I'd assume that any simplifying assumption I made about demographics was limited and parochial. The cost of breaking sex out to its own table is small now and expensive later. I wouldn't avoid the small cost on the basis of an assumption.
Yes. I think that You can use enum in code and bind eventuatly to it.
null - unknow ; 0 - male ; 1 - female;
or you can use bool type to define this
null - unknow; true - male; false - female
Just thought I'd throw an opinion in here. @Ben McCormack has a great answer with a minor caveat: Regarding localization, there are sometimes better ways of handling this than having the values defined directly in your database.
For example, you mention WPF. With .Net you have various localization resources that are much better suited to managing differences in whether to emit "Male" or "Samec" (Czech).
By letting the built in localization features take care of this you don't have to worry about having multiple database records defining the exact same thing.. which could complicate reporting.
That said, I'd suggest that you might want to consider if "gender" is really what you are after. Gender is defined as "a set of characteristics distinguishing between male and female".
On the face of it this sounds like your standard Male/Female options; but it's not. Gender is much more complicated than that as it needs context in order to have meaning. For example, in the context of a relationship a Male (by Sex) could have one of several "genders": Masculine, Feminine or even Neutral. This is regardless of what sex their partner is.
In the context of just an individual, a Male (by Sex) might be Masculine, Feminine, Neutral, Transgender, Intersex or any of a number of other options acceptable to the person filling out the form.
At least one person commented that Gender is necessary in order to determine the honorific used in mailings. I'd suggest that there is no relationship between gender and those honorifics. For example, a Female (by Sex) might want to be addressed as Ms/Miss/Mrs/Dr/Madam/Professor or even Mr if they are in the process of, or have completed, surgery to become "Male". That list is by no means all inclusive and in any event it's much better to allow that person to select how they want to be addressed.
Which leads me to my last item: Before collecting any piece of data you should have a defined reason for having it. My company specializes in data collection through online forms. One of the things we do is look at what our clients are asking for and go field by field to determine if the data is even used anywhere.
More often than not an entity (company/governmental/etc) asks for far more information than they care about. This can have additional consequences in the event the data is lost, stolen or simply viewed by unauthorized individuals. Further there is a burden placed on the person filling out the forms for each field they are asked to complete.
I bring this up because "Gender" is almost never needed for any normal system. Instead, Sex is a better qualifier and even then it has little value. Exempting dating sites and governmental census.