My requirements are:
Our database powers a SaaS app (helpdesk software) where users have over 7k "custom fields". We use a combined approach:
(EntityID, FieldID, Value)
table for searching the dataentities
table, that holds all entity values, used for displaying the data. (this way you don't need a million JOIN's to get the values values).You could further split #1 to have a "table per datatype" like this answer suggests, this way you can even index your UDFs.
P.S. Couple of words to defend the "Entity-Attribute-Value" approach everyone keeps bashing. We have used #1 without #2 for decades and it worked just fine. Sometimes it's a business decision. Do you have time to rewrite your app and redesign the db or you can throw a couple of bucks on cloud-servers, which are really cheap these days? By the way, when we were using #1 approach, our DB was holding millions of entities, accessed by 100s of thousands of users, and a 16GB dual-core db server was doing just fine
I have written about this problem a lot. The most common solution is the Entity-Attribute-Value antipattern, which is similar to what you describe in your option #3. Avoid this design like the plague.
What I use for this solution when I need truly dynamic custom fields is to store them in a blob of XML, so I can add new fields at any time. But to make it speedy, also create additional tables for each field you need to search or sort on (you don't a table per field--just a table per searchable field). This is sometimes called an inverted index design.
You can read an interesting article from 2009 about this solution here: http://backchannel.org/blog/friendfeed-schemaless-mysql
Or you can use a document-oriented database, where it's expected that you have custom fields per document. I'd choose Solr.
This sounds like a problem that might be better solved by a non-relational solution, like MongoDB or CouchDB.
They both allow for dynamic schema expansion while allowing you to maintain the tuple integrity you seek.
I agree with Bill Karwin, the EAV model is not a performant approach for you. Using name-value pairs in a relational system is not intrinsically bad, but only works well when the name-value pair make a complete tuple of information. When using it forces you to dynamically reconstruct a table at run-time, all kinds of things start to get hard. Querying becomes an exercise in pivot maintenance or forces you to push the tuple reconstruction up into the object layer.
You can't determine whether a null or missing value is a valid entry or lack of entry without embedding schema rules in your object layer.
You lose the ability to efficiently manage your schema. Is a 100-character varchar the right type for the "value" field? 200-characters? Should it be nvarchar instead? It can be a hard trade-off and one that ends with you having to place artificial limits on the dynamic nature of your set. Something like "you can only have x user-defined fields and each can only be y characters long.
With a document-oriented solution, like MongoDB or CouchDB, you maintain all attributes associated with a user within a single tuple. Since joins are not an issue, life is happy, as neither of these two does well with joins, despite the hype. Your users can define as many attributes as they want (or you will allow) at lengths that don't get hard to manage until you reach about 4MB.
If you have data that requires ACID-level integrity, you might consider splitting the solution, with the high-integrity data living in your relational database and the dynamic data living in a non-relational store.
I would most probably create a table of the following structure:
The exact types of course depend on your needs (and of course on the dbms you are using). You could also use the NumberValue (decimal) field for int's and booleans. You may need other types as well.
You need some link to the Master records which own the value. It's probably easiest and fastest to create a user fields table for each master table and add a simple foreign key. This way you can filter master records by user fields easily and quickly.
You may want to have some kind of meta data information. So you end up with the following:
Table UdfMetaData
Table MasterUdfValues
Whatever you do, I would not change the table structure dynamically. It is a maintenance nightmare. I would also not use XML structures, they are much too slow.
If you're using SQL Server, don't overlook the sqlvariant type. It's pretty fast and should do your job. Other databases might have something similar.
XML datatypes are not so good for performance reasons. If youre doing calculations on the server then you're constantly having to deserialize these.
Option 1 sounds bad and looks cruddy, but performance-wise can be your best bet. I have created tables with columns named Field00-Field99 before because you just can't beat the performance. You might need to consider your INSERT performance too, in which case this is also the one to go for. You can always create Views on this table if you want it to look neat!
In the comments I saw you saying that the UDF fields are to dump imported data that is not properly mapped by the user.
Perhaps another option is to track the number of UDF's made by each user and force them to reuse fields by saying they can use 6 (or some other equally random limit) custom fields tops.
When you are faced with a database structuring problem like this it is often best to go back to the basic design of the application (import system in your case) and put a few more restraints on it.
Now what I would do is option 4 (EDIT) with the addition of a link to users:
general_data_table
id
...
udfs_linked_table
id
general_data_id
udf_id
udfs_table
id
name
type
owner_id --> Use this to filter for the current user and limit their UDFs
string_link_id --> link table for string fields
int_link_id
type_link_id
Now make sure to make views to optimize performance and get your indexes right. This level of normalization makes the DB footprint smaller, but your application more complex.