I have to find a design decision for the following task:
I have a SQL Server database and it contains a table of orders. PDF documents will be uploaded by users through
This was asked many times about storing images, but the discussion to those still applies:
I would also create a separate table for the documents, that way the search data/key fields for document retrieval will be more cache'able. The only time your database will need to touch the document table is during an insert or download.
With SQL Server 2008, when you have documents that are mostly 1 MB or more in size, the FILESTREAM feature would be recommended. This is based on a paper published by Microsoft Research called To BLOB or not to BLOB which analyzed the pros and cons of storing blobs in a database in great length - great read!
For documents of less than 256K on average, storing them in a VARBINARY(MAX)
column seems to be the best fit.
Anything in between is a bit of a toss-up, really.
You say you'll have PDF documents mostly around 100K or so -> those will store very nicely into a SQL Server table, no problem. One thing you might want to consider is having a separate table for the documents that is linked to the main facts table. That way, the facts table will be faster in usage, and the documents don't get in the way of your other data.
I am sceptical storing large blobs in SQL, assuming that sql page size is 4k (off the nut).. it has to assemble fragment of the entire file in nK blocks when serving the file back to user .. I am not sure whether this is the case or not.
I would recommend AGAINST storing the files in SQL. You are adding extra overhead when retrieving the files. IIS is really efficient at serving up files, but with SQL are the storage facility you now have introduced a bottle neck, as you now have to hop from your web server to your SQL Server and back to get the file.
When you store your files on the webserver, your process can determine the appropriate file based on the criteria you've listed, point to it and serve it. Document management systems such as Documentum and Alfresco store the files on a share, and this allows you great flexibility with respects to back up and and redundant storage.
We ran in to a similar situation albeit in principle only. We needed a way by which documents stored to SharePoint could be accessed via a link on a web page. Since everything is project based with a unique project number the solution was to implement a common naming convention to the documents. s the web page is created server-side, the links are dynamically created. The code takes the base path to the SharePoint server and then adds the project number and specifics for the document.
Example:
[SharePoint Base Path][Project Numbe][Project Document Name]
[http://mysharepoint.mycompany.com/213990/213990_PC.pdf]