In our new project we have to provide a search functionality to retrieve data from hundreds of xml files. I have a brief of our current plan below, I would like to know your sug
Why dont you store the searchable data in a database table with key to the actual file? So your search would be on database table rather than xml file. I suppose this would be faster because you may index the table for faster searching.
Index your XML files. Look into http://incubator.apache.org/lucene.net/
I recently used it at my previous job to cache our SQL database for fast searching and very little overhead.
It provides fast searching of content inside xml files (all depending on how you organize your cache).
Very easy and straight forward to use.
Much easier than trying to loop through a bunch of files.
Hmm, sounds like your building a database over the top of Xml, for performance I'd be reading those files into the DB of your choice, and let it handle indexing and searching for you. If that's not an option get really with XPath, or roll your own exhaustive search using XmlReader.
Xml is not the answer to every problem, however clean it appears to be, performance will suck.
First: how big are the xml files? XmlDocument
doesn't scale to "huge"... but can handle "large" OK.
Second: can you perhaps put the data into a regular database structure (perhaps SQL Server Express Edition), index it, and access via regular TSQL? That will usually out-perform an xpath search. Equally, if it is structured, SQL Server 2005 and above supports the xml
data-type, which shreds data - this allows you to index and query xml data in the database without having the entire DOM in memory (it translates xpath into relational queries).
If you can store then data in a SQL Server database then you could make use of SQL Servers in built XPath query functionality.