Most efficient method for persisting complex types with variable schemas in SQL

前端 未结 5 979
旧巷少年郎
旧巷少年郎 2021-01-03 08:21

What I\'m doing

I am creating an SQL table that will provide the back-end storage mechanism for complex-typed objects. I am trying to determine how

相关标签:
5条回答
  • 2021-01-03 08:34

    Interesting question.

    I think you may be asking the wrong question here. Broadly speaking, as long as you have a FULLTEXT index on your text field, queries will be fast. Much faster than varchar if you have to use wild cards, for instance.

    However, if I were you, I'd concentrate on the actual queries you're going to be running. Do you need boolean operators? Wildcards? Numerical comparisons? That's where I think you will encounter the real performance worries.

    I would imagine you would need queries like:

    • "find all addresses in the states of New York, New Jersey and Pennsylvania"
    • "find all addresses between house numbers 1 and 100 on Mulberry Street"
    • "find all addresses where the zipcode is missing, and the city is New York"

    At a high level, the solution you propose is to store your XML somewhere, and then de-normalize that XML into name/value pairs for querying.

    Name/value pairs have a long and proud history, but become unwieldy in complex query situations, because you're not using the built-in optimizations and concepts of the relational database model.

    Some refinements I'd recommend is to look at the domain model, and at least see if you can factor out separate data types into the "value" column; you might end up with "textValue", "moneyValue", "integerValue" and "dateValue". In the example you give, you might factor "address 1" into "housenumber" (as an integer) and "streetname".

    Having said all this - I don't think there's a better solution other than completely changing tack to a document-focused database.

    0 讨论(0)
  • 2021-01-03 08:37

    Somehow what you want sounds like a painful thing to do in SQL. Basically, you should treat the inside of a text field as opaque as when querying an SQL database. Text fields were not made for efficient queries.

    If you just want to store serialized objects in a text field, that is fine. But do not try to build queries that look inside the text field to find objects.

    Your idea sounds like you want to perform some joins, XML parsing, and XPath application to get to a value. This doesn't strike me as the most efficient thing to do.

    So, my advise:

    • Either just store serialized objects in the db, and do nothing more than load them and perform all other operations in memory
    • Or, if you need to query complex data structures, you may really want to look into document stores/databases like CouchDB or MongoDB; you can also check Wikipedia on the subject. There are even databases specifically designed for storing XML, even though I personally don't like them very much.

    Addendum, per your explanations above

    Simply put, don't go over the top with this thing:

    • If you just want to persist C#/.NET objects, just use the XML Serialization already built into the framework, a single table and be done with it.
    • If you, for some reason, need to store complex XML, use a dedicated XML store
    • If you have a fixed database schema, but it is too complex for efficient queries, use a Document Store in memory where you keep a denormalized version of your data for faster queries (or just simplify your database schema)
    • If you don't really need a fixed schema, use just a Document Store, and forget about having any "schema definition" at all

    As for your solution, yes, it could work somehow. As could a plain SQL schema if you set it up right. But for applying an XPath, you'll probably parse the whole XML document each time you access a record, which wouldn't be very efficient to begin with.

    If you want to check out Document databases, there are .NET drivers for CouchDB and MongoDB. The eXist XML database offers a number of Web protocols, and you can probably create a client class easily with VisualStudio's point-and-shoot interface. Or just google for someone who already did.

    0 讨论(0)
  • 2021-01-03 08:40

    How about looking for a solution at the architectural level? I was also breaking my head on complex graphs and performance until I discovered CQRS.

    [start evangelist mode]

    • You can go document-based or relational as storage. Even both! (Event Sourcing)
    • Nice separation of concerns: Read Model vs Write Model
    • Have your cake and eat it too!

    Ok, there is an initial learning / technical curve to get over ;)

    [end evangelist mode]

    As you stated: "I need to be able to create variable schemas on the fly without changing anything about the database access layer." The key benefit is that your read model can be very fast since it's made for reading. If you add Event Sourcing to the mix, you can drop and rebuild your Read Model to whatever schema you want... even "online".

    There are some nice opensource frameworks out there like nServiceBus which saves lots of time and technical challenges. All depends on how far you want to take these concepts what you're willing/can spend time on. You can even start with just basics if you follow Greg Young's approach. See the info in the links below.

    See

    • CQRS Examples and Screencasts
    • CQRS Questions
    • Intro (Also see the video)
    0 讨论(0)
  • 2021-01-03 08:47

    In part, it will depend of your DB Engine. You're using SQL Server, don't you?

    Answering your topics:

    1 - Comparing the value of a text field versus of a varchar field: if you're comparing two db fields, varchar fields are smarter. Nvarchar(max) stores data in unicode with 2*l+2 bytes, where "l" is the lengh. For performance issues, you will need consider how much larger tables will be, for selecting the best way to index (or not) your table fields. See the topic.

    2 - Sometimes nested queries are easily created and executed, also serving as a way to reduce query time. But, depending of the complexity, would be better to use different kind of joins. The best way is try to do in both ways. Execute two or more times each query, for the DB engine "compiles" a query on first executing, then the subsequent are quite faster. Measure the times for different parameters and choose the best option.

    "Sometimes you can rewrite a subquery to use JOIN and achieve better performance. The advantage of creating a JOIN is that you can evaluate tables in a different order from that defined by the query. The advantage of using a subquery is that it is frequently not necessary to scan all rows from the subquery to evaluate the subquery expression. For example, an EXISTS subquery can return TRUE upon seeing the first qualifying row." - link

    3- There's no much information in this question, but if you will get the xml document directly from the table, would be a good idea insted a view. Again, it will depends of the view and the document.

    4- Other issues is about the total records expected for your table; the indexing of the columns, in wich you need to consider sorting, joining, filtering, PK's and FK's. Each situation could demmand different aproaches. My sugestion is to invest some time reading about your database engine and queries functioning and relating to your system.

    I hope I've helped.

    0 讨论(0)
  • 2021-01-03 08:54

    I need to be able to create variable schemas on the fly without changing anything about the database access layer.

    You are re-implementing the RDBMS within an RDBMS. The DB can do this already - that is what the DDL statements like create table and create schema are for....

    I suggest you look into "schemas" and SQL security. There is no reason with the correct security setup you cannot allow your users to create their own tables to store document attributes in, or even generate them automatically.

    Edit: Slightly longer answer, if you don't have full requirements immediately, I would store the data as XML data type, and query them using XPath queries. This will be OK for occasional queries over smallish numbers of rows (fewer than a few thousand, certainly).

    Also, your RDBMS may support indexes over XML, which may be another way of solving your problem. CREATE XML INDEX in SqlServer 2008 for example.

    However for frequent queries, you can use triggers or materialized views to create copies of relevant data in table format, so more intensive reports can be speeded up by querying the breakout tables.

    I don't know your requirements, but if you are responsible for creating the reports/queries yourself, this may be an approach to use. If you need to enable users to create their own reports that's a bigger mountain to climb.

    I guess what i am saying is "are you sure you need to do this and XML can't just do the job".

    0 讨论(0)
提交回复
热议问题