Database with “Open Schema” - Good or Bad Idea?

前端 未结 4 1892
深忆病人
深忆病人 2020-12-24 08:00

The co-founder of Reddit gave a presentation on issues they had while scaling to millions of users. A summary is available here.

What surprised me is point 3:

相关标签:
4条回答
  • 2020-12-24 08:15

    This is a data model known as EAV for entity-attribute-value. It has its uses. A prime example is patient test data which is naturally sparse since there are hundreds of thousands of tests which might be run, but typically only a handful are present for a patient. A table with hundreds of thousands of columns is silly, but a table with EAV makes good sense.

    0 讨论(0)
  • 2020-12-24 08:18

    We worked in a similar problem not long ago, i could say at first it wasn't easy and fun, but after some point that you get used to it, it has it's own benefit, it's like developing another Database withing your tables, in some area it's an overkill task, but when you pass these levels, it provide you with lots of functionality, basically after one point, we didn't create any new table, and we just created dynamic forms for everything's, even for our own programming tasks. as for performance, system didn't get million of rows to be fair comparison, but for daily usage i never noticed any differences. some problems i want to share.

    1. we didn't delete any rows we just hide them and setting a flag, and a nightly (weekly) service clean the physical rows
    2. orphan rows, we basically didn't care about cleaning childs, we just set "IsDeleted" On father, and nightly service would clean every rows that is orphan or not needed anymore.

    3.you should keep your indexes up to date, but you should skip of building them whenever possible (again nightly service keep index up to date)

    1. we prepared our reporting data ahead of time (AOT) which means we were behind of actual data here :))

    we tried hard no to jump in ocean of rows to calc some values by user demand. if we prepared it you can use it, if not then you cant

    at the end, there are so many unique challenges to this approach that you should find a way to solve, problems you never faced before in your routine job, but after all of these you earn more flexibility that you can spend.

    0 讨论(0)
  • 2020-12-24 08:22

    I noted that they did not mention anything about the ease or difficulty in creating reports against that data. When used in a narrow set of circumstances, EAVs can be beneficial. As a central part of most systems it will become a nightmare when you hit reporting. The problem with EAVs is that most of the benefit is at the outset of the project and most of the pain is later in the analysis and reporting especially due to the severe lack of data integrity. "Not having to worry about foreign keys" to me sounds like a nightmare of orphaned rows. Add in the use of surrogate keys for everything and you have a tangled morass which generally ends in a complete rewrite

    0 讨论(0)
  • 2020-12-24 08:23

    Most of the really big web sites end up using some sort of incredibly simple on the database side of things. This has the advantage that it's fast and scalable. It has the disadvantage that all the relationships that you'd get the database to enforce automatically (via triggers and such) you need to enforce yourself in your client code instead. Maintaining consistency is a pain in the neck, and there's almost always at least some chance that your data will be inconsistent, at least for short periods of time.

    For a social networking site, it's a worthwhile compromise. Data that's mostly right most of the time is adequate (e.g., who really cares if the number of up-votes you receive for an item is really 20 milliseconds out of date when it's sent), and keeping costs reasonable while scaling to support a gazillion users matters a lot.

    0 讨论(0)
提交回复
热议问题