DB design and optimization considerations for a social application

前端 未结 5 1324
花落未央
花落未央 2021-02-06 14:43

The usual case. I have a simple app that will allow people to upload photos and follow other people. As a result, every user will have something like a \"wall\" or an \"activity

相关标签:
5条回答
  • 2021-02-06 14:56

    There are many options you can take

    • Add more hardware, Memory, CPU -- Enter cloud hosting
    • Hows 24GB of memory sound? Most of your importantly accessed DB information can fit just in memory.
    • Choose a host with expandable SSDs.
    • Use an events based system in your application to write the "history" of all users. So it will be like so: id, user_id, event_name, date, event_parameters' -- an example would be: 1, 8, CHANGED_PROFILE_PICTURE, 26-03-2011 12:34, <id of picture> and most important of all, this table will be in memory. No longer need to worry about write performance. After the records go past i.e. 3 days they can be purged into another table (in non-memory) and included into the query results, if the user chooses to go back that far. By having all this in one table you remove having to do multiple queries and SELECTs to build up this information.
    • Consider using INNODB for the history/feeds table.

    Good Resources to read

    • Exploring the software behind Facebook, the world’s largest site
    • Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL
    • Caching & Performance: Lessons from Facebook
    0 讨论(0)
  • 2021-02-06 15:14

    These kind of problems are why currently NOSql solutions used these days. What I did in my previos projecs is really simple. I don't keep user->wall user->history which contains purely feed'ids in memory stores(my favorite is redis). so in every insert I do 1 insert operation on database and (n*read optimization) insert operation in memory store. I design memory store to optimize my reads. if I want to filter user history (or wall) for videos I put a push feedid to a list like user::{userid}::wall::videos.

    Well ofcourse you can purely build the system in memstores aswell but its nice to have 2 systems doing what they are doing the best.

    edit : checkout these applications to get an idea:

    http://retwis.antirez.com/

    http://twissandra.com/

    0 讨论(0)
  • 2021-02-06 15:18

    If your application is successful, then it's a good bet that you'll have more reads than writes - I only upload a photo once (write), but each of my friends reads it whenever they refresh their feed. Therefore you should optimize for fast reads, not fast writes, which points in the direction of a denormalized schema.

    The problem here is that the amount of data you create could quickly get out of hand if you have a large number of users. Very large tables are hard on the db to query, so again there's a potential performance issue. (There's also the question of having enough storage, but that's much more easily solved).

    If, as you suggest, you can delete rows after a certain amount of time, then this could be a good solution. You can reduce that amount of time (up to a point) as you grow and run into performance issues.

    Regarding storing serialized objects, it's a good option if these objects are immutable (you won't change them after writing) and you don't need to index them or query on them. Note that if you denormalize your data, it probably means that you have a single table for the activity feed. In that case I see little gain in storing blobs. If you're going the serialized objects way, consider using some NoSQL solution, such as CouchDB - they're better optimized for handling that kind of data, so in principle you should get better performance for the same hardware setup. Note that I'm not suggesting that you move all your data to NoSQL - only for that part where it's a better solution.

    Finally, a word of caution, spoken from experience: building an application that can scale is hard and takes time better spent elsewhere. You should spend your times worrying about how to get millions of users to your app before you worry about how you're going to serve those millions - the first is the more difficult problem. When you get to the point that you're hugely successful, you can re-architect and rebuild your application.

    0 讨论(0)
  • 2021-02-06 15:19

    I would probably start with using a normalized schema so that you can write quickly and compactly. Then use non transactional (no locking) reads to pull the information back out making sure to use a cursor so that you can process the results as they're coming back as opposed to waiting for the entire result set. Since it doesn't sound like the information has any particular critical implications you don't really need to worry about a lock of the concerns that would normally push you away from transactional reads.

    0 讨论(0)
  • 2021-02-06 15:19

    I'm reading more and more about NoSQL solutions and people suggesting them, however no one ever mentions drawbacks of such choice. Most obvious for me is lack of transactions - imagine if you lost a few records every now and then (there are cases reporting this happens often).

    But, what I'm surprised with is that no one mentions MySQL being used as NoSQL - here's a link for some reading.

    In the end, no matter what solution you choose (relational database or NoSQL storage), they scale in similar manner - by sharding data across network (naturally, there are more choices but this is the most obvious one). Since NoSQL does less work (no SQL layer so CPU cycles aren't wasted on interpreting SQL), it's faster, but it can hit the roof too.

    As Elad already pointed out - building an app that's scalable from the get go is a painful process. It's better that you spend time focusing on making it popular and then scale it out.

    0 讨论(0)
提交回复
热议问题