MongoDB: Sharding on single machine. Does it make sense?

后端 未结 5 1275
说谎
说谎 2020-12-30 06:56

created a collection in MongoDB consisting of 11446615 documents.

Each document has the following form:

{ 
 \"_id\" : ObjectId(\"4e03dec7c3c365f57482         


        
相关标签:
5条回答
  • 2020-12-30 07:24

    Yes, it does make sense to shard on a single server.

    1. At this time, MongoDB still uses a global lock per mongodb server. Creating multiple servers will release a server from one another's locks.

    2. If you run a multiple core machine with seperate NUMAs, this can also increase performance.

    3. If your load increases too much for your server, initial sharding makes for easier horizontal scaling in the future. You might as well do it now.

    Machines vary. I suggest writing your own bulk insertion benchmark program and spin up a various number of MongoDB server shards. I have a 16 core RAIDed machine and I've found that 3-4 shards seem to be ideal for my heavy write database. I'm finding that my two NUMAs are my bottleneck.

    0 讨论(0)
  • 2020-12-30 07:28

    This doesn't have to be a mongo question, it's a general operating system question. There are three possible bottlenecks for your database use.

    1. network (i.e. you're on a gigabit line, you're using most of it at peak times, but your database isn't really loaded down)
    2. CPU (your CPU is near 100% but disk and network are barely ticking over)
    3. disk

    In the case of network, rewrite your network protocol if possible, otherwise shard to other machines. In the case of CPU, if you're 100% on a few cores but others are free, sharding on the same machine will improve performance. If disk is fully utilized add more disks and shard across them -- way cheaper than adding more machines.

    0 讨论(0)
  • 2020-12-30 07:35

    In modern day(2015) with mongodb v3.0.x there is collection-level locking with mmap, which increases write throughput slightly(assuming your writing to multiple collections), but if you use the wiredtiger engine there is document level locking, which has a much higher write throughput. This removes the need for sharding across a single machine. Though you can technically still increase the performance of mapReduce by sharding across a single machine, but in this case you'd be better off just using the aggregation framework which can exploit multiple cores. If you heavily rely on map reduce algorithms it might make most sense to just use something like Hadoop.

    The only reason for sharding mongodb is to horizontally scale. So in the event that a single machine cannot house enough disk space, memory, or CPU power(rare), then sharding becomes beneficial. I think its really really seldom that someone has enough data that they need to shard, even a large business, especially since wiredtiger added compression support that can reduce disk usage to over 80% less. Its also infrequent that someone uses mongodb to perform really CPU heavy queries at a large scale, because there are much better technologies for this. In most cases IO is the most important factor in performance, not many queries are CPU intensive, unless you're running a lot of complex aggregations, even geo-spatial is indexed upon insertion.

    Most likely reason you'd need to shard is if you have a lot of indexes that consume a large amount of RAM, wiredtiger reduces this, but its still the most common reason to shard. Where as sharding across a single machine is likely just going to cause undesired overhead, with very little or possible no benefits.

    0 讨论(0)
  • 2020-12-30 07:38

    No, it does not make sense to shard a on a single server.

    There are a few exceptional cases but they mostly come down to concurrency issues related to things like running map/reduce or javascript.

    0 讨论(0)
  • 2020-12-30 07:38

    This is answered in the first paragraph of the Replica set tutorial

    http://www.mongodb.org/display/DOCS/Replica+Set+Tutorial

    0 讨论(0)
提交回复
热议问题