This is very system dependent, but chances are near certain we'll scale past some arbitrary cliff and get into Real Trouble. I'm curious what kind of rules-of-thumb exist for a good RAM to Disk-space ratio. We're planning our next round of systems, and need to make some choices regarding RAM, SSDs, and how much of each the new nodes will get.
But now for some performance details!
During normal workflow of a single project-run, MongoDB is hit with a very high percentage of writes (70-80%). Once the second stage of the processing pipeline hits, it's extremely high read as it needs to deduplicate records identified in the first half of processing. This is the workflow for which "keep your working set in RAM" is made for, and we're designing around that assumption.
The entire dataset is continually hit with random queries from end-user derived sources; though the frequency is irregular, the size is usually pretty small (groups of 10 documents). Since this is user-facing, the replies need to be under the "bored-now" threshold of 3 seconds. This access pattern is much less likely to be in cache, so will be very likely to incur disk hits.
A secondary processing workflow is high read of previous processing runs that may be days, weeks, or even months old, and is run infrequently but still needs to be zippy. Up to 100% of the documents in the previous processing run will be accessed. No amount of cache-warming can help with this, I suspect.
Finished document sizes vary widely, but the median size is about 8K.
The high-read portion of the normal project processing strongly suggests the use of Replicas to help distribute the Read traffic. I have read elsewhere that a 1:10 RAM-GB to HD-GB is a good rule-of-thumb for slow disks, As we are seriously considering using much faster SSDs, I'd like to know if there is a similar rule of thumb for fast disks.
I know we're using Mongo in a way where cache-everything really isn't going to fly, which is why I'm looking at ways to engineer a system that can survive such usage. The entire dataset will likely be most of a TB within half a year and keep growing.
Answer
This is going to be a bunch of small points. There is sadly no single answer to your question, however.
MongoDB allows the OS kernel to handle memory-management. Aside from throwing as much RAM as possible at the problem, there are only a few things that can be done to 'actively manage' your Working Set.
The one thing that you can do to optimize writes is to first query for that record (do a read), so that it's in working memory. This will avoid the performance problems associated with the process-wide Global Lock (which is supposed to become per-db in v2.2)
There is no hard-and-fast rule for RAM vs SSD ratio, but I think that the raw IOPS of SSDs should allow you to go with a much lower ratio. Off the top of my head, 1:3 is probably the lowest you want to go with. But given the higher costs and lower capacities, you are likely going to need to keep that ratio down anyway.
Regarding 'Write vs Reading phases', am I reading correctly that once a record is written, it is seldom updated ("upserted")? If that is the case, it may be worthwhile to host two clusters; the normal write cluster, and read-optimized cluster for "aged" data that hasn't been modified in [X time period]. I would definitely enable slave-read on this cluster. (Personally, I'd manage that by including a date-modified value in your db's object documents.)
If you have the ability to load-test before going into Prod, perf monitor the hell out of it. MongoDB was written with the assumption that it would be often be deployed in VMs (their reference systems are in EC2), so don't be afraid to shard out to VMs.
No comments:
Post a Comment