Briefly: if I have 5 Tb of data and want to deploy this on 5 cassandra servers - does each machine need to have 5 Tb of disk space for data (not counting log space)? From the docs it sounds like at times cassandra will need 2x the data size - so 10Tb / server or 10Tb total in the array?
How much RAM should each machine have? Assume that the 5Tb is all in the same column space. I had been planning to max out the RAM on each machine but I'm not sure that's enough. Do I need an array of servers with a total of 5Tb of RAM?
Answer
If you spread evenly your 5 TB of data on you 5 servers, each server will host 1 TB of data. Because of compaction needs, each server will need 2 TB of disk space (in the worst case, a compaction needs twice as much space on disk as you have data), which means 10 TB total in your cluster.
The case above is where you only store a single replica of your data among the cluster. In this case, if a server fail, one fifth of your data will be unreachable. If you want to store 2 replica of your data in the cluster, each node will need 4 TB of disk space, which means 20 TB total in your cluster.
No comments:
Post a Comment