Monday, May 25, 2015

Cassandra - hardware planning



Briefly: if I have 5 Tb of data and want to deploy this on 5 cassandra servers - does each machine need to have 5 Tb of disk space for data (not counting log space)? From the docs it sounds like at times cassandra will need 2x the data size - so 10Tb / server or 10Tb total in the array?



How much RAM should each machine have? Assume that the 5Tb is all in the same column space. I had been planning to max out the RAM on each machine but I'm not sure that's enough. Do I need an array of servers with a total of 5Tb of RAM?



Answer



If you spread evenly your 5 TB of data on you 5 servers, each server will host 1 TB of data. Because of compaction needs, each server will need 2 TB of disk space (in the worst case, a compaction needs twice as much space on disk as you have data), which means 10 TB total in your cluster.



The case above is where you only store a single replica of your data among the cluster. In this case, if a server fail, one fifth of your data will be unreachable. If you want to store 2 replica of your data in the cluster, each node will need 4 TB of disk space, which means 20 TB total in your cluster.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...