We're a small consulting shop that hosts some public facing websites and web applications for clients (apps we've either written or inherited). Our strength lies in coding, not necessarily in server management. However a managed hosting solution is out of our budget (monthly costs would exceed any income we derive from hosting these applications).
Yesterday we experienced a double hard drive failure in one of our servers running RAID5. Rare that something like this happens. Luckily we had backups and simply migrated the affected databases and web applications to other servers. We got really lucky, only one drive 100% failed, the other simply got marked as pending failure, so we were able to live move almost everything [one db had to be restored from backup] and we only had about 5 minutes of downtime per client as we took their database offline and moved it.
However we're now worried that we've grown a bit... organically... and now we're attempting to figure out the best plan for us moving forward.
Our current infrastructure (all bare metal):
- pfSense router [old repurposed hardware]
- 1U DC [no warranty, old hardware]
- 2U web & app server (server 2k8 R2, IIS, MSSql, 24gb ram, dual 4C8T Xeon) -- this had the drive failures -- [warranty good for another year, drives being replaced under the warranty]
- 4U inherited POS server (128gb ram, but 32bit OS only, server 2k3) [no warranty]
- (2) 1U webservers (2k8, IIS, 4C8T Xeon, 4gb ram) in a load balanced cluster (via pfSense) [newish with warranty]
- 1U database server (2k8, MSSQL, 4C8T Xeon, 4gb ram) [new with warranty]
- NAS running unRaid with 3TB storage (used for backups and file serving for webapps to the 2 load balanced web servers)
Our traffic is fairly light, however downtime is pretty much unacceptable. Looking at the CPU monitors throughout the day, we have very, very little CPU usage.
We've been playing with ESXi as a development server host and it's been working reasonably well. Well enough for me to suggest we run something like that in our production environment.
My initial inclination is to build a SAN (following a guide such as this: http://www.smallnetbuilder.com/nas/nas-howto/31485-build-your-own-fibre-channel-san-for-less-than-1000-part-1) to host the VMs. I'd build it in RAID 1+0 to avoid the nasty hard drive failure issue we had yesterday.
We'd run the majority of VMs on the server that currently has failed hard drives as it is the most powerful. Run other VMs on the 1U servers that we've currently got load balanced. P2V the old out of warranty hardware (except pfSense, I prefer it on physical hardware). Continue running the unRaid for backups.
I've got a bunch of questions, but the infrastructure based ones are as such:
- Is this a reasonable solution to mitigate physical hardware issues? I think that the SAN becomes a massive SPOF and the large server (that would be hosting the VMs) is another. I read that the paid versions of vmWare support automatic failover of VMs and that might be something that we look into to alleviate the VM Host failure potential.
- Is there another option that I'm missing? I've considered basically "thin provisioning" the applications where we'd use our cheaper 1U server model and run the db and the app on one box (no VM). In the case of hardware failure, we'd have a smaller segment of our client base affected. This increases our hardware costs, rackspace costs, and system management costs ("sunk" cost in employee time).
- Would Xen be more cost effective if we did need the VM failover solution?
No comments:
Post a Comment