Sorry for the wall of text below.
At my workplace we have quite a few Exchange servers and supporting Domain Controllers, all currently being monitored by SCOM. I've setup certain monitoring elements on top of what SCOM can do in an ASP.net website, backed by a SQL DB.
I am keen to monitor the servers more closely so that we can:
Baseline the servers better for historical comparison
If a problem occurs we can pull out more raw data.
To this end, I want to pull more data from the Perfmon counters. I know that SCOM can pull some Perfmon data, and it does, but our SCOM implementation is quite large, and the guys who manage it don't want me to increase the frequency of the counters to be often enough to be useful at times.
Also querying the SCOM DB for data means I can't really change the indexes, control the cut off times for the data I need, or really control what I might need to.
My question here is really one more of how I should approach this, not how do I actually pull the data and insert it, as I already have scripts that can pull data and insert them into SQL.
Loosely we have server splits as follows:
Exchange 2007 in Domain1.com
Exchange 2010 in Domain2.com
Servers are split logically as follows for both Exchange 2007/2010:
Mailbox Servers
Client Access Servers
Hub Transport Servers
Domain Controllers
I would like to grab counters for the basics from all of the servers above, and then go more in depth on specific server types, for example pulling RPC averaged latency from the Mailbox Servers, number of connections from CAS servers, Messages submitted/sec for HT servers etc.
So what I want to do is create 4 script types, one for each class of server, and then run them from each server on a schedule to record some data, connect to my SQL DB, and then insert the records into tables.
For my SQL tables, should I create one per server class, or just throw all of the records into a single table?
is it a terrible idea to run a local script on each server that throws all of the data at the SQL table? I considered collecting the data remotely but if I just want to capture a bit of data every few minutes, I found wildly varying times caused by network latency or other factors that threw the timing off.
I would also be keen to hear about any potential archival solutions, like how to keep more frequent records for recent days (maybe every 5 minutes for all counters for 5 days), but keep less frequent records going back in time.
Answer
In the end I just ran a remote collection, but for a limited set of data, and set the interval to around once every 5 minutes which seems OK so far. I need to think about data retention, querying efficiency etc, but for the meantime I have answered my own question, no one proposed anything.
No comments:
Post a Comment