We use graphite to track history of disk utilisation over time. Our alerting system looks at the data from graphite to alert us when the free space falls below a certain number of blocks.
I'd like to get smarter alerts - what I really care about is "how long do I have before I have to do something about the free space?", e.g. if the trend shows that in 7 days I'll run out of disk space then raise a Warning, if it's less than 2 days then raise an Error.
Graphite's standard dashboard interface can be pretty smart with derivatives and Holt Winters Confidence bands but so far I haven't found a way to convert this to actionable metrics. I'm also fine with crunching the numbers in other ways (just extract the raw numbers from graphite and run a script to do that).
One complication is that the graph is not smooth - files get added and removed but the general trend over time is for disk space usage to increase, so perhaps there is a need to look at local minimum's (if looking at the "disk free" metric) and draw a trend between the troughs.
Has anyone done this?
No comments:
Post a Comment