|Cheshire Cat Computing
|Desperately Seeking Trending
|Page 1 of 2|
|Author:||cracraft [ Sat Dec 06, 2003 10:47 am ]|
|Post subject:||Desperately Seeking Trending|
Hi -- seems that MRTG/RRDTOOL/ROUTERS.CGI would be a good
base to build a trending system for each graph from. Show N
bars beyond the current bar, for each graph, a predicted trend.
The way I envision this, apart from additional support code to
implement it in the existing graphs and I doubt that solution
would fly, would be one trending graph extra for each regular
graph. In that would be the two normal data points, but the
whole thing would be shifted one day, one week in the future, etc.
If you can't shift the X axis on a per-graph basis into the future,
don't shift and just label the legend as "one period in the future"
to remind the viewer.
Then, insert the two data points normally for the entire period
but based on a mathematical extrapolation of some kind indicating
the likely trend from the last period of the same duration. Best
fit, polynomial, whatever seems decent.
Is there trending already for MRTG/RRDTOOL/ROUTER or would
the above work? It uses the same ideas as already in the tools
and just throws in more graphs and data that is time-shifted which
these can already handle. The only issue would be deriving the
equation that provides a good fit, extending out past the end
of previous data, taking sample points every 5 minutes, using
them instead with MRTG, and voila.
|Author:||cracraft [ Sat Dec 06, 2003 12:13 pm ]|
|Post subject:||trending, reply|
Upon re-reading my note, I realize now it looks absurd. My point
was not to trend by shifting an existing graph to the right on
the X axis, obviously. The point is to use the appropriate math
to plot a best fit, sample beyond the existing curve N data points
5 minutes apart, and fit those in.
The main thing that I see missing in MRTG/RRDTOOL/ROUTERS.CGI
Perhaps it is there or someone has done it and I simply haven't
the knowledge yet.
|Author:||stevesh [ Sun Dec 07, 2003 9:36 am ]|
This is actually incredibly difficult to do. The hardest part is the 'some mathematical function' you mention. Also, the RRDTool graphing libraries just dont have the ability to add this sort of extension line to the graph.
With trending, the main problems are
1) what shape of line to predict? A straight line, first-order quadratic, etc?
2) how far back to look at existing data when predicting?
3) should we weight past data?
4) how far into the future to predict? should we give a range or a line?
5) should we look at duplicating time/day of week patterns?
While it would be easy to predict (eg) disk usage that increases in a probably linear, constant and regular fashion, predicting network activiey on a daily basis would be very hard, as it has a daily and weekly pattern. There is no easy way to work out which sort of prediction method should be used, given a data set. I could write in a linear daily prediction based on the last week's data (for example) but it would be useless for 90% of data sets, and still there would be no way to get it into the RRD graph function.
If anyone has some ideas or working code, then I'd be very interested to see it...
|Author:||cracraft [ Sun Dec 07, 2003 9:54 am ]|
Instead of embedding it in the current graph, create a whole
new graph that represents the trend itself. Then display that
graph on the same physical screen as the original non-trended
data. One graph for the original data. One graph for the trend
without any original data on it (just the trend as if it itself
For example, assume the fetch feature is used to gather
the last 30 days worth of datapoints. Now use best-fit
to calculate a matching formula for the data from fetch. Next,
take this formula and for each time (x axis) output the
y (trend). Take these two and stuff them into an RRD
database. Do this all "really fast" and built up the RRD
and it will be graphed by MRTG and your ROUTERS.CGI.
Then, create a new window using your CGI view that displays
that graph plus the graph it was derived from on the same
Voila! You have limited trending suitable for some things.
Add more complex formulae later as a trend-formula menu
on the side.
Looks like rrdtool update should be able to do the business
of storing the trend-formula-calculated y-axis values from the
x-axis time periods from the original graph.
|Author:||stevesh [ Sun Dec 07, 2003 4:37 pm ]|
It sounds as if what you are basically saying is to create a temporary rrd file so as to be able to use the RRD library. I hadn't thought of that ... its would probably be easier to output the temporary data as XML and then load this into a tmp.rrd file, so as to only create the one required RRA. This would be a handy way of doing it, if we had the data. We could even use some extra datasets so as to be able to plot predicted data in a different colour.
However, we still have the primary problem of how to meaningfully "use best-fit to calculate a matching formula for the data from fetch." This is by no means simple -- is the best fit a curve? A line? A repeating pattern? I could relatively simply generate a best fit line, but this would be useless for most data sets. All we have to go on is a sample set of data, and we want to generate a formula.
|Author:||cracraft [ Wed Dec 10, 2003 6:59 am ]|
|Post subject:||new idea|
Just thought of a new idea...
For each data graph, you have two points A/B, that are graphed.
Assume now that either is completely blank, say B. A is the
real data. B is simply zero and being supplied as zero for
each 5 minute interval.
In such a case, have the CGI curve-fit the non-zero data
of the A/B pair and replot the curve-fit as data point B.
Do this with rrdtool update every 5 minutes for all data
points in the rrd file that includes the B data.
When your cgi script displays the A/B data, the effect will
be real data and a curve-fit, for every plot!
As regards to what type of formula to use, start out with a
simple basic linear fit to get it started and we can worry about
polynomials and other levels later.
The router script can ensure that the B data is zero by
ensuring it is not an "error count", i.e. has the string "err"
anywhere in its cfg description or associates names and then
by scanning the last N 5 minute entries to ensure they are
Conversely, if you don't like that way of identifying it, have
the requirement that the B datapoint always has to be some
magic value, say -1, or -777, or something unlikely to turn
up too often, and then replace all those occurrences with
the value from the curve-fit for the given time quantum.
|Author:||stevesh [ Wed Dec 10, 2003 1:23 pm ]|
Now that there is a workable way to use the RRD graphing libraries to show a predicted line, we still have the (major) problem of how to, given a set of datapoints in an RRA, generate a 'best fit' line or curve. I have a possible way of doing it using decayed standard deviations but it is computationally heavy, and only generates a first-order curve (ie, a straight line) which is not really appropriate in most cases. There is also the question of what decay parameters and how many datapoints to consider.
I will look into this (it is a very interesting challenge ) but if anyone else wants to try it (as a routers.cgi*Extension script?) then please let me know so we can share notes.
|Author:||cracraft [ Wed Dec 10, 2003 1:40 pm ]|
I think that if you can get a reasonable approximation, even something
half-way reasonable, the usefulness of MRTG/RRD/ROUTER goes up exponentially.
For example, half of the commerical competition (TeamQuest, Sysload,
HP Performance Agent, etc.) just fall away since trending is their
major advantage and all management worth their salt need trending
to predict and build budgets and procure the iron.
It would be super to have a strategic view (trending) in addition to
the shorter-range, current, troubleshooting view.
Also, I don't see why a basic linear fit isn't a good one to just start
off with to prove the technology.
Each time the trend-builder builds the trend for any given RRD, it's
going to rebuild it occasionally anyway (say once a day or settable?)
that way it won't be a terrible load on the system. It could be
made separate to keep out of MRTG's way not slowing it down.
I'd suggest starting with just a basic linear fit, keep the trendbuilder
separate from the MRTG processes, institute a locking mechanism to
prevent the two from trampling each other, trend in column B
of any RRD file found to have N recent 5 minute column B entries
all set to some standard number (-1 for instance), etc.
Then work on improved curve-fitting later. If you start with
curve-fitting first, you're starting with by far the hardest issue.
The rest is an afternoon hack session. The curve-fitting algorithm
picking is a much longer process best suited for the occasional
kaizen satori rather than a heavy research approach. Also, once
you announce a basic linear fit version, the mathematicians out
there will be all over you with suggestions for various curve-fitting
|Author:||stevesh [ Wed Dec 10, 2003 10:58 pm ]|
Well, Ive started to put together a prototype for a trending module, that works using the routers.cgi*Extension interface. I think it should make a linear trend based on the yearly data (since a daily trend analysis is pretty useless). Maybe it should also highlight when/if the value reaches MaxBytes or 0?
So far, Ive still to write the code to generate the graph (but I should be able to steal this from routers.cgi), the .cfg file parser (again stealable) and of course, the trending code (although I have a nifty algorythm on paper). Im trying to make it sufficiently modular that it will be easy to plug in different trending functions in the future.
This will be something for me to work on in the office while I man the phones over the christmas break
Although maybe I should concentrate on getting v2.13 out!
|Author:||stevesh [ Fri Dec 12, 2003 11:42 am ]|
I have a very prototype version of this now for trial. Anyone who wants a copy can email me, but it is a long way from general release yet...
|Page 1 of 2||All times are UTC + 12 hours [ DST ]|
|Powered by phpBB® Forum Software © phpBB Group