Cheshire Cat Computing

Software support and information
It is currently Mon May 20, 2013 6:00 am

All times are UTC + 12 hours [ DST ]




Post new topic Reply to topic  [ 19 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Wed Jul 27, 2011 12:26 pm 
Offline
User

Joined: Wed Jul 27, 2011 12:19 pm
Posts: 2
Hello, I'm using check_vmware.pl 1.13 and am still getting the CPU utilization calculation error as mentioned above.

[root@lib-systems plugins]# ./check_vmware.pl --config=XXXXX --report=cpu --debug
Starting.
Connecting
Connected
Server Time : 2011-07-26T23:21:23.342901Z
Report type requested is [cpu]
Base is ha-folder-root
Retrieving PerfMgr data
Selected interval is: 0
Retrieving list of hosts...
Making new propertyspec
Making new filterspec from
Retrieving vim_service
Retrieving properties from VimService
Checking faults on SoapResponse
Processing entities:
ha-host
Creating query for MORef ha-host
Start time: 2011-07-26T23:16:00Z
End time : 2011-07-26T23:21:00Z
Retrieving data...
Perfstats retrieved...
Results for value ha-host
Perfdata object: 2011-07-26T23:21:20Z
Disconnecting...
Exiting with status (2)
CRIT: CPU usage at 7825528.47% (need more CPU allocation?)|cpu=7825528.47%;80;90;0;100

(I realize that it's an error in the API)

And oddly, I'm getting a similar error for 'net'.

[root@lib-systems plugins]# ./check_vmware.pl --config=XXXXX --report=net --debug
Starting.
Connecting
Connected
Server Time : 2011-07-26T23:24:11.544703Z
Report type requested is [net]
Base is ha-folder-root
Running network report
Retrieving PerfMgr data
Selected interval is: 0
Retrieving list of hosts...
Making new propertyspec
Making new filterspec from
Retrieving vim_service
Retrieving properties from VimService
Checking faults on SoapResponse
Processing entities:
ha-host
Creating query for MORef ha-host
Start time: 2011-07-26T23:19:00Z
End time : 2011-07-26T23:24:00Z
Retrieving data...
Perfstats retrieved...
Results for value ha-host
Perfdata object: 2011-07-26T23:24:00Z
Disconnecting...
Exiting with status (2)
CRIT: Network usage at 242144808Kbps (need more resource shares or physical interfaces?)|net=242144808;7680;10240;0;


The other parameters (memory, disk) produce meaningful output and jibe with my understanding.


Top
 Profile  
 
PostPosted: Wed Jul 27, 2011 12:42 pm 
Offline
Site Admin

Joined: Tue Jul 29, 2003 11:42 am
Posts: 2921
Location: Auckland, New Zealand
The CPU error has been tracked down to definitely being an error in the API implementation on the VirtualCentre, so I'm unable to fix it. The API is returing a value in MHz (actually the value of the cpu:usagemhz counter) for the counter value cpu:usage where it is supposed to be returning a percentage. Debug output from the plugin has confirmed this as the problem. It only seems to affect certain versions of the VirtualCentre software, though, and I can't confirm which though it doesn't affect the one we're running here (making testing more difficult).

I suspect the Network problem you report (which I've not had reported before) is related to the same issue.

_________________
Steve Shipway
UNIX Systems, ITSS, University of Auckland, NZ
Woe unto them that rise up early in the morning... -- Isaiah 5:11


Top
 Profile  
 
PostPosted: Wed Jul 27, 2011 3:01 pm 
Offline
User

Joined: Wed Jul 27, 2011 12:19 pm
Posts: 2
Thanks. I guess then you could convert it to a percentage if you knew the theoretical maximum.

At a guess, I wonder if I could use the API to scrape the total number and speed of the CPUs, and sum them up to create a "maximum possible cycles per second" number, for which the CPU tally returned by check_vmware is a percentage (since the unit is probably Hz).

CPU is probably a naive metric anyway, do you know offhand if the API provides system load? That's probably more reliable.

More work than I'm really authorized to do, since I'm only supposed to be evaluating Nagios and my manager is pushing for Hitachi Network Analyzer. Maybe I can do it in my off hours if I get approval.


Top
 Profile  
 
PostPosted: Wed Jul 27, 2011 3:20 pm 
Offline
Site Admin

Joined: Tue Jul 29, 2003 11:42 am
Posts: 2921
Location: Auckland, New Zealand
The CPU as reported from the API is the CPU usage by the guest; this should be a percentage of allocated resources.

Load is not relevant to VMWare, it is a function of the underlying OS so you need to get it there. The Load average is the number of processes in the guest OS scheduled to run at an average time; this is not equivalent to CPU usage since processes can be in run state but not heavy users of CPU.

It is worth noting that CPU usage by guest is not equivalent to CPU usage as reported by the guest OS itself; and the guest OS will LIE about CPU and Memory usage as it gets tricked by the virtualisation layer.

For a guest, the API reports the CPU usage (how many MHz are used), the Ready time (how many cycles it wants to run, but cannot due to VM resources being unavailable), the Wait time (how many MHz it doesnt want and so are used elsewhere). What matters to you is the Ready Time (if >10% you have serious resource problems) and the Usage (if too high then consider adding a vCPU to the guest). Note that Wait time also includes time waiting for virtualised IO.

_________________
Steve Shipway
UNIX Systems, ITSS, University of Auckland, NZ
Woe unto them that rise up early in the morning... -- Isaiah 5:11


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 19 posts ]  Go to page Previous  1, 2

All times are UTC + 12 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group