Tuesday, March 10, 2009

Using EEM 3.0 to get the top 3 processes

As you may already know, IOS > 12.3(4)T provides an easy way to get notified through syslog and snmp, when the cpu load of your router exceeds a specified threshold for a specific duration.

This is accomplished with the following global configuration commands :


process cpu threshold type total rising 40 interval 30
snmp-server enable traps cpu threshold

The output produced, when the threshold is exceeded, looks like the following :

%SYS-1-CPURISINGTHRESHOLD: Threshold: Total CPU Utilization(Total/Intr): 42%/19%, Top 3 processes(Pid/Util): 89/38%, 138/2%, 28/0%

%SYS-1-CPUFALLINGTHRESHOLD: Threshold: Total CPU Utilization(Total/Intr) 3%/19%.

It sure looks quite informative, since the top 3 process IDs are included with details about their cpu usage. You can make it even more informative by using EEM 3.0, so you can have the processes names too (plus other details). Here is a sample applet that uses regexp to get the 3 PIDs and then passes them to the "sh proc cpu" command using another regexp :

event manager applet SHOW-TOP3-PROCS-APPLET
event syslog pattern "^.*CPURISINGTHRESHOLD.*Top 3 processes.*"
action 1.1 regexp "^.*CPURISINGTHRESHOLD.*Top 3 processes.*:[ ]+([0-9]+)/[0-9]+%, ([0-9]+)/[0-9]+%, ([0-9]+)/[0-9]+%$" "$_syslog_msg" _match _sub1 _sub2 _sub3
action 2.1 if $_regexp_result eq 1
action 3.1 cli command "sh proc cpu sort | inc (CPU utilization|Runtime|^[ ]*($_sub1|$_sub2|$_sub3)_)"
action 3.3 syslog msg "$_cli_result"
action 4.1 else
action 5.1 syslog msg "No Match. Please re-check your regexp code."
action 6.1 end

I have colored the 3 variables (=PIDs) so you can easily see their usage in each regexp.

When an applet is triggered by an event (like the syslog in our example), you cannot run it manually :

R3#event manager run SHOW-TOP3-PROCS-APPLET
EEM policy SHOW-TOP3-PROCS-APPLET not registered with event none Event Detector

You have to use "event none", if you want to do so.

But, when the trigger event is a syslog message, you can create the appropriate message manually by using the "send log" command. That way you can test your applet without waiting for the syslog message to appear.

R3#send log %SYS-1-CPURISINGTHRESHOLD: Threshold: Total CPU Utilization(Total/Intr): 26%/40%, Top 3 processes(Pid/Util): 89/21%, 138/3%, 43/0%

R3#
%SYS-2-LOGMSG: Message from 0(): %SYS-1-CPURISINGTHRESHOLD: Threshold: Total CPU Utilization(Total/Intr): 26%/40%, Top 3 processes(Pid/Util): 89/21%, 138/3%, 43/0%
R3#
%HA_EM-6-LOG: SHOW-TOP3-PROCS-APPLET: CPU utilization for five seconds: 8%/0%; one minute: 5%; five minutes: 6%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
138 126668 1493020 84 5.19% 3.35% 2.86% 0 HQF Shaper Backg
89 122052 8421 14493 0.95% 0.60% 1.80% 0 Exec
43 20368 6154 3309 0.15% 0.18% 0.28% 0 Per-Second Jobs
R3>
R3#

Of course, the cpu load numbers between your message and the cpu load output do not agree in such a case, because the current cpu load is different from the one you sent through your syslog message.

So, let's create a little bit of real load, in order to make the applet run automatically too:

R3# show tech
R3#
%SYS-1-CPURISINGTHRESHOLD: Threshold: Total CPU Utilization(Total/Intr): 42%/19%, Top 3 processes(Pid/Util): 89/38%, 138/2%, 28/0%
%HA_EM-6-LOG: SHOW-TOP3-PROCS-APPLET: CPU utilization for five seconds: 42%/19%; one minute: 12%; five minutes: 7%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
89 127716 8452 15110 38.00% 8.16% 2.59% 0 Exec
138 133132 1555879 85 2.95% 2.58% 3.02% 0 HQF Shaper Backg
28 8344 1922 4341 0.23% 0.10% 0.11% 0 HC Counter Timer
R3>
R3#

As you can see, PIDs 89 (Exec), 138 (HQF Shaper Backg), 28 (HC Counter Timer) are the top 3 in this case.

Please keep in mind that EEM 3.0 requires latest 12.4(22)T.

2 comments:

  1. great post... here's some food for thought

    If you want to be able to __manually trigger__ the EEM policy to grab the top 3 process PIDs, CPU and Process name, AND also do the same to __automatically trigger__ when you see a Syslog message, you can do that...

    You may need to (probably) write the policy in Tcl. Maybe not.. I generally use Tcl straight away due to limitations in applets in EEM 2.4 and earlier.

    What you would do is:

    - write one policy that uses Syslog ED for appropriate message that publishes an Application event

    - write another policy that uses None ED that publishes same Application event

    - write a 3rd policy that does the work you want done when it sees the Application event published.

    Cisco calls this multiple trigger, single action I believe. It is not very well documented, but there is a bit of information out there on it, if you look hard enough. I used a similar strategy for a few EEM Tcl scripts I wrote to automate the 12.4(22)T packet capture buffer export feature, if you'd like to look at them for an example usage.

    See http://www.cisco.com/go/ciscobeyond/

    I wrote the Packet Capture EEM policy (under Diagnostics). I don't have anything on this on by blog, dedicated to EEM and related scripting for Cisco IOS/IOS XE/NXOS , as I just started the blog, but I think I'll write this up in the near future.

    Regards
    Sam Crooks
    http://www.eemhints.info

    ReplyDelete
  2. Here's a somewhat more detailed description of how "action regexp" works:

    http://wiki.nil.com/Regular_expressions_in_Embedded_Event_Manager_applets

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Greece License.