HPE

INTERNAL USE ONLY

Image result for HP new logo

 

 

Analysis Code: 1020

Severity: Error

 

INEX has found evidence of a Stack Trace in a node’s /var/log/messages[.*] file.

 

As you look at the Stack Trace that INEX has called out you will see that in many cases that it is the stack trace from another node. The reason that the Stack Trace from one node appears in another nodes’ messages file is simply to record that Stack Trace elsewhere so the information is not lost, the OS is taking advantage of the fact that the nodes are clustered together.

 

If a Stack Trace does appear you will want to review the information seen on the UpDown tab of this workbook and comapare the time stamps of the Stack Trace to determine if the node in question actually crashed. You will then want  to look out on STaTS for any crash files, crashtxt and/or crashdmp. If the crash related files are found you will then want to proceed as you normally would to process the crash. Keep in mind the Stack Trace information that INEX has found may be very useful if it is not present in the crashtxt or analysis.* file of the crash dump. Refer to the INEX User’s Guide.

 

An example of a Stack Trace seen by INEX:

 

Node 1 panic stack trace:

(tpd_panic+0x10b)

(lckevt_clock_timeout+0x7d)

(run_timer_softirq+0x14c)

(__do_softirq+0xcf)

(call_softirq+0x1c)

(do_softirq+0x6d)

(irq_exit+0x75)

(smp_apic_timer_interrupt+0x45)

(apic_timer_interrupt+0x13)

 

 

As you can see, this Stack Trace is for Node 1.