Linux System Hangup을 방지하라.

2012. 11. 26. 11:38

일반적인 Enterprise System에서는 System Hang-up을 방지하기 위한 NMI 스위치를 제공하기는하나
x86_64 시스템의 경우 NMI 신호처리를 운영체제에서 해주는 것이 일반적인다. 특히 리눅스의 경우
커널 파라메터에서 NMI 옵션을 적용함으로 인해서 System Hangup을 미연해 방지해주는 경우도 있다.

- Check Point 1 -

레드햇이나, 수세 리눅스 처럼 엔터프라이즈 환경에 운영되어야 하는 리눅스의 경우 NMI의 옵션이
기본적으로 Enable 되어 있다. 이 사실을 모르는 상태에서 H/W에서 NMI 신호를 받았을때 시스템이
재부팅 되었을 경우 당황해 하는 경우가 있다. 아래의 Virtual File system에서 NMI 신호에 대한 옵션을 확인해 보자.

cat /proc/sys/kernel/nmi_watchdog

위의 경우 처럼 /proc/sys/kernel/nmi_watchdog을 쿼리했을때 옵션 값이 1 로 나올경우 커널옵션과
상관없이 NMI 리눅스 상에서 동작하게 된다.

- Check Point 2 -

NMI 신호는 H/W에서도 옵션 설정이 가능하다, 벤더별로 조금씩 차이는 있을수 있겠지만. 일반적으로는 BIOS단에서 NMI 신호를 보내는 경우도 있다.

NMI Control Disable / Enable

다음은 Linux 상에서 NMI 처리 및 Interrupt 신호와 관련된 설정과 옵션처리는 아래와 같이 진행한다.

==============================

Using the NMI Watchdog to detect hangs

When the NMI watchdog is enabled, the system hardware is programmed to periodically generate an NMI. Each NMI invokes a handler in the Linux kernel to check the count of certain interrupts. If the handler detects that this count has not increased over a certain period of time, it assumes the system is hung. It then invokes the panic routine. If Kdump is enabled, the routine also saves a crash dump.

Determining whether the NMI Watchdog is enabled

To determine whether NMI watchdog is enabled, enter the following command. The included output indicates that there is no NMI count in all processors, thus NMI watchdog is disabled on this system:

# grep NMI /proc/interrupts 
NMI:    0    0    0    0

Enabling the NMI watchdog

To enable the NMI watchdog, add nmi_watchdog=1 or nmi_watchdog=2 to your boot entry.

Note: Not all hardware supports the nmi_watchdog=1 boot parameter. Some hardware supports the nmi_watchdog=2 parameter, and some hardware supports neither parameter.

Edit the /boot/grub/menu.lst file to add the nmi_watchdog=1 parameter or the nmi_watchdog=2parameter to your boot entry. This example shows the /boot/grub/menu.lst file in Red Hat Enterprise Linux:

title Red Hat Enterprise Linux Server (2.6.18-128.el5) 
        root (hd0,0) 
        kernel /vmlinuz-2.6.18-128.el5 ro root=/dev/sda nmi_watchdog=1 
        initrd /initrd-2.6.18-128.el5.img

Reboot the machine. Enter the grep command repeatedly to view the NMI count. The following output shows that the NMI count on each processor increases rapidly. Thus the NMI watchdog is enabled by thenmi_watchdog=1 boot parameter.

# grep NMI /proc/interrupts 
NMI:    2123797    2123681    2123608    2123535 
# grep NMI /proc/interrupts 
NMI:    2124855    2124739    2124666    2124593 
# grep NMI /proc/interrupts 
NMI:    2125981    2125865    2125792    2125719 
# grep NMI /proc/interrupts 
NMI:    2126692    2126576    2126503    2126430 
# grep NMI /proc/interrupts 
NMI:    2127406    2127290    2127217    2127144

The following output shows the NMI count on each processor when the NMI watchdog is enabled by thenmi_watchdog=2 kernel boot option. The NMI counts increase slowly because the count depends on processor utilization, and the system in this example is idle.

Note: There are exceptional cases where a small number of NMI appears in the /proc/interrupts file when NMI watchdog is disabled.

grep NMI /proc/interrupts


NMI:       187       107        293       199
grep NMI /proc/interrupts
NMI:       187       107        293       199
grep NMI /proc/interrupts
NMI:       187       107        293       200

Now your system is ready to generate a crash dump in case it becomes unresponsive, but does not go into a panic state.

'Linux 이야기. > LInux Article.' 카테고리의 다른 글

재부팅없이 SCSI를 인식시키는 방법 (0)	2012.12.14
SCSI 정보를 알아보자 (0)	2012.12.11
패킷 Overrun 으로 인한 Frame loss (0)	2012.11.23
Linux Bonding ARP option (0)	2012.11.06
dmidecode를 이용한 시스템 상태확인 (0)	2012.11.02

Rehoboth.. 이곳에서 부터

Linux System Hangup을 방지하라.

Using the NMI Watchdog to detect hangs

Determining whether the NMI Watchdog is enabled

Enabling the NMI watchdog

'Linux 이야기. > LInux Article.' 카테고리의 다른 글

+ Recent posts

티스토리툴바