| 1 | Introduction |
| 1.1 | Prerequisites |
| 1.2 | Related Information |
| 1.3 | Revision Information |
2 | Alarm Description |
3 | Procedure |
Reference List | |
1 Introduction
This document is the Operating Instruction (OPI) for the alarm LOTC Time Synchronization.
Scope
This document covers the following topics:
- Alarm description
- Alarm handling procedure
Target Groups
This document is intended for personnel involved in alarm handling.
1.1 Prerequisites
This section describes the possible documents, tools, and conditions needed before performing steps to cease the alarm.
1.1.1 Documents
Not applicable.
1.1.2 Tools
Not applicable.
1.1.3 Conditions
Not applicable.
1.2 Related Information
The definition and explanation of acronyms and terminology, information about trademarks used, and typographic conventions can be found in the following documents:
- LDE Glossary of Terms and AcronymsLDE Glossary of Terms and Acronyms, Reference [1]
- LDE Trademark InformationLDE Trademark Information, Reference [2]
- Typographic ConventionsTypographic Conventions, Reference [3]
1.3 Revision Information
Other than editorial changes, this document has been revised from revision C to D according to the following:
- Update section 2 to include additional alarm condition details.
2 Alarm Description
The alarm is issued when the Network Time Protocol (NTP) server(s) cannot be contacted or if the local time is off by more than the threshold value of 10 seconds.
The following is as list of the alarm attributes:
- Note:
- This view of the alarm attributes will be presented to the user from Common Operation and Maintenance (COM), only when the LDE adaptations for Component Based Architecture (CBA) have been installed and the LDE alarm model has been registered to COM.
|
Attribute Name |
Attribute Value/Interpretation |
|---|---|
|
Major Type |
193 |
|
Minor Type |
3341942785 |
|
Managed Object Class |
SafNode |
|
Specific Problem |
LOTC Time Synchronization |
|
Event Type |
6(1) |
|
Additional Information |
Not applicable. |
|
Perceived Severity |
Critical: There are time differences between the blades in the cluster that exceeds the threshold value Major: The configured ntp servers are not accepted by ntpd (unusable), no ntp server is reachable (unreachable) or a peer cannot be selected (rejected) Minor: Some of the configured ntp servers are unreachable |
(1) Environmental
1. The possible cause for the Critical severity alarm is:
- The time difference between the local system time and
the remote time server is greater than 10 seconds.
However, if the time difference is too large (more than 1000 seconds), a major alarm will be raised instead of critical, as the ntpd service will refuse to select the remote time server with such a high time offset. Additionally, the node alarm may switch from Critical, to Major, then back to Critical again as the ntpd service attempts to re-select a valid remote server to synchronize time against.
The additional text for this alarm is:"Time incorrect (off by X.X seconds)"
2. The possible causes for the Major severity alarm are:
- 2.1. Unusable: The ntp servers provided
in cluster.conf cannot be used by the local ntp daemon, ntpd. The
additional text for this case is:
"Time servers not reachable: 1.2.3.4 (unusable)"
- 2.2. Rejected: None of the configured
ntp servers can be selected as a current time source. If this occurs
right after the alarm daemon has started, the alarm is raised after
60 minutes. If it occurs after a peer has been previously selected,
then the alarm is raised after 90 seconds. The additional text for
this case is:
Time servers not reachable: 1.2.3.4 (rejected at initial selection)
Time servers not reachable: 1.2.3.4 (rejected at reselection)
The ntpd service has a number of selection algorithms, and there are a number of reasons why it may reject the selection of a server. The most common, is basically that the quality of the clock is not good enough, due to high jitter, dispersion or delay. Further details and trouble-shooting advice can be found in the documentation on the ntp.org website.
- 2.3. Unreachable: All of the configured
ntp servers are reported as unreachable by the local ntp daemon, ntpd.
This usually means that the network connectivity to the servers is
lost. The additional text for this case is:
"Time servers not reachable: 1.2.3.4 (unreachable)"
As a result of the fault, the time within the cluster might not be synchronized.
3. The possible cause for the Minor severity alarm is:
- Some, but not all of the configured ntp servers are
reported as unreachable by the local ntp daemon, ntpd. This may be
a temporary network connectivity issue. The additional text is as
follows:
"Time servers not reachable: 1.2.3.4 (unreachable)"
- Note:
- The initial synchronization time for a newly started or rebooted cluster can be up to 20 minutes.
- Note:
- There may be a long delay before the configured ntp servers are reported as unreachable by the local ntpd. This delay depends on various polling times of ntpd. Therefore, any alarm will not be raised until after these polling periods have completely elapsed.
3 Procedure
To clear the alarm, perform the following steps:
- If the affected node is a payload node, go to Step 3.
- Check that the NTP server(s) listed in the cluster configuration
are correct and have network connectivity from the cluster nodes.
If the name or address of any NTP server must be updated, see the following document for further information about how to configure an NTP server:
- LDE Management GuideLDE Management Guide, Reference [4]
- The NTP service will be restarted with the new servers when the lde-config --reload command is issued
- Wait 20 minutes.
If the alarm ceases, exit this procedure.
- Reboot the affected node.Warning!
As a consequence of a reboot, any application may lose sessions or traffic. Therefore, restart only one node at a time and only if the state of the cluster as whole is stable and running.
- Wait up to 20 minutes.
- If the alarm does not cease, contact the next level of maintenance support. Further actions are outside the scope of this operating instruction.
Reference List
| [1] LDE Glossary of Terms and Acronyms, TERMINOLOGY, 1/0033-APR 901 0551/4 |
| [2] LDE Trademark Information, LIST, 1/006 51-APR 901 0551/4 |
| [3] Typographic Conventions, DESCRIPTION, 1/1551-FCK 101 05 |
| [4] LDE Management Guide, USER GUIDE, 1/1553-CAA 901 2978/4 |

Contents