Monitoring High Availability using PAM36017 minutes to read
In mission-critical environments, one of the crucial requirements is to provide uninterrupted access to passwords. PAM360 provides the High Availability feature just to ensure this. In general, High Availability (HA) approach is to have a Primary server and a Secondary/Standby server to take over operations, if the Primary server fails. PAM360 supports monitoring the high availability of your servers to anticipate failures, thereby avoiding costly downtimes. This document walks you through the below topics:
1. Why do you need High Availability Monitoring?Continuous monitoring of your endpoints and associated database operations ensures early detection of problems and finding solutions for the same which in turn improves the user's system experience. In addition, monitoring captures system metrics, used to analyze trends in server performance and recurring problems. In the case of the database server, a reliable monitoring system is essential as it measures availability, detects events that can put down the database server and provides immediate notifications about critical failures to the concerned parties. A perfect monitoring process is one that is highly accessible and stable, captures diagnostic data and alerts the administrator about the problems encountered. 2. How does High Availability work in PAM360?Whenever the Primary server fails or goes down, the Secondary server takes over the functions that were being performed by the Primary. The HA setup in PAM360 provides a Secondary server, which can be used to retrieve passwords from the PAM360 repository, in case of a disaster, until the fully functional Primary server is back to service. This can be explained in detail as below:
3. The High Availability Architecture in PAM360The HA architecture in PAM360 is designed to be compatible with two different scenarios. See the below table for a detailed explanation: 4. What happens to Audit Trails?In the high availability scenarios mentioned above, audit trails will be recorded as usual. In scenario 2, as long as there is network connectivity between the two locations, the audit trails will be printed by the primary. When users connect to the Secondary, it will print operations such as 'password retrieval', 'login' and 'logout'. When the two locations get back network connectivity, the audit data will be synchronized. In scenario 1, when the primary crashes, the 'password retrieval', 'login' and 'logout' done by the users in secondary will be audited. Other audit records will already be in sync at the Standby 5. Synchronizing Primary and Secondary ServersFor a Secondary server to take over the operations of a failed Primary server, it must hold accurately the same data and perform the database processing in the same way as the primary server would have done, if it had worked fine. Hence, synchronization means continuously updating the Secondary server database so that it is an exact replica of the Primary database server. PAM360's HA functionality is thoughtfully designed to keep the data in both the servers in sync all the time. In case of a Secondary server failure or link failure, the changes made in one database are automatically synced up with the other upon service/connection restoration. Also, during such failures, the operations done in the Secondary server are audited as usual and synced up automatically on restoration. The data replication happens over a secure, encrypted channel. 6. Monitoring High Availability for PostgreSQL Database Server
|
Sl. No: | UI Element/Icon | Status | Definition |
---|---|---|---|
1 |
Active |
This blinking icon indicates that the HA is actively running in the server (Primary/Secondary) which you are viewing right now. |
|
2 |
Inactive |
This blinking icon indicates that the HA is down in the server (Primary/Secondary) which you are viewing right now. |
|
3 |
Success |
This icon indicates that HA is configured successfully in your server. In the case of HA configuration failure, this screen will be shown. |
|
4 |
|
This icon denotes the Primary server. |
|
5 |
|
This icon denotes the Secondary server. |
|
6 |
Configuration Details |
|
This is a table listing the following details of Primary and Secondary servers; Server Name, Server Port and Actions. You can modify the Secondary server details from here. (Please note that you cannot edit the Primary server details) |
7 |
Primary/Secondary Server |
This icon indicates that the Primary/Secondary server is up and running. |
|
This icon indicates that the Primary/Secondary server is down and stopped running. |
|||
8 |
Primary/Secondary Server PostgreSQL |
This icon indicates that the PostgreSQL database of Primary/Secondary server is up and running. |
|
This icon indicates that the PostgreSQL database of Primary/Secondary server is down and stopped running. |
|||
9 |
Replication Pending Count |
|
This indicates the total number of pending replications. If this value is zero, it means that there are no replications pending and the Primary and Secondary server are continuously in sync with each other. |
10 |
Connection Lost Time |
|
This indicates the time when the connectivity between the Primary and Secondary servers was lost. |
11 |
Connection Resumed Time |
|
This indicates the time when the connectivity between the Primary and Secondary servers was regained. |
The basic concept underlying HA is constant replication of data between the Primary and Secondary servers, where the Primary acts as the "Master" and the Secondary as the "Slave". The "Status" corresponds to the condition of the connection/communication between the Primary and Secondary servers/databases. There are two types of HA status:
Once the connection gets re-established, the synchronization will start between the databases. Anyhow, during the network disconnectivity, those who have connected to the primary and Secondary will not face any disruption in service.
Since the above two conditions (Active/Inactive) assume importance in the HA setup, it is important to receive real-time alerts when the status turns from Active to Inactive and vice-versa. To configure alerts, navigate to Audit >> Resource Audit >> Configure User Audit >> General Operations and select the mode of alert (email/SNMP trap/Syslog message) for the events High Availability Alive and High Availability Failed.
Notes:
1. Post HA Configuration: If you change the port of the Primary PAM360 server, the high availability setup will not work. You need to re-configure the setup with suitable changes.
2. If you have configured TFA: Whenever you enable TFA or when you change the TFA type (PhoneFactor or RSA SecurID or One-time password) AND if you have configured HA, you need to restart the PAM360 secondary server once.
Click the icon under Actions beside the Secondary server, whose details you wish to edit. In the window that pops up modify the details as required and click Update.
Once the HA status becomes "Inactive", the PAM360 HA setup also breaks down. In case of a HA failure, contact pam360-support@manageengine.com with the below log file:
<PAM360 Installation Folder>/pgsql/data/pg_log/pgsql_Mon.log