ManageEngine Applications Manager provides out-of-the-box Linux server performance monitoring capabilities. It helps the operations team ensure the servers are up (ping) and also run at peak performance by monitoring CPU usage, memory utilization, processes, disk utilization, disk I/O Stats.
In this help document, you will learn how to get started with Linux performance monitoring along with the list of parameters that are monitored with Applications Manager's Linux monitoring tool.
Supported Distributions: We support monitoring most popular Linux distributions, including but not limited to Debian, Ubuntu, CentOS / CentOS Stream, RedHat, Oracle Linux, Mandriva, Fedora, SLES, OpenSUSE, Amazon Linux, IBM Cloud Linux, Microsoft Azure Linux, Google Cloud Platform (GCP) Linux, and more.
Prerequisites for monitoring Linux server performance metrics: Click here
Using the REST API to add a new Linux server monitor: Click here
Follow the steps given below to create a new Linux server monitor:
Note: To identify the Public/Private key, go to command prompt, type cd.SSH/ then from the list, open the files <id_dsa.pub>/<id_rsa.pub> [Public] or <id_dsa>/<id_rsa>[Private] to get the keys.
Applications Manager's Linux performance monitoring monitors the key performance indicators of Linux servers to detect any performance problems. These indicators include CPU, memory, disk, etc.
Click on the individual monitors listed to view detailed Linux server performance metrics. The performance metrics have been categorized into 7 different tabs:
This tab provides a high-level overview of the health and performance of the Linux server along with information pertaining to the processes running on the system.
Parameter | Description |
---|---|
Monitor Information | |
Name | The name of the Linux server monitor. |
System Health | Denotes the health status of the Linux server(clear, critical, warning). |
Type | Denotes the type you are monitoring. |
Host Name | The host name of the Linux system. |
Host OS | The main OS installed on the system. |
Last Polled at | Specifies the time at which the last poll was performed. |
Next Poll at | Specifies the time at which the next poll is scheduled. |
Today's Availability | Shows the overall availability status of the server for the day. You can also view 7/30 reports and the current availability status of the server. |
Parameter | Description |
---|---|
Thread count | The number threads running in the Linux machine |
Process Count | The number of processes. Too many open processes can give poor performance on servers. it is helpful to be warned that process count is increasing so users can remedy before an issue arises. |
Zombie Process Count | The number of Zombie processes. Zombie Processes can hold ports open with no control. it is helpful to see when a zombie process is spawned so it can be deal with accordingly before any issues arise |
Major Page Faults/s | Number of major faults the system has made per second, those which have required loading a memory page from disk. |
Context Switches/s | Total number of context switches per second. |
You can use the Custom Fields option in the 'Monitor Information' section to configure additional fields for the monitor.
This tab provides the CPU usage statistics of the Linux server. The tab includes two graphs - one that displays the CPU utilization by CPU Cores and another that shows the Breakup of CPU utilization - by CPU cores. You can view additional reports by clicking the graphs present in the Breakup of CPU Utilization - by CPU coressection. These reports include Break up of CPU Utilization (%) Vs Time, User Time (%) Vs Time, System Time (%) Vs Time, I/O Wait Time (%) Vs Time, Idle Time (%) Vs Time, Steal Time (%) vs Time, CPU Utilization (%) Vs Time and Interrupts/sec Vs Time for all the CPU cores.
The CPU tab also shows the following performance metrics:
Parameter | Description | Monitoring Mode | |
---|---|---|---|
Telnet/SSH | SNMP | ||
Core | The name of the CPU core | ||
User Time(%) | The percentage of time that the processor spends on User mode operations. This generally means application code. | ||
System Time(%) | The percentage of CPU kernel processes that are in use. | ||
I/O Wait Time(%) | The time spent by the processor to waiting for I/O to complete. | ||
Idle Time(%) | The time when the CPU is idle (not being used by any program) | ||
Steal Time(%) | Amount of time a virtual CPU waits for a real CPU while the hypervisor is servicing another virtual processor. | ||
CPU Utilization(%) | Specifies the total CPU used by the system. | ||
Interrupts/sec | The rate at which CPU handles interrupts from applications or hardware each second. If the value for Interrupts/sec is high over a sustained period of time, there could be hardware issues. |
You can also view graphs for these attributes by selecting the necessary CPU core and then choosing the appropriate attribute.
This tab displays disk usage and disk I/O statistics of the Linux server.
Parameters | Description |
---|---|
Disk Utilization | |
Disk | The name of the disk drive. |
Used (%) | Denotes how much disk space out of the total disk space has actually been used (in percentage) |
Used (MB) | The disk space used in mega bytes. |
Free (%) | The percentage of total usable space on the disk that was free. |
Free (MB) | The unallocated space on the disk in mega bytes. |
Disk I/O Statistics | |
Transfers/sec | The number of read/write operations on the disk that occur each second. |
Writes/sec | The percentage of elapsed time that the disk drive was busy servicing write requests. |
Reads/sec | The percentage of elapsed time that the disk drive was busy servicing read requests. |
% Busy Time | The percentage of time the disk was busy. |
Average Queue Length | The average number of both read and write requests that were queued for the disk during the sample interval. |
Avg. Disk Latency | The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. |
Read Wait Time | The average time (in milliseconds) for read requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servic‐ ing them. |
Write Wait Time | The average time (in milliseconds) for write requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servic‐ ing them. |
Inode Usage | |
Inode | The name of the Inode. |
Total | The total number of Inodes available in that particular disk. |
Used | The percentage of elapsed time that the disk drive was busy servicing read requests. |
Free | The remaining number of Inodes that are available in that particular disk. |
Used (%) | The number of Inodes used in that particular disk, in percentage. |
Free (%) | The remaining number of Inodes that are available in that particular disk, in percentage. |
You can also delete disks that have been physically removed using the Delete Orphaned Disk option.
Note: Data collection for Disk I/O statistics and Inode statistics can be enabled from 'Disk I/O Statistics Monitoring' and 'Inode Monitoring' options under Settings → Performance Polling → Servers tab.
Parameters | Description |
---|---|
Memory Usage Statistics | |
Active Memory | Memory that has been used more recently and usually not reclaimed unless absolute necessary (in MB). |
Active anonymous memory | Anonymous memory that has been used more recently and usually not swapped out (in MB). |
Active Files Memory | Pagecache memory that has been used more recently and usually not reclaimed until needed (in MB). |
Anonymous huge pages memory | Non-file backed huge pages mapped into userspace page tables (in MB). |
Anonymous pages memory | Non-file backed pages mapped into userspace page tables (in MB). |
Cached memory | Memory in the pagecache (Diskcache and Shared Memory) (in MB). |
Commit limit memory | Based on the overcommit ratio (vm.overcommit_ratio), this is the total amount of memory currently available to be allocated on the system (in MB). This limit is only adhered to if strict overcommit accounting is enabled (mode 2 in vm.overcommit_memory). |
Committed Memory | The amount of memory presently allocated on the system (in MB). The committed memory is a sum of all of the memory which has been allocated by processes, even if it has not been "used" by them as of yet. |
Unevictable memory | Unevictable pages that cannot be swapped out for a variety of reasons (in MB). |
Unreclaimable memory | The part of the slab that cannot be reclaimed under memory pressure (in MB). |
Parameter | Description | Monitoring Mode | |
---|---|---|---|
Telnet/SSH | SNMP | ||
NETWORK INTERFACE | |||
Name | The name of the network interface present in the Windows system. | ||
Speed (Mbps) | The estimate of the current bandwidth in Mbps. | ||
MTU | Maximum Transmission Unit (MTU) is a measurement of the largest data packet that a network-connected device can accept. | ||
Input Traffic (Kbps) | The rate at which packets are received on the interface, in kilo bytes per second. | ||
Output Traffic (Kbps) | The rate at which packets are sent on the interface, in kilo bytes per second. | ||
Errors | Number of packets that could not be sent or received. | ||
Connection Stats | |||
Socket State | The state in which the sockets are present. Following are the list of sockets that are shown:
|
||
No. of Connections | Number of connections that are available for the particular socket state. | ||
NTP Stats | |||
NTP Status | Indicates whether the client is synchronized with the server or not. | ||
Server Name | Indicates the hostname of the server to which the client is synchronized. | ||
Stratum Level | Indicates the level of the strata at which the client is located. | ||
NTP Time correct to within | Indicates the time offset value (in milliseconds) displayed for 'time correct to within' after executing the npstat/chrony command. Time correct to within = (Root dispersion + Root Delay) / 2 |
||
Poll Interval | Indicates the polling time interval between each sync (in seconds). |
Note: You can also delete interfaces that have been physically removed using the Delete Orphaned Interface option.
Cron jobs are used for scheduling tasks like backups, emails, status checks, etc. in Linux and can have a major impact on the performance of your web servers and applications. Applications Manager makes it easy by continuously monitoring them and helps you gain insight into the execution of important jobs in the back-end systems.
Prerequisites : Click here
The below table contains information about Cron job details running in the Linux server.
Parameters | Description |
---|---|
Cron Job Details: | |
Cron Name | Name of the Cron job. |
Cron Expression | The Cron expression for the corresponding Cron job. |
Job Start Time | Time and date at which the Cron job started. |
Job End Time | Time and date at which the Cron job ended. |
Next Run Time | Time and date at which the next Cron job is scheduled to run. |
Elapsed Time | The amount of time elapsed since the Cron job started (in Minutes). |
Exit Code | Denotes the exit code of the Cron job. |
Missed Runs | The number of times Cron job had failed/missed to start at the scheduled time. |
Status | Status of the Cron job. Possible values are:
|
Note: Once the Cron job is added, it will be in discovery state until we receive the first response from the remote server.
To update a Cron job,
To delete Cron jobs,
Note: Addition, update, and deletion of Cron jobs will be possible only in managed servers by the administrator user.
This tab contains information about system configuration attributes.
Parameters | Description |
---|---|
System Information | |
Host Name | The name of the system. |
Domain | The name of the domain to which the system belongs. |
OS Information | |
OS Name | The name of the operating system instance. |
OS Version | Version number of the operating system. |
OS Release | The Linux distribution |
Memory Information | |
Total Physical Memory (MB) | Total amount of physical memory as available to the operating system. |
Total Swap Memory (MB) | Total amount of swap memory available. |
Processor Information | |
Id | Unique identifier of a processor on the system |
Model | The processor model type |
Implementation | The processor family type. |
Manufacturer | Name of the processor manufacturer |
Speed(MHz) | Current speed of the processor |
Cache (KB) | Size of the processor cache. A cache is an external memory area that has a faster access time than the main memory. |
Network Interface Settings | |
Name | The name of the network adapter. |
IP Address | The IP address configured for this network interface |
MTU | The network medium in use. |
Type | The type of network adapter. |
Mac Address | The Media access control address for this network adapter. A MAC address is a unique 48-bit number assigned to the network adapter by the manufacturer. It uniquely identifies this network adapter and is used for mapping TCP/IP network communications. |
Status | The current status of the network adapter. |
Broadcast Address | The IP address to which messages are broadcast. |
Printer Settings | |
Name | Name of the printer. |
Device | The name of the server that controls the printer. |
Default | Indicates whether the printer is the default one. Values are either True or False. |
Status | Current status of the printer. |
Note: The data present in the configuration tab is not updated during every poll. So if you make any changes to the server configuration, you need to restart Applications Manager for those changes to be reflected in the 'Configuration' tab.
The following are metrics pertaining to the hardware of Dell and HP servers:
Category | Attribute | Description | DELL | HP | ||
---|---|---|---|---|---|---|
SNMP Mode | WMI Mode | SNMP Mode | WMI Mode | |||
Temperature | Sensor | The name of the temperature sensor. | ||||
Temperature Reading (deg C) | The current /present temperature reading. | |||||
Status | The temperature status - Critical, Warning, Clear | |||||
Fan | Sensor | Name of the fan sensor. | ||||
Fan Speed (RPM) | The fan speed values displayed in RPM. | |||||
Status | The fan status - Critical, Warning, Clear | |||||
Power | Sensor | Name of the power supply. | ||||
Reading (Watts) | The power supply reading values displayed in Watts. | |||||
Status | The power status - Critical, Warning, Clear | |||||
Voltages | Sensor | Name of the voltage supply. | ||||
Reading (Volts) | The voltage reading values displayed in Volts. | |||||
Status | The voltage status - Critical, Warning, Clear | |||||
Battery | Sensor | Name of the Battery sensor.. | ||||
Status | The battery status - Critical, Warning, Clear | |||||
Memory | Sensor | Name of the Memory sensor. | ||||
Memory Device Type | The type of memory device | |||||
Size (MB) | The amount of memory currently installed in MB. | |||||
Status | The memory status - Critical, Warning, Clear | |||||
Disk | Sensor | Identifies the disk's label | ||||
Device Name | The device name configured for the disk | |||||
Size (MB) | The allocated size in MB | |||||
Status | The disk status - Critical, Warning, Clear. | |||||
Array | Sensor | The name of the array disk | ||||
Bus protocol | The bus type of the array disk | |||||
Size (MB) | The amount in MB of the used space on the array disk. | |||||
Status | The array status - Critical, Warning, Clear | |||||
Chassis | Sensor | The user-assigned chassis name of the chassis. | ||||
Model | The system model type for this chassis | |||||
Status | The chassis status - Critical, Warning, Clear | |||||
Processor | Sensor | The location name of the processor device status probe | ||||
Processor Brand | The brand of the processor device. | |||||
Processor Current Speed | The current speed of the processor device in MHz
|
|||||
Processor Core Count | The number of processor cores detected for the processor device. | |||||
Status | The processor status - Critical, Warning, Clear |
Note: Currently hardware performance monitoring is supported in SNMP and WMI monitoring mode.
Hardware Device-Level Configuration
Hardware Configuration option available under Host Details in the right hand side of the details page, will allow you to opt for the various hardware components you want to monitor. This operation can also be done using the Performance Polling option under the Settings tab which will globally configure the hardware stats.
Advanced Settings
By clicking the Advanced Settings option available under Host Details in the right hand side of the details page, you can go to the Performance Data Collectionpage for Servers.
Here you can use the Hardware Health monitoring option to enable or disable hardware monitoring in servers. You can also opt the various hardware components (like power, fan, disk,etc.,) to be monitored by checking the options given. This will globally configure the hardware monitoring status. You can also configure the health status by defining values in the respective text boxes:
Note: If the status of the device does not match with any of the values defined in the severity text box, the device status is displayed as unknown. Status values defined within the severity text boxes are comma-separated and case-insensitive.