Elasticsearch Monitoring
Elasticsearch - An Overview
Elasticsearch is a highly scalable, distributed, open source RESTful search and analytics engine. It is multitenant-capable with an HTTP web interface and schema-free JSON documents. Based on Apache Lucene, Elasticsearch is one of the most popular enterprise search engines today and is capable of solving a growing number of use cases like log analytics, real-time application monitoring, and click stream analytics.
Monitoring Elasticsearch - What we do
Let’s take a look at what you need to see to monitor Elasticsearch, the performance metrics to gather and how you can ensure that your search server is up and operating as expected with Applications Manager's Elasticsearch monitoring:
- Resource Utilization Details - Applications Manager automatically discovers Elasticsearch servers, monitors memory and CPU and notifies you of changes in resource consumption of thread pool queues.
- Real-Time Data - You get up-to-the-second insight into cluster runtime metrics, individual cluster nodes, real-time threads and configurations.
- Cluster and Node Monitoring - Stay on top of your cluster and node health in real-time with fine-grained statistics of performance from Disk I/O Java to Memory usage metrics.
- Search and Indexing Performance - Gain complete control of your indexes and mappings. Monitor query latency, file system cache usage and request rates and take action if it surpasses a threshold.
- Fix Performance Problems Faster - Get instant notifications when there are performance issues. Become aware of performance bottlenecks and take quick remedial actions before your end users experience issues.
Creating a new Elasticsearch monitor
Using the REST API to add a new Elasticsearch monitor: Click here
To create an Elasticsearch Monitor, follow the steps given below:
- Click on New Monitor link. Choose ElasticsearchCluster.
- Specify the Display Name of the Elasticsearch monitor.
- Enter the HostName or IP Address of the host where Elasticsearch Cluster runs.
- Enter the Port of the Elasticsearch Cluster. By default, it will be 9200.
- Enter the polling interval time in minutes.
- Click Test Credentials button, if you want to test the access to Elasticsearch server.
- Choose the Monitor Group from the combo box with which you want to associate Elasticsearch Monitor (optional). You can choose multiple groups to associate your monitor.
- Click Add Monitor(s). This discovers Elasticsearch from the network and starts monitoring.
Note:
- Security/Firewall Requirements - The Elastic Search Cluster host and port should be accessible from the machine where Applications Manager is installed.
- User Privilege - The required user credentials should be provided.
Demo
Monitored Parameters
Go to the Monitors Category View by clicking the Monitors tab. Click on the Elasticsearch and ElasticsearchCluster monitors under the Web Server/Services Table. Displayed is the Elasticsearch or the ElasticsearchCluster bulk configuration view distributed into three tabs:
- Availability tab displays the Availability history for the past 24 hours or 30 days.
- Performance tab displays the Health Status and events for the past 24 hours or 30 days.
- List view enables you to perform bulk admin configurations.
Click on the monitor name to see all the server details listed under the following tabs:
Elasticsearch Cluster
Overview
Parameter |
Description |
Node Details |
Node Name |
The name of the node |
Node Type |
The type of the node (Client or Data or Master-Eligible or Master-Data) |
Avg Query Time |
The first phase of search operation is Query. The time taken to process the query in all shards |
Avg Fetch Time |
The second phase of search operation is Fetch. The time taken to retrieve the query result, only from the shards which have the requested data. |
CLUSTER OVERVIEW |
Cluster Status |
The status of the cluster depending on the replicas of the cluster. |
Total Nodes |
The total number of nodes in the cluster. |
Total Indices |
The total number of indices in the cluster. |
Total Shards |
The total number of shards in the cluster. |
Total Docs |
The total number of documents present in the cluster. |
Cluster Details
Parameter |
Description |
NODES SPLITUP |
Client Node |
The total number of Client Nodes in the cluster. |
Data Node |
The total number of Data Nodes in the cluster. |
Master Node |
The total number of Master Eligible Nodes in the cluster. |
Data-Master Node |
The total number of Data Nodes, which also acts as Master Eligible Nodes in the cluster. |
SHARDS COUNT |
Active Shards |
The number of Active Shards present in the cluster. |
Active Primary Shards |
The number of Primary Shards that are Active in the cluster. |
Relocating Shards |
The number of Relocating Shards present in the cluster. |
Initializing Shards |
The number of Initializing Shards present in the cluster. |
Unassigned Shards |
The number of Unassigned Shards present in the cluster. |
Delayed Unassigned Shards |
The number of Delayed Unassigned Shards present in the cluster. |
Total Shards |
The number of Shards present in the cluster. |
Top 20 Pending Tasks by Priority |
Insert Order |
The order of the task in which the pending task is inserted into the queue. |
Priority |
The priority assigned for the particular task. |
Source |
The source for the pending task. |
Wait Time by Priority |
The total waiting time of the pending task in that queue based on priority (in milliseconds). |
Top 20 Pending Tasks by Wait Time |
Insert Order |
The order of the task in which the pending task is inserted into the queue. |
Priority by Wait Time |
The priority assigned for the particular task based on Wait Time. |
Source |
The source for the pending task |
Wait Time |
The total waiting time of the pending task in that queue (in milliseconds). |
Indices
PARAMETER |
DESCRIPTION |
Indices Overview |
Index Name |
The name of the index representing a collection of documents. |
Documents |
Indicates the number of documents that are available in the particular index. |
Indexing Latency |
Amount of time taken to index a document in the particular index (in millisecond). |
Indexing Rate |
The number of documents that are indexed per second. |
Query Latency |
Amount of time taken to process the query in the particular index (in millisecond). |
Query Rate |
The number of queries that are processed by the index per second. |
Fetch Latency |
Amount of time taken to run the query and retrieve the data in the particular index (in millisecond). |
Fetch Rate |
The number of queries that are run and retrieved data by the index per second. |
Current Merges |
Indicates the number of merges that have occurred in the particular index. |
Merge Time |
Amount of time taken to merge segments in the particular index (in millisecond). |
Flush Time |
Amount of time taken to flush one or more indices to disk (in millisecond). |
Refresh Time |
Amount of time taken to refresh an index (in millisecond). |
Configuration
PARAMETER |
DESCRIPTION |
CONFIGURATION DETAILS |
Cluster Name |
The name of the cluster. |
Total Nodes |
The total number of nodes in the cluster. |
Master Node Name |
The name of the Master Node in the cluster. |
Master Node Port |
The port on which the Master node of Elasticsearch runs. |
Master Node IP |
The IP address in which the Master Node runs. |
Publish Port |
The publish port of the cluster. |
Elasticsearch
Overview
PARAMETER |
DESCRIPTION |
AVERAGE SYSTEM LOAD |
Avg. System Load |
The average value of the amount of load that is being processed by the system (in the last 1 minute, 5 minutes, and 15 minutes). |
CPU UTILIZATION |
CPU Utilization |
Amount of CPU currently being utilized by the node (in %). |
SEARCH TIME |
Average Query Time |
The first phase of search operation is Query. The time taken to process the query in all shards |
Average Fetch Time |
The second phase of search operation is Fetch. The time taken to retrieve the query result, only from the shards which have the requested data. |
SEGMENT TIME |
Average Merge Time |
The average time taken for segment merging in a node. (A shard in elasticsearch is a Lucene index, broken down into segments. Segments are, periodically, merged into larger segments to keep the index size at bay and expunge deletes.) |
Average Refresh Time |
The average time spent in refreshing an index. (Refresh time increases with the number of file operations for the Lucene index). |
INDEXING TIME |
Average Index Time |
The average time taken to index a document. (Documents are indexed i.e stored and made searchable.) |
Average Delete Time |
The average time taken to delete an existing index. |
Indexed Count |
The number of documents indexed. |
Deleted Count |
The number of deleted documents. |
Indexing Rate |
The number of documents that are indexed per second. |
GET TIME |
Average Get Time |
The average time taken to retrieve information about one or more indexes |
Existing Count |
The number of get requests that were present. |
Missing Count |
The number of get requests that were missing. |
FLUSH TIME |
Average Flush Time |
The average time taken to flush one or more indices to disk. (The flush process of an index basically frees memory from the index by flushing data to the index storage and clearing the internal transaction log.) |
WARMER TIME |
Average Warmer Time |
The average time taken to perform a warmup search on an index. (Index warming allows to run registered search requests to warm up the index before it is available for search.) |
PERCOLATE TIME |
Average Percolate Time |
The average time spent running percolator queries. (One of Elasticsearch's core feature is the ability to do search in reverse with the percolator. The percolator automatically indexes the query terms with the percolator queries. This allows the percolator to percolate documents more quickly.) |
Memory Details
The total space used in the Direct Buffer pool.
PARAMETER |
DESCRIPTION |
HEAP MEMORY |
Used Heap Percent |
The percentage of JVM heap currently in use. |
Free Heap Percent |
The percentage of JVM heap currently free |
NON-HEAP MEMORY |
Used Non-Heap Percent |
The percentage of non-heap memory currently in use. |
Free Non-Heap Percent |
The percentage of non-heap memory currently free. |
GARBAGE COLLECTION |
GC Time - Young |
The total time spent on young-generation garbage collections. |
GC Time - Old |
The total time spent on old-generation garbage collections. |
GC Count - Young |
The total number of young-generation garbage collections. |
GC Count - Old |
The total number of old-generation garbage collections. |
BUFFER POOLS |
Direct Buffer Space Used |
The total space used in the Direct Buffer pool. |
Mapped Buffer Space Used |
The total space used in the Mapped Buffer pool. |
Direct Buffer Connection Count |
The total connections to Direct Buffer pool. |
Mapped Buffer Connection Count |
The total connections to Mapped Buffer pool. |
I/O Details
PARAMETER |
DESCRIPTION |
DISK I/O COUNT |
Disk Read Count |
The number of read ( from the disk) requests by Elasticsearch. |
Disk Write Count |
The number of write ( to the disk) requests by Elasticsearch. |
DISK I/O SIZE |
Disk Read Size |
The total size of read requests ( from the disk) by Elasticsearch. |
Disk Write Size |
The total size of write requests ( to the disk) by Elasticsearch. |
CACHE DETAILS |
Cache Name |
The name of the cache. |
Total Size (MB) |
The size of the cache. |
Evictions |
The number of evictions from the filter cache. |
BREAKER DETAILS |
Breaker Name |
The name of the Circuit Breaker. (Circuit breakers are designed to deal with situations when request processing needs more memory than available. This would mean OOM (OutOfMemoryException). So sometimes it is better to fail a query instead of getting OOM, because when OOM appears JVM becomes not responsive.) |
Limit Size (MB) |
The limit size of the particular Breaker. |
Used Size (MB) |
The used size of the particular Breaker. |
Tripped |
The total number of times the breaker circuit tripped. |
Thread Pools
PARAMETER |
DESCRIPTION |
THREAD DETAILS |
Thread Name |
The name of the thread. |
Configured Threads |
The number of threads of current configured type. |
Queue |
The number of thread of current type in queue. |
Active |
The number of active threads of current type. |
Rejected |
The number of rejected threads of current type. |
Largest |
The number of largest threads of current type. |
Network
PARAMETER |
DESCRIPTION |
TRANSPORT |
Transmitted Bytes |
The number of bytes sent by the network. (Transport metrics about cluster communication) |
Received Bytes |
The number of bytes received by the network. (Transport metrics about cluster communication) |
Transmitted Packets |
The number of data packets sent by the network. (Transport metrics about cluster communication) |
Received Packets |
The number of data packets received by the network. (Transport metrics about cluster communication) |
TCP CONNECTOR |
Active Connections |
The number of active TCP connections. |
Passive Connections |
The number of passive TCP connections. |
HTTP CONNECTOR |
Current Connections |
The number of http connections currently active. |
Total Connections |
The total number of http connections. |
Configuration
PARAMETER |
DESCRIPTION |
CONFIGURATION DETAILS |
Cluster Name |
The name of the cluster. |
Node Name |
The name of the node in the cluster. |
Node Type |
The type of the node (Client/Data/Master-Eligible/Data-Master). |
Host |
The IP address of the Host. |
ElasticSearch Version |
The version of the installed Elasticsearch. |
Port |
The port in which Elasticsearch runs. |
ElasticSearch Home |
The home directory of Elasticsearch. |
Total Processors |
The total number of processors in the current node |
Java Version |
The version of Java running in the node. |
Java Vendor |
The Java vendor. |