Apache Spark Monitoring
Apache Spark- An Overview
Apache Spark is an open source big data processing framework built for speed, with built-in modules for streaming, SQL, machine learning and graph processing. Apache Spark has an advanced DAG execution engine that supports acyclic data flow and in-memory computing. Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.
Monitoring Apache Spark - What we do
Let’s take a look at what you need to get real-time operational visibility into Spark applications, the performance metrics to gather and how you can ensure that your search server is up and operating as expected with Applications Manager's Apache Spark Monitoring:
- Resource Utilization Details - Applications Manager automatically discovers your Spark components and shows key metrics of Apache Spark clusters (master and worker nodes), monitors memory and CPU and notifies you of changes in resource consumption of memory pool.
- Real-Time Data - Track garbage collection and memory across the cluster on each component, specifically, the executors and the driver. Get useful information about the application and cores.
- Fix Performance Problems Faster - Get instant notifications when there are performance issues. Become aware of performance bottlenecks and take quick remedial actions before your end users experience issues.
Apache Spark - Adding a new monitor
Prerequisites for monitoring Apache Spark metrics: Click here
Using the REST API to add a new Apache Spark monitor: Click here
To create an Apache Spark monitor, follow the steps given below:
- Click on New Monitor link. Choose Apache Spark.
- Specify the Display Name of the Apache Spark monitor.
- Enter the HostName or IP Address of the host where Apache Spark Master runs.
- Enter the Port of the Apache Spark Master. By default, it will be 8080.
- Enter the polling interval time in minutes.
- Click Test Credentials button, if you want to test the access to Spark server.
- Choose the Monitor Group from the combo box with which you want to associate Spark Monitor (optional). You can choose multiple groups to associate your monitor.
- Click Add Monitor(s). This discovers Spark from the network and starts monitoring.
Note:
Uncomment the following lines in the file
SPARK_HOME/conf/metrics.properties.template and save it as
metrics.properties and restart the Apache Spark instances to collect the metrics:
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
You can monitor the Worker Nodes under the given Apache Spark Master by checking the option Discover All Nodes.
Monitored Parameters
Go to the Monitors Category View by clicking the Monitors tab. Click on the Apache Spark Master or Apache Spark Worker monitors under the Web Server/Services Table. Displayed is the Apache Spark bulk configuration view distributed into three tabs:
- Availability tab displays the Availability history for the past 24 hours or 30 days.
- Performance tab displays the Health Status and events for the past 24 hours or 30 days.
- List view enables you to perform bulk admin configurations.
Click on the monitor name to see all the server details listed under the following tabs:
Apache Spark Master
Overview
Parameter |
Description |
NODE DETAILS |
Node Name |
The name of the Apache Spark worker node. |
Used Memory (%) |
The percentage of total memory that the Spark worker node uses on the machine. |
Free Memory (%) |
The percentage of total free memory on the machine. |
MEMORY UTILIZATION |
Used Memory |
The percentage of total memory that the Spark Master node uses on the machine. |
Free Memory |
The percentage of total free memory of the Spark Master node. |
Total Memory |
The total amount of memory to allow Spark applications to use on the machine |
Used Memory |
The total amount of memory used by Spark applications. |
MASTER OVERVIEW |
Alive Workers |
The number of alive workers in the Spark cluster. A worker in the ALIVE state can accept applications. |
Active Applications |
The number of active applications that run on the Spark infrastructure. |
Waiting Applications |
The number of waiting applications. |
Completed Applications |
The number of completed applications. |
Used Cores |
The number of used CPU cores on the Apache Spark Master. |
Workers
In standalone mode, the workers are processes running on individual nodes that manage resource allocation requests for that node and also monitor the executors.
The number of CPU cores used by the particular Worker node.
Parameter |
Description |
WORKER DETAILS |
Web UI Address |
The URL of the worker's Web UI. The Web UI is the web interface of a running Spark application to monitor and inspect Spark job executions in a web browser. |
ID |
The ID of the particular worker node, to uniquely identify them. |
Cores Used |
The number of CPU cores used |
Cores Free |
The number of free CPU cores, which are unused. |
Used Memory (GB) |
The total memory used by the Worker Node. |
Free Memory (GB) |
The total free memory in the Worker node. |
Used Memory (%) |
Percentage of memory used by the Worker node. |
Time Since Last Heart Beat (seconds) |
The time elapsed since last heart beat (i.e.) The last time when the Worker node contacted the Master Node. |
State |
The current state of the Worker node, say, ALIVE or DEAD. |
Applications
Parameter |
Description |
APPLICATION DETAILS |
Application Name |
The name of your application. |
ID |
The application is referenced by its application ID. |
User |
The user associated with the particular application. |
Memory Allocated Per Slave (GB) |
The amount of memory allocated for each worker. |
Running Duration (min) |
The total running duration of the application, since it is started. |
State |
The current state of the particular Application, say, WAITING or RUNNING |
Memory
The maximum heap memory that the Spark can use.
Parameter |
Description |
HEAP MEMORY |
Used Heap |
The amount of heap memory used, in percentage. |
Free Heap |
The amount of heap memory that is free, in percentage. |
Max Heap Size |
The maximum heap memory that the Spark can use, in MB. |
Init Heap Size |
The minimum heap memory allocated, in MB. |
Committed Heap Size |
The total amount of committed heap memory, in MB. |
Used Heap Size |
The total used heap memory, in MB. |
NON HEAP MEMORY |
Used Non Heap |
The amount of non-heap memory used, in percentage. |
Free Non Heap |
The amount of non-heap memory that is free, in percentage. |
Max Non Heap Size |
The maximum non-heap memory that the Spark can use, in MB. |
Initial Non Heap Size |
The minimum non-heap memory allocated, in MB. |
Committed Non Heap Size |
The total amount of committed non-heap memory, in MB. |
Used Non Heap Size |
The total used non-heap memory, in MB. |
JVM |
Used JVM |
The amount of JVM memory used, in percentage. |
Free JVM |
The amount of JVM memory that is free, in percentage. |
Max JVM Size |
The maximum amount of heap that can be used for memory management, in GB. |
Initial JVM Size |
The amount of heap that the Java virtual machine initially requests from the operating system, in MB. |
Committed JVM Size |
The total amount of committed JVM memory, in MB. |
Used JVM Size |
The total amount of used JVM memory, in MB. |
MARKSWEEP AND SCAVENGE |
MarkSweep Count |
The number of times garbage collection have occurred in the Marksweep GC. |
MarkSweep Time |
The time taken for garbage collection that have occurred in the Marksweep GC. |
Scavenge Count |
The number of times garbage collection have occurred in the Scavenge GC. |
Scavenge Time |
The time taken for garbage collection that have occurred in the Scavenge GC. |
MEMORY POOL DETAILS |
Memory Pool |
The memory pool name |
Maximum (MB) |
The maximum pool memory allocated in MB. |
Committed (MB) |
The total amount of committed pool memory. |
Initial (MB) |
The pool memory initially requests from the operating system in MB. |
Used (MB) |
The total amount of used pool memory. |
Utilization (%) |
The percentage of used pool memory. |
RDD Details
Parameter |
Description |
COMPILATION DETAILS |
Compilation Time (Mean) |
The time it took to compile source code text. |
Compilation Count |
The total number compilations occurred while loading the files. |
COMPILATION DETAILS |
Generated Class Size (Mean) |
The size of the class generated. |
Generated Method Size (Mean) |
The size of each method in classes generated. |
Source Code Size (Mean) |
The time it took to compile source code text. |
Generated Class Count |
The number of classes generated. |
Generated Method Count |
The number of methods in classes generated. |
Source Code Count |
The total number of source code files, that were loaded into the node for compilation. |
COUNTERS |
File Cache Hits |
The total number of file level cache hits occurred. |
Files Discovered |
The total number of files discovered. |
Hive Client Calls |
The total number of client calls sent to Hive for query processing. |
Parallel Listing Job Count |
The total number of jobs running in parallel. |
Partitions Fetched |
The total number of partitions fetched. |
Configuration
Parameter |
Description |
CONFIGURATION DETAILS |
Master URL |
The URL of the master node. |
Total Workers |
The total number of workers provisioned in the cluster. |
Available Cores |
The number of CPU cores to allow Spark applications to use on the machine. |
Total Memory |
Total memory allocated for the Spark Master node. |
Apache Spark Worker
Overview
Parameter |
Description |
MEMORY UTILIZATION |
Used Memory Percentage |
The percentage of total memory that the Spark worker node uses on the machine. |
Free Memory Percentage |
The percentage of total free memory on the machine. |
Used Memory |
The total memory used by the Worker node, from the available memory. |
Free Memory |
The total free memory available for the Worker node. |
WORKER OVERVIEW |
Active Executors |
Number of active executors |
Finished Executors |
Number of finished executors (Spark executor exits either on failure or when the associated application has also exited.) |
Free Cores |
The total number of cores free and available for the particular Worker. |
Used Cores |
The total number of cores used by the particular Worker. |
Executors
Parameter |
Description |
EXECUTOR DETAILS |
Executor ID |
The unique ID for the particular Executor. |
Executor Memory (GB) |
The total memory available for the particular Executor. |
Application ID |
The unique ID for the application associated with the Executor. |
Application Name |
The name of the particular Application. |
User |
The user associated with the particular Application. |
Memory Allocated Per Slave (GB) |
The amount of memory allocated for each worker. |
Memory
Parameter |
Description |
HEAP MEMORY |
Used Heap |
The percentage of total used heap memory. |
Free Heap |
The percentage of free heap memory. |
Max Heap Size |
The maximum heap memory that the Spark can use, in MB. |
Init Heap Size |
The minimum heap memory allocated, in MB. |
Committed Heap Size |
The total amount of committed heap memory, in MB. |
Used Heap Size |
The total used heap memory, in MB. |
NON-HEAP MEMORY |
Used Non Heap Memory |
The percentage of total used non-heap memory. |
Free Non Heap Memory |
The percentage of free non-heap memory. |
Max Non Heap Size |
The maximum non-heap memory that the Spark can use, in MB. |
Initial Non Heap Size |
The minimum non-heap memory allocated, in MB. |
Committed Non Heap Size |
The total amount of committed non-heap memory, in MB. |
Used Non Heap Size |
The total used non-heap memory, in MB. |
JVM |
Used JVM Memory |
The amount of used JVM memory, in percentage. |
Free JVM Memory |
The amount of memory available for the JVM, in percentage. |
Max JVM Size |
The maximum amount of heap that can be used for memory management, in GB. |
Initial JVM Size |
The amount of heap that the Java virtual machine initially requests from the operating system, in MB. |
Committed JVM Size |
The total amount of committed JVM memory, in MB. |
Used JVM Size |
The total amount of used JVM memory, in MB. |
MARKSWEEP AND SCAVENGE |
MarkSweep Count |
The number of times garbage collection have occurred in the Marksweep GC. |
MarkSweep Time |
The time taken for garbage collection that have occurred in the Marksweep GC. |
Scavenge Count |
The number of times garbage collection have occurred in the Scavenge GC. |
Scavenge Time |
The time taken for garbage collection that have occurred in the Scavenge GC. |
MEMORY POOL DETAILS |
Maximum (MB) |
The maximum pool memory allocated, in MB. |
Initial (MB) |
The pool memory initially requests from the operating system, in MB. |
Committed (MB) |
The total amount of committed pool memory, in MB. |
Used (MB) |
The total amount of used pool memory, in MB. |
Utilization (%) |
The percentage of used pool memory. |
RDD Details
Parameter |
Description |
COMPILATION DETAILS |
Compilation Time (Mean) |
The time it took to compile source code text. |
Compilation Count |
The total number compilations occurred while loading the files. |
COMPILATION DETAILS |
Generated Class Size (Mean) |
The size of the class generated. |
Generated Method Size (Mean) |
The size of each method in classes generated. |
Source Code Size (Mean) |
The size of the compiled source code text. |
Generated Class Count |
The number of classes generated. |
Generated Method Count |
The number of methods in classes generated. |
Source Code Count |
The total number of source code files, that were loaded into the node for compilation. |
COUNTERS |
File Cache Hits |
The total number of file level cache hits occurred. |
Files Discovered |
The total number of files discovered. |
Hive Client Calls |
The total number of client calls sent to Hive for query processing. |
Parallel Listing Job Count |
The total number of jobs running in parallel. |
Partitions Fetched |
The total number of partitions fetched. |
Configuration
Parameter |
Description |
CONFIGURATION DETAILS |
Worker ID |
The worker is referenced by its worker ID. |
Master URL |
The URL of the master node. |
Master Web UI URL |
The URL of the master node's Web UI. |
Total Memory |
The total memory allocated and available for the particular Worker node. |