Kubernetes Performance Monitoring



Kubernetes - An Overview

Kubernetes (or k8s) is an open-source container orchestration system for automating deployment, scaling and management of application containers across clusters of hosts. Kubernetes clusters can span hosts across public, private, or hybrid clouds. K8s orchestration allows users to build application services across multiple containers, schedule those containers across a cluster, scale those containers, and manage the health of those containers over time.

Monitor all your Kubernetes performance workloads using a single tool

Applications Manager's Kubernetes monitoring tool lets administrators adapt monitoring strategies to account for the new infrastructure layers introduced (when adopting containers and the container orchestration) with a distributed Kubernetes environment.

  • Auto-discover the parts and map relationships between objects in the cluster - Kubernetes nodes, namespaces, deployments, replica sets, pods, and containers.
  • Track the capacity and resource utilization of your cluster and be able to drill into specific parts of the cluster.
  • Identify if you have enough nodes in your cluster and resource allocations to existing nodes is sufficient for deployed applications.
  • Ensure all nodes on the cluster are healthy - monitor the CPU and memory for Kubernetes nodes (workers and masters).
  • Ensure all desired pods in a deployment are running and not in a restart loop.
  • Set up alerts for Container restarts to identify issues with either a container or its host that affect performance of their applications.
  • Monitor the performance outliers of the Kubernetes-hosted applications running inside your cluster and track down any individual errors.
  • View the status of Kubernetes Master and Node components – API Server, the Etcd key/value store, Scheduler and Controller.
  • Monitor crucial Kubernetes performance metrics to predict future resource requirements and ensure that your cluster has sufficient capacity to handle potential workload spikes.

Note: In the Kubernetes cluster architecture, it is sufficient to add the primary master node alone to the Applications Manager. Applications Manager will automatically discover all the other master and worker nodes within the cluster and monitor them closely. There is no need to individually add each node as a Kubernetes performance monitor as this will lead to a performance issue.

Discover more with a Kubernetes performance monitor

Prerequisites for setting up Kubernetes performance monitor: kubectl should be installed on the machine where Kubernetes is installed.

Using the REST API to add a new Kubernetes performance monitor: Click here

Follow the steps given below to create a new Kubernetes monitor:

  1. Click on New Monitor link. 
  2. Select Kubernetes under Virtualization category.
  3. Specify the Display Name of the Kubernetes Server.
  4. Enter the Cluster hostname/ IP address of the server where Kubernetes is running. 
  5. Enter the credential details like user name and password for authentication, or select the required credentials from the Credential Manager list after enabling the Select from Credential list option.
  6. Check the box to enable Public Key Authentication (Supported for SSH2 only), the SSH Key for SSH authentication.
  7. Specify the command prompt value, which is the last character in your command prompt. Default value is $ and possible values are >, #, etc.
  8. Enter the SSH port. Default SSH port used is 22.
  9. Enable the Monitor Specific Namespace(s)option if you wish to monitor only specific namespace(s) in the Kubernetes environment. After enabling, specify the following details:
    • Filter Condition: Select the filtering condition to include or exclude monitoring of specific namespace(s) in the Kubernetes environment.
    • Namespace Name(s): Specify the name of the namespace(s) to be included/excluded while monitoring. You can enter multiple namespaces as comma-separated values.
  10. Check the Enable Event Log Monitoring box to enable the option to monitor Event Log details.
  11. Specify the Polling Interval in minutes.
  12. Choose the Monitor Group with which you want to associate the Kubernetes to, from the combo box (optional). You can choose multiple groups to associate your monitor.
  13. Click Add Monitor(s). This discovers the Kubernetes from the network and starts monitoring it.

Monitored Parameters

Go to the Monitors Category View by clicking the Monitors tab. Click on Kubernetes under the Virtualization table. Displayed is the Kubernetes bulk configuration view distributed into three tabs:

  • Availability tab gives the Availability history for the past 24 hours or 30 days.
  • Performance tab gives the Health Status and events for the past 24 hours or 30 days.
  • List view enables you to perform bulk admin configurations.

On clicking a monitor from the list, you'll be taken to the Kubernetes performance monitor dashboard. It has thirteen tabs -

Overview

ParameterDescriptionSupported in Prometheus
CLUSTER USAGE DETAILS
Average Cluster CPU Usage Average CPU used by the cluster
Average Cluster Memory Usage Average Memory used by the cluster
CLUSTER DETAILS
Control Plane Control Plane URL.
Git Version Git version used in Kubernetes.
Build Date Build Date
Compiler Name of the compiler.
Platform Version of the Platform.
CLUSTER SUMMARY
Namespace Count Total number of Namespace.
Service Count Total number of Services.
Deployment Count Total number of Deployment.
Daemonset Count Total number of Daemonsets.
Statefulset Count Total number of Statefulsets.
Total Jobs Count Total number of Jobs.
Replication Controller Count Total number of Replication Controllers.
Replica Set Count Total number of Replica Sets.
Ingress Count Total number of Ingress.
CLUSTER NODE SUMMARY
Total Node Count Total number of Nodes.
Master Node Count Total number of Master Nodes.
Worker Node Count Total number of Worker Nodes.
CLUSTER PODS SUMMARY
Total Pods Count Total number of Pods.
Running Pods Count Total number of Running Pods.
Succeeded Pods Count Total number of Pods.
Pending Pods Count Total number of Pending Pods.
Failed Pods Count Total number of Failed Pods.
Unknown Pods Count Total number of Unknown Pods.
CLUSTER CONTAINER SUMMARY
Total Containers Count Total number of Containers.
Running Containers Count Total number of Running Containers.
Completed Containers Count Total number of Completed Containers.
Terminated Containers Count Total number of Terminated Containers.
Waiting Containers Count Total number of Waiting Containers.
CLUSTER PODS USAGE DETAILS
Used Pod Count Total number of Used Pods.
Maximum Pod Count Maximum number of Allocatable Pods.
Top 5 Nodes by Used Pod Count
Used Pod Count Total number of Used Pods.
COMPONENT DETAILS
Component Name Name of the component.
Status Status of the component.
Component Message Root Cause message of the Component.

Namespace

ParameterDescriptionSupported in Prometheus
NAMESPACE DETAILS
Namespace Name Name of the Namespace.
Resource Version The version Total number of Namespace.
Namespace Availability Availability of Namespace.
Namespace Created Time Time at which the Namespace was created.
NAMESPACE PODS USAGE DETAILS
Namespace Name Name of Namespace.
Total Pods Count Total number of Pods present in the Namespaces.
Running Pods Count Total number of Running Pods present in the Namespaces.
Succeeded Pods Count Total number of Succeeded Pods present in the Namespaces.
Pending Pods Count Total number of Pending Pods present in the Namespaces.
Failed Pods Count Total number of Failed Pods present in the Namespaces.
Unknown Pods Count Total number of Unknown Pods present in the Namespaces.

Node

ParameterDescriptionSupported in Prometheus
TOP 5 NODES BY MEMORY DETAILS
Memory Limit Maximum limit of Node memory in GiB.
Memory Requests Total number of memory requests.
TOP 5 NODES BY CPU DETAILS
CPU Limit Maximum limit of CPU
CPU Request Total number of CPU requests
NODE MEMORY AND CPU DETAILS
Name Name of the node.
Allocatable Memory(Gi) The CPU resources of a node that are available for scheduling in Gi.
Memory Limit(%) The maximum limit of memory resource which can be used.
Memory Request(%) Total memory requests in %.
Allocatable CPU Processor Count Total number of CPU processors that are available.
CPU Limit(%) The maximum limit of CPU resource which can be used.
CPU Request(%) Total CPU requests in %.
NODE POD DETAILS
Name Name of the pod.
Pod Usage Details Total number of pods available with used and free pods split-up.
Kube-system Pod Count Total number of Kube state pods.
Non-Kube-system Pod Count Total number of non-Kube state pods.
Image Count Total number of images in the node.
Used Pod Count Total number of pods present in Kubernetes.
Allocatable Pod Count Total number of pods that are available.
NODE DETAILS
Name Name of the node.
OSImage OSImage name.
OS Name of the OS in which the container is deployed.
Architecture Architecture details.
Type Type of node.
Kubelet Version The version of Kubelet used.
Allocatable Ephemeral Storage(Gi) Size of temporary memory available in Gi.
Created Time Time at which the node was created.

Pods

ParameterDescriptionSupported in Prometheus
POD DETAILS
Pod Name Name of the pod.
Pod Namespace Namespace in which the pod resides
Pod Node Name Name of the pod-node.
Pod Application Name of the pod application.
Pod Type Type of pod.
Pod created The means by which the pod was created.
Pod Status Status of the pod.
Pod Start Time The start time of the pod.
Pod Created Time Time at which the pod was created.
TOP 10 PODS BY MEMORY DETAILS
Pods Memory Limit Maximum limit of memory.
Pods Memory Request Total number of memory requests.
TOP 10 PODS BY CPU DETAILS
Pods CPU Limit Maximum limit of CPU.
Pods CPU Request Total number of CPU requests.
POD MEMORY AND CPU DETAILS
Pod Name Name of the pod.
Pod Namespace Namespace of the pod.
Total number of Containers Total number of containers run by the pod.
Pod CPU Limit(%) The maximum limit of CPU resource which can be used.
Pod CPU Request(%) Total CPU requests by pod in %.
Pod Memory Limit(%) The maximum limit of memory resource that can be used.
Pod Memory Request(%) Total memory requested in %.
Pod created The means by which the pod was created.
Pod Persistent Volumes Claim Name of the Claim through which a pod can access the persistent volume.

Containers

ParameterDescriptionSupported in Prometheus
TOP 5 CONTAINERS BY RESTART COUNT
Container Restart Count Total number of times the container has been restarted.
CONTAINER DETAILS
Container Name Name of the container.
Container Image Name of the container image.
Container Pod Name Name of the container pod.
Container Restart Count Total number of times the container has been restarted.
Container Status Status of the container.
Container Start Time Start time of the container.

Services

ParameterDescriptionSupported in Prometheus
SERVICE DETAILS
Services Name Name of the service.
Services Namespace Name of the Namespace in which the service resides.
Services Application Name of the Service application.
Service Type Type of the service.
Cluster IP Cluster IP Address.
Service Ports Name of the port that connects with the service.
Service Created Time Creation time of the service.
DEPLOYMENT DETAILS
Deployment Name Name of the deployment.
Deployment Namespace Namespace where the deployment exists.
Deployment Replica Count Total number of replicas in a deployment.
Running Replica Total number of Running Pods in a deployment.
Deployment Available Replica Count Total number of available replicas in a deployment.
Deployment Availability Availability of the deployment.

Daemonset

ParameterDescriptionSupported in Prometheus
DAEMONSET DETAILS
Name Name of the Daemonset.
Namespace Name Name of the Namespace where the Daemonset is present.
Desired Replica Total number of desired Pods. Default value is 1.
Current Replica Total number of Current Pods.
Running Replica Total number of Running Pods.
Available Replica Total number of Available Pods.
Misscheduled Replica Total number of Misscheduled Pods.

Statefulset

ParameterDescriptionSupported in Prometheus
STATEFULSET DETAILS
Name Name of the Statefulset.
Namespace Name Name of the Namespace where the Statefulset is present.
Desired Replica Total number of desired Pods. Default value is 1.
Running Replica Total number of Running Pods.
Available Replica Total number of Available Pods.

Replica

ParameterDescriptionSupported in Prometheus
REPLICATION CONTROLLER DETAILS
Name Name of the Replication Controller.
Namespace Name Name of the Namespace where the Replication Controller is present.
Desired Replica Total number of desired Pods. Default value is 1.
Running Replica Total number of Running Pods.
Available Replica Total number of Available Pods.
REPLICA SET DETAILS
Name Name of the ReplicaSet.
Namespace Name Name of the Namespace where the ReplicaSet is present.
Desired Replica Total number of desired pods. Default value is 1.
Running Replica Total number of Running Pods.
Available Replica Total number of Available Pods.

Jobs

ParameterDescriptionSupported in Prometheus
CLUSTER JOBS SUMMARY
Total Jobs Count Total number of Jobs.
Running Jobs Count Total number of Running Jobs.
Completed Jobs Count Total number of Completed Jobs.
JOBS DETAILS
Name Name of the Job.
Namespace Name Name of the Namespace where the Jobs are present.
Parallelism Replica Total number of Pod replicas, a job should run in parallel.
Desired Replica Total number of desired Pods.
Successful Replica Total number of Pods in successful state.
Job Start Time The start time of the Job.
Job Completion(Min) Time taken for job completion (in minutes).

Persistent Volumes

ParameterDescriptionSupported in Prometheus
PERSISTENT VOLUMES DETAILS
PV Name Name of the Persistent Volume.
PV Status Status of the Persistent Volume.
PV Claim Name of the Persistent Volume Claim.
PV Access Mode The mode through which you can access the Persistent Volume.
PV Storage Class Name of the Persistent Volume storage class.
PV Capacity(GiB) The capacity of the Persistent Volume in GiB.
PV Created Time Creation time of the Persistent Volume.
PERSISTENT VOLUMES CLAIM DETAILS
PVC Name Name of the Persistent Volume Claim.
PVC Namespace Name of the Namespace in which the Claim exists.
PVC Status Status of the Persistent Volume Claim.
PVC Name Name of the Persistent Volume.
PVC Access Mode The mode through which you can access the Persistent Volume.
PVC Storage Class Name of the Persistent Volume storage class.
PVC Requests(GiB) Total number of Persistent Volume Claim requests in GiB.
PVC Created Time Creation time of Persistent Volume Claim.

Events

ParameterDescriptionSupported in Prometheus
CLUSTER EVENT SUMMARY
Total Event Count Total number of Events.
Failed Event Count Total number of Failed Events.
Normal Event Count Total number of Normal Events.
Warning Event Count Total number of Warning Events.
EVENT DETAILS
Event Name Name of the Event.
Event Created Time The time at which the Event was created.
Event Namespace Name of the Namespace where the Event is associated.
Event Type Type of the Event. Possible values: Warning/ Normal/ Failed
Event Kind Module of the Event. Possible values : Pod/ Node
Involved Object The module object involved.
Reason Reason of the Event.
Message Message of the Event.
Last Updated Time The latest updated time of the Event.

Service Map

  • Displays a graphical map view containing namespace and service details.
  • All the namespace with its status and pods count for each phase will be seen inside cluster circle.
  • Green color indicates that the namespace is UP and red color indicates it is DOWN.
  • The cluster services under a namespace can be seen branching as a tree.
  • Each service contains its host IP address and port details.