A leading bank achieves 3-minute MTTA with ManageEngine OpManager Plus

Established over 25 years ago in one of India's bustling metro cities, this financial institution provides essential banking services and retail products such as loans, credit and debit cards, and transaction accounts to communities across the nation. With a network comprising over 5,000 branches across 3,500+ cities and a robust presence of more than 15,000 ATMs/cash recycler machines nationwide, the bank caters to customers from the rural, semi-urban, and urban areas of the nation. Also, the bank has extended its reach internationally with branch offices in key locations, such as Singapore and the UK.

Industry type

Banking

Branches

5,000+

Customers

100 million+

The digital banking journey

Recognizing the shifting demands of its clientele and the surging interest in digital banking solutions, the institution's leadership prioritized digital banking along with the traditional banking approach.

Embracing digital transformation demanded a strong IT infrastructure. The bank invested in building a network ecosystem of 45,000+ devices and 1,000+ applications—a combination of customer facing applications and internal-purpose apps. They also invested in building a server farm with IBM AIX operating systems to host business-critical apps, while ensuring security and compliance with the SEBI and RBI regulations.

As its digital footprint grew, the bank deployed dedicated monitoring tools to monitor their multi-faceted, distributed network. However, separate monitoring tools for each function led to data compartmentalization, inflated costs, and offered poor insights. This made IT infrastructure monitoring cumbersome.

It was clear that they needed a solution that gave centralized visibility into all data that would enable them to identify faults easily, track performance behavior, and receive timely and actionable insights to improve their network along the way.

The quest for finding an affordable, comprehensive monitoring solutions led them to discover ManageEngine OpManager Plus. With our solution, they were able to monitor their IT comprehensively and eliminate the necessity of disparate tools.

The SMS-alert feature played a vital role in significantly mitigating the fault resolution time. This feature enabled the bank to cut down the mean time to acknowledge (MTTA) from a significant four hours to under three minutes. As a result, they were able to avoid application and server outages, reducing the Mean time to resolve (MTTR) value drastically.

The challenges

With over 60 tools, each serving a particular purpose, the bank found itself drowning in a sea of challenges before implementing ManageEngine OpManager Plus.

1. Tool sprawl

The bank used a myriad of monitoring tools such as Cisco AppDynamics, CA Broadcom, Dynatrace, Oracle Enterprise Manager, and NetApp Active IQ. Each tool served its dedicated purpose well, but a lack of collaboration between the tools meant the IT teams had to view data in siloes, manually collate them, and make sense of it. With the data scattered across disparate tools, the IT admins had to shift multiple tabs before gaining useful insights.

This made even simple things like analyzing a fault complex. As a result, the overall productivity and efficiency of the monitoring process was heavily affected. Moreover, usage of several tools stacked up the costs and required dedicated personnel to manage the licenses.

2. The time challenge in tool adoption

The time consumed for tool adoption was a huge concern for the bank's IT leadership staff. Each tool presented a unique learning curve and consumed a significant amount of time for training the employees. Consolidating the tools and replacing the existing tools with an easy-to-use solution was the way forward for the bank to counter this challenge.

3. Prolonged decision making process

Lack of clarity was another major downside of using tools from different vendors. During an analysis or troubleshooting, the members across teams should gain visibility into the data. This ensures that one finding seamlessly leads to another, and ultimately, enables the IT teams to infer something useful. Separate tools, however, hampered the communication flow between teams and delayed the decision-making process, increasing the time required to narrow down the root cause.

Banking applications are critical, and any service outage, even for a short span of time, would prove too costly. So, they needed a solution that expedited the fault resolution process.

The bank leadership soon realized these limitations and solved them by transitioning to ManageEngine's one-stop, full-stack IT infrastructure monitoring solution OpManager Plus.

The visibility challenge: A critical migration operation

The bank was able to address their enterprise-wide concerns such as tool sprawl and tool adoption by investing in OpManager Plus. After the investment, the bank was able to counter another mounting challenge in the form of a lack of visibility during a critical migration operation.

When the bank decided to transition from the legacy Sun Solaris servers to Linux-based servers, the bank had to counter the potential challenge of an unplanned downtime. Ensuring business continuity was important as an outage of bank servers can affect service delivery to the clients. The bank planned to perform migration under a disaster recovery mechanism to ensure business continuity.

During the migration process, the bank relied on our solution for monitoring their IT infrastructure to ensure critical services were not affected. The dashboard in OpManager Plus helped them to have a real-time overview of the device performance during the migration. Given the distributed architecture of their network, the centralized NOC team required a unified dashboard for holistic view of important parameters.

Through the specialized NOC view which provides a CCTV like view of multiple individual dashboards, the central admin team was able to view the performance of the entire network.

SMS-based alerts: A distinguishing feature

The notification profile feature enabled IT admins to receive alerts whenever a device began to under perform. Coupled with this, the bank leveraged the threshold monitoring capability, enabling IT admins to set threshold values for important metrics so that whenever a device begins to under perform, an alert will be triggered, prompting quick action which could avoid a potential problem.

While the previously used monitoring tools also had the alert option, what distinguished ManageEngine OpManager Plus was the SMS alert option. Unlike email alerts, which can sometimes be overlooked when the IT admins are not at their screens, an SMS alert on their phones grabs immediate attention. This simple, yet effective mechanism mitigated downtime risks significantly and helped maintain an operational network during the entire migration phase.

The must-haves for the solution

The bank leadership formulated a checklist of the essential aspects they were looking for in the solution.

  • To implement a monitoring solution that would monitor and provide end-to-end visibility into every aspect of the network—app performance, transactions, and network health.
  • To invest in an affordable solution that has rich product capabilities and an easy-to-use UI.
  • To combine as many disparate monitoring systems into one platform
  • To use one system that monitored the geographically distributed network

The solution

With an aim to select a solution that met their objectives, the bank explored various vendors in the market. After thorough consideration, ultimately ManageEngine OpManager Plus emerged as the clear choice to meet their objectives. A key factor that worked in our favor was the great value proposition our solution offered at an affordable price. Moreover, the bank leadership was immensely impressed by the excellent and responsive service and support we offer. Once we earned their trust, the transition to the implementation phase was seamless.

Evaluation and implementation

During the evaluation and implementation phase, our team engaged with the bank's key stake holders. Initial talks involved detailed technical discussions about their high-level requirements and security considerations. Subsequent discussions included conversations with individual teams where specific queries were addressed.

The bank's top brass wanted OpManager Plus to monitor key application performance metrics and provide in-depth visibility into granular details for the IT technicians, as well as provide an overall view of the distributed network for the IT leadership staff. Our skilled technical engineers understood their needs and implemented the solution end-to-end so that the solution was ready for monitoring straight out-of-the-box.

Simplified application performance monitoring

The bank primarily wanted to monitor their applications, which can be categorized into two buckets: customer-facing apps which provided services such as loan payments and net banking, and internal-purpose applications such as analytics, fraud detection, risk management, and transaction monitoring.

There were separate IT teams to monitor each application and its associated servers and databases; it was difficult to glean monitoring information when all the data was available without classification. Now, with OpManager Plus' monitor groups feature, the application metrics are grouped together based on the business use case they solve. For example, the net banking application is one monitor group, under which all associated performance metrics and server metrics will be available. This grouping provided better visibility for the IT teams into the application they monitored.

Accelerated troubleshooting

Reducing MTTR was another huge challenge that the IT team members faced. When a performance degradation or an outage was noticed, it was difficult and time-consuming to look into multiple systems to identify the root cause. Streamlining the data and providing visibility on an unified console was the key to solve this issue and OpManager Plus did that. The APM insights agent was deployed on each application server, it gathered monitoring data and populated the data with end-to-end visibility.

The collected data provides in-depth analysis into important metrics such as application response time, throughput, and availability. With about 10,000 transactions per second, OpManager Plus enabled the bank's IT team to keep track of all the transactions associated to applications.

From the exhaustive list of transactions, the IT admins were able to identify slow transactions and drill down into the granular details, such as slow SQL queries and narrow down the root cause to fine-tune performance.

Advanced monitoring of IBM AIX servers

By default, OpManager Plus categorizes the server performance metrics into six buckets: Overview, CPU, Disk, Network, Errors (Errpt) and Configuration. The IT team members were able to fetch more detailed insights under each category. For example, under CPU, associated CPU metrics such as Break up of CPU Utilization (%) vs Time, CPU utilization by CPU Cores, I/O Wait Time (%) vs Time is found.

With all the metrics integrated properly in a unified console, the bank's server monitoring team felt at ease. They were able to correlate metrics, understand actual performance behavior, and improve performance.

Other OpManager Plus features at work!

  • APDEX Score: Customer experience is a crucial determinant for better customer retention and overall business growth. However, the bank did not have a proper mechanism to understand the end-user-experience. This is where the APDEX score came into play. The APDEX score offered in OpManager Plus enabled the IT teams to understand the performance of applications from the perspective of an end user.

    Ranging from 0 to 1, the APDEX score helped the IT teams to clearly understand whether the end users are satisfied, tolerating or frustrated with the services and based on that optimized the performance. (A score of 1 indicates all the users are satisfied, whereas 0 indicates none are satisfied).

  • Capacity planning reports: The bank used capacity planning reports bundled in our solution to optimize resource allocation and ensure that all services were sufficiently right sized at all times. The IT team gained access to the reports highlighting overutilized or underutilized servers based on critical metrics including CPU utilization, disk utilization, and memory utilization. This enabled them to avoid resource wastage and also adopt a data-driven decision-making process.

    Moreover, the CTO and managers required monthly reports to understand how their network performs over a certain period for making informed decisions. With the schedule reports option, the stakeholders were able to receive reports at regular intervals, which they used to understand performance trends, identify loopholes, and made strategic improvements.

  • Anomaly detection: The server monitoring teams leveraged the anomaly detection feature to address potential issues or avoid an outage proactively.The server admins configured the baseline values for metrics. When the actual values begin to drift away from the baseline limits, an alert was generated, prompting the admins to deploy corrective actions. This proved to be useful as it prevented performance degradation issues before they impacted the end-user.

The Results

ManageEngine OpManager Plus complemented the bank's digital banking approach and enabled it to improvise service delivery significantly. Here are some of the benefits the bank reaps by investing in our solution.

Mitigated licensing complexities

Before transitioning to ManageEngine, the institution relied on over 65 tools to manage their IT. This added layers of complexity to license management and increased overhead costs significantly. With OpManager Plus, they were able to do more with less and were able to reduce the total tally of tools to about 13.

The ease-of-use nature of our solution meant the team did not need to spend a substantial amount of time training their members on how to use it. Once they began using the solution, it became second nature.

Offered comprehensive and detailed insights

OpManager Plus was a solution that catered to every need of the entire IT team-from the top-level Chief Technology Officer (CTO) to the frontline IT technician. It offered a high-level at-a-glance performance view of their network for C-suite executives while simultaneously offering in-depth analysis into specifics such as the performance of an app transaction, required for an IT admin.

Reduced operational siloes

By moving from the multiple single-purpose tools to a comprehensive ITOM suite of solutions offered by OpManager Plus, the bank was able to reduce a lot of operational siloes that affected their network.

With a unified platform to manage their complete IT network, the bank bridged the gap between disparate teams, improving collaboration, and streamlining corrective measures and workflows necessary for troubleshooting.

About OpManager Plus

ManageEngine OpManager Plus is a full-stack observability solution that provides organizations with enhanced visibility across applications, infrastructure and networks, along with visibility on security for hybrid environments consisting of on-premises and cloud instances.

It enables organizations to provide a superior end-user experience and drive better business outcomes by helping them proactively manage their IT environments, automate fault remediation, and break down operational silos. Trusted by IT admins worldwide to streamline their IT operations, OpManager Plus is the go-to observability solution for digital-first enterprises of all sizes.

For more information, visit manageengine.com/it-operations-management.

Get started with a free trial today.

Try a 30-day free trial
 
 Pricing  Get Quote