The state of ITSM two years into the pandemic. Take the two-minute survey now >

Top 3 high-level ITOM challenges in 2024

August 09 · 14 min read

IT operational problems and solutions

Last March, Egypt woke up to the news that over $40 million had seemingly vanished from the accounts of one of its largest banks. It turns out customers discovered they could withdraw funds well beyond their account balances or transfer vast sums to other banks before transactions were abruptly halted—all due to a small operational glitch. This glitch not only presented the bank with a significant financial challenge but also a stark reminder of the critical role of IT operations management (ITOM) in modern banking.

Beyond the financial sector, this incident serves as a vivid illustration of the challenges faced by ITOM across industries—especially as markets expand and complexities multiply. ITOM professionals play significant roles, from ensuring the security of digital systems and managing the complexities of hybrid cloud setups to optimizing performance despite an ever-growing amount of data.

Let's delve deep into the major challenges in ITOM in 2024 and explore strategies businesses can use to tackle these challenges and deliver uninterrupted operations.

1. Legacy system constraints

As organizations modernize and upgrade their technology, older devices are often left unsupported. These legacy devices become outdated and are unable to integrate with modern software, consequently hampering workflows and impeding efforts to automate processes.

Problem: Integration complexity

Integrating legacy systems with newer technologies or third-party applications can be complex due to differences in technology standards, data formats, and communication protocols. This complexity necessitates custom development efforts or middleware solutions to facilitate data exchange and interoperability.

Imagine a scenario where an organization relies on training modules built on Adobe Flash. With Adobe Flash reaching its end of life in Dec. 2020, the latest versions of web browsers like Google Chrome, Mozilla Firefox, and Microsoft Edge do not support it. This means trying to get those old modules to play nice with the latest browsers is like fitting a square peg into a round hole.

Integrating Adobe Flash with newer browser versions can be complex and challenging due to differences in browser architectures, rendering engines, and the absence of Adobe Flash support. Additionally, it can trigger compatibility, security, and performance issues.

Solution: Proactive network monitoring and management

To address these challenges, organizations can implement network monitoring and management solutions to gain real-time visibility into the health and performance of their IT infrastructure. These tools assist in assessing the readiness of legacy systems for migration or modernization by monitoring performance metrics, identifying bottlenecks, and predicting potential failures.

In the above scenario, these tools would monitor the performance of Adobe Flash-based training modules and assess their compatibility with modern web browsers. They would track key metrics such as resource usage, load times, and user interactions to identify any issues with integration.

Additionally, organizations can consider modernization efforts such as re-platforming, re-hosting, or re-engineering legacy systems to align them with current IT standards and architectures. In addition, they can migrate legacy applications to cloud-native environments and adopt microservices-based architectures. For example, organizations may consider migrating Adobe Flash-based training modules to HTML5 or other supported technologies.

For e ffective network monitoring and management, it is important to prioritize performance issues based on context, and striking the right balance is essential. Overly extensive monitoring can overwhelm teams with irrelevant alerts, while insufficient monitoring risks overlooking critical issues. Network monitoring needs to be prioritized based on business needs, device types, and scalability requirements.

For example, imagine you are running a business website where both a rewards button and a QR code payment system encounter disruptions. The QR code payment system would naturally take precedence due to its business-critical nature.

Simultaneously, you should also take into account that continuously monitoring irrelevant metrics can be counterproductive. The cue lies in identifying what needs constant monitoring and at what frequency.

Setting up 24/7 monitoring mechanisms allows for proactive problem identification. For instance, during a crucial online ticket booking window, a glitch in the payment portal could lead to significant financial losses. Round-the-clock monitoring with proper alerts, such as slowdowns, can help identify and rectify the issue beforehand.

Problem: Security and compliance

Outdated technologies in legacy systems pose significant security risks, leaving them vulnerable to cyberattacks due to the absence of critical security patches. For instance, without timely updates, Adobe Flash-based training modules become prime targets for hackers seeking unauthorized access or compromising sensitive data. Moreover, reliance on outdated software undermines compliance efforts with regulations such as the GDPR and ISO, which mandate the use of secure and up-to-date software to protect user privacy. Noncompliance not only risks legal consequences but also damages the organization's reputation and stakeholder trust.

Solution: Patch management

Patch management solutions facilitate the seamless integration of legacy systems with newer technologies or third-party applications. For instance, in the scenario where an organization relies on training modules built on Adobe Flash, patch management can automate the process of identifying and deploying updates or patches to ensure compatibility with modern web browsers. This helps alleviate compatibility issues and ensures a smoother transition to updated systems.

Moreover, patch management solutions also address legacy system constraints by ensuring that outdated software components receive timely updates and security patches. By automating the patching process, these solutions help organizations maintain the security and functionality of legacy systems, reducing the risk of cyberattacks and compliance violations. Furthermore, patch management solutions enable organizations to create custom patching schedules and prioritize critical patches based on risk severity.

Patch management solutions with user access control features allow organizations to define who can deploy patches and to which systems. This ensures that only authorized personnel can make critical changes to the IT infrastructure, mitigating the risk of unauthorized modifications or accidental disruptions.

2. Unified observability

Observability in ITOM refers to the ability to understand, monitor, and analyze the internal state and behavior of a system based on its external outputs or signals. In simple terms, observability allows admins to observe and comprehend what is happening within a system, even if they cannot directly inspect its internal workings.

Let's take an IT organization that has a diverse portfolio of applications and services. As part of its ITOM strategy, it has invested in multiple monitoring and observability tools. Despite having a variety of tools in place, the organization faces an observability challenge due to fragmented data and silos. Each tool collects and analyzes data from each layer of the IT environment, leading to fragmented visibility and siloed information.

For instance, the network operations team relies on a network monitoring tool to track bandwidth usage, latency, and network device health. Similarly, the server management team utilizes a server monitoring tool to monitor CPU, memory, disk usage, and application performance. Meanwhile, the application development team depends on an application performance monitoring tool to monitor transaction response times, error rates, and code-level performance metrics.

With the above information, the IT operations team may struggle to correlate data and gain a comprehensive view across all layers of the infrastructure. This fragmented approach to monitoring and observability results in the following challenge.

Problem: Difficulty in root cause analysis

In today's dynamic IT landscapes, there's a deliberate push towards creating highly interconnected environments where various components and systems rely on each other to function properly. An issue in one part of the infrastructure can have cascading effects across multiple layers. Without comprehensive data and correlation capabilities, it is challenging to trace the complex interactions, like dependencies, between network components and identify the primary cause of an incident.

Say the server management team receives an alert indicating a sudden spike in CPU usage. It could be due to an application issue, a surge in user activity, or a network bottleneck. However, with fragmented and siloed data across the tools, pinpointing the root cause of the spike can be difficult.

Even if IT teams attempt to correlate data from different sources manually, the process can be time-consuming and error-prone. Navigating between multiple tools and analyzing disparate datasets requires significant effort and expertise, prolonging the time to identify the root cause and resolve the issue.

Solution: Unified monitoring platforms

Unified monitoring platforms serve as centralized solutions for overseeing the health, performance, and availability of an organization's entire IT infrastructure. By consolidating data from various sources into a single interface, they provide comprehensive visibility across applications, servers, networks, databases, and cloud resources. They act like a central hub, consolidating data from all your tools into one place.

This gives the IT team a single pane of glass, with features like:

Data consolidation

Bringing all data together in one place gives a complete picture of the entire IT setup. It enhances observability by providing a holistic view of the entire IT infrastructure. By consolidating fragmented data, teams can easily correlate information across different components, enabling quicker identification of root causes.

With everything in one place, the ITOM team can easily see how a spike in CPU usage on a server might correlate with a recent application deployment or increased user activity. This lets them quickly identify the root cause, not just the symptom.

Comprehensive visibility

Customizable dashboards display real-time metrics and performance trends across applications, servers, networks, and other components. This comprehensive visibility allows teams to spot anomalies or performance degradation quickly.

For example, if the server operation team sees a spike in CPU usage, they can immediately see if it's because of the surge in user activity on the application performance dashboard or a network bottleneck on the network monitoring dashboard.

Proactive alerting mechanisms

Strong alerting mechanisms ensure that IT teams are promptly notified of significant abnormalities or performance deviations. By receiving alerts in real time, teams can proactively address issues before they escalate.

For instance, instead of getting an alert every time CPU usage goes up slightly, the platform can be set to trigger an alert only when usage spikes beyond a defined threshold, allowing the team to focus on resolving the root cause of the significant spike.

Advanced analytics

Advanced analytics capabilities enable teams to perform in-depth performance analysis and capacity planning. By analyzing historical data and trends, teams can gain insights into the root causes of issues, helping to prevent similar incidents in the future and improving overall system stability.

For example, the team might discover that CPU spikes always occur after a specific application update. This knowledge empowers them to take proactive steps to prevent future issues, like delaying deployments during peak usage times.

3. Data migration and management

Data migration to the cloud represents a significant shift for organizations. A 2023 ManageEngine survey shows that 83% of large enterprises intend to focus their efforts on cloud migration.

Problem: Maintaining data integrity

The cornerstone of any data migration effort is ensuring data integrity. An organization's data center typically houses vast amounts of structured and unstructured data, including databases, files, and application logs. The migration process itself amplifies the risk of data corruption or loss, especially when dealing with large data volumes or complex transformations. Various factors, such as network latency, hardware failures, or software bugs, can contribute to data corruption incidents.

For example, an automotive manufacturer decides to transition their customer and vehicle data from on-premises servers to a cloud environment. This data encompasses diverse information, including customer profiles, vehicle configurations, maintenance records, and warranty details, each stored in different formats such as relational databases for customer profiles and maintenance records, JSON files for vehicle configurations, and XML files for warranty details. Additionally, within each data format, there are variations based on specific models and segments, further adding to the complexity of the migration process.

Without proper data integrity measures, even a single instance of data corruption could jeopardize the accuracy of customer and vehicle data, potentially leading to regulatory compliance issues or financial losses.

Solution: Thorough data profiling and cleansing

Before migration, conduct comprehensive data profiling and cleansing to identify and rectify inconsistencies, errors, or duplication within the data sets. Utilize automated tools and processes to streamline data quality assessment and cleansing efforts, ensuring data integrity from the outset.

Data profiling

Data profiling involves analyzing the content, structure, and quality of data sets to understand their characteristics thoroughly. Automated data profiling tools can scan through large volumes of data quickly, providing insights into data patterns, distributions, and anomalies. This step helps in identifying potential issues, such as missing values, outliers, duplicates, or inconsistencies, which could impact data integrity during migration.

The ITOM team of the manufacturer utilizes data profiling tools to conduct a thorough examination of customer transaction records. For instance, if a service date is absent, default values are inserted. And duplicate entries are either merged or removed, while date formats are standardized for uniformity across the dataset.

Data cleansing

Data cleansing aims to rectify errors, inconsistencies, and duplicates within the data sets to ensure their accuracy and reliability. This process involves methods like standardization and data reconciliation to standardize data formats, correct errors, remove duplicates, and reconcile inconsistencies.

For example, if inconsistencies are found in vehicle configuration data, automated scripts are used to standardize attribute names, correct data formatting, and eliminate duplicate data entries.

Data cleansing may also involve enriching or augmenting existing data with additional information from external sources to enhance its completeness and accuracy. For example, adding exact locations like latitude and longitude to addresses using trusted geo-coding services during the data cleaning stage. This makes tasks like running targeted marketing campaigns easier.

5 Steps to ensure data integrity in cloud migration

Data integrity
A step-by-step guide to ensure data integrity in cloud migration.

Implement data validation mechanisms

Deploy data validation mechanisms throughout the migration process to verify data accuracy and completeness. Some data validation mechanisms are:

Checksum validation

Checksum validation involves generating unique checksum values for each piece of data before and after migration. For instance, the ITOM team for the automotive manufacturer can calculate checksum values for customer transaction records using algorithms like SHA-256 or MD5. These checksum values serve as fingerprints for the data, enabling the team to detect any alterations or discrepancies during transit.

After migration, the team recalculates checksum values at the cloud destination and compares them with the original values. If there are discrepancies, this indicates potential data corruption or loss, prompting further investigation and corrective action. This validation mechanism ensures the integrity of customer transaction records throughout the migration process.

Encryption

Encrypting data is similar to placing it in a secure box before transferring it to the cloud. Encryption scrambles the data using cryptographic algorithms, making it unreadable without the decryption key. Encrypting data during transit and storage adds an extra layer of protection against unauthorized access and tampering.

For example, the ITOM team can employ AES-256 encryption (that is, Advanced Encryption Standard with a key length of 256 bits) to encrypt the customer transaction records before transmitting them to the cloud.

AES-256 encryption process

  • Encryption key generation: The strength of the encryption key is directly proportional to the security achieved. For instance, an organization can opt for a cryptographic algorithm to generate an encryption key. This key may contain a random sequence of 256 bits or a passphrase using a key derivation function.
  • Data scrambling: For each customer transaction record, the AES-256 encryption algorithm scrambles the data into a jumbled mess of characters, making it unreadable. This text is called cipher text, and without the key, it is almost impossible to decrypt.
  • Secure transmission: Once the customer transaction records are encrypted, the organization securely transmits them to the cloud server over a network. Since the data is encrypted, even if it's intercepted by unauthorized parties, they won't be able to make sense of it without the encryption key.
  • Decryption: When the encrypted data reaches the cloud server, the organization can decrypt it using the encryption key. The decryption process reverses the scrambling, turning the data back into its original form.

Wrapping it up

By implementing these solutions, your organization can gain better visibility into its IT infrastructure, ensuring the success of your digital transformation initiatives. Understanding the existing IT environment is crucial for your ITOM team, so prioritize defining the metrics that align with your business goals rather than measuring every data point on the dashboard. By focusing on relevant context, you can pinpoint how specific problems connect to critical business services, translating efforts into tangible improvements. Remember, a effectual ITOM strategy is the foundation for a resilient and reliable IT environment that can support business growth and innovation.

Sruthi K

Shivaram P R, Content writer

Sign up for our newsletter to get more quality content

Get fresh content in your inbox

By clicking 'keep me in the loop', you agree to processing of personal data according to the Privacy Policy.
Back to Top