In this section, we will focus on the differences between incident management and problem management and how problem management functions together with other supporting ITSM practices.
Incident management vs. problem management
The terms incident and problem might appear to be synonymous, but both are distinct in the role they play in achieving ideal service quality. It's important to know where incident management and problem management interact with each other and how they differ, especially where an incident ends and a problem begins.
Incident management
An incident is an unplanned interruption of an entire service or just a component of one. Let's look at a scenario to understand it better. There's an important meeting in 15 minutes, and a report has to be printed out. Unfortunately, the department printer isn't working. A ticket is quickly raised to patch a workaround and get the reports printed out. This is an incident.
The incident management process is about handling incidents and restoring service as soon as possible. In our scenario, the service desk staff quickly connects the laptop to the adjoining department's printer to help the user get the reports ready in time for the meeting. Therefore, incident management's goal is to ensure that an interruption or incident gets resolved as quickly as possible with a workaround or a resolution.
Problem management
Problem management isn't about restoring services or troubleshooting, but determining and removing the cause. A problem is logged in a service desk when there are recurring incidents that have common issues, or if a major incident occurs that impacts many users. In our scenario, the sole printer in the department went under and all the users in that department were affected, which was logged as a problem by the service desk staff to find the cause and solution. An incident can be closed when a workaround is provided, but a problem is raised to fix the printer permanently so this issue does not occur again.
Referring back to our scenario, the printer issue will undergo RCA to find a permanent fix, and be tracked as a problem ticket while the business continues with the workaround in place. If the problem management team is unable to find a solution, the workaround is documented and the issue is added to the KEDB. In this way, problem management is not only about eliminating incidents by finding the underlying root cause, but also determining the most feasible solution that can be implemented to minimize disruptions. Sometimes, despite knowing the root cause, the most feasible solution is to implement a workaround and document it as a known error.
Despite being different, incident management and problem management complement each other and are closely aligned. Incident management ensures continuity in business operations, while problem management takes care of the underlying issues and problems.
The relationship between the ITSM processes and problem management
An integrated system of service delivery best practices improves business services and IT service capabilities. An effective problem management process has interactions with several other ITSM processes.
The processes that interact with problem management are briefly discussed below:
Incident management
Incident management is the methodical process of logging, categorizing, prioritizing, assigning, and resolving issues in an organization. The goal of incident management is to restart the interrupted services as soon as possible; often, this means a workaround is arranged in place of a permanent solution. Every activity in this practice is documented on a granular scale and pushed to the problem management team, who initiates RCA to develop a permanent solution. You can see that despite problem management being its own process, it's dependent on a robust incident management process.
Change management
The objective of change management is to increase the success rate of any changes implemented in the organization. A change refers to any modification made to an organization's IT infrastructure, processes, services, products, applications, vendors, or anything else that implicitly or explicitly affects the organization's service delivery.
According to the ITSM framework, problem management's responsibility concludes with finding the root cause that leads to a solution for a problem, and actually implementing the solution is carried out with change control. Since implementing a change involves managing risk in multiple business units, it requires a process of its own for efficient handling. However, the problem management team should participate in the post-implementation review of a change to ensure consistency between the problem solution and the implemented change associated with it.
IT asset management
IT asset management is the practice of governing the life cycle of an asset in an organization. Its activities include deriving maximum value from assets, controlling asset costs, and managing the risks of assets. These risks can be in terms of compliance, vendor selection, usage policies, and disposal practices.
The practices of asset management and problem management may cross paths when problems emerge from hardware and software assets used by the organization. When the root cause of a problem appears to be from a product or service, IT asset management's detailed record of the inventory expedites the problem-solving process. Apart from this, IT asset management assists problem management in studying the impact of an incident, examining the effects of implementing a solution, and providing information whenever necessary via RCA.
Let's put things into perspective with a scenario.
Zylker is a fast-growing stock photography provider in India. A manager in Mumbai has been having trouble generating monthly reports from the SQL server in New Delhi. An incident has been raised, and the service desk staff has notified the technicians in New Delhi. As a temporary workaround, the reports are generated locally and sent to ensure business continuity.
Zylker's proactive problem management team decides to run trend analysis on incidents occurring over the past six months. They find multiple incidents pertaining to the server in New Delhi. This leads to them initiating a problem ticket and proceeding with the investigative analysis using the accumulated data from all the documented incidents.
The technician in New Delhi sees that the SQL server is using multiple types of protocols, including iSCSI and Fibre Channel, for linking data storage facilities. Since both protocols function on an Ethernet network, there is doubt about whether the local block switch was configured for large packet data transfer. The technician receives data from the IT asset management team and verifies that the switch was not the culprit. This is supported by the evidence that generating reports locally was not a problem.
The wide area network (WAN) is next in line for analysis, as a manager from Mumbai is having trouble generating the monthly report. The technician, due to their experience in network issues, has doubts about traffic flow at the end of every month, so they install software on the company's routers and switches to analyze traffic passing through them and statistically aggregate the information.
The software generates graphs and charts that indicate the top protocols that were used, along with the bandwidth each protocol consumed over a month. This unveils significant bandwidth usage at the end of the month around the same time the monthly report is generated. After careful examination, it's revealed that full image backups were scheduled around the same time as the monthly report, and this caused a significant bottleneck in the WAN.
Now that the problem's root cause is identified, the technician raises a change ticket to reschedule the image backup to the early hours of the morning before business begins, leveling out the traffic in the network.
Here's an overview of the steps performed in this scenario:
Activity | Practice involved |
---|---|
The manager in Mumbai had trouble generating monthly reports from the SQL server in New Delhi. An incident was raised and the reports were generated locally and sent to the manager. The ticket was closed. | Incident management |
The proactive problem management team ran trend analysis on incidents over the past six months. They found multiple incidents involving the server in New Delhi. | Problem management, incident management |
The technician in New Delhi observed the SQL server's network and protocol, and was unsure whether or not the local block switch was configured for large packet data transfer. | Problem management, IT asset management |
The technician received data from the IT asset management team and verified that the switch was not the culprit. | Problem management, IT asset management |
The technician had suspicions about the traffic flow at the end of every month, and installed software on the routers and switches that analyzed traffic and statistically aggregated the information. | Problem management, IT asset management |
After careful examination, it was revealed that full image backups were scheduled for around the same time as the report generation, and this caused a significant bottleneck in the WAN. | Problem management |
The technician raised a change ticket to reschedule the image backup to the early hours of the morning before business begins. | Problem management, change management |
All ITSM practices have an intricate relationship with other IT practices. As your problem management matures in service delivery, make sure to improve the way it interacts with other practices for healthy, business-oriented service delivery.
Up next:
With a clear distinction between incident management and problem management and clarity over the importance of problem management with respect to its supporting ITSM practices, it is time to move on to the different approaches of practicing problem management.
Assess your incident response readiness to kick-start your problem management journey
The zeroth step in the journey towards proactive problem management is establishing a robust incident management process in your IT environment. Discover how Zoho, our parent company, handles the spectrum of incidents thrown at it year over year and assess your incident management readiness at an enterprise scale.
Download a free copy of our incident management handbook and a best practice checklist to review your problem management solution.
-
Problem management feature checklist
-
IT incident management handbook
Frequently asked questions:
1. How do problem management and incident management differ?
2. How are incident, problem, and change management interlinked?
These three processes in ITSM form a loop for continuous improvement. Incident management tackles individual disruptions (like a website outage). Problem management then investigates the root cause (faulty code in the website). This knowledge from problem management can trigger a change request, which could involve patching the faulty code (implementing a fix). Change management ensures this fix is implemented smoothly and minimizes the risk of further disruptions. Effectively working together, they ensure IT services are reliable and improve over time.
3. What is the difference between change management and problem management?
Problem management and change management are two important aspects of ITSM. Problem management is like a detective that investigates the root cause of recurring incidents. This means that if there is an ongoing issue with slow network performance, problem management will identify the faulty network equipment. Change management then takes over and implements the solution based on problem management's findings. This may involve seeking approval to replace the faulty equipment, scheduling the change within a maintenance window, and ensuring a smooth transition to the new equipment. By working together, change management and problem management ensure that identified problems are addressed effectively, and that the risk of future disruptions is minimized.