Three techniques to help IT teams handle security incidents better
April 29 · 07 min read
Cyberattacks hit businesses everyday. In recent years, these attacks have increased ten-fold as hackers benefit from vulnerable IT systems. Managing these security threats can make you feel like you're stuck in an endless loop—you solve one problem, another pops up.
At ManageEngine, we've faced many threats over the years and fine-tuned our approach to IT security. We focus on improving our methods and overall information security instead of constantly operating in firefighting mode. It's helped us learn from our mistakes and solidify our IT security framework.
As hackers continue to use different ways to infiltrate your IT system, here's how you can continually defend computer systems using three techniques.
Establish comprehensive controls
We take charge of IT security events from our security operations center (SOC). Some of these events evolve into security incidents. Here's how we approach them:
In the drafting phase, we establish controls to take charge of the incident and ensure we're in the best position to handle it. The SOC team collects information (like events and logs) from multiple sources. We have controls to help leverage this information, and they help us identify security-related issues from our database of information. Our approach to an incident depends on these issues.
Let's use an example of firewall access. The SOC team stores and monitors firewall logs regularly. When they're alerted to failed attempts to log in to the firewall console, they notify the incident management (IM) team. The IM team logs this incident and performs a preliminary assessment with the SOC team. Then, the IM team raises the issue with our network operations center (NOC) team. The NOC team resolves this issue.
Likewise, the SOC team scans external IP addresses for vulnerabilities based on fixed schedules. When the SOC team spots deviations, they work with other framework teams to fix them.
In summary, our SOC team implements the following security controls:
Controls from syslogs
Authentication and command history logs help us find who accessed our servers and executed problematic commands.
Controls from firewall logs
Unauthenticated access, rule changes, etc. are monitored and trigger alerts.
Controls for file integrity
The integrity of files across all production servers is monitored and triggers alerts.
Controls for applications
These are application-specific to help individual application teams. The SOC team performs a preliminary assessment and works with the application teams to resolve issues.
These controls improve our analysis phase significantly. This phase is more straightforward and reliable when our security controls are comprehensive.
Create a blueprint to analyze incidents
Next, let's look at an example of an attack on one of our services. A particular user (in this case, the attacker) created multiple instances from their account from multiple IP addresses. This event registered multiple unique IDs in our database for the attacker. As subsequent days went by, the instances doubled from 1,000 to 2,000 and continued to grow exponentially.
At one point, it went up to 14,000 instances per day. The problem is that each instance has a 10-digit key associated with it. At this rate, our 10-digit keys would soon be exhausted. Moreover, each instance invited 30-50 secondary users, and each user received emails. Overall, we had nearly 5,000-25,000 emails triggered every hour. It became a massive attack on our services, and we had to intercept.
Here's how we analyzed the issue using a stage-by-stage blueprint:
Stage 1 (based on evidence)
The numbers were the first piece of evidence. We also noticed that the attacker created each instance using valid parameters. Each secondary user involved also received abusive content from valid email addresses. We gathered this evidence from our abuse and spam team, who received complaints from users.
Using this evidence, we understood that the attacker wanted to keep inflating the numbers using valid parameters. To combat this, we established a limit on the number of instances per day. Since the attacker also used valid email addresses, we contacted the corresponding email service provider to spot these email accounts.
Stage 2 (based on motive)
As the attacker used our domains to send spam content, some of our genuine emails landed in spam. We understood that this could have been one of their motives. We examined further and noticed that the attacker found a work-around to create multiple email addresses using a pattern. We spotted that pattern and blocked the accounts containing that pattern.
Stage 3 (based on infrastructure)
The attacker used different IP addresses, so we blocked them and monitored the list closely. They also used valid email addresses for secondary users. Once we identified the pattern they used to achieve this, we established firewall controls and created programs to limit the number of instances. We introduced more checks at the creation stage and limited the number of emails.
This stage-wise analysis helped us contain the damage the attacker tried to inflict. It also helped us establish measures to respond to the attacker while not affecting other genuine users. We resolved the incident, contained the damage, and placed controls so that a similar spam attack wouldn't occur again. We also shared this knowledge with other application teams.
We have similar blueprints to handle DDoS, brute-force, and other attacks. If it suits your organization, you can create such blueprints as well.
Use a formula to contain the impact of an incident
There have been cases where we've had to jump into damage control immediately due to a security incident. Let's consider a scenario when one of our product teams executed a fragment of incorrect code. The concerned product team had just concluded a promotional event. However, a series of emails about the event were sent out automatically to a set of users.
Once our incident management learned about this incident, they worked with the product teams to contain the impact. We discovered that the flawed code triggered the emails. We immediately deployed a temporary fix to stop the emails. We also found out the root cause and executed permanent fixes.
Fortunately, we noticed it early, so the emails had only gone out to employees who signed up with test accounts. If we hadn't contained the situation, it could have led to a privacy breach and a loss of trust.
We have a formula or a system that helps us with such incidents. Here are some highlights of that system:
Create a temporary fix
We educate the engineers on developing a temporary quick fix. We assist them with checklists to ensure that the solution works accurately. Managers are alerted when their team members execute a temporary fix.
Revert the problematic build
A build is a fragment of code that developers push into production servers. If it has problems, reverting it should be common knowledge. Our service delivery teams work with developers to help them revert problematic builds and troubleshoot them.
Patch systems with hotfix builds
Our developers use hotfix updates to address the problem immediately if needed. We provide enough resources to help developers exercise caution.
Disable features
Sometimes, we may not be able to spot the exact erroneous code or come up with a temporary fix fast enough. In this case, we disable certain features to stop the impact and buy ourselves time to devise solutions. We also ensure users are aware of the situation.
Disconnect systems from the network
When we need to stop the impact immediately, we remove the problematic systems from our network. We also notify the affected users and work with them until we contain the incident and restore services.
Include remarks on the problematic endpoints
In our database of security aspects related to our systems, we insist developers include technical details about problematic endpoints when looking for issues to speed up troubleshooting.
Security events are bound to challenge you. However, they can also transform you for the better. Facing these challenges helped us craft these techniques. If you want to design and implement techniques like these, you can start by creating a well-trained information security team. To learn more about how we handle cybersecurity incidents and other incidents in general, check out our IM handbook.