Malware hunting with ELK

Give me a fulcrum and I’ll lift the world. Archimedes

ELK or Elastic stack is an open source tool (well, more like a set of open source tools) that enable the collection and analysis of large amounts of data (there are applications that handle data volumes on the order of few PBs) One of the most common uses of ELK is probably for the collection and analysis of logs from various sources.

A slightly different use case for the ELK stack would be to collect logs and alerts from security and IT related devices and applications for the purpose of detecting and investigating cyber related attacks.

In this post I will not focus on the recommended installation procedure of these tools but rather on the ways to detect common cyber attacks by analyzing the logs of various systems by using the ELK stack. Although this article will focus on a number of specific attacks, this is not necessarily my preferred approach. I think the attempt to locate, for example, the WannaCry attack using ELK does not necessarily guarantee success in detecting the Petya attack (for example). Therefore, the approach I recommend is to try to identify, by using as many indicators as possible, the normal and reasonable behavior of the organization, users and networks and then to define the attack as a significant deviation from those measures, since, in my opinion, it is the only way to detect the next attack – even before it has a cool name like WannaCry.

I think that it would be important to emphasize that, in my opinion, the techniques presented in this post, are indeed effective in detecting the attacks presented but may also, generate some false-positives that may make the investigation harder.Towards the end of the article I will propose a solution that I think that can be successful in dealing with this problem. Also, not all of the techniques listed here were tested by me for a long period of time so I can’t really determine how effective they will be.

Here are some common types of cyber attacks and their unique characteristics:

Ransomware
1. According to a research conducted(1), some Ransomware programs such as CryptoLocker, Critroni, CTB Locker, and others perform as part of their operation (some of them when they started before encryption and some after encryption) connections to C&C servers either to update them that another client is infected or to request an encryption key for the infected system. In the first, naive versions of the ransomware such communications were running over HTTP (some with HTTP body encryption by using algorithms like RC4 and AES and others in clear text) and in more recent versions of ransomware such communications are running via HTTPS or TOR to make detection harder.
2. According to a research conducted(2) by the Elastic company, some of the WannaCry derivatives will make many connections via port 445 (SMB) to other computers on the organizational network (servers and workstations) as well as public IPs. Also, according to the same research, WannaCry runs processes that aren’t usually found on normal organization’s endpoints such as “attrib + h” (using tasksche.exe) or “cmd.exe / c vssadmin delete shadows”. In addition, WannaCry, according to the same study, performs domain calls (ie attempts to retrieve an IP address for them and then executes an HTTP request to them) with such names “www.iuqerfsodp9ifjaposdfjhgosurijfaewrwergwea.com” or “www.ifferfsodp9ifjaposdfjhgosurijfaewrwergwea.com”. In the same context it’s worth mentioning that according to Ian Thompson, The Register (7) also the malwares Petya, Not-Petya and many others are using SMB requests on port 445 to spread throughout the organization’s network.
3. According to a research conducted by MalwareBytes(4) about the ransomeware Locky, it seems that the ransomware usually arrives as a MS-Office file (i.e docx file) attached to an email as part of a phishing campaign. When the file loads it downloads an .exe file and runs as svchost.exe from the %TEMP% folder. Also, the ransomware communicates with a hardcoded list of public IPs. In some of the versions the list contains 3 IP addresses. In addition, it uses DGA (Domain Generation Algorithm) to dynamically generate domain names that the malware should connect to.
Phishing
1. According to a research conducted by TrendMicro, it turns out that apps for the Android platform, are identified by a name that looks like a domain name, for example, the name that identifies the Facebook app is “com.facebook.katana”. This name is visible as part of the URL. For example here’s the complete url for the same Facebook app:
  https://play.google.com/store/apps/details?id=com.facebook.katanaThe same research, also concludes that this name doesn’t have to be universally unique but only to be unique on the device on which it is installed. For example: Adobe company has ceased the development of the Flash app for Android (which was available under the name “com.adobe.flashplayer”) back in 2012 and since then their app doesn’t exist on the Google play store and therefore various attackers created malicious apps under similar names like “con.adobe.flash.player”, “com.adobe.flash.player”, and similar others.
Common attackers tools
1. WMImplant – According to the article(5) by FireEye and according to the article(6) written by Ionut Arghire and published by SecurityWeek, this t).ool, written in Powershell allows rem0te access to the machine it’s running on (RAT) by using the WMI service which is available in every one of the new versions of Windows. This tool allows the attacker to take full control of the victim’s computer: To launch processes, shutdown the victim’s computer, etc. For that, it is very popular among hackers that are looking for a way to gain control over Windows workstations or to write malware that use this tool to propagate latterly throughout the organization. While using this tool, commands that are run by it are going to be obfuscated in the system’s log (Like this: ‘Inv`oke-Ex`pression’) which might make it difficult to locate in the system log.
2. Remote Code Exec via Services.msc – According to an article on the CyberWarDog website (8) one of the common ways to start processes on remote computers is by adding, starting, stopping and removing Windows Services by using the ability to do so remotely. Tools such as PsExec use this behavior for their normal operation. According to articles (8) and (9) it seems that tools that are working like this would also connect to the ADMIN$ hidden administrative share and update the services list on the victim’s computer’s registry.
3. Mimikatz – This tool(9) allows the attacker to conduct several security tests for Windows such as extraction of passwords, PIN codes and Kerberos tickets from the system’s memory, execution of Pass-The-Hash attacks and others. This tool can be run both from the disk and directly from the system’s memory.

In my opinion, there’s a nice way (although not perfect…) to detect such threats in near-realtime and of course also retroactively, by collecting enough accurate data about the way the servers, workstations, peripheral equipments and the organization in general works. In my opinion, the larger the organization, the better are the chances to achieve lower rate of false positives and higher detection accuracy.

In the next lines I will try to demonstrate my general approach for detecting such events. Of course, it is possible to implement such strategy in many ways which might be better than the way I demonstrate here but I think that the basic principles remain the same.

Unlike traditional systems that rely mostly on signatures detection of known threats (for example a hash of files or IP addresses that are linked to known malwares) and also unlike systems that rely on the signatures of behaviors of known threats (it downloads a file + tries to access IP X + creates a file Y + does Z = malware type A), unlike these, my idea is to detect the normal behavior by as many metrics as possible and to define the threat as a deviation from it by some metrics. It is important to clarify at this point that such a system will not provide named alerts, for example, it won’t tell you: “WannaCry detected” but rather something like this: “An abnormal number of connections between workstations has been detected and also, the number of SMB connections to the outside is abnormally high and there has been a suspicion for usage of DGA based domain names and all that happened within 30 minutes.” The responsibility of the user, the analyst would be to categorize this threat as WannaCry. However, towards the end of this article I will suggest a way that this process can be also automated (to some extent). In the next few lines I’ll try to demonstrate how to detect the attacks and tools mentioned above by using anomaly detection tools and relevant big-data tools.

Data collection:

Sysmon – The Sysmon tool is one of the tools from the Sysinternals website that was created by the developers Mark Russinovich and Bryce Cogswell(12). This tool can log events to the Windows event log based on some criteria. We’ll use this tool to gather information about the behavior of Windows systems.

Winlogbeat – Tool(13) developed by the Elastic company that can pull events from the Windows event log and send them to Logstash or Elasticsearch for search and analysis. This tool can compress the logs before they’re sent to Logstash to minimize bandwidth usage.

NXLog – Tool(14) for collecting events from log files or from the Windows event log to various destinations (and also enrich them or convert them from one format to another on-the-fly).

Logstash – Tool(15) developed by the Elastic company that can pull events from various sources, transform and enrich them in various ways and then send them to various destinations, one of those destinations is, as might expected, Elasticsearch.

By using Sysmon we can get Windows to log events that are of interest to us to its event log. By using Winlogbeat we can send these logged events to Elasticsearch. Another option would be to use NXLog that would be installed on the Windows servers (instead of Winlogbeat) and will transfer the events via TCP or UDP in any format to Logstash which will send them to Elasticsearch.

It’s important to understand that the more detailed the information to Elasticsearch would be the better are the chances to find more threats and cyber attacks. It means that if we can collect, in addition to the logs from the workstations and servers, also the logs from firewalls and other security related devices and applications – the better.

Analysis:

Ransomware:
1. According to the data I mentioned earlier in this post, many Ransomware attacks use Tor as the platform for communicating with its C&C server. Tor doesn’t only encrypts the traffic between nodes on the Tor network but also masks its traffic as normal HTTPS traffic to make it harder to detect it. However, as mentioned in the article (16) listed below, there is way, based on a statistical algorithm that can detect this type of communications. Also, as mentioned in an article published by Fortinet (17), the firewall product by this company (FortiGate) can detect and block Tor traffic. Even if in your organization such systems are being used to block such connection attempts you can still use that algorithm to detect workstations that are trying to use the Tor network and therefore might have been compromised. In addition to that, there are sites like the one listed here (18) that contain a list of IPs of known Tor relays that you can check against the data in your Elasticsearch cluster. This is indeed a form of signature based detection but still it is a generic signature for detecting Tor traffic and not a signature of a specific malware.
2. As mentioned above, many types of malware such as WannaCry and NotPetya will try to spread throughout the organization by using the EternalBlue vulnerability based on the SMB v1 protocol and therefore the amount of SMB traffic between clients and servers will dramatically increase, also, and probably more importantly, the amount of outbound SMB (port 445) traffic will dramatically increase. Therefore, measuring this metric regularly and detecting anomalies in its behavior may help in detecting such threats.
3. As mentioned above, malwares in this category (but not only them) will start processes in the infected computers. Those processes are run from unusual places such as from “%TEMP%” or are unlikely to be run by the user “cmd.exe /c vssadmin delete shadows” and others so collecting the information about processes that are running in every workstation or server is critical for detecting such threats by detecting command-lines or execution files that are extremely rare throughout the organization.
4. As mentioned above, attacks of this type will often use dynamically generated domain names or very long and randomly generated domain names either to contact their command and control servers or to test the existence of a kill-switch for example “www.iuqerfsodp9ifjaposdfjhgosurijfaewrwergwea.com”. Such addresses will mostly be longer than “normal” domain names and will include many letter combinations that are quite rare in “normal” addresses, therefore, in my opinion, it is possible to detect access to such addresses by measuring the length or letter frequencies in the requested domains.
Phishing:
1. As mentioned above, due to the fact that apps are identified by names, attackers often publish apps with a similar name (for example “com.adobe.flash.player” instead of “com.adobe.flashplayer”) therefore, by using the fuzzy query feature of Elasticsearch can certainly help in detecting names of apps that are very similar to other apps. If we can have information about downloaded apps, or by extension domain names (from DNS logs for example), it would be possible to generate fuzzy queries based on the top X most frequently accessed domain names (or app names) and look for similar values on the list of the top X most rarely accessed domain names (or app names). For example, if in your organization, many users are accessing the website www.artifex.co.il such mechanism will look for similar domain names on the list of the most rarely accessed domain names and will find (for example) the access to the domain www.artifix.co.il. In my opinion, implementing such mechanisms will help in detecting many phishing attacks.
Frequently used PT/Hacking tools:
1. WMImplant
  1. As mentioned above, tools such as this will start processes on the infected computer that are not normally started by the user like ‘Inv`oke-Ex`pression’ (because of the obfuscation). Therefore, collecting information about processes that are running on every server/workstation is critical for detecting such attacks by looking for extremely rare files or command lines.
2. Remote Code Exec via services.msc
  1. As mentioned above, tools that will try to add services to the system to gain persistence or to allow remote code execution will have to connect to the hidden share ADMIN$ and to update the list of services in the system’s registry. If we have logs that can describe the SMB activity in the organization, we can look for connections to that share and if we have logs from the Windows System event log we can look for events with event ID 4697 which has the title “A service was installed on the system” and alert when it happens.
3. Mimikatz
  1. If we started Sysmon with the right configuration we can get an event whenever a process accesses another process’s memory. These events may indicate that a process is trying to get another process’s permissions – exactly Mimikatz’s MO.

At this point you might wonder how many false-positives that kind of a strategy might generate. Well, to be honest – you might be right and the amount of false positives would be too high and will make it hard to detect threats effectively.

After reading (although to be honest – I haven’t tried it yet) the book “On Inteligence” by Jeff Hawkins and listened to many of his lectures and presentations (19, 20) I came to the conclusion that using the algorithm that was developed by his company – Numenta, would probably be capable of filtering out the “noise” and separate the wheat from the chaff. If you would like more info on this subject please leave me a comment below and I’ll go into more detail on the subject in one of the next articles.

Bibliography:

Leave a Reply Cancel reply