Domain Squatting, typosquatting and IDN homograph attacks are commonplace when it comes to phishing and other forms of social engineering. Attackers use domain squatting and typosquatting of domains to trick users into providing their credentials, distribute malware, harm an organization’s reputation, or otherwise maliciously impersonate a legitimate domain. We’ve discussed this topic before and have developed a unique use case with Swimlane to detect this malicious activity automatically.
Recently, we began to monitor domains related to coronavirus (COVID-19), knowing there would be an increase in traffic to research the outbreak, which could be exploited by bad actors. Even though not all of these domains are necessarily malicious or focused on spoofing (or typosquatting) techniques, we decided to use this use case to identify any registered domains related to “corona” and “covid.” Over the last 2 weeks, we have seen 5054 corona-related domains being registered.
Many of the 5,054 domains identified using Swimlane’s security orchestration, automation and response (SOAR) solution are focused on selling vaccines, test kits, supplies, resources, or otherwise attempting to take advantage of unsuspecting people for financial gain.
Given this information, I urge you to use extreme caution when interacting with any COVID-19-related domain and to be suspicious of anything you might receive via email, SMS, social media, etc.
Even though these newly registered “corona” domains are not considered to be in the typical domain squatting categories (more below), I wanted to provide you with a general overview of the different “squatting” attacks. These various attacks—which will be referred to collectively as “squatting” in this article—are a family of attacks wherein a user is fooled into interacting with a legitimate-looking website with a legitimate-looking domain/URL. Any legitimate domain can be “squatted” with its clone disguised as a legitimate domain in several ways, including:
- Domain squatting: An actor simply registers a target’s predicted domain name before the target organization has a chance and holds onto it for a monetary or nefarious purpose.
- Typosquatting: An attacker registers a domain similar to the target domain in appearance, keyboard typo likelihood, or tweaked TLD, and skims traffic that people accidentally direct that way.
- IDN homograph attacks: Attackers register a domain that is visually similar or identical to a registered target domain through the International Domain Name (IDN) protocol, which allows for the display of Chinese, Arabic, Korean, Amharic, etc. characters in domain names. Some characters, like the Russian “а,” appear identical to certain English letters, meaning “apple.com” (English “a”) and “аpple.com” (Russian “а”) can resolve to entirely different servers, with end users none the wiser.
The domains we will be looking at are specifically categorized as “scam” or “profiteering” domains. The fact that 5,054+ domains have been registered in recent weeks indicates that these domains are not official resources and should be considered, for the most part, untrustworthy.
The impact of the domains overall is unknown, but there have been multiple incidents of individuals receiving phishing attacks related to COVID-19 in the past week. These phishing incidents range from traditional URL to maldoc (malicious document) to SMS attacks. The National Cyber Security Centre stated they have “seen an increase in the registration of web pages relating to the Coronavirus suggesting that cyber criminals are likely to be taking advantage of the outbreak,” which aligns with our findings as well.
On February 16, 2020, the World Health Organization (WHO) announced they were seeing criminals disguising as the WHO to attempt to steal money and/or sensitive information.
Again, the impact of these domains is still unknown, but it is likely that many of these domains are planned or will be used for malicious intent. Part of our domain squatting use case application is the ability to send automated take-down notifications to registrars and hosting providers alike. As we continually monitor activity we will be reporting any malicious domains to their respective registrars.
To also help with the detection and investigation of potential COVID-19-related domains, we are providing a GitHub repository that contains registered domains from all (most) gTLDs (domain name extensions). Additionally, we are providing another dataset in the form of two JSON files. These files are specific to the following terms and will be updated as needed:
We are providing two JSON files for each of these terms (and their confusables) that contain the same data but are structured in different ways. For example, we are providing the following data structures:
- domains_by_ip.json: These json files are organized by key value of the domain name and the value is the domain’s registered IP addresses.
- ips_by_doman.json: These json files are organized by key value of IPs and the values are a list of domains associated with that IP address.
- master_blacklist.txt: This file contains a blacklist of all terms and their identified domains, except for domains ending in .gov. More than likely you should blacklist all of these domains but use at your own discretion.
You can find this dataset, which will be updated & archived daily on the following GitHub repository: https://github.com/swimlane/deepdive-domain-data.
The first step in monitoring these potential domains is pretty straightforward in the sense that we need to find domains with the word “corona” or “covid”. The challenge then becomes finding domains that are using IDN homograph domain names, or just simply replacing 0 for O plus all the other combinations. This is where the real challenge is, and our use case contains this logic out of the box.
Many of the domains we have identified are not active, but the domain name has been purchased and assigned via a registrar (ICANN). The problem is, over time, these domains will become older, but any one of them could be hosting malicious code today, tomorrow, or 5 months from now. So, being proactive and checking these domains on a regular basis is critical when trying to prevent malicious attacks.
Automation to the rescue!
Swimlane can ingest the list of newly registered domains on a daily basis and compare them against a list of domains you wish to monitor. Three comparisons are made between each newly registered domain and each of the domains you wish to monitor. The comparisons are:
- CONTAINED_IN: The newly registered domain CONTAINS the monitored domain (i.e. “coronaid.net” CONTAINS “corona” from “coronaid.net”).
- CONFUSABLE: The newly registered domain resembles the monitored domain via IDN Homograph Attack or via “confusable” characters, such as lowercase “L” for capital “i” or zeroes for Os.
- LEVENSHTEIN DISTANCE: The newly registered domain is very similar to the monitored domain, save that the text is transformed slightly. The Levenshtein Distance is how many changes must be made to one string of characters to transform it into a second string of characters. If the strings are similar enough, Swimlane will register a hit.
Once Swimlane has identified potential squatting domains, it begins attempting to take snapshots of those domains. Once a day Swimlane will retrieve the following information:
- SSL certificates
- server information
- WHOIS information
- Screenshot and contents of the website
A note about the screenshot and contents of the website.
If the page is sufficiently similar, no additional action is needed. But if a web hosting domain parking page suddenly turns into a full webpage, or the page changes substantially in any other way, your analysts will be alerted to investigate for similarity to the monitored domain.
Hoping this helps. Stay safe, friends!
*** This is a Security Bloggers Network syndicated blog from Swimlane (en-US) authored by Josh Rickard. Read the original post at: https://swimlane.com/blog/identify-malicious-domains-using-soar/