
A View into Top Level Domain (TLD) Abuse
Data science and security research teams in the carrier organization at Akamai process massive volumes of DNS queries every day to detect and track malicious activity. The data is live-streamed from DNS resolvers deployed in diverse service provider networks in every region of the world. Providers who supply the data anonymize it using Cryptography-based Prefix-preserving Anonymization (Crypto-PAn), a well known tool.
The research team in the carrier division has implemented algorithms that generate threat intelligence in near real-time. An anomaly detection algorithm flags incoming DNS queries that don’t conform to normal patterns for further analysis. Threat specific algorithms targeting DGAs include LSTM (Long Short-Term Memory) and an in-house developed word embedding based algorithm.
My colleague Paul O’Leary (also a Principal Data Scientist at Akamai) and I recently ran a small experiment to assess the prevalence of malicious domain names in each Top Level Domain (TLD) delegated in the internet root zone. TLDs are the backbone of the DNS; they’re the names installed in the root zone where navigation starts. At this point there are more than 1500 TLDs representing a wide variety of interests. Understanding underlying foundational trends like TLD abuse contributes to the process of evaluating raw DNS data to identify new threats.
To identify abuse in TLDs we matched a sample of two months of recursive DNS queries gathered and anonymized from ISP resolvers worldwide (~6 trillion queries) against threat intelligence in Akamai Security and Personalization Service (SPS) feeds. To understand the relative scale of abuse within each TLD, we compared the number of domain names identified as malicious or suspicious to the total number of unique domain names observed in the live query traffic.
There are challenges evaluating domain names. Different TLDs have different methods of managing their name space. Some TLDs retain two, or sometimes more, labels for their own use. For instance the “.uk” TLD designates a number of second level labels, like “co.uk”, “org.uk, and “net.uk”. The “.jp” TLD and several others use as many as 3 labels. Most of this ambiguity is tabulated in the Public Suffix list which is maintained as a community resource. The differences are important for security research because algorithms need to know where to look in the name to find the portions that belong to registrants, versus the TLD.
For each data set there are four or five columns. The first column shows the TLD. The second and third columns show the percentage of domains that matched against inclusive and conservative entries in our threat intelligence feed (described in Table 1 below).
The fourth column shows the number of names registered in each TLD. It’s provided to offer a sense of the scale of the TLD. Registration data sources are indicated below each table. For some TLDs the operator is contractually obligated to provide registration data to ICANN, for others data is available from secondary sources. It is difficult to assess the accuracy of secondary sources, but even a rough sense of scale is useful. Some tables have a fifth column that shows which country a country code TLD represents. The Inclusive and Conservative data was gathered in April and May 2019. Registrations data was gathered in early May 2019.
More than 40 networks covering more than 300 million subscribers use the Akamai SPS threat intelligence feed containing the “Conservative” entries. Incoming DNS queries from production resolvers serving customer traffic that match entries on this feed are blocked. Providers are sensitive to false positives since they often generate support calls, but only a few calls from provider abuse teams are generated per year from this feed, which suggests a very low false positive rate.
Abuse data from the original generic TLDs is shown in Table 2 below. Overall the original TLDs look pretty good, especially “.edu”, “.gov”, and “.mil,” which likely have such low rates because registrations are tightly controlled.
Registration sources
*ICANN Monthly Registry Reports transactions
https://www.icann.org/resources/pages/registry-reports/#x
** Domaintools http://research.domaintools.com/statistics/tld-counts/
Here’s data for the follow-ons to the original TLDs. Biz shows relatively high levels of malicious activity and even higher levels of spam or other potentially unwanted activity.
Registrations source: ICANN Monthly Registry Reports transactions https://www.icann.org/resources/pages/registry-reports/#x
Below is a table of the Top Ten ccTLDs ranked by the number of registrations.
Registrations source: Domain Tools http://research.domaintools.com/statistics/tld-counts/
Here’s a look at the ccTLDs ranked by prevalence of malicious activity:
Registrations source: Domain Tools http://research.domaintools.com/statistics/tld-counts/
The ccTLDs show some interesting variation. In some the prevalence of abusive names is extremely high at slightly more than 90%. At the other extreme 1% or less of the names observed in live traffic for the German, Dutch, Brazilian and French TLDs are malicious using the conservative metric.
Finally, on to the new wave of generic TLDs that began to appear in the root zone in October 2013. Table 6 below shows abuse in the Top Ten new generic TLDs ranked by number of registrations.
Registrations source: ICANN Monthly Registry Reports
https://www.icann.org/resources/pages/registry-reports/#x
Ranking new gTLDs based on the prevalence of malicious activity offers another view:
Registrations source: ICANN Monthly Registry Reports
https://www.icann.org/resources/pages/registry-reports/#x
There is considerable variation in the prevalence of abuse among the Top Ten gTLDs ranked by registrations, ranging from 1.0% to 86.0%. The .vip, .shop, and .work gTLDs are relatively clear of more serious malicious activity but .loan is one of the largest and has a higher prevalence of malicious names than any other new gTLD. Loan and some of the other highly abused TLDs have appeared in similar research from other organizations in the past.
Summary
Operators of TLDs have motivations to minimize registration of malicious names and to be responsive to abuse reports that come from researchers and the internet community. TLDs who want to attract well-known brands or high-value audiences are incented to maintain their reputation. Scrutiny on the front end to minimize malicious registrations can minimize overhead on the back end responding to complaints about malicious names, and reduce the possibility traffic will be impacted by filters. TLDs may also have formal contractual obligations to deter abuse with meaningful business consequences when violations occur.
There are many reasons malicious activity exists in TLDs. Even very large, popular, and/or long-established TLDs may inadvertently host malicious domains despite their best efforts to prevent malicious registrations and respond to abuse reports. The domain name industry is large and diverse and the interests of all of the players may not always coincide with the rest of the internet ecosystem.
The Internet Corporation for Assigned Names and Numbers (ICANN) is the organization responsible for coordinating the operation of the DNS and leading the development of policy that guides its evolution. Although ICANN has a global oversight role, the country code TLDs (ccTLD) pre-date ICANN and are largely self-regulated. They define their own registration policies and establish their own obligations to respond to abuse. For instance, some ccTLDs require that a legal entity be registered in-country in order to register a name, which creates a trail that may be useful for identifying registrants if it becomes necessary.
Although all operators of new generic TLDs have formal contractual obligations with ICANN that require them to deter abusive activity there are notable differences in registration policies. Some of the new generic TLDs are closed, like brands (.chanel, .ericsson, .fedex) that buy TLDs and restrict registrations to their own legal entities. Some TLDs such as .travel, .law, and .realtor require registrants to have some kind of formal affiliation or credentials, however the formality of the affiliation varies widely. Most of the new TLDs are completely open to anyone who wants to register a name, although that’s not to suggest they simply accept all registrations.
Another layer in the domain name ecosystem, registrars, may not fulfill their obligations to properly vet registrants and filter out miscreants. Lack of vigilance policing registrations, like limited verification of the identity of the registrants, is another contributing factor. Batch registrations make it easier to obtain names, use them briefly, and leave a minimal trail, especially if verification is inadequate. Cheap registrations benefit abusers because it reduces their operating costs and potentially allows them to hide abusive names amongst benign names.
Another reason abuse exists is there can be delays in responding to requests to review and potentially takedown domains that are flagged as problematic by researchers or the broader community. Misaligned incentives, resource constraints, deficient infrastructure or processes, and the sheer skill and determination of adversaries all get in the way.
Future posts will cover additional analysis of the underlying data and explore other factors, like registration policies and registrars, and domain name pricing, to gain more insights into the presence and factors that may contribute to abuse in TLDs.
Donghoon Shin and Paul O’Leary are Principal Data Scientists at Akamai.
*** This is a Security Bloggers Network syndicated blog from The Akamai Blog authored by Donghoon Shin. Read the original post at: http://feedproxy.google.com/~r/TheAkamaiBlog/~3/VS-C5ZwTeiA/a-view-into-top-level-domain-tld-abuse.html