Home » Security Bloggers Network » Using Google Bots as an Attack Vector

Using Google Bots as an Attack Vector

by Netsparker Security Team on November 8, 2018

According to the statistics, Google always has more than 70% of the web search market value. Many users use their address bar as Google’s search bar. Therefore, being visible on Google is crucial for websites as it continues to dominate the market.

In this article, we analyze a study from F5 Labs which brings our attention to a new attack vector using Google’s crawling servers, also known as Google Bots. These servers gather content from the web to create the searchable index of websites from which Google’s Search engine results are taken.

How Search Engines Use Bots to Index Websites

Each search engine has unique sets of algorithms, but the common thing they do is to visit any given website, look at the content and links they find (known as ‘crawling’), then grade and list the resources. After one of these bots finds your website, it will visit and index it.

For a good ranking, you need to make sure that search engine bots can crawl your website without issues. Google specifically recommends that you avoid blocking search bots in order to achieve successful indexing. Attackers are aware of these permissions and have developed an interesting technique to exploit them – Abusing Google Bots.

The Discovery Phase of a Google Bot Attack

In 2001, Michael Zalewski wrote in Phrak magazine about this trick. He also highlighted how difficult it is to prevent it. Just how difficult became apparent 17 years later, when F5 Labs inspected CroniX Crypto Miner. When F5 Labs’ researchers analyzed some malicious requests they had logged, they discovered that the requests originated from Google Bots.

Initially, the F5 Labs researchers assumed that an attacker used the Google Bot’s User-Agent header value. But when they investigated the source of the requests, they discovered that the requests were indeed sent from Google.

There were different explanations for why Google servers would send these malicious requests. One of them would be that Google’s servers were hacked. However, that idea was discarded quickly as it wasn’t likely. Instead they focused on the scenario laid out by Michael Zalewski, who stated that Google Bots are abused in order to make them behave maliciously.

How Did the Google Bots Turn Evil?

Let’s take a look at how attackers can abuse Google Bots in order to use them as a tool for malicious intent.

First, let’s suppose that your website contains the following link:

<a href="http://victim-address.com/exploit-payload">malicious link<a>

When Google Bots encounter this URL, they’ll visit it in order to index it. The request that includes the payload will be made by a Google Bot. This image illustrates what’s happening:

Using Google Bots as an attack vector diagram

The Experiment Conducted to Prove the Attack

Researchers verified the theory that a Google Bot request would carry the payload, by conducting an experiment in which they prepared two websites: one that acted as the attacker, and one that acted as the target. The links that carried the payload were added to the attacker’s website and then sent to the target website.

Once the researchers set the necessary configurations for the Google Bots to browse the website, they then waited for the requests from the Google Bots. When they analyzed the requests, they found out that the requests from the Google Bot servers indeed carried the payload.

The Limits of the Attack

This scenario is only possible in GET requests where the payload can be sent through the URL. Another drawback is that the attacker won’t be able to read the victim server’s response, which means that this attack is only practical if it’s possible to send the response out of bounds, like with a command injection or an SQL injection.

The Combination of Apache Struts Remote Code Evaluation CVE-2018-11776 and Google Bots

Apache Struts is a Java-based framework released in 2001. The regular discovery of code evaluation vulnerabilities in the framework generated many discussions about its security. For example, the Equifax Breach that facilitated the loss of $439 million and the theft of a huge amount of personal data, was the result of CVE-2017-5638, a critical code execution vulnerability found in the Apache Struts framework.

A Quick Recap of Apache Struts Remote Code Evaluation CVE-2018-11776

Let’s recap on the vulnerability that can be exploited on recent Apache Struts versions. The CVE-2018-11776 vulnerability (discovered in August this year) is perfect for a Google Bot attack, since the payload is sent through the URL. Not surprisingly, this was the vulnerability that CroniX abused.

Example

Here are two examples:

The ‘hello’ in this URL is a namespace: http://www.example.com/hello/index.action.
Likewise, the ‘/’ that precedes ‘index.action’ is considered to be a namespace: http://www.example.com//index.action.

When a namespace is not set, the configuration that leads to the vulnerability allows user-defined namespaces to be set from the path. In this situation it’s possible to inject an OGNL (Object-Graph Navigation Language) expression. OGNL is an expression language in Java.

Here is an example of a configuration that is vulnerable to CVE-2018-11776:

<struts>
<constant name="struts.mapper.alwaysSelectFullNamespace" value="true" />

<package name="default" extends="struts-default">

<action name="help">
  <result type="redirectAction">
      <param name="actionName">date.action</param>
  </result>
</action>
..
..
.
</struts>

You can use the following sample payload to confirm the existence of CVE-2018-11776. If you open the URL http://your-struts-instance/${4*4}/help.action and you get redirected to http://your-struts-instance/16/date.action, you can confirm that the vulnerability exists.

As mentioned before, this is the perfect context for a Google Bot attack. As CroniX shows, attackers can go as far as spreading Cryptomining malware using a combination of Apache Struts CVE-2018-11776 and Google Bots.

Solutions to the Google Bots Attack

At this point, the possibility of malicious links being directed to your website from Google Bots should make you question which third-parties you can really trust. Yet, blocking Google Bot requests entirely would negatively influence your position in the search engine’s results. The Google Bots that cannot browse your website will pull down your ranking in the search results. In addition, if you detect malicious requests and block them, or block the IP that sends them, attackers could use the Google Bot requests to send payloads that would block your website, and further pull down your website ranking in Google’s search results.

Control the External Connections on Your Website

Attackers can use their websites, or those under their control, to conduct malicious activity using Google Bots. They might also place links on a website in comments under blog posts.

If you want an overview of the external links on your website, you can check the Out-of-Scope Links node in the Netsparker Knowledge Base following a scan.

Out of Scope Links

The Correct Handling of Links Added by Users

Even though it won’t prevent attackers from abusing Google Bots to websites, you might still be able to prevent a negative Search Engine Ranking if you take certain precautions. For example, you can prevent search bots from following these links using the rel feature. This is how it’s done:

<a rel="nofollow" href="http://www.functravel.com/">Cheap Flights</a>

Due to the ‘nofollow’ value of the rel feature, the bots will not visit the link.

Similarly, the meta tags you define between the <head></head> tags will help control the behavior of the search bots on all URLs found on the page.

<meta name="googlebot" content="nofollow" />
<meta name="robots" content="nofollow" />

You can give these commands using the X-Robots-Tag response header, too:

X-Robots-Tag: googlebot: nofollow

You should note that the commands given with X-Robots-Tag and meta tags apply to all internal and external links.

Using Google Bots as an Attack Vector

How Search Engines Use Bots to Index Websites

The Discovery Phase of a Google Bot Attack

How Did the Google Bots Turn Evil?

The Experiment Conducted to Prove the Attack

The Limits of the Attack

The Combination of Apache Struts Remote Code Evaluation CVE-2018-11776 and Google Bots

A Quick Recap of Apache Struts Remote Code Evaluation CVE-2018-11776

Example

Solutions to the Google Bots Attack

Control the External Connections on Your Website

The Correct Handling of Links Added by Users

Further Reading

Senator Sanders Wants to Own AI Companies — and Hand America’s Adversaries the Keys

NIST’s Nine: The PQC Signature Race Moves to Round Three

The Quantum Arms Race: Why Washington Just Wrote a $2 Billion Check to Nine Companies

Beyond Moore’s Law: The Hyper-Acceleration of Autonomous AI Cyber Capabilities

The Exception Economy: When Security Teams Stop Protecting and Start Negotiating

GoPlus’s Latest Report Highlights How Blockchain Communities Are Leveraging Critical API Security Data To Mitigate Web3 Threats

C2A Security’s EVSec Risk Management and Automation Platform Gains Traction in Automotive Industry as Companies Seek to Efficiently Meet Regulatory Requirements

Zama Raises $73M in Series A Lead by Multicoin Capital and Protocol Labs to Commercialize Fully Homomorphic Encryption

RSM US Deploys Stellar Cyber Open XDR Platform to Secure Clients

ThreatHunter.ai Halts Hundreds of Attacks in the past 48 hours: Combating Ransomware and Nation-State Cyber Threats Head-On

Fortinet® Follies