Home » Security Bloggers Network » How Do you Protect an API from Scraping?

How Do you Protect an API from Scraping?

by Neosec Team on September 19, 2022

What is screen scraping, and how does it relate to APIs?

Screen scraping is a common challenge for businesses with a significant online presence, such as financial services and e-commerce firms. It may be referred to under many different names, such as web data extraction, web scraping, web harvesting, etc. While screen scraping was once thought of primarily as a front-end web application security challenge, the changing nature of business applications is extending the issue of scraping into the API security domain.

For example, business-to-consumer (B2C) architectures have evolved over time from monolithic web applications to new API-based front-end frameworks that can meet the needs of both web and mobile applications. Meanwhile, growing use of business-to-business (B2B) APIs by industry ecosystem partners is creating even more potential scenarios for scraping to occur.

B2B APIs have different APIs consumers than B2C APIs, which broadens the universe of potential data scraping scenarios. Some forms of scraping may be legitimate, but more often it is used to abuse APIs. Examples may include:

Aggregating information for use in non-sanctioned ways like product descriptions and product reviews
Collecting pricing information from ecommerce sites to inform competitive pricing strategies and offers, particularly those with constantly changing pricing models like travel, hotel, and car rental to name a few
Accessing frequently changing information such as interest rates from financial sites or betting odds from gambling sites for competitive reasons.

In addition to undesirable forms of enabling data leakage, API scraping can place a heavy resource burden on application infrastructure. And unfortunately, mitigating it is not as simple as implementing rate limits or quotas. Many sophisticated actors are adept at conducting scraping activities in a “low and slow” manner that falls below existing limit and quota levels. This makes it difficult to stop without disrupting legitimate API usage.

In addition, the fact that API scraping likely operates within these existing rate limit and quota parameters means that most organizations have zero visibility that it’s actually happening.

How do most organizations protect themselves against API scraping?

Most organizations rely on rate limits and quotas to limit the ability to perform web scraping. While this is not a silver bullet for the reasons described above, it is nonetheless an important first step. At a very minimum, it puts an upper limit on the volume of scraping that can occur.

Another crucial best practice is to ensure that the clients connecting to APIs are valid. For example, if APIs are generally accessible by mobile devices, steps should be taken to assure that the mobile client accessing the API hasn’t been hacked, the mobile device integrity hasn’t been compromised through jailbreaking, etc.

Some organizations may also use specialized bot mitigation tools to protect their web applications against automated scraping. These solutions provide value for B2C API traffic. But since they require specific browser or mobile application instrumentation, they are completely ineffective for B2B API scraping, where browsers and mobile apps don’t exist, which generally originates from a programmatic client of some kind. Similarly, compromised internet of things (IoT) or internet of everything (IoE) devices can be used to create “swarms” that do not originate from standard web or mobile application clients.

So, in summary, even if you have rate limits and quotas in place, you will still be left with two major points of exposure:

You remain wide open to low and slow scraping on B2C APIs.
Authenticated B2B API traffic is completely unmonitored.

And these risks are more than theoretical. Earlier this year, a threat actor was able to exploit a Twitter API vulnerability to scrape account details for an estimated 5.4 million users.

How does Neosec’s approach close these critical protection gaps?

The most important advance that Neosec brings to API security is extending API monitoring and analysis to authenticated traffic. B2B APIs represent a much larger attack surface – and a potential pathway to higher-value corporate assets.

Behavioral analytics at the authenticated user level is the key to monitoring B2B APIs. It’s the only way to tell when a seemingly legitimate, authenticated API consumer not using any known attack patterns is scraping your APIs. This requires context that can only come from analyzing the API requests of the same user over a long period of time – even if they’ve changed access tokens 100-plus times.

Below is a summary of how Neosec’s approach can extend your API protection capabilities beyond traditional bot mitigation techniques.

Comparison of Bot Mitigation and API Data Scraping

OWASP API Top 10	Bot Mitigation	Neosec
What	UI-based API (B2C only)	Any API (B2C, B2B)
Where	In the browser	Through the API
How	Detects browser or mobile app and human user signals – assumes any human is good	Behavioral profiling of users and IPs
Impact on the user experience	High	Low
Endurance	Easier to bypass	Robust
Response	Immediate	Slower
Strengths	Blocking high volume automated scraping on websites	Detects a wide range of abuse and misuse by malicious insiders and attackers masquerading as legitimate users
Common scraping use case	Scraping prices on website (for example: Airlines, Playstation 5)	Scraping any API resource by any authenticated user – from resellers, partners, suppliers to customers

*** This is a Security Bloggers Network syndicated blog from Blog authored by Neosec Team. Read the original post at: https://www.neosec.com/blog/how-do-you-protect-an-api-from-scraping

September 19, 2022September 19, 2022 Neosec Team API Security 101, API Security Strategy

How Do you Protect an API from Scraping?

Senator Sanders Wants to Own AI Companies — and Hand America’s Adversaries the Keys

NIST’s Nine: The PQC Signature Race Moves to Round Three

The Quantum Arms Race: Why Washington Just Wrote a $2 Billion Check to Nine Companies

Beyond Moore’s Law: The Hyper-Acceleration of Autonomous AI Cyber Capabilities

The Exception Economy: When Security Teams Stop Protecting and Start Negotiating

GoPlus’s Latest Report Highlights How Blockchain Communities Are Leveraging Critical API Security Data To Mitigate Web3 Threats

C2A Security’s EVSec Risk Management and Automation Platform Gains Traction in Automotive Industry as Companies Seek to Efficiently Meet Regulatory Requirements

Zama Raises $73M in Series A Lead by Multicoin Capital and Protocol Labs to Commercialize Fully Homomorphic Encryption

RSM US Deploys Stellar Cyber Open XDR Platform to Secure Clients

ThreatHunter.ai Halts Hundreds of Attacks in the past 48 hours: Combating Ransomware and Nation-State Cyber Threats Head-On

Randall Munroe’s XKCD ‘Soniferous Aether’