After Van Buren, are Data Scraping Cases Barred? - Security Boulevard

After Van Buren, are Data Scraping Cases Barred?

The federal computer crime law makes it both a criminal offense and a civil offense (you can sue for damages or loss) for someone to “access a computer without authorization” or to “exceed authorized access” to a computer, and then do certain proscribed things. But the meaning of the terms “access without authorization” and “exceed authorized access” has been, to say the least, somewhat vague and ambiguous.

Specifically, when someone uses a computer, or data contained in that computer, in a way that the computer or data owner does not want—for example, when the user violates the owner’s Terms of Use or Terms of Service—is this “exceeding authorization” to access that computer?

This is significant because, even though the computer crime law—the Computer Fraud and Abuse Act (CFAA)—is a criminal statute, the vast majority of cases brought under the statute have been lawsuits between parties. In fact, they typically fall into one of two categories; an employer suing current or former employees for improperly accessing their email accounts (or other electronic data) with the intent to use that data to compete with their former employee, or competitors accessing data (sometimes on public-facing websites) and “scraping” data from those sites to use to compete against the data owner—often in violation of the website’s Terms of Service or Terms of Use. In both of these cases, the “owner” of the data or website claims that the opposing party exceeded the scope of their authorization to access the website (computer) or email system in order to improperly obtain data (information) from that site. Since the website operator gets to set the “rules of the road” for such access (through a contract), violation of those rules is—well, a criminal trespass. Companies like to use the CFAA because it provides a private right of action and gets you into federal court. It also contains the tacit threat that someone might go to jail.

On June 3, 2021, the U.S. Supreme Court at least partially addressed the question of what constitutes “exceeding authorization” to access a computer. The Court decided that a Georgia police officer did not “exceed authorized access” to a database of records maintained “for law enforcement purposes” when he subsequently gave records he accessed to individuals which were not for such purposes.

Narrowly read, the case stands for the proposition that a person does not violate the CFAA when they use their own authorized credentials to log into a computer (or database) that they are authorized to access, to download or view files that they are authorized to download and view, and then to use that information for a purpose that the owner of the database did not sanction, and, in fact, prohibited. While the misuse of the data may be its own offense, it’s not “hacking” and it’s not “trespass.”

TOS’d and Turned

More broadly, however, the Van Buren case might stand for the proposition that, where access is controlled, limited or conditioned on agreement to the terms of a contract—Terms of Service (TOS), Terms of Use, or Software License Agreement, or End User Agreement, or an employment agreement, or an HR policy—that mere violation of the terms of those agreements does not invalidate the authorization to access the computer, to use it and to view the data contained therein.

The Supreme Court majority seemed leery of making a mere violation of TOS’s into a federal crime (or a civil action) noting:

“Many websites, services, and databases … authorize a user’s access only upon his agreement to follow specified terms of service. If the ‘exceeds authorized access’ clause encompasses violations of circumstance-based access restrictions on employers’ computers, it is difficult to see why it would not also encompass violations of such restrictions on website providers’ computers.”

And indeed, numerous amici explain why the Government’s reading of subsection (a)(2) would do just that—criminalize everything from embellishing an online dating profile to using a pseudonym on Facebook. In the interests of limiting the scope of the criminal statute, the Court found that mere violations of terms of contracts and the like are not hacks, trespasses or violations of the CFAA.

“In sum,” the Court summarized, “an individual ‘exceeds authorized access’ when he accesses a computer with authorization but then obtains information located in particular areas of the computer—such as files, folders or databases—that are off limits to him.”

One unanswered question is how you tell someone that access to the computer, the files, the folders or the databases are “off limits?” The Court noted that liability under both the “unauthorized access” and “exceeding authorized access” clauses “stems from a gates-up-or-down inquiry—one either can or cannot access a computer system, and one either can or cannot access certain areas within the system.”

But then they added a footnote—footnote 8—which said:

“For present purposes, we need not address whether this inquiry turns only on technological (or ‘code-based’) limitations on access, or instead also looks to limits contained in contracts or policies.”

Translation—the Court punted on whether, to establish a lack of authorization, you must show that you provided some technological barrier to entry (code based) or whether a mere contract or policy can establish the limitation on access.

Code vs. Permission Based

Think of “code based” restrictions as fences, gates, locks and doors. In the real world, one way to enforce “no trespassing” requirements is to provide some demarcation and some restriction on access via a physical restraint. So, if someone hops the fence into your yard, they have bypassed a technological (code based) limitation on access. If you give someone a user ID and password, and they can’t use that user ID and password to log into the HR department or the finance department, then you have a “code based” restriction on access.

Permissions-based, or contract- or agreement-based restrictions are different. You give someone a user ID and password that grants them access to any computer or network or database, but then tell them that they are not permitted to access the HR or finance department (or have a published policy that restricts access to those sites or databases). If we adopt permissions- or contract-based models for “exceeds authorized access,” then its hard to draw the line between telling Officer Van Buren, “You may access the GCIC database only for law enforcement purposes,” (authorized access, but improper use) and, “If you do not intend to use the data for law enforcement purposes, you are not authorized to access the database or this computer at all,” permissions-based or contract-based restrictions. It becomes unworkable.

Ultimately, everything becomes contract-based restrictions, and the problem that the Court identified about Terms of Use or Terms of Service is not solved. What the court is saying is that you can tell someone that they can be in (virtual) place A, or (virtual) place B, and if they are granted permission to be in place A but not B, their access to place B is in “excess” of their authorization, but you can’t say that you can be in (virtual) place A, but only for specific purposes. In for a penny, in for a pound.

Scraping By

It’s not clear how the Van Buren case affects “scraping” litigation. In a scraping case, company A “scrapes” data from company B using a tool, a bot or a spider. They do so without the consent of company B, and in violation of company B’s terms of service. So, when you log onto company B’s website, it might say something like, “The data on this website is the property of company B, and is proprietary data. You agree not to download, scrape … etc., this data. You are only authorized to access company B’s computers and networks for the purposes stated, and can’t scrape the data.” Or something. Company B might also impose a technological barrier, like a Robots.txt file, to attempt to prevent scraping or spidering of data. You might impose a captcha to prevent automated tools from downloading the data. The Court has not made it clear how (or whether) a company can use the CFAA to protect the collection and use of massive amounts of data collected by a company for one purpose from being used by another company or a competitor for another purpose. It is entirely possible that it can’t, and that the remedy for a breach of a terms of service or terms of use is to sue for a breach of contract. But these things can be important.

When you provide data to companies like Facebook or LinkedIn, those companies provide certain promises about how they will use (and protect) your data. When third parties scrape that data from the site, there’s no promise about how they will use the data. When you throw a picture of a Las Vegas bachelor party onto a Facebook page, all of the attendee’s pictures may be scraped by a company like Clearview AI, captured with facial recognition and now identified in a way that any of Clearview AI’s customers can match against another picture. Not what you signed up for when you signed up for Facebook, and not what you signed up for even if you didn’t sign up for Facebook, but, instead, just attended a bachelor party in Vegas. What happens in Vegas doesn’t stay there … at least, not for long.

Similarly, data analytics company HiQ routinely scraped data from professional social media site LinkedIn, and used their own algorithm to identify employees on the site who were poachable (at higher risk of leaving the company). This could be used by the company to offer incentives to keep the employee onboard, or by a competitor to recruit. It also offered skills mapping intel; again, which could be used to retain, retrain or recruit. But both of these required access to LinkedIn’s database—which LinkedIn tried to restrict. Although the LinkedIn data was “public” (that is, not a separate site) they enabled their own users to set their privacy settings—public, private or semi-restricted—as well as to restrict the “broadcast” of changes to their profiles. LinkedIn not only had a restriction in its terms of service prohibiting access to the computers or databases for the purpose of “scraping” or “spidering,” they also imposed some technological barriers to the automated and massive downloading of its customer’s “public” data. The instructions in LinkedIn’s “robots.txt” file prohibit access to LinkedIn servers via automated bots; LinkedIn’s “Quicksand” system detects non-human activity indicative of scraping; its “Sentinel” system throttles (slows or limits) or even blocks activity from suspicious IP addresses;[ and its “Org Block” system generates a list of known “bad” IP addresses serving as large-scale scrapers. LinkedIn blocks approximately 95 million automated attempts to scrape data every day, and has restricted over 11 million accounts suspected of violating its User Agreement, including through scraping.

Cat, Meet Mouse.

But LinkedIn elevated the stakes by sending a “cease and desist” letter to HiQ demanding that HiQ stop scraping data from the website. In a “real world” analogy, they asked them to leave. If HiQ continued to scrape data from the LinkedIn site after LinkedIn expressly revoked their permission to do so, was HiQ exceeding its authorization to access LinkedIn’s data? The Ninth Circuit Court of Appeals held that HiQ’s accessing the publicly accessible portions of LinkedIn’s website—even after permission had been expressly revoked—did not violate the CFAA.

The Court held that accessing public information, even after permission was revoked by the cease and desist letter, did not violate CFAA holding:

“In recognizing that the CFAA is best understood as an anti-intrusion statute and not as a ‘misappropriation statute,’ Nosal I, 676 F.3d at 857-58, we rejected the contract-based interpretation of the CFAA’s ‘without authorization’ provision adopted by some of our sister circuits. Compare Facebook, Inc. v. Power Ventures, Inc., 844 F.3d 1058, 1067 (9th Cir. 2016), cert. denied, ___ U.S. ___, 138 S. Ct. 313, 199 L.Ed.2d 206 (2017) (“[A] violation of the terms of use of a website— without more—cannot establish liability under the CFAA.”); Nosal I, 676 F.3d at 862 (“We remain unpersuaded by the decisions of our sister circuits that interpret the CFAA broadly to cover violations of corporate computer use restrictions or violations of a duty of loyalty.”), with EF Cultural Travel BV v. Explorica, Inc., 274 F.3d 577, 583-84 (1st Cir. 2001) (holding that violations of a confidentiality agreement or other contractual restraints could give rise to a claim for unauthorized access under the CFAA); United States v. Rodriguez, 628 F.3d 1258, 1263 (11th Cir. 2010) (holding that a defendant “exceeds authorized access” when violating policies governing authorized use of databases).”

Put simply, HiQ did not “break in” to LinkedIn’s site. LinkedIn appealed to the U.S. Supreme Court which, on June 14, 2021 vacated the 9th Circuit’s judgement and remanded the case “for further consideration in light of Van Buren v. United States, 593 U. S. ___ (2021).” But it’s not clear that the 9th Circuit’s judgement was inconsistent with Van Buren, and I’m not sure how the case would be decided differently “in light of” Van Buren.

Meanwhile, scraping cases continue. For example, in litigation between Kiwi Airlines and Southwest Airlines, Southwest’s website terms of service “expressly prohibit any attempts to ‘page scrape’ flight data and any use of the Southwest Website ‘for any commercial purpose’ without authorization from Southwest.” That’s why, for example, when you search for flights on online travel agents like Kayak.com, you don’t find Southwest flights.

In their lawsuit, Southwest alleged that “Kiwi is accessing Southwest’s computer systems … without authorization, bypassing Southwest’s security systems intended to block automated traffic and bots from using the Southwest Website, and hacking the Southwest application programming interface (API) that is accessible only through the Southwest Website—all in violation of the Website Terms.” At present, the issue in the case is whether the lawsuit can continue in Texas, where Southwest (and its website) were located, but it’s not clear whether the holding in Van Buren forestalls a CFAA remedy by the discount airline.

Ultimately, the Supreme Court itself may have to clarify how the Van Buren case impacts scraping cases. The holding in Van Buren suggests that those suing for anticompetitive and unwanted access to public data may have a large hill to climb to prove “unauthorized access” or “exceeding authorized access” in such cases. The Van Buren case, however, raises legitimate questions about the ability of web page, computer and database owners to prevent malicious activity directed at their computers, data or users where “access” to the computer system, data or website is permitted for some purposes, but prohibited for other purposes.

This kind of “conditional access” is what permits companies like Facebook to enforce its terms of service, to kick out malefactors and to restrict phishing, malware and block IP addresses. In a future article, I will address the impact of Van Buren on ongoing efforts to protect websites and data. For now, it appears that scraping lawsuits will have a difficult time in the Courts. The HiQ case will go back to the circuit court and probably back to the Supreme Court. Until then, I guess we will all just have to continue scraping by.

Featured eBook
The Dangers of Open Source Software and Best Practices for Securing Code

The Dangers of Open Source Software and Best Practices for Securing Code

More and more organizations are incorporating open source software into their development pipelines. After all, embracing open source products such as operating systems, code libraries, software and applications can reduce costs, introduce additional flexibility and help to accelerate delivery. Yet, open source software can introduce additional concerns into the development process—namely, security. Unlike commercial, or ... Read More
Security Boulevard

Mark Rasch

Mark Rasch is a lawyer and computer security and privacy expert in Bethesda, Maryland. where he helps develop strategy and messaging for the Information Security team. Rasch’s career spans more than 35 years of corporate and government cybersecurity, computer privacy, regulatory compliance, computer forensics and incident response. He is trained as a lawyer and was the Chief Security Evangelist for Verizon Enterprise Solutions (VES). He is recognized author of numerous security- and privacy-related articles. Prior to joining Verizon, he taught courses in cybersecurity, law, policy and technology at various colleges and Universities including the University of Maryland, George Mason University, Georgetown University, and the American University School of law and was active with the American Bar Association’s Privacy and Cybersecurity Committees and the Computers, Freedom and Privacy Conference. Rasch had worked as cyberlaw editor for SecurityCurrent.com, as Chief Privacy Officer for SAIC, and as Director or Managing Director at various information security consulting companies, including CSC, FTI Consulting, Solutionary, Predictive Systems, and Global Integrity Corp. Earlier in his career, Rasch was with the U.S. Department of Justice where he led the department’s efforts to investigate and prosecute cyber and high-technology crime, starting the computer crime unit within the Criminal Division’s Fraud Section, efforts which eventually led to the creation of the Computer Crime and Intellectual Property Section of the Criminal Division. He was responsible for various high-profile computer crime prosecutions, including Kevin Mitnick, Kevin Poulsen and Robert Tappan Morris. Prior to joining Verizon, Mark was a frequent commentator in the media on issues related to information security, appearing on BBC, CBC, Fox News, CNN, NBC News, ABC News, the New York Times, the Wall Street Journal and many other outlets.

mark has 111 posts and counting.See all posts by mark