The federal computer crime law makes it both a criminal offense and a civil offense (you can sue for damages or loss) for someone to “access a computer without authorization” or to “exceed authorized access” to a computer, and then do certain proscribed things. But the meaning of the terms “access without authorization” and “exceed authorized access” has been, to say the least, somewhat vague and ambiguous.
On June 3, 2021, the U.S. Supreme Court at least partially addressed the question of what constitutes “exceeding authorization” to access a computer. The Court decided that a Georgia police officer did not “exceed authorized access” to a database of records maintained “for law enforcement purposes” when he subsequently gave records he accessed to individuals which were not for such purposes.
Narrowly read, the case stands for the proposition that a person does not violate the CFAA when they use their own authorized credentials to log into a computer (or database) that they are authorized to access, to download or view files that they are authorized to download and view, and then to use that information for a purpose that the owner of the database did not sanction, and, in fact, prohibited. While the misuse of the data may be its own offense, it’s not “hacking” and it’s not “trespass.”
TOS’d and Turned
The Supreme Court majority seemed leery of making a mere violation of TOS’s into a federal crime (or a civil action) noting:
“Many websites, services, and databases … authorize a user’s access only upon his agreement to follow specified terms of service. If the ‘exceeds authorized access’ clause encompasses violations of circumstance-based access restrictions on employers’ computers, it is difficult to see why it would not also encompass violations of such restrictions on website providers’ computers.”
And indeed, numerous amici explain why the Government’s reading of subsection (a)(2) would do just that—criminalize everything from embellishing an online dating profile to using a pseudonym on Facebook. In the interests of limiting the scope of the criminal statute, the Court found that mere violations of terms of contracts and the like are not hacks, trespasses or violations of the CFAA.
“In sum,” the Court summarized, “an individual ‘exceeds authorized access’ when he accesses a computer with authorization but then obtains information located in particular areas of the computer—such as files, folders or databases—that are off limits to him.”
One unanswered question is how you tell someone that access to the computer, the files, the folders or the databases are “off limits?” The Court noted that liability under both the “unauthorized access” and “exceeding authorized access” clauses “stems from a gates-up-or-down inquiry—one either can or cannot access a computer system, and one either can or cannot access certain areas within the system.”
But then they added a footnote—footnote 8—which said:
“For present purposes, we need not address whether this inquiry turns only on technological (or ‘code-based’) limitations on access, or instead also looks to limits contained in contracts or policies.”
Translation—the Court punted on whether, to establish a lack of authorization, you must show that you provided some technological barrier to entry (code based) or whether a mere contract or policy can establish the limitation on access.
Code vs. Permission Based
Think of “code based” restrictions as fences, gates, locks and doors. In the real world, one way to enforce “no trespassing” requirements is to provide some demarcation and some restriction on access via a physical restraint. So, if someone hops the fence into your yard, they have bypassed a technological (code based) limitation on access. If you give someone a user ID and password, and they can’t use that user ID and password to log into the HR department or the finance department, then you have a “code based” restriction on access.
Permissions-based, or contract- or agreement-based restrictions are different. You give someone a user ID and password that grants them access to any computer or network or database, but then tell them that they are not permitted to access the HR or finance department (or have a published policy that restricts access to those sites or databases). If we adopt permissions- or contract-based models for “exceeds authorized access,” then its hard to draw the line between telling Officer Van Buren, “You may access the GCIC database only for law enforcement purposes,” (authorized access, but improper use) and, “If you do not intend to use the data for law enforcement purposes, you are not authorized to access the database or this computer at all,” permissions-based or contract-based restrictions. It becomes unworkable.
When you provide data to companies like Facebook or LinkedIn, those companies provide certain promises about how they will use (and protect) your data. When third parties scrape that data from the site, there’s no promise about how they will use the data. When you throw a picture of a Las Vegas bachelor party onto a Facebook page, all of the attendee’s pictures may be scraped by a company like Clearview AI, captured with facial recognition and now identified in a way that any of Clearview AI’s customers can match against another picture. Not what you signed up for when you signed up for Facebook, and not what you signed up for even if you didn’t sign up for Facebook, but, instead, just attended a bachelor party in Vegas. What happens in Vegas doesn’t stay there … at least, not for long.
Similarly, data analytics company HiQ routinely scraped data from professional social media site LinkedIn, and used their own algorithm to identify employees on the site who were poachable (at higher risk of leaving the company). This could be used by the company to offer incentives to keep the employee onboard, or by a competitor to recruit. It also offered skills mapping intel; again, which could be used to retain, retrain or recruit. But both of these required access to LinkedIn’s database—which LinkedIn tried to restrict. Although the LinkedIn data was “public” (that is, not a separate site) they enabled their own users to set their privacy settings—public, private or semi-restricted—as well as to restrict the “broadcast” of changes to their profiles. LinkedIn not only had a restriction in its terms of service prohibiting access to the computers or databases for the purpose of “scraping” or “spidering,” they also imposed some technological barriers to the automated and massive downloading of its customer’s “public” data. The instructions in LinkedIn’s “robots.txt” file prohibit access to LinkedIn servers via automated bots; LinkedIn’s “Quicksand” system detects non-human activity indicative of scraping; its “Sentinel” system throttles (slows or limits) or even blocks activity from suspicious IP addresses;[ and its “Org Block” system generates a list of known “bad” IP addresses serving as large-scale scrapers. LinkedIn blocks approximately 95 million automated attempts to scrape data every day, and has restricted over 11 million accounts suspected of violating its User Agreement, including through scraping.
Cat, Meet Mouse.
But LinkedIn elevated the stakes by sending a “cease and desist” letter to HiQ demanding that HiQ stop scraping data from the website. In a “real world” analogy, they asked them to leave. If HiQ continued to scrape data from the LinkedIn site after LinkedIn expressly revoked their permission to do so, was HiQ exceeding its authorization to access LinkedIn’s data? The Ninth Circuit Court of Appeals held that HiQ’s accessing the publicly accessible portions of LinkedIn’s website—even after permission had been expressly revoked—did not violate the CFAA.
The Court held that accessing public information, even after permission was revoked by the cease and desist letter, did not violate CFAA holding:
Put simply, HiQ did not “break in” to LinkedIn’s site. LinkedIn appealed to the U.S. Supreme Court which, on June 14, 2021 vacated the 9th Circuit’s judgement and remanded the case “for further consideration in light of Van Buren v. United States, 593 U. S. ___ (2021).” But it’s not clear that the 9th Circuit’s judgement was inconsistent with Van Buren, and I’m not sure how the case would be decided differently “in light of” Van Buren.
Meanwhile, scraping cases continue. For example, in litigation between Kiwi Airlines and Southwest Airlines, Southwest’s website terms of service “expressly prohibit any attempts to ‘page scrape’ flight data and any use of the Southwest Website ‘for any commercial purpose’ without authorization from Southwest.” That’s why, for example, when you search for flights on online travel agents like Kayak.com, you don’t find Southwest flights.
In their lawsuit, Southwest alleged that “Kiwi is accessing Southwest’s computer systems … without authorization, bypassing Southwest’s security systems intended to block automated traffic and bots from using the Southwest Website, and hacking the Southwest application programming interface (API) that is accessible only through the Southwest Website—all in violation of the Website Terms.” At present, the issue in the case is whether the lawsuit can continue in Texas, where Southwest (and its website) were located, but it’s not clear whether the holding in Van Buren forestalls a CFAA remedy by the discount airline.
Ultimately, the Supreme Court itself may have to clarify how the Van Buren case impacts scraping cases. The holding in Van Buren suggests that those suing for anticompetitive and unwanted access to public data may have a large hill to climb to prove “unauthorized access” or “exceeding authorized access” in such cases. The Van Buren case, however, raises legitimate questions about the ability of web page, computer and database owners to prevent malicious activity directed at their computers, data or users where “access” to the computer system, data or website is permitted for some purposes, but prohibited for other purposes.
This kind of “conditional access” is what permits companies like Facebook to enforce its terms of service, to kick out malefactors and to restrict phishing, malware and block IP addresses. In a future article, I will address the impact of Van Buren on ongoing efforts to protect websites and data. For now, it appears that scraping lawsuits will have a difficult time in the Courts. The HiQ case will go back to the circuit court and probably back to the Supreme Court. Until then, I guess we will all just have to continue scraping by.