Is it Unlawful to Collect or Store TCP/IP Log Data for Security Purposes?
It is a common and accepted practice for entities to collect, store, process and analyze log data. This log data includes the IP addresses of every person (computer) that accessed or attempted to access the network, the website or the process. These logs can be used to determine the source of attacks, the pattern of attacks and to provide and early warning about such attacks. Data analytics on such log data can not only identify sources of attacks, but also can be used to prevent future attacks—both at that institution and others.
But this activity raises two significant questions: Are IP addresses “personal data” under the EU General Data Protection Regulation (GDPR), and if so, can I collect, store and process them with or without the consent of the “data subject”—presumably the owner of the IP address? The answer (and you knew this already) is “it depends” and “it depends.”
What is an IP Address Anyway?
“An Internet Protocol address (IP address) is a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication. An IP address serves two principal functions: host or network interface identification and location addressing.” (Thank you, Wikipedia). First thing to note is that an IP address identifies (or can identify) a connected device, not the actual identity of the individual using that device. Second, the extent to which the IP address actually identifies a device may depend on whether the IP address is “static” (that is, that the device has a specific assigned IP address associated with it) or “dynamic” (that is, that the device is assigned a new IP address every time it connects to the internet.) Even if dynamic, it may depend on whether the IP address is reassigned for every session, and how long each session may last. The longer an IP address is associated with a device (even if dynamically assigned) the more that IP address is “associated” with that device.
Most corporate or other networks now use network address translation (NAT) in one form or another, so that the internal IP addresses associated with specific devices or activity are not viewable to the outside world. Think of it as a mailroom where you simply address mail to “ABC Company” and the mailroom figures out to whom within the company the mail is intended. (Remember mailrooms?)
Finally, the information necessary to connect an IP address to either an individual or even a device is frequently not publicly available. The IP address 126.96.36.199 may be associated with the U.S. Senate, but not with any specific individual or device at the Senate. Typically, an IP address will identify a provider—an ISP who assigned the IP address to a subscriber. Armed with the IP address, the date, the time and possibly other information, the provider can determine the subscriber to whom that IP address was assigned at that time. If the address is NAT’d, or if the subscriber used a VPN or TOR router, or travelled through a proxy server of any kind, then we need some complicated forensics or other tools or devices just to connect an IP address to a device. Oh, and these providers won’t give out this data to just anyone—generally a subpoena, warrant or other legal process is required to be able to trace and track an IP address to a subscriber. Generally. So it’s not like you can find an IP address and know, “Hey! This is the IP address of Boris Badenov.” So, IP addresses are NOT personal data.
Not so fast.
Why exactly do we collect, store, analyze and log IP addresses? Obviously the machines we use—the computers, routers, hubs and firewalls—collect IP addresses to direct traffic to the right location. When a visitor comes to a website, the device associated with the IP address of that visitor makes a request to our web server to deliver the hypertext on our website to theirs. For that, we need the return address—the IP address. The internet does’t work without the collection and use of IP addresses. Emails don’t get sent, videos don’t get watched, websites don’t get visited, tweets don’t get tweeted, messages don’t get messaged—you get the idea. IP addresses are what make the internet work, and without collecting them and using them, there ain’t no internet.
So, irrespective of whether they are “personal data,” it’s clear that some collection, use and analysis of IP addresses is both permitted and necessary. Whether that is because it is a “lawful purpose” to use the IP address, or whether there is express or implied consent to such use we will put aside. For the moment.
Security logging goes somewhat beyond the “normal” use of IP addresses to deliver traffic. Think of FedEx or UPS. They need the address of the sender to know where to pick up the package. They need the address of the recipient to know where to deliver the package. While the package is in transit, they need the tracking data to, well, track the package. They may need aggregate package tracking data to decide—both for the present and the future—how many trucks, planes, warehouses, etc. they will require. But FedEx and UPS also have the ability to print out (and to be compelled to print out by subpoena) records of every package delivered to you (or your address) from the dawn of time (say, 1990) to the present. They can slice and dice this data to show how many letters, packages and boxes. And from whom. Are you getting packages from drug dealers? Chemical manufacturers? Adam and Eve? This is similar to what IP logging and traffic analysis does. It collects traffic data to determine patterns. Patterns of “good” traffic and patterns of “bad.” Comparing this data to those collected by others (e.g., threat intel) we can get a picture of the activities associated with the IP address. We can track the IP address over time. And, in certain circumstances, we can use this data with other data to determine the individual associated with the IP address (the threat actor). In some cases—such as employee misconduct, theft of intellectual property, denial of service attacks or data breaches, we collect, store and process IP addresses to actually identify (and then possibly sue or arrest) a specific identifiable person. So IP address ARE personal data, right?
Not so fast.
To understand whether an IP address is, or is not “personal data” under GDPR, you have to understand what “personal data” is and why it is protected. Even then, assuming that IP addresses are personal data, this does not necessarily preclude its collection, storage and use. On the other hand, it doesn’t mean you can do whatever you want with IP information.
What IS ‘Personal Data’?
Personal data is at the heart of GDPR, it’s not clear whether an IP address does or does not fit this definition. GDPR states that “‘[P]ersonal data’ means any information relating to an identified or identifiable natural person (‘data subject’).” It clarifies this statement by noting that:
“[A]n identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”
So, does the number 188.8.131.52 relate to an identified or identifiable natural person?
The EU Court dealt with this issue with respect to the predecessor to GDPR—the EU Data Privacy Directive—regarding German citizen Patrick Breyer’s claim that his IP address, held by the German Federal government (acting as an ISP), constituted “personal data.” On Oct. 19, 2016, the EU Court agreed. Kinda. Sorta. The Court stated that:
“… a dynamic IP address registered by an online media services provider when a person accesses a website that the provider makes accessible to the public constitutes personal data within the meaning of that provision, in relation to that provider, where the latter has the legal means which enable it to identify the data subject with additional data which the internet service provider has about that person.”
OK, what does that mean, exactly? It means that an IP address is personal data when the provider has the additional data necessary to link that IP address to a specific individual. So, if your provider is, say Comcast or Verizon, and they can track your activities by IP address and know who you are because you are a subscriber of theirs, then with respect to them, your IP address is personal data. But the case is ambiguous. It makes the data “personal” if the provider has the “legal means” to enable it to identify the data subject—not just where it actually has the necessary data or even access to that data.
In the United States, if all you have is an IP address, a date and a time, (the kind of stuff you typically log), do you have the “legal means” to enable you to identify the data subject? Depends what “legal means” means. Certainly you would have the ability to file a lawsuit against the “owner” of the IP address for—well, whatever bad thing you think they did. Then you could serve a subpoena on the ISP for it to pony up its own log and subscriber data. If a VPN or proxy, you could then do the same thing for each IP address uncovered, then file a lawsuit in Armenia or Ukraine or Singapore, and get a court to order the Armenian ISP to deliver this data through letters rogatory to a U.S. court, or otherwise produce it pursuant to a treaty, and then, five years later, you might have the fake subscriber’s name associated with the proxy server three steps removed from the threat actor. Or you might get lucky, and find the threat actor in just one subpoena. Nevertheless, you have the “legal means” to get the data. Maybe.
The Breyer court simply didn’t take into account how difficult it is in most circumstances to convert a naked IP address into an identity of a living, breathing human being. The court is not wrong, it’s just not right.
An analogy can be made between IP addresses and license plate numbers. Neither of these are “personal data” per se. The Illinois license plate “BDR 529” tells you nothing about the activities of a specific individual standing alone. Add to that the ability to look up the registered owner of the vehicle and you have something much closer to personal data. As with the IP address which at best identifies a machine or process, the license plate at best identifies a vehicle itself (and its registered owner) but not the operator. So the plate + lookup + observation of activity lets you infer a great deal about the activities of the owner (if you assume the owner is driving) and would constitute personal data.
In a recent case, Fairfax County, Virginia, police were using a sophisticated camera linked to a state database to collect, store and process license plates of vehicles as they drove down the street. The Automated License Plate Reader (ALPR) would capture and process an image of the license plates of cars, then run these license plates to see if the cars were reported stolen or if there were warrants issued against the registered owner of the cars. But the police were also storing this data for an unlimited period of time. So they had the ability to, for example, identify all the vehicles that might have been in a particular neighborhood at a particular time. They also had the ability to put in a person’s name as registered owner and see all the places that person had been (and had been captured by the ALPR) for days, weeks or months. There was essentially no limit to how the police could use the data.
So are the scans of the license plates, which are just a series of numbers and letters (and the plates themselves are issued by and belong to the government), and the data collected about what these license plates are doing “personal data?” The lower Virginia court, applying Virginia’s “Data Act,” which protects certain “personal data,” found “that a license plate number is not personal information,” noting that each example of “personal information” listed in [the Data Act’s] definition referred to “an individual person” but a license plate number refers to “a vehicle rather than a person.” Since the ALPR only identifies a car and not a person, it’s not protected. Just like an IP address identifies a computer or process, not a person.
The Virginia Supreme Court reversed. It found that the Data Act defines “personal information” broadly to “encapsulate ‘all information’ that … allows any inference about an individual’s ‘personal characteristics,’ activities, or associations.”
The Virginia Court took a nuanced approach to whether the license plate data is or is not “personal information.” It noted that, because it is an agency-issued identification number under the statute, it may be deemed “personal information” in certain contexts. However, the court noted “in other contexts, a license plate number would not be ‘personal information’ because there is nothing about a license plate number that inherently ‘describes, locates or indexes anything about an individual.’ Without something connecting the license plate number to an individual, it is just a combination of letters and numbers that does not describe, locate or index anything about anyone.”
Sure, just like IP addresses. The court noted that “the pictures and data associated with each license plate number constitute ‘personal information,’ …. The images of the vehicle, its license plate, and the vehicle’s immediate surroundings, along with the GPS location, time, and date when the image was captured ‘afford a basis for inferring personal characteristics, such as … things done by or to’ the individual who owns the vehicle, as well as a basis for inferring the presence of the individual who owns the vehicle in a certain location at a certain time.”
So, applying this rationale to an IP address, a “naked” IP address, like a “naked” license plate number reveals and means nothing. Add to that the ability to link it to a person or a machine linked to a person, and you’re getting closer. Add to that the ability to track that “person” and their activity over time, and you’ve got personal data.
So whether a number is personal data often depends on what you do with it.
When a person uses a P2P network to download (pirate) movies, music or software, typically the only information that can be gleaned from the P2P network is the pirate’s IP address. Recently, the makers of the 2015 Adam Sandler movie, “The Cobbler” pursued movie pirates by suing for copyright infringement and then attempting to idenitify the real human beings behind the IP addresses by compelling the ISP’s to deliver subscriber information.
Now the Cobbler got a rousing 9 percent on Rotten Tomatoes, and if you don’t remember the film, you can be excused. However, a recent California federal appeals Court ruling related to a person alleged to have unlawfully downloaded this movie. According to the case, the movie production company “identified an IP address located in Portland, Oregon, that had downloaded and distributed The Cobbler multiple times without authorization. [The production company] filed suit against the unknown holder of the IP address.” The court went on to note that “Records subpoenaed from Comcast identified Thomas Gonzales as the subscriber of the internet service associated with the IP address.”
Further investigation however revealed that Gonzales was a resident of an adult care facility, and that “the internet service was accessible to both residents and visitors at an adult care home.” The studio’s investigator concluded that “it does not appear that [Gonzales] is a regular occupant of the residence or the likely infringer.” Gonzales refused to share the names or work schedules of the individuals living and working in the home without a court order. Nevertheless, the studio continued to prosecute its copyright infringement case (and a claim of contributory infringment) against Gonzales.
So the court had to decide whether an IP address was enough to establish conduct by the owner of that address. Of course, it was not. The court dismissed the infringement case against Gonzales, noting:
“Although copyright owners can often trace infringement of copyrighted material to an IP address, it is not always easy to pinpoint the particular individual or device engaged in the infringement. Internet providers, such as Comcast or AT&T, can go so far as to identify the individual who is registered to a particular IP address (i.e., an account holder) and the physical address associated with the account, but that connection does not mean that the internet subscriber is also the infringer. The reasons are obvious—simply establishing an account does not mean the subscriber is even accessing the internet, and multiple devices can access the internet under the same IP address. Identifying an infringer becomes even more difficult in instances like this one, where numerous people live in and visit a facility that uses the same internet service. While we recognize this obstacle to naming the correct defendant, this complication does not change the plaintiff’s burden to plead factual allegations that create a reasonable inference that the defendant is the infringer.”
The only connection between Gonzales and the infringement was that he was the registered internet subscriber and that he was sent infringement notices.
So, does this mean that an IP address is NOT linked with a specific person’s activities and therefore is NOT personal data?
Not so fast.
Context, Context, Context
What’s missing in each of these cases is context. An IP address can be personal data if you want to track what a specific person is doing and you have the ability to do so and you have reasonable access to the data necessary to make that happen. If you are collecting IP information to see what porn sites employee John Smythe is going to—then it’s personal data. If you are doing load balancing or general traffic analysis, then no.
IP logging for security purposes has characteristics of both. For the most part, the logs are used to get a general picture of the security posture of a company. They show trends and general information. Even the “specific” information does not generally identify a specific individual except by IP address. We may know that an IP address has been identified as the source of a particular attack, and we may—with other information—suspect that that IP address is associated with a threat actor in Belarus. That doesn’t yet make it personal data. If we then tie that information to a hacker group—say, Not So Fancy Bear—we are getting closer. Tie it to a person, it’s personal data.
But we’re still not done. Personal data can still be collected provided that there is a “lawful basis” for such collection (and the collection, use and analysis is limited to the lawful purpose, and that the data is only kept for the time necessary for that use, and blah blah blah …) Securing a network or device is certainly a lawful basis, and IP information collection and processing is necessary for that purpose.
But the devil is in the details. If you are overcollecting, overstoring or even overanalyzing the data, you may run afoul of GDPR. Best advice is to get good advice.