The internet is vast and full of data that is publicly available to anyone with the time, or technology, to mine for insights.

You can find everything from years of NYC taxi cab data and Uber information to more obscure datasets about every Jeopardy question in history or every single Iowan liquor store receipt since 2014. The volume of data availability is staggering, and it’s poised to only grow with players like Amazon supporting publicly available AWS Datasets.

There is so much free data out there that thriving companies have built entire business models based upon farming, organizing, and selling insights on free, publicly available data.

Surrounding all this available data, not to mention what lies at the core of a recent lawsuit filed by startup hiQ against LinkedIn, is the question that has plagued the internet since the first time my Gateway 2000 screamed out over the internet at 14.4k: who controls the data?

Why do you care?

The key question underpinning this legal case will see at least some resolution in March of 2018 when hiQ has its day in court against LinkedIn.

Central to the case are several sub-questions related to data ownership and control: 1) may a hosting site prohibit third-party entities from scraping otherwise publicly available data?, and 2) Does a hosting company have the right to control access to data that its users make publicly available?

hiQ v. LinkedIn – The Basics

hiQ is a company built upon scraping publicly available data on LinkedIn.

Specifically, hiQ tracks user-generated changes to profiles in areas like work history and skills. hiQ takes this data, does its magic, and offers two products: 1) Keeper, which helps companies identify employees who are at risk of being recruited away; and 2) Skill Mapper, which helps companies map (Read more...)