I can’t seem to free my infrequently-viewed email inbox from “you might like!” notices by the content-lock-in site Medium. This one made it to the iOS notification screen (otherwise I’d’ve been blissfully unaware of it and would have saved you the trouble of reading this).
Today, they sent me this gem by @JeromeDeveloper: Scrapy and Scrapyrt: how to create your own API from (almost) any website. Go ahead and click it. Give the Medium author the they so desperately crave (and to provide context for the rant below).
I have no issue with @JeromeDeveloper’s coding prowess, nor Scrapy/Scrapyrt. In fact, I’m a huge fan of the folks at ScrapingHub, so much so that I wrote
splashr to enable use of their Splash server from R.
My issue is with the example the author chose to use.
CoinMarketCap provides cryptocurrency prices and other cryptocurrency info. I use it to track cryptocurrency prices to see which currency attackers who pwn devices to install illegal cryptocurrency rigs might be switching to next and to get a feel for when they’ll stop mining and go back to just stealing data and breaking things.
CoinMarketCap has an API with a generous free tier with the following text in their Terms & Conditions (which, in the U.S. [soon] may stupidly be explicitly repeated & required on each page that scraping is prohibited on vs a universal site link):
There is only one reason (apart from complete oblivion) to use CoinMarketCap as an example: to show folks how clever you are at bypassing site restrictions and eventually avoiding paying for an API to get data that you did absolutely nothing to help gather, curate and setup infrastructure for. There is no mention of “be sure what you are doing is legal/ethical”, just a casual caution to not abuse the Scrapyrt technology since it may get you banned.
Ethics matter across every area of “data science” (of which, scraping is one component). Just because you can do something doesn’t mean you should and just because you don’t like Terms & Conditions and want to grift the work of others for fun, profit & also doesn’t mean you should; and, it definitely doesn’t mean you should be advocating others do it as well.
Ironically, Medium itself places restrictions on what you can do:
yet they advocated I read and heed a post which violates similar terms of another site. So I wonder how they’d feel if I did a riff of that post and showed how to setup a hackish-API to scrape all their content. O_o
*** This is a Security Bloggers Network syndicated blog from rud.is authored by hrbrmstr. Read the original post at: https://rud.is/b/2018/12/02/more-scraping-ethics-gone-awry-and-why-do-this-when-theres-a-free-api/