A Workaround For When Anti-DDoS Also Means Anti-Data

More sites are turning to services like Cloudflare due to just how stupid-easy it is to DDoS a site. Sometimes the DDoS is intentional (malicious). Sometimes it’s because your bot didn’t play nice (stop that, btw). Sadly, at some point, most of us with “vital” sites are going to have to pay protection money to one of these services unless law enforcement or ISPs do a better job stopping DDoS (killing the plethora of pwnd IoT devices that make up one of the largest for-rent DDoS services out there would be a good start).

Soapbox aside, sites like this one — https://www.bitmarket.pl/docs.php?file=api_public.html — (which was giving an SO poster trouble) have DDoS protection enabled but they also want you to be able to automate the downloads (this one even calls it an “API”). However, try to grab one of the files there with your browser and you’ll likely see a Cloudflare interstitial page which eventually gets you the data.

Try the same thing with download.file() or httr::GET() and you’ll run into trouble since neither of those two functions have a way to perform the javascript challenge execution which ultimately is posted (well, GETted in this case) to a checker endpoint which eventually redirects to the original URL with enough 🍪🍪🍪 to ensure you won’t be bothered again.

Cloudflare has captcha and other types of interstitials, but if you happen on the 503+javascript challenge one, have I got a package for you! Meet: cfhttr🔗.

The singular function (for now) — cf_GET() — does the following:

  • Makes an httr::GET() call with the initial URL
  • Checks to ensure it’s both on Cloudflare and is using the javascript challenge protection scheme
  • Slices the javascript and tweaks it enough to enable running it in V8
  • Retrieves the challenge computation from V8
  • Posts (well, httr::GET()s it since that’s what Cloudflare expects) the challenge form with the proper Referer header and hopefully passes the test so you get your content.
devtools::install_github("hrbrmstr/cfhttr")

library(cfhttr)

res <- cf_GET("https://www.bitmarket.pl/graphs/BTCPLN/90m.json")
## Waiting 5 seconds...

str(httr::content(res, as="parsed"))
## List of 90
##  $ :List of 6
##   ..$ time : int 1512908160
##   ..$ open : chr "48000.00000000"
##   ..$ high : chr "48100.00000000"
##   ..$ low  : chr "48000.00000000"
##   ..$ close: chr "48100.00000000"
##   ..$ vol  : chr "0.00124821"
##  $ :List of 6
##   ..$ time : int 1512908220
##   ..$ open : chr "48100.00000000"
##   ..$ high : chr "48100.00000000"
##   ..$ low  : chr "48100.00000000"
##   ..$ close: chr "48100.00000000"
##   ..$ vol  : chr "0.00000000"
##  $ :List of 6
##   ..$ time : int 1512908280
##   ..$ open : chr "48100.00000000"
##   ..$ high : chr "48100.00000000"
##   ..$ low  : chr "48100.00000000"
##   ..$ close: chr "48100.00000000"
##   ..$ vol  : chr "0.00000000"
## ...

FIN

If you end up using this in workflows and run into a problem, it likely means that Cloudflare changed the challenge code page. Please file an issue so I can update the code.

*** This is a Security Bloggers Network syndicated blog from rud.is authored by hrbrmstr. Read the original post at: https://rud.is/b/2017/12/10/a-workaround-for-when-anti-ddos-also-means-anti-data/