Talking Serverless And AWS Lambda Security With Jeff Forristal

Introduction

In my previous blog interview with Jeremiah Grossman, I mentioned that throughout the years, I befriended a small group of people, with which every discussion is always intriguing, challenging and truly inspiring. Jeff Forristal is another old acquaintance, who I hold the utmost respect for. Jeff is an accomplished thought leader and writer, having written multiple features and cover-story articles for Network Computing and Secure Enterprise magazines; He is also a contributing author to multiple books.

For many years, Jeff was known under the pseudonym “Rain Forest Puppy” one ofjeff_forristal the leading recognized industry experts in web application security who was responsible for noted industry landmarks including the first documented discovery of SQL injection (you heard it right!), Poison Null Byte, the first responsible security disclosure policy (RFPolicy), and the first intelligent web application scanner (Whisker). Jeff presented his research in many forums, from established security events like RSA, BlackHat and CanSecWest to smaller regional conferences around the world.

I should probably prepare you – this discussion is going to be technical. Very technical, covering topics such as Serverless and AWS Lambda security, automated testing, etc. So, hold on tight.

==

Ory: Hi Jeff, have you already had a chance to develop on, or experiment with any serverless platform? which? why?

Jeff: I have used AWS Lambda for certain tasks. Honestly, my early attempts to wrangle AWS API Gateway to connect to a Lambda function for inbound HTTP requests left me a bit turned off from using API Gateway, simply because it felt too complicated and required explicit choices on exactly what I wanted to pass through. In the earlier days of serverless, I was also being hit with too much cold-start overhead for infrequently called APIs. Fortunately, things have become better as the technology has matured.

I have leveraged Lambda for mildly interactive websites, serving the majority of static content from S3 (web site publishing) and saving a few dynamic functions for a Lambda.  Elsewhere, I have some small infrastructure housekeeping tasks that run out of Lambda.  But I find a lot of my workloads are heavy, usually needing more than the five-minute Lambda execution allowance to get the job done. I wind up putting those into “run for as long as you need to, until done” container launches.

Ory: can you elaborate a bit about the use-cases which didn’t quite fit the current serverless model?

Jeff: In terms of scenarios not fitting serverless, it’s simply a matter of the “right tool for the job”. A lot of jobs can use serverless, but not all jobs. Five recent examples where serverless wasn’t the right tool for the job:

  • An associate of mine has been implementing COTS security tools into Lambdas, as a way to create one-shot on-demand security tool runs (for lower DevSecOps TCO). For some tools, it works great! For other tools (e.g. Truffle Hog and web site crawlers), the execution duration is unpredictable and often exceeds serverless limits.
  • I knew an organization that needed to Antivirus (AV) scan documents being PUT into a S3 bucket. The preliminary architecture was to put AV command line tool(s) into Lambda and trigger a Lambda scan upon the PUT event, since a single document scan is a relatively fast event (<= 30 seconds). Turned out that getting the AV tools wrapped into Lambda proved really challenging, so they abandoned serverless in favor of the PUT event going into a queue that was received by a container worker. The DevSecOps TCO wound up lower that way, as all the AV tooling became generic and standard installs in the container environment.
  • I have an analytics batch job I run once per day, it streams in a 300GB compressed file and processes it (i.e. it takes this huge haystack and finds the relevant needles needed from within it). It’s streaming based, so it actually uses zero disk and would fit well into serverless … except it takes 25-40 minutes for the job to run a single pass of that data.  So instead I just use spot instances to run the job and terminate the instance when the job is done.
  • I knew an organization that needed to process transaction requests using operations from CloudHSM. In the CloudHSM v1 world, the method used to setup/authorize a machine to use the CloudHSM Luna client did not conceptually fit with Lambda. Maybe CloudHSM v2 is better, I haven’t checked.
  • For better or worse, justified or not, some organizations have policy/procedure requirements that require making assertions about the compute environment. For example, the compute may need to happen in a PCI certified environment (Lambda didn’t make PCI compliance until July 2017). Elsewhere, I know of organizations where all compute must occur in the presence of various intended security monitoring agents installed on the system/container (e.g. OSQuery, Tanium, McAfee, Cylance, Qualys, Encase, Eclypsium), and the organizations haven’t yet tackled how to fit serverless into their more traditional risk management SOP.

Honestly, I would probably do more with serverless if given a 1-2 hour execution limit and a bit more temporary disk space. I totally agree that a lot of the world consists of small computing tasks (most of the web world in fact!), and serverless is well positioned to tackle all of those. But for the stuff I generally deal with, the runtime dynamics don’t always fit into Lambda.  So, grab the next best-fit tool/thing to get the job done.

Ory: Regarding some of the use cases you mentioned, which require specific environments – some serverless platforms allow you to bring your own container. For example – IBM cloud functions (OpenWhisk). Have you considered this approach?

Jeff: Yes, there are more and more cloud provider options showing up these days, especially as further needs emerge and the market works to answer them. Many of the big organizations I know still have strong preference towards the main three platforms: AWS, GCP, and Azure. One organization I worked with (Capital One) was significantly focused on AWS only (there are a lot of public articles regarding Capital One’s usage of AWS). As a regulated financial institution, they do not have the luxury to place parts of their compute workload into arbitrary vendor environments without a lot of environment vetting. And it’s not cost-effective to go vet every vendor just because that vendor may be incrementally better than AWS in some corner-case functionality…

Ory: You’re a natural born security researcher (and hacker), have you stopped to think about the security implications of serverless, or on the consequences (security wise) of developing applications on serverless architectures? 

Jeff: I think the natural serverless aspect of disposing of the compute environment after running a single task will significantly disrupt adversary TTPs that leverage any form of persistence.  Granted, multi-request persistence is already disrupted due to server clustering, but now an adversary’s foothold will be wiped away within minutes, leaving them to start over. Outside of simple and non-persistent attacks like SQL injection that could still be done manually, attacks will need significantly more automation even for simple things like recon, enumeration, and lateral scanning.

Ory: People have been developing serverless applications since the end of 2014 when AWS Lambda was released. We’re nearing the end of 2018, and there are no automated tools or even documented best practices for security testing serverless applications besides PureSec’s. Any idea why? when do you think we will start seeing tools, scripts, pen-testing frameworks?

Jeff: I think the answer is similar to why lots of existing security scanner and DAST tools also do not accommodate microservices and binary transports (gRPC, Thrift, etc.) very well: these deployment scenarios are not pervasive enough (yet) to justify the investment.  If 70%+ of an organization’s compute is done as traditional HTTP services, tool vendors targeting traditional HTTP will cover the majority of compute across all server deployment methods (bare metal, virtual system, container, and HTTP-fronted serverless, regardless of on-site, datacenter, or cloud). Specialized protocols (Thrift, gRPC) and cloud serverless accommodation (which is cloud provider specific) are very narrow features/support that offer less ROI compared to making the traditional tool coverage more valuable.

Here is an interesting point to ponder: a lot of security testing tools tend to come out of offensive security camps (think: pen-testers, red teams, vulnerability assessors, etc.) as a method to automate and scale their offensive security agenda. Defensive security personnel and developers, in my encounters, do not tend to invest heavily in offensive approaches to their defensive goals/KPIs (in my experience they prefer SAST, analytics, etc.). Going back to my previous mention regarding TTPs, serverless is not an attractive target in a typical “land, persist, expand” campaign approach. I simply see the offensive folks de-prioritizing serverless in favor of richer server-full targets and things with useful infrastructure behind/under them. Otherwise their Cobalt Strike installation/license gathers dust. 🙂

Ory: Given your deep understanding of DAST / SAST / IAST technologies, I was wondering what are your thoughts regarding the ability to adapt (and adopt) these technologies for testing serverless applications. Personally, with regards to SAST, I see an inherent issue with being able to follow data flow between functions, when what glues them together are cloud-native events that are not really in your code. So, data can flow between two functions in the same application, through an event, and your data flow analysis will never pick that up.

Jeff: I think the SAST point you bring up applies to even server-full microservice meshes and discretely separate HTTP endpoints within the same server. It’s not a server-less problem, but a data-flow analysis problem when trying to analyze the entire system operation end-to-end in the presence of external resources and application boundary callouts. Second-order SQL injection is a great example of an equivalent server-full data analysis target — SAST needs to track data flow across disparate call entry points that happen to use common external resources.

Just like networks have lost their discrete perimeters, applications are losing their boundaries too. Tools like SAST and DAST will need to evolve to deal with service meshes, distributed workload compute, event-driven architectures with external stimuli, and the like, because that is where everything is going.  The real question is: when is that tipping point?  I sense commercial tool vendors do not yet feel their revenue is eroding from not having this coverage, so they may be investing elsewhere for the time being.  Once there is enough money in that space to be had, I’m sure the DAST vendors will turn an eye towards it. But at this time, I suspect they just don’t see enough dollars there to go first, and their current coverage gets them cloud + datacenter, the lion’s share of the coverage needs.  Even more likely, the existing DAST players will just wait for a smaller startup to go first/create a tool, then look to acquire it when their need is right (much like the CASB market). If no one is currently building serverless security tools (startup or established DAST players), is that indicative of a currently low TAM?

Setting tools aside, I believe the security testing methodologies (which are what a DAST should theoretically be implementing) still hold: look at the data inputs to a target (preferably with context), and start submitting input stimuli and witness response/feedback accordingly, to infer various classes of attack.

Ory: Regarding your last DAST remarks – let’s assume you have a Lambda function that only gets invoked via an SNS message event. It is not exposed through API gateway. How would you test it then? Will the DAST scanner start sending SNS messages to the SNS service? And what if that service is not exposing an HTTP endpoint? You see where I’m going with this. 

Jeff: The benefit of serverless is that it comes with a consistent, programmatic API related to discovering and configuring, and a deterministic invoke/event model tied to it. A serverless-aware DAST given read access to the AWS APIs can enumerate all Lambda functions and determine their event input triggers. From there, the DAST can fabricate the right event inputs for the given event sources/triggers, exactly like the AWS Lambda test button does in the AWS console (it provides you an event input template, and you just modify the specific data fields). 

So, in your proposed case, the DAST can be told (or pragmatically discover) the Lambda function receives events from SNS, so it can fabricate an SNS event template and direct invoke the Lambda with the SNS event data. Further, the DAST can be intelligent enough to know what event fields are infrastructure deterministic (i.e. don’t test them), versus which fields are subject to dynamic influence (e.g. the SNS content, or the bucket & key name of an S3 event, etc.). It can then do the usual injection testing on those fields while ignoring the rest. If it was a super smart DAST, it could pull prior invoke log entries out of the CloudTrail logs to seed working input values. But maybe that starts to make it an IAST?

Overall though, not much different than taking a swagger or WSDL that lists the endpoint & parameter fields, eliminating the untestable fields, asking the DAST user to provide any specific or guiding values for select fields, and blindly injecting into the rest? The difference here is in the protocol/data delivery, as it needs to be delivered as an event JSON blob to a Lambda function invoke.

Note I am talking about directly testing the serverless function, not backing up and DAST’ing the whole infrastructure. In your case, SNS itself wasn’t tested…but does it need to be? Test the SNS event receiver directly, that’s the logic you are looking to have covered by your application development team.

Ory: I agree 100% – but as I noted, testing serverless with DAST requires heavy changes from the DAST vendors. Not sure if you’ve seen this, but I actually published a local proxy that transforms sqlmap into AWS Lamda invokes –

https://www.puresec.io/blog/automated-sql-injection-testing-of-serverless-functions-on-a-shoestring-budget-and-some-good-music

Jeff: I didn’t see, cool! But yes, COTS HTTP DAST will not be useful, the DAST implementations need changing to be aware. But writing a serverless security scanner from scratch is also heavy work, with the addition of having to build all the DAST inference engines (shell command injection engine, SQL injection engine, etc.). And a lot of the work has nothing to do with any of the technical logic coverage…you need UIs, reporting capabilities, dashboards, remediation integration, etc. to really be viable in many organizations. Current DASTs have that basic infrastructure and the attack inference engines, they just need a transport adjustment.

I also consider DAST to be more than just HTTP. True, a lot of commercial popular DAST tools are centric to HTTP/HTML, because that’s a huge market. But there are other less-popular DAST tools, particular fuzzers and fuzzer-like things, that talk all manners of protocols beyond HTTP. DAST being a black-box testing methodology, the implementations choosing to apply that methodology over (just) HTTP. IMHO DAST tools being centric to HTTP is a limit made by that DAST tool vendor, not DAST methodologies/approaches overall.

==

We’d like to thank Jeff for taking the time to do this interview with us.



*** This is a Security Bloggers Network syndicated blog from PureSec Blog (Launch) authored by Ory Segal, PureSec CTO. Read the original post at: https://www.puresec.io/blog/talking-serverless-and-aws-lambda-security-with-jeff-forristal