Home » Cybersecurity » DevOps » An Oxymoron : Static Analysis of a Dynamic Language (Part 4)

An Oxymoron : Static Analysis of a Dynamic Language (Part 4)

by Chetan Conikee on June 24, 2020

An Oxymoron : Static Analysis of a Dynamic Language (Part 4)

Taint Flow challenges in a world of untyped and async event handling

From the previous post we concluded that type-checking at compile-time can help enforce better practices and reduce the likelihood of vulnerabilities.

Many such tools rely on static analysis to approximate a program’s behavior. One program representation that is commonly used in static analysis is the call graph, which associates with each call site in a program the set of security sensitive functions that may be invoked from that site. For example, a tool for finding security vulnerabilities might use a call graph to detect possible data flows from tainted inputs to security sensitive operations.

A (traditional) call graph is a directed graph that connects call sites with call targets (i.e., function declarations). Call graphs are useful for taint-flow analysis, debugging, refactoring, and many other applications. In languages with polymorphism and/or higher-order functions, the call graph is not immediately available from the source code, but must be statically approximated (fuzzy approach).

Call graph helps answer two key questions:
– Who are the callers of a function? (i.e., “who calls me”)
– Who are the callees of a function? (i.e., “who do I call?”)

Challenge #1: Values

A basic part data-flow analysis is to model the values that can appear at runtime. JavaScript is dynamically typed. This means that an analysis must be able to handle the fact that a given variable can contain values of different types, string and integer for instance, at different points during execution. JavaScript further complicates this by supporting a myriad of coercions between types.

The following JavaScript program illustrates a case where multiple types come into play for a single variable:

1  if (foo) 
2     var x = "I am a string"
3  else 
4     var x = 30

5  o.p = x

In the above example the value of x depends on the boolean value foo. If the value of foo is not known, then x can be either a string or an integer when assigned to the property. For the analysis to be useful we must track both possibilities.

Challenge #2: Control Flow

JavaScript has several features that complicate control flow analysis. Higher order functions mean that data-flow and control flow are entangled as functions can be passed as values. JavaScript also supports throwing and catching exceptions, which lead to alternative paths of execution that must also be tracked.

The following snippet illustrates code using higher order functions that an analysis must be able to handle.

1  function SpreadSheetCell(j) { 
2    this.c = j ;
3    this.get = function() { 
4        return this 
5    };
6    this.set = function(nc) { 
7        this.c = nc 
8    }; 
9  }

10  var ss_cell_1 = new SpreadSheetCell("string_value") 
11  var ss_cell_2 = new SpreadSheetCell({}) 
12  console.log(ss_cell_1.get())
13  console.log(ss_cell_2.get())

This code defines a mutable spreadsheet cell containing one value. JavaScript has no notion of classes, but the function SpreadSheetCell acts as a class or a blueprint for SpreadSheetCell objects. SpreadSheetCell objects have two methods, get and set and one property storing the actual value.

To correctly analyze the above example the analysis must take the following into account:

The functions created on line 3 and 6 must be tracked to the set and get property of the two objects created on line 10 and 11. This is data-flow analysis.
When invoking the get method, the this identifier must be bound to the correct object in the body of the getter and setter function.
When analyzing the body of the SpreadSheetCell function, the this identifier must be bound to a newly created object, since it was invoked with the new keyword.

As this example demonstrates, it is not possible to analyze control flow and data-flow separately

Challenge #3: Asynchronous event handling

Traditional call graph based approach reflects only the inter-procedural flow of control due to function calls, and ignores the event-driven flow of control in many JavaScript applications. Node.js applications are typically written in an event-driven style which heavily relies on callbacks that are invoked when an asynchronously executed operation has completed. Several new types of bugs may arise in event-driven programs. For example, an application may emit events for which no listener has been registered yet, or an event listener may be unreachable code because an event name was mis-spelled or the event listener was registered on the wrong object.

However, in languages with asynchronous callbacks and event listeners a traditional call graph provides incomplete information because it does not reflect precisely how events give rise to indirect calls.

Refer to an example below to illustrate challenges with conducting data flow tracking on untyped async handlers

1 var fs = require('fs');
2 var rest = require('restler');
3
4 var restlerHtmlFile = function(url) {
5  rest.get(url).on('complete ', function(res) {
6   fs.writeFileSync('file.html',res);
7  });
8 };
9
10 if (require.main == module) {
11  restlerHtmlFile('http://ec2***.blah.aws.com/');
12  fs.readFileSync('file.html');
13 } else {
14  exports.checkHtmlFile = checkHtmlFile ;
15 }

This example portrays a problematic Node.js code fragment taken from StackOverflow (https://stackoverflow.com/questions/19081270/why-my-fs-readfilesync-does-not-work ), which relies on the restlr library to facilitate interaction with HTTP servers.

Lines 4–8 assign a function to variable restlerHtmlFile.

The call rest.get(Url) within this function creates a GET request to obtain the contents of a URL. The call .on(‘complete’,…) on line 5 serves to write the page contents to file.html when the request completes.

Then, on line 11, the function bound to restlerHtmlFile is invoked to read the contents of the URL (http://obscure-refuge-7370.herokuapp.com/) into the file file.html, and on line 12, this file is read by calling fs.readFileSync(‘file.html’). The programmer reports on StackOverflow that the program crashes with an error message “no such file or directory ’file.html’ at Object.fs.openSync (fs.js:427:18)”.

There are various ways in which the code can be fixed.

One solution, which is suggested on StackOverflow, is to move the call fs.readFileSync(‘file.html’) inside the definition of the asynchronous event listener, so that it will not execute before the writing of the file has completed.

The question at this point is how the programmer could have observed that the code is buggy. Here, the key issue is that the programmer implicitly assumed that the read operation on line 13 will always execute after the write operation on line 6.

The SAST engine should be able to determine that no such ordering exists (in async handler context) from a data flow tracking perspective. In an event-based system the control-flow of the program is not immediately available from the program syntax. Events may cause the program switch between event listeners at pre-determined points in the execution.

Some of the possible bugs related to event-based systems:

Dead Emits: A dead emit occurs when an emit expression does not cause any event listener to be scheduled. A typical cause of this bug is emitting the wrong event or emitting it on the wrong object. The NodeJS API makes these mistakes more common by using similar sounding event names, e.g. connect vs. connection.
Dead Listeners: A dead listener occurs when an event listener is registered on an object for some event, but that event is never emitted after the registration. As before, the event listener may be registered on the wrong object or for the wrong event. A program may have a dead listener independently of a dead emit.
Data Race: A data race occurs when two event listeners communicate via shared state. For example, one event listener may write a piece of global state, which is then read by the second event listener. If the write is required to happen before the read can succeed then there must be a may-happen-before relationship between the write and read, and no other possible relationships.

A static analysis engine should be able to approximately reason about events and event listeners to detect bugs that can become security vulnerabilities.

In the next part of this series we will illustrate how these challenges can be overcome using Code Property Graphs.

An Oxymoron : Static Analysis of a Dynamic Language (Part 4) was originally published in ShiftLeft Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.