HTTP is a ubiquitous protocol and is one of the cornerstones of the web. If you are a newcomer to web application security, a sound knowledge of the HTTP protocol will make your life easier when interpreting findings by automated security tools, and it’s a necessity if you want to take such findings further with manual testing. What follows is a web security-focused introduction to the HTTP protocol to help you get started.
HTTP is a message-based (request, response), stateless protocol comprised of headers (key-value pairs) and an optional body. Three versions of HTTP have been released so far – HTTP/1.0 (released in 1996, rare usage), HTTP/1.1 (released in 1997, wide usage), and HTTP/2 (released in 2015, increasing usage).
The HTTP protocol works over the Transmission Control Protocol (TCP). TCP is one of the core protocols within the Internet protocol suite and it provides a reliable, ordered, and error-checked delivery of a stream of data, making it ideal for HTTP. The default port for HTTP is 80, or 443 if you’re using HTTPS (an extension of HTTP over TLS).
HTTP is a line-based protocol, meaning that each header is represented on its own line, with each line ending in a Carriage Return Line Feed (CRLF) with a blank line separating the head from the optional body of the request or response.
Up to HTTP/1.1, HTTP was a text-based protocol, however, with HTTP/2 this has changed. HTTP/2, unlike its predecessors, is a binary protocol with most implementations requiring TLS encryption. It’s worth noting that for the vast majority of cases (and certainly, for this article) interacting with the HTTP/2 protocol won’t be any different. It’s also worth mentioning that HTTP/1.1 isn’t going away anytime soon, and it’s still early days for HTTP/2 (as such, HTTP/1.1 will be referenced throughout this article) even though it is supported by all major web servers such as Apache and NGINX, as well as modern browsers such as Google Chrome, Firefox, and Internet Explorer.
In order to initiate an HTTP request, a client first establishes a TCP connection to a specified web server on a specified port (80 or 443 by default).
The request would start with an initial line known as a request line, which contains a method (GET in the following example, more on this later), a URL (/, indicating the root of the host in the below example) and the HTTP version (HTTP/1.1 in the below example). We must also include a
Host header in order to tell the HTTP client where to send this request.
GET / HTTP/1.1 Host: www.example.com
The above is exactly what a web browser does when you type in http://www.example.com into its URL bar. If we wanted to get the contents of http://www.example.com/about.html, we would send the following request instead:
GET /about.html HTTP/1.1 Host: www.example.com
HTTP Request Methods
The HTTP protocol defines a number of HTTP request methods (sometimes also referred to as verbs), which are used within HTTP requests to indicate to the server the desired action for a particular resource.
|GET||The GET method is used to retrieve a resource from a server.|
|POST||The POST method is used to submit data to a resource.|
|TRACE||The TRACE method is used to echo back anything sent by the client. This HTTP method is typically abused for reflected Cross-site Scripting attacks.|
|PATCH||The PATCH method is used to apply partial updates to a resource.|
|PUT||The PUT method is used to replace a resource.|
|HEAD||The HEAD method is used to retrieve a resource identical to that of a GET request, but without the response body.|
|DELETE||The DELETE method is used to delete the specified resource.|
|OPTIONS||The OPTIONS method is used to describe the supported HTTP methods for a resource.|
|CONNECT||The CONNECT method is used to establish a tunnel to the server specified by the target resource (used by HTTP proxies and HTTPS).|
On the server-side, an HTTP server listening on port 80 sends back an HTTP response to the client for what it has requested.
The HTTP response will contain a status line as the first line in the response, followed by the response. The status line indicates the version of the protocol, the status code (200 in the below example), and, usually, a description of that status code.
Additionally, the server’s HTTP response will typically also include response headers (
Content-Type in the below example) as well as an optional body (with a blank line at the end of the head of the request).
HTTP/1.1 200 OK Content-Type: text/html <html> ... </html>
Response Status Codes
HTTP response status codes are issued by the server within an HTTP response to let the client know what the status of the request is. Status codes are organized in the following categories.
|Status code group||Description|
Some of the most relevant HTTP status codes for web application security testing are the following, however, a full list of status codes and their descriptions may be found here.
|Status code group||Description|
|200 OK||Indicates that the request has succeeded.|
|301 Moved Permanently||Indicates that the resource requested has been permanently moved to the URL within the |
|302 Found (Temporary Redirect)||Indicates that the resource requested has been permanently moved to the URL within the |
|400 Bad Request||Indicates that the server could not understand the request sent by the client, usually due to invalid syntax.|
|401 Unauthorized||Indicates that the request could not be served due to insufficient authentication.|
|403 Forbidden||Indicates that the server understood the request but refuses to authorize it.|
|404 Not Found||Indicates that the server cannot find the requested resource.|
|405 Method Not Allowed||Indicates that the request method is known by the server, but it is not allowed to be used with this resource.|
|500 Internal Server Error||Indicates that the server encountered an unexpected condition that prevented it from fulfilling the request.|
The query string is defined using the question mark (
?) character after the URL within an HTTP request. The query string defines a series of key-value parameters separated by the ampersand (
GET /search?query=example&lang=en_US HTTP/1.1 Host: www.example.com
Query string parameters are one of the primary mechanisms that web applications use as user input. It’s, therefore, no surprise that most web application vulnerabilities arise from poorly handled user input within query string parameters.
URL encoding is a way to represent characters that cannot (or should not) be present within URLs to be represented safely within a URL. This allows encoding and decoding of characters that would otherwise cause problems or conflicts. The following are some examples of URL encoded characters:
Since HTTP is a stateless protocol, cookies are a built-in mechanism to pass state data to the server. Typical examples included in cookies would be state information such as session identifiers and user preferences.
Cookies are crucial to security since they are widely used to store session information. This means that if an attacker can steal a user’s cookie (using attacks such as Cross-site Scripting, for example), in many web applications, this alone provides the attacker with all they need to impersonate that user.
Cookies are set by the server using the
Set-Cookie HTTP response header. The browser then stores the cookie value and submits it with every request. This may also introduce vulnerabilities such as Cross-site Request Forgery. The cookie value may contain several values delimited by a semicolon (
Additional security features around cookies include
The HTTP protocol includes two types of built-in authentication mechanisms: Basic and Digest. While these two methods are built-in to HTTP, they are by no means the only authentication methods that can leverage HTTP, including NTLM, IWA (Integrated Windows Authentication, also known as Kerberos) and TLS client certificates. Additionally, form authentication, OAuth/OAuth2, SAML, JWT, and a whole host of other types of authentication options re-use features within HTTP such as form data or headers to authenticate a client.
Basic authentication is a built-in HTTP authentication method. When a client sends an HTTP request to a server that requires Basic authentication, the server will respond with a 401 HTTP response and a
WWW-Authenticate header containing the value
Basic. The client then submits a username and a password separated by a colon (
:) and base64-encoded.
It’s important to note that Basic authentication sends credentials in the clear (without any form of encryption). This means that for Basic authentication alone is not secure, is highly susceptible even to the simplest man-in-the-middle attacks, and must be paired with the use of SSL/TLS.
Digest authentication is also built-in to HTTP and similarly to Basic authentication, it also returns a 401 HTTP response and a
WWW-Authenticate header. In the case of Digest, the
WWW-Authenticate header will contain the value of digest together with a nonce (number only used once) and a realm (defines a URL path, which may share the same credentials).
The HTTP client would then concatenate the supplied credentials together with the nonce and realm and produce an MD5 hash (first hash). The HTTP client then concatenates the HTTP method and the URI and generates an MD5 hash (second hash). The HTTP client then sends an
Authorize header containing the realm, nonce, URI, and the response. The response is an MD5 sum of the two hashes combined.
While digest is a more secure alternative to Basic authentication, it is still highly advised for any authentication traffic to be transmitted over an HTTPS connection (SSL/TLS).
Form-based authentication is by far the most popular kind of authentication. It’s also not standard, in the sense that any application developer can dictate how an HTTP client should authenticate to an application.
Typically, the HTTP client would send a POST request to the server with the combination of a username and a password, after which, if successful, the server will respond with some kind of token. This token could be placed in a
Set-Cookie HTTP header, which would set a cookie in the browser (meaning that this value will henceforth be passed with each request to the server).
Such POST requests can be made by the browser by using a
<form name="login" action="https://login.example.com" method="post"> <input name="username" type="text"> <input name="password" type="password"> <input value="Login" type="submit"> </form>
This would send data from input fields
password (field names are arbitrary) in a POST request to https://login.example.com. The POST request would be as follows:
POST / HTTP/1.1 Host: login.example.com username=myusername&password=mypassword
HTTP headers are a way for an HTTP client and server to pass additional information within requests and responses. HTTP headers consist of a case-insensitive key (may not contain spaces), followed by a colon (
:), which is in turn followed by the header’s value (may contain spaces). The header is terminated by a CRLF (carriage return and line break).
It’s worth noting that while there are a number of standard HTTP headers, the HTTP protocol allows custom headers. Typically, custom headers start with an X- (for example, the
the X-Frame-Options header, and more), however, this is simply a widely adopted convention and not something enforced by the HTTP protocol. Note that some headers were originally custom headers but now are adopted as a standard, for example,
X-Content-Security-Policy is now
The following are some examples of commonly seen HTTP headers. A complete list of standard HTTP headers may be found here.
|Used to specify directives for caching mechanisms in both requests and responses.||Request/Response|
|Indicates to the sender/receiver to keep the TCP connection open or close it.||Request/Response|
|Indicates the MIME type of the request/response body.||Request/Response|
|Indicates the type of encoding used for the request/response body.||Request/Response|
|Indicates the size in bytes of the request/response body.||Request/Response|
|Contains cookie values, which were set using the ||Request|
|Contains the hostname of the URL being requested. This is a required header in HTTP/1.1 and it is important since the same server may serve different sites based on this header. This header is also important to keep in mind when defending against host header attacks.||Request|
|Indicates to the server the page from which a link was clicked from. The header was misspelled in the original specification and has remained so instead of being changed to referrer.||Request|
|Contains the authentication method together with credentials.||Request|
|Contains a string to identify the client (browser or another tool) that is making the request.||Request|
|Indicates to the server what content MIME type it will accept.||Request|
|Indicates to the server what types of encoding it will accept.||Request|
|Sets a cookie in the browser that will be later submitted.||Response|
|Indicates the type and version of the server. This information could be useful for attackers.||Response|
Conclusion and Further Reading
This sums-up some useful basics of the HTTP protocol that should get you up to speed with the terminology surrounding HTTP. Now that you’ve covered the basics, you should be able to understand the reports containing HTTP requests, responses, and other HTTP attributes, which means the next time you generate an Acunetix Developer Report, you should be sorted.
If you want to read more about HTTP security, we recommend you to have a look at our SSL/TLS basics series that includes the explanation of HTTPS, public keys, and SSL certificates, our article on HTTP Strict Transport Security and HSTS headers, our article about Sameorigin, as well as our article about clickjacking. You can also read the detailed description of the content security policy (CSP) HTTP security header by the Mozilla foundation.
*** This is a Security Bloggers Network syndicated blog from Web Security Blog – Acunetix authored by acunetix. Read the original post at: http://feedproxy.google.com/~r/acunetixwebapplicationsecurityblog/~3/MMM3WJ9BbX0/