Security Foundations: HTTP Basics Tutorial

Uncover the fundamentals of HTTP in this beginner-friendly way. HTTP Basics Tutorials is a guide for those new to information security or diving into web application security.

Security Foundations: HTTP Basics Tutorial
HTTP Basics for Web Application Security

Discover the internals of HTTP with the following beginner's tutorial. If you are new to information security or getting started in web application security, this post will walk you through the HTTP basics and foundations needed.

Introduction to HTTP and its importance

HTTP, which stands for "Hypertext Transfer Protocol", is the foundation used for data communications on the World Wide Web. In simple words, this protocol lets communication between our web browser and web servers.

Understanding the HTTP is crucial for anyone who is interested in getting into security testing. It is a basic foundation for all the data communication on the internet. Knowing about the HTTP Requests, Responses, and Status Codes will help us to understand more about a web application and its behaviour during the security testing process and even help us automate repetitive manual work.

What is HTTP?

HTTP is a protocol with a set of defined rules for communicating between the client and the server. Think of it like this: we humans use a common language to speak and communicate with each other. Similarly, the protocol is to communicate with the client (Example: Web Browser) and the server (Example: Apache HTTP).

Our modern web browsers hide this complexity by taking care of them and loading beautiful web pages for us to be easy to read and understand.

💡
On the web, all I see is HTTPS. Yes, you are right. The working principle is the same, with few changes. Will cover those as well.

HTTP is text based protocol

HTTP is a simple message-based request/response protocol. When the user requests a web page, the browser sends the HTTP request to the server in the background, which then responds with the requested resources. The website responds back with the requested resource, which is called response.

HTTP acts as a medium through which information is exchanged. It allows the transfer of various types of data, including text, images, multimedia files, etc.

Structure of HTTP Requests and Responses

A sample of HTTP request-response can be seen as shown below:

Request:

Sample HTTP Request by Client

Response:

Sample HTTP Response from Server

I can see this is a lot of data to digest, don't worry, with the overwhelming information you see on the response screenshot. You don't need to remember everything. This is to get you acclimated to the type of data you'll see from now on and focus only on what we need the most.

We will come back to technical in a bit. Let's go ahead and learn more about HTTP.

HTTP is a Stateless Protocol

As the name suggests, the stateless protocol does not maintain the state of a transaction. Stateless protocols are typically used in low-level communication applications, where data packets can be sent without any notifications. An example of a stateless protocol is TCP.

This means you get the response back once a request is sent and the connection closes. It will not be able to link or relate to any previous requests.

Example of HTTP/1.0 and below versions

$nc demo.testfire.net 80
GET / HTTP/1.0
Host: www.demo.testfire.net

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=DF622CC21A2727AC5DD745D1A5B007BF; Path=/; HttpOnly
Content-Type: text/html;charset=ISO-8859-1
Date: Wed, 27 Jul 2022 07:48:18 GMT
Connection: close

I am using a tool called net cat (nc) here to demonstrate a stateless example by sending the custom request and waiting for the response from the server. If you observe closely, the server sends the last line Connection: close.

Connection: header above tells that the TCP connection can be closed once after sending the HTTP response.

But from HTTP/1.1 and above, the TCP connection will not be closed and will wait for the next request to be received and processed.

$ nc demo.testfire.net 80
GET / HTTP/1.1
Host: demo.testfire.net   

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: JSESSIONID=D7190756165E5E1B4986094B7A17630F; Path=/; HttpOnly
Content-Type: text/html;charset=ISO-8859-1
Transfer-Encoding: chunked
Date: Wed, 27 Jul 2022 07:54:31 GMT

In the above request/response, you can see the Transfer-Encoding: chunked, which means the connection is not closed, and it is waiting for the next request to process.

Transfer-Encoding: header is used to specify the type of encoding used on the data sent in the message body in a series of chunks.

For now, remember that the above concepts will be helpful while discussing advanced HTTP attacks.

HTTP vs HTTPS

HTTP vs HTTPS

HTTP (Hypertext Transfer Protocol) and HTTPS (Hypertext Transfer Protocol Secure) underlying working concepts are similar. HTTPS is a secure version of HTTP that uses SSL or TLS security certificates.

The HTTP protocol is outdated and not secure. The HTTPS protocol is more modern and has encryption codes to protect user information.

A few of the drawbacks of HTTP are detailed below:

  1. HTTP transmits messages in clear text, meaning that anyone on the network can view all of your requests, responses, and sensitive information like usernames, passwords, and card information.
  2. Not only above, if your data goes through any of the Proxy connections, then a copy of your request and response is also stored now on the proxy server. How the admin uses the information is beyond our control, and it would be hard to track the attacker.
  3. In some browsers where the cache is enabled, the copy of your request will also be in the cache. This is challenging when the session token or sensitive information is passed in the URL.

This is where HTTPS protects us with privacy and integrity from all kinds of snooping or Man in the Middle (MITM) attacks.

Man-in-the-middle attack - Wikipedia

Lastly, the default port for HTTP is 80, and HTTPS is 443. It can also be changed based on requirements.

Breakdown of the HTTP components

Let's dig a bit deeper and learn more about the HTTP syntax, methods, and headers. This section helps us in reviewing request-response messages and confirm whether vulnerability exists or not.

Take some time to understand and get well acquainted.

HTTP Syntax

From yesterday's session, we came to understand that HTTP is a simple message-based protocol that contains request and response messages.

Let's try to understand more about the request.

A Typical Request Message:

HTTP Request Sections

HTTP request contains two sections, Headers, and Body. The header and body are separated by an empty line.

The body section can vary based on the type of HTTP method. For example, the GET HTTP method doesn't contain any body information, but the POST method does contain.

  1. Headers:
POST /members/api/send-magic-link/ HTTP/2

In the above line, the first one POST is called a verb. It's one of the HTTP methods commonly used for posting data onto the server.

Followed by the URL path /members/api/send-magic-link/, the page which the client is trying to access/post.

Lastly, HTTP/2 is the protocol and its version used for communication.

Each header in the message body is placed on a separate line. The next header is Host.

Host: securityarray.io

Host header contains the domain name of the server to whom the request must be sent.

Next, followed by other headers.

2. Body:

HTTP Body section contains the additional supporting information for the above-specified POST header information. Below is an example of the JavaScript Object Notation (JSON) type of data passed to the server in the message body.


{"name":"Raghu","email":"[email protected]","requestSrc":"portal"}

Body Section

The message body can also be plain text or HTML or XML data based on the type of content being transmitted.

A Typical Response Message

HTTP Response message will look similar to the above, containing the Headers and Body separated by an empty line.

HTTP/2 201 Created

The first line specifies the server acknowledgement back to the client using HTTP/2 protocol, and the next 201 is a status code specifying the type of action taken by the web server for the request sent.

Here, it has created the content on the server. Therefore, we received 201 Created.

By this, I can confirm the content I have posted is created on the server, and it acknowledges back, followed by other HTTP headers and message body,, and next if any.

The HTTP message body is later beautified and displayed back to the user on a web browser in a human-readable way.


HTTP Methods

HTTP Methods and Its Purpose

The methods indicate the purpose for which the client has initiated the request and what is expected by the client to consider it a successful result.

Let's walk through the different HTTP methods available.

  1. GET is used to retrieve the resource from the web server, and it doesn't contain a message body. The URLs that you observe in the browser URL bar are all GET requests.
  2. HEAD is very similar to the GET, but it only sends the header section.
  3. POST method performs a specific operation with the provided message body. Like creating or updating. It is recommended to use POST for sending sensitive information like username, password, card info, etc. As the message body cannot be seen by intermediate resources like proxies.
  4. PUT is most commonly used for updating the data and also for uploading the content to the server.
  5. DELETE is used to remove a resource on the server, which is allowed only for users with authorized privileges.
  6. CONNECT is used to establish a tunnel to the server
  7. OPTIONS is used to request a list of HTTP methods enabled on the server.
  8. TRACE is used for diagnostic purposes. Whenever you send a request using TRACE, you should be able to see the same contents in response as were sent in the request.

HTTP Headers

Understanding HTTP Headers

In HTTP, there are many headers, each intended for a specific purpose. We will be covering some headers, and if you would like to learn about the complete list of headers, the best source is RFCs (HTTP/1.1, HTTP/2)

RFC 7540 - Hypertext Transfer Protocol Version 2 (HTTP/2)
Hypertext Transfer Protocol Version 2 (HTTP/2) (RFC 7540)

Some headers you can observe in both Request and Response are given below.

Content-Length specifies the length of the message body in bytes.

Content-Type specifies the type of content present in the message body.

Request headers

User-Agent specifies the information about your browser or from other clients from which you are trying to access the web server.

Origin is used to specify from where the request originated.

Referer is used to specify the request from which the current URL originated.

Authorization is a token used to pass to the server for accessing privileged resources.

Accept headers inform the server about content types accepted by the client.

Accept-Encoding informs the server about encoding types understood by the client.

Response Headers

Set-Cookie is used to set the session cookies on the client side to identify the sessions and user accounts used for accessing server resources.

Access-Control-Allow-Origin indicates whether the resource can be retrieved via cross-domain Ajax requests.

Cache-Control passes the instructions to the browser about how the cache must be handled.


Status Codes

Decoding HTTP Status Codes

The status codes are used to inform the client about how the request was handled. The server acknowledges back with the three-digit numerical codes of how it was processed.

Getting familiar with status codes helps you to review and take action while performing security assessments.

HTTP Status Codes

Let's look at common status codes which we will be encountering day-to-day.

100 Continue message sent by the server to continue and keep sending the message body. Once completed, the server will respond with another status message.

200 OK means that the request was successful and that the response body contains the result of the request.

201 Created is returned in response to a PUT/POST request to indicate that the request was successful and created.

301 Moved Permanently redirects the browser permanently to a different URL. So the client should use the new URL going ahead in the future.

302 Found redirects the browser temporarily to a different URL

304 Not Modified specifies the browser to use the cached copy as the client has the latest data of the server.

400 Bad Request is a client error that indicates that the client submitted an invalid HTTP request.

401 Unauthorized indicates that you need to be authenticated to access the server resource.

403 Forbidden indicates that you do not have privileges to access the resource on the server.

404 Not Found means the requested resource is not present on the server.

405 Method Not Allowed means the specified HTTP method in the request is not supported for the given URL.

500 Internal Server Error indicates that the server is unable to process your request, which might be because of some unhandled error within the server application.

503 Service Unavailable is displayed when the server is handling a heavy load and is not able to function. Also, during maintenance hours, or during migration, or even in case the server crashes.


Conclusion

I hope this short tutorial has provided the foundations for understanding HTTP protocol and its significance in terms of application security. It's a lot of information to digest and remember. You could use this as a reference to get started, and surely, this is going to help us make better decisions while carrying out security assessments.

I encourage you to continue exploring the HTTP RFCs (HTTP/1.1, HTTP/2, HTTP/3) for in-depth understanding.

Remember, learning web application security is an ongoing journey, and each step you take brings you closer to mastering this valuable skill. Keep exploring!