CPSC 441: Computer Networks

Fall 2021

Assignment 1: Web Censorship Proxy (40 marks)

Due: Friday, October 1, 2021 (11:59pm)

Learning Objectives

The purpose of this assignment is to learn about the HyperText Transfer Protocol (HTTP) used by the World Wide Web. In particular, you will design and implement a Web proxy using HTTP to demonstrate your understanding of this application-layer protocol. Along the way, you will also learn a lot about socket programming, TCP/IP, network debugging, and more.

Preamble

As you know, there is a LOT of undesirable content on the World Wide Web, with SpongeBob, Justin Bieber, and Britney Spears being just a few of the most glaring examples of this. In this assignment, you are going to develop a Web censorship proxy that blocks access to certain content, based on keywords that appear in the URL that is being accessed. To keep the assignment simple, we will restrict ourselves only to HTTP (not HTTPS).

Background

A Web proxy is a piece of software that functions as an intermediary between a Web client (browser) and a Web server. The Web proxy intercepts Web requests from clients and determines whether they should be transmitted to a Web server or not. If the request is blocked, the proxy informs the client directly. If the request is forwarded to the Web server, then any response that the proxy receives from the Web server is forwarded back to the client. From the server's point of view, the proxy is the client, since that is where the request comes from. Similarly, from the client's point of view, the proxy is the server, since that is where the response comes from. A Web proxy thus provides a single point of control to regulate Web access between clients and servers. A lot of Calgary schools use Web proxies to limit the types of Web sites that students are allowed to access. Net Nanny and Barracuda are examples of commercially available Web proxies.

Technical Requirements

In this assignment, you will implement your very own Web censorship proxy, in either C or C++. The goals of the assignment are to build a properly functioning Web proxy for simple Web pages, and then use your proxy to block undesirable Web content from being delivered to the browser.

There are three main pieces of functionality needed in your proxy. The first is the ability to handle HTTP requests and responses, while still forwarding them between client and server. This is called a transparent proxy. The second is the ability to parse (and possibly modify) HTTP requests, so that you can extract URL information, decide whether requests should be blocked or not, and block them in some reasonable way. This is called a censorship proxy. The third piece is to provide a way to update the list of blocked keywords for your censorship proxy while it is still running. This is called a dynamically configurable Web censorship proxy.

The most important HTTP command for your Web proxy to handle is the "GET" request, which specifies the URL for an object to be retrieved. In the basic operation of your proxy, it should be able to parse, understand, and forward to the Web server a (possibly modified) version of the client HTTP request. Similarly, the proxy should be able to parse, understand, and return to the client a (possibly modified) version of the HTTP response that the Web server provided to the proxy. Please give some careful thought to how your proxy handles commonly occurring HTTP response codes, such as 200 (OK), 206 (Partial Content), 301 (Moved Permanently), 302 (Found), 304 (Not Modified), 403 (Forbidden), and 404 (Not Found).

You will need at least one TCP socket (i.e., SOCK_STREAM) for client-proxy communication, and at least one additional TCP socket for each Web server that your proxy talks to during proxy-server communication. If you want your proxy to support multiple concurrent HTTP transactions, you may need to fork child processes or create threads for request handling. Each child process or thread will use its own socket instances for its communications with the client and with the server.

When implementing your proxy, feel free to compile and run your Web proxy on any suitable department machine, or even your home machine or laptop, but please be aware that you will ultimately have to demo your proxy to your TA on campus at some point. You should try to access your proxy from your favourite Web browser (e.g., Edge, Firefox, Chrome, Safari), and computer (either on campus or at home). To test the proxy, you will have to configure your Web browser to use your specific Web proxy (e.g., look for menu selections like Tools, Internet Options, Proxies, Advanced, LAN Settings).

As you design and build your Web proxy, give careful consideration to how you will debug and test it. For example, you may want to print out information about requests and responses received, processed, forwarded, redirected, or altered. Once you become confident with the basic operation of your Web proxy, you can toggle off the verbose debugging output. If you are testing on your home network, you can also use tools like WireShark to collect network packet traces. By studying the HTTP messages and TCP/IP packets going to and from your proxy, you might be able to figure out what is working, what isn't working, and why.

When you are finished, please submit your solution in electronic form to your TA via D2L. Your submission should include the source code for your Web proxy, a brief user manual describing how to compile and use your proxy, and a description of the testing done with your proxy. Please remember that assignments are to be done individually, and submitted to your assigned TA on time. You should also plan to give a brief demo of your proxy to your TA during a tutorial time slot just after the assignment deadline.

Testing

During your demo, your proxy will be tested on the following test cases:

Once you have these cases working, you can try your proxy on other pages, such as the Wikipedia page for Floppy Disks, to see what happens. Good luck, and have fun!

Grading Rubric

The grading scheme for the assignment is as follows:

Bonus (optional)

Up to 4 bonus marks will be given for a Web censorship proxy that can also block undesirable content based on keywords seen in the body of an HTTP response, rather than just based on the URL requested. Make sure to show this bonus feature to your TA during the demo.

Tips