Assignment 1: sloxy: A Slow Web Proxy (40 marks)
Due: Friday, February 2, 2018 (4:00pm)Learning Objectives
The purpose of this assignment is to learn about the HyperText Transfer Protocol (HTTP) used by the World Wide Web. In particular, you will design and implement an HTTP proxy (i.e., Web proxy server) with functionality that demonstrates both the simplicity and the power of HTTP as an application-layer protocol. Along the way, you will also learn a lot about client-server socket programming, TCP/IP, network debugging, and more.
Preamble
April Fool's Day will be here soon, and it is time to get our next prank ready. This year we are going to make a Web proxy that slows down access to certain Web content of our choosing. We will call our software "sloxy" as short form for "slow Web proxy", and we will use it very selectively to hinder content access. We will restrict our interest to HTTP (not HTTPS), and to HTML pages (not images, text files, scripts, movies, etc). We will make access to HTML pages painfully slow for the unsuspecting user. Hee hee hee!!!
Background
A Web proxy is a software entity that functions as an intermediary between a Web client (browser) and a Web server. The Web proxy intercepts Web requests from clients and reformulates the requests for transmission to a Web server. When a response is received from the Web server, the proxy sends the response back to the client. From the server's point of view, the proxy is the client. Similarly, from the client's point of view, the proxy is the server. A Web proxy thus provides a single point of control to regulate Internet access between clients and servers. A lot of Calgary schools use Web proxies to limit the types of Web sites that students are allowed to access. Commercially available Web proxies, such as Net Nanny or Barracuda, are some examples of this, as is the open-source proxy Squid, which also provides Web object caching.
Technical Requirements
In this assignment, you will implement and test your very own Web proxy, in either C or C++. The goals of the assignment are to build a properly functioning Web proxy for simple Web pages, and then use some special features of HTTP to throttle data rates selectively when accessing certain content. There is no requirement for Web object caching in your proxy at all.
There are two main pieces of functionality needed in your proxy. The first is the ability to parse an HTTP request to determine if the requested content should be treated normally (i.e., obtained from the server and returned to the client), or if the content should be restricted to be painfully slow. The second is the ability to restrict the downloading speed for Web content that is selected for "special treatment". As a mechanism to slow down a Web page, our proposed solution is the "Range Request" feature in HTTP. With this feature, one can request as few or as many bytes of data from a Web server as you wish, and in any order. Imagine how painful it is going to be when downloading one byte at a time from a Web server. Woot!
The most important HTTP command for your Web proxy to handle is the "GET" request, which specifies the URL for an object to be retrieved. In the basic operation of your proxy, it should be able to parse, understand, and forward to the Web server a (possibly modified) version of the client request. Similarly, the proxy should be able to parse, understand, and return to the client a (possibly modified) version of the response that the Web server provided to the proxy. Your proxy should be able to handle commonly occurring HTTP response codes such as:
- 200 (OK)
- 206 (Partial Content)
- 301 (Moved Permanently) (a.k.a. HTTP redirection)
- 302 (Found) (a.k.a. HTTP redirection)
- 304 (Not Modified)
- 404 (Not Found)
You will need at least one TCP socket (i.e., SOCK_STREAM) for client-proxy communication, and at least one additional TCP socket for each Web server you are talking to for proxy-server communication. (If you want your proxy to support multiple concurrent HTTP transactions, you will need to fork child processes for request handling as well. See bonus part below.) Each child process or thread will use its own socket instances for its communications with the client and with the server.
When implementing your proxy, feel free to compile and run your Web proxy on any suitable department machine, or even your home machine or laptop, but please be aware that you will ultimately have to demo your proxy to your TA on campus at some point. You should be able to use your proxy from a modern Web browser (e.g., Internet Explorer, Mozilla Firefox, Chrome, Safari), and from any machine (either on campus or at home). To test the proxy, you will have to configure your Web browser to use your specific Web proxy (e.g., look for menu selections like Tools, Internet Options, Proxies, Advanced, LAN Settings).
As you design and build your Web proxy, give careful consideration to how you will debug and test it. For example, you may want to print out information about requests and responses received, processed, forwarded, redirected, or altered. Once you become confident with the basic operation of your Web proxy, you can toggle off the verbose debugging output. If you are testing on your home network, you can also use tools like WireShark or tcpdump to collect network packet traces. By studying the HTTP messages and TCP/IP packets going to and from your proxy, you should be able to figure out what is wrong, or convince yourself (and others) that it is working properly.
When you are finished, please submit your solution in electronic form to your TA via D2L. Your submission should include the source code for your Web proxy, a brief user manual describing how to compile and use your proxy, and a description of the testing done with your proxy. Please remember that assignments are to be done individually, and submitted to your assigned TA on time. You should also plan to give a brief demo of your proxy to your TA in early February, either just before or just after your submission.
Testing
The primary test of correctness for your proxy will be quite simple. That is, for a non-HTML page, such as this simple ASCII text file, the Web browsing experience should be almost the same regardless of whether you are using your Web proxy or retrieving content directly from the Web server. However, for an HTML page, sloxy will be doing its thing. You may not notice it much on a small HTML test page, but you might on a medium-size HTML test page, and most certainly on a large HTML test page. Once you have these simple cases working, you can be ambitious and try your proxy on real-world Web pages. Have fun!!
Grading Rubric
The grading scheme for the assignment is as follows:
- 15 marks for the design and implementation of a functional Web proxy that can handle simple HTTP GET interactions between client and server, using either HTTP/1.0 or HTTP/1.1. This basic proxy should be able to deliver Web pages in unaltered form, and be able to handle HTTP redirection when it occurs. Your implementation should include proper use of TCP/IP socket programming in C or C++, and reasonably commented code.
- 10 marks for that part of your Web proxy that can parse HTTP requests and responses, identify when HTML files are being requested, and make proper control decisions that limit the content delivery speed for such pages. This part might involve HTTP HEAD requests, verifying that a server supports HTTP range requests, and then suitably generating the (many) HTTP range requests required. Perhaps.
- 5 marks for a clear and concise user manual (at most 1 page) that describes how to compile, configure, and use your Web proxy. Make sure to indicate the required features and optional features (if any) that the proxy supports. Make sure to clarify where and how the testing was done (e.g., home, university, office), what works, and what does not. Be honest!
- 10 marks for a suitable demonstration of your proxy to your TA in your tutorial section, or to your professor at a mutually convenient time. A successful demo will include marks for the test cases above, as well as clear answers to questions asked during your code walk-through.
Bonus (optional)
Up to 4 bonus marks will be given for proper design of a non-blocking (i.e., multi-threaded or multi-process) proxy that can handle complicated Web sites with ease. Make sure to mention (and show!) this during the demo.
Tips
- This is a rather challenging assignment, so please get started early. You will likely need 7-10 days of thinking/coding/debugging time to get it fully working.
- If you have never done socket programming in C/C++ before, you should make sure to get to your CPSC 441 tutorials on the week of January 15.
- If you don't speak HTTP already, make sure to get to your CPSC 441 tutorials during the week of January 22.
- Focus on the basic HTTP proxy functionality first, by simply forwarding everything that you receive from the client directly to the server, and everything you receive from the server directly back to the client. Then add more functionality, such as text parsing, content buffering, and/or HTTP redirection.
- Your proxy will need one socket for talking to the client, and another socket for talking to the server. Make sure to keep track of which one is which!
- Your proxy will likely need to dynamically create a socket for every new server that it talks to. Make sure to manage these properly.
- Start with very simple text-based HTML files, such as those given above. Once you have these working, then you can try more complicated Web pages with lots of embedded objects, possibly from multiple servers.
- You may find that network firewalls block certain ports, which may make configuration and use of your proxy tricky. For example, I found it easier to do all of my testing using machines within the CPSC network, rather than external ones. There is nothing like a good Wireshark trace to show you what is going on!
- Be careful not to annoy a Web server so much that it refuses your TCP connections or blacklists your IP address. With 100-byte transfers you should have no worries, but with 1-byte requests you might! Just be careful about where you do your testing, and how much of it you do. With the CPSC servers you should be fine, I think.