CPSC 601.08: Computer Systems Performance Evaluation

Professor Carey Williamson

Winter 2010

Assignment 3 (10 marks)
Due: March 16, 2010 (3:30pm)

The purpose of this assignment is to gain experience with experimental methods used in computer systems performance evaluation.

Please do any one of the following 3 questions. Note that the marks allocated are the same for each question, but they may not be of the same difficulty.

Q1. Web Response Time Measurements (20 marks)

There are many choices available for Internet access, from Gigabit Ethernet in the workplace to WiFi access at Starbucks to dialup modems at your summer cottage. Your goal in this question is to do an empirical measurement study of the user-perceived Web response time for a small set of (2 or 3) different Internet access technologies of your own choosing. For example, you might choose your desktop environment in the department, wireless access at the U of C, residential Internet access via your ISP, or maybe even your Internet-enabled cell phone. You will be downloading a simple set of Web objects of different sizes, and comparing the user-perceived performance observed.

Bonus (up to 4 marks)

Collect similar measurements from a third different network environment. Compare and contrast your results with the previous ones.

Q2. File System Workload Characterization (20 marks)

The data file sample.txt contains a small sample of a much larger data file files.txt (6 MB uncompressed text file, available upon request) that contains the output of the Unix command "ls -lR" in my home directory on the CPSC file servers in January 2010. The output shows information such as the name of each file and directory, the file permissions, the file size, the file modification date, and so on.

You will use this empirical data file in a workload characterization study of the Unix file system (albeit only for 1 user). Using data analysis tools of your own choosing (e.g., grep, awk, perl, gnuplot, Excel, MatLab, C, C++, Java, Python), process this empirical data set to answer as many of the following questions as you can.

Bonus (up to 4 marks)

With a bit of effort, you should be able to analyze the file type distribution. On a Unix system, file types can be determined heuristically based on the (optional) suffix in the file name (e.g., foo.html, paper127.pdf, painful.doc). Produce a table showing the top 10 identifiable file types in the data, in sorted order from most prevalent to least prevalent. Within this table, show the number of files of each type, the percentage of files of each type, the number of bytes for each file type, and the percentage of bytes for each file type. If necessary, use a catch-all category "Unknown" for any file types that are not easily discernible from the file name suffix. In the table, add a category "Other" for those files not accounted for among the top 10 file types, so that the percentages in the table sum properly to 100%. Comment on your observations.

Q3. Wireless Network Data Analysis (20 marks)

Choose any interesting data set from the CRAWDAD (Community Resource for Archiving Wireless Data at Dartmouth) Web site and analyze it.

Bonus (up to 4 marks)

Compare and contrast your wireless data analysis results with those from another environment, such as the U of C network.

Submitting Your Assignment

When you are finished, hand in a hardcopy version of your solution to your instructor, either in person, or under his office door. Provide proper citation for any literature or Internet sources used. Submissions must be received on or before the stated submission deadline, otherwise a late penalty of 10% (2 marks) per day will apply.