CPSC 457 - Principles of Operating Systems - Assignment 2
This assignment comes in two parts. Part A is a programming assignment to familiarize you with the methods for modifying Linux and adding new features to it. It should be completed on your virtual machines.
Part B includes questions to be answered which will encourage you to think about some of the work you did in Part A and some of the topics we have discussed in lecture.
This assignment will be marked out of 100 and will count for 20% of your total grade in the course. 60 marks will be dedicated to Part A and 40 marks will be dedicated to Part B (four marks for each question).
You will hand in your assignment by sharing your git repository with your TA through the CPSC gitlab and Tagging the branch which includes all of your files for Part A and Part B, "CPSC457-Assignment2-Complete".
This assignment is due at 2:00pm on Friday June 10.
This assignment was updated to fix some errors and add some clarifications. If you would like to see the original, look here.
Changes:
- Fixed number of marks for Part B from 400 to 40
- Removed "if the file is a directory, the sum of the size of all files and directories contained in the directory (to an arbitrary depth)." from the functional requirements.
- Changed "Include a usage statement" to "Include usage statements for each of the user program".
- Added requirement to demonstrate your assignment
- Added a quantum of 2ms to the round robin in question 7. If you worked out the solution to the question using average running times, this should give you the same answer.
- Added some hints to help things get going.
- Added the demo procedure
Part A - Coding
Your CEO has trouble remembering program names and process ID numbers. He recently left a large search engine company whose motto was "Don't sort -- search!" and encouraged tagging of everything. One day, he asks you to modify the Linux kernel to augment job control by tagging processes with an arbitrary list of keywords. Since he once took an OS course, he dimly remembers a discussion of extended attributes from the filesystem lecture. He asks you to "implement extended attributes for processes" (rather than files). He gives you a few weeks to develop a working prototype.
Your CTO takes pity on you, and gives you a few hints. She says:
A) Create a user-level program called 'ptag' that takes a tag and a process ID as arguments and associates that tag with the process, e.g.,
(eye@mordor hw2)$ ptag 1234 -a "yellow"
ptag should also support removing a tag from a process.
(eye@mordor hw2)$ ptag 1234 -r "yellow"
(eye@mordor hw2)$ ptag 1234 -r "green"
ptag should fail with an appropriate error message if the process ID is not valid or you don't own it. Use getopt to process the arguments (see `man 3 getopt').
- B) Implementing the ptag program program probably requires a new system call to associate and remove tags from processes. Be careful about copying arbitrary strings from userspace into kernel memory.
- C) You will probably need a data structure in the kernel to keep track of tags on a per-process basis. You may want to look into keeping an instance of the kernel's doubly-linked list around to track all the keywords (if any) associated with the process. It is possible to have a global list of keywords, but then you'll have to think carefully about how you'll associate a subset of them with each task. You should also use suitable concurrency primitives to protect access to this list. You should make sure the list is duplicated if the process forks (e.g., just like a parent and child share open file descriptors).
You do as your CTO recommends. The CEO is impressed, but complains that he cannot see the status of all the tagged processes. He leaves in a hurry to go golfing. Your CTO says:
D) You should implement a utility called 'tagstat' that prints a table to stdout listing all processID-tag mappings, one per line, e.g.,
22 : green
22 : swordfish
22 : hackers
31 : yellow
31 : sneakers
56 : wargames
133 : leet
133 : lame
By default it does not print entries for processes that do not have tags associated with them. By default it only prints entries for processes you own. Processes should appear in numerical order. You should also print the process state as a third column.
You decide to have tagstat read this state out of a psuedo-device registered as /proc/ptags (so that tagstat is little more than a shell script that performs `cat /proc/ptags').
Your CEO is delighted, but laments the fact that he cannot do anything useful with these groups of processes, like kill all processes associated with the tag 'green'. After he goes to lunch, your CTO looks at you and says: "You're on your own."
- E) Implement a utility or shell script called tagkill. This utility takes one argument (a tag value) and kills each process (kill -9) associated with that tag.
Functional Requirements
Your assignment must:
- Allow users to tag processes with a String, and allow them to view and manage processes on the basis of the tags.
- To do this you must:
- create a user mode program to add and remove tags
- add a system call to Linux to add and remove tags to the kernel (safely)
- add a data structure to the kernel to manage tags
- add a psuedo device to list the tags and the processes associated with them in the file system.
- create a utility (possibly a bash script) that will allow you to kill all processes with a given tag.
Non-Functional Requirements
Your assignment must:
- Be written in C.
- Compile without warnings (using gcc's -Wall option)
- Be implemented as a patch to modify the Linux Kernel.
- Be well-commented.
- Must cite all out-of-class resources.
- Include usage statements for each of the user programs.
- Include a makefile which compiles your assignment.
- Include a README explaining the assumptions you've made, how to compile and run your program and any other notes we should know.
- Be your personal, original work.
- Include a typescript which demonstrates each of the user programs (and devices) working as specified.
- Demonstrate your assignment working to your TA.
Hints:
Thanks to Mike Clark for the hints.
- Kernel LinkedList data type already exists: http://www.makelinux.net/ldd3/chp-11-sect-5
- This structure might be a good candidate for modifying: http://lxr.cpsc.ucalgary.ca/lxr/#linux+v2.6.32/include/linux/sched.h#L1215
You will still need to use the LXR to figure out how to duplicate any modifications it if the process forks. There probably exists macros (that can be used in the kernelspace) for locating a task_struct based on a pid.
- This article does a good job of explaining the API http://www.ibm.com/developerworks/library/l-kernel-memory-access/
- Exposing kernelspace through a virtual filesystem http://www.linuxdevcenter.com/pub/a/linux/2007/07/05/devhelloworld-a-simple-introduction-to-device-drivers-under-linux.html?page=2
Part B
Part B includes five questions, which should be answered in one or two paragraphs and included in a text file along with your code, makefile and README when you hand in your assignment. Most questions have several parts, make sure you answer them all.
Question 1 - Why did we do this?
The CEO is fired, the company folds, and you go interview at the bank your former CTO now works for. She asks you one question in the interview: Do you think it would be better to implement process tagging completely in userspace (for example, with a file called /etc/tags) or in the kernel like you did? Why? Do you get the job at the bank?
Question 2 - Kernel routines vs User routines
What are the differences between printf and printk? What are the differences between malloc and kmalloc? Why do we need different libraries to support kernel operations compared to user operations?
Question 3 - Root vs kernel
Both the concepts of the root user (or an administrator account) and the kernel are similar and related, but they are not the same. How they are different and describe how they are similar?
Question 4 - LKMs and system calls
In some operating systems, such as Solaris, it's possible to add new system calls through loadable kernel modules. In others, such as Linux, it isn't possible. Why? What reasons are there for allowing LKMs to add system calls and what reasons are there for restricting this? Which approach do you favour?
Question 5 - Thread Implementation
Threads can be implemented in user space or kernel space. Each way has benefits and draw backs. List some of the benefits and drawbacks of each. Discuss one situation where user space threads are a better choice and one situation where kernel space threadsa are a better choice.
Question 6 - Thread Yielding
Why would a thread every voluntarily give up the CPU by calling thread_yield? Discuss the assumptions that allow us to think of threads as cooperative, but processes as antagonistic. (From Modern Operating Systems, Chapter 2)
Question 7 - Scheduling
Five batch jobs (A, B, C, D, E) arrive to be scheduled at almost the same time. They have estimated burst times of 10, 6, 2, 4 and 8 ms respectively. Their (externally determined) priorities are 2, 0, 3, 4 and 1, with 0 being the highest priority. For each of the following algorithms calculate the the turnaround time for each job.
- Round robin - with a quantum of 2ms
- Priority scheduling
- First-come, first served (assume they arrived in alphabetical order)
- Shortest Job First (assume you have perfect knowledge of the burst times)
Assume for 1. that the system is multiprogrammed and that each job gets its fare share of CPU. For 2. through 4. assume that once a job is started it will not be preempted until it finishes. Assume all jobs are completely CPU bound. (Modified from Modern Operating Systems, Chapter 2)
Question 8 - Context Switching and Round Robin Quanta
Explain how time quantum value and context switching time affect each other in a round robin scheduling algorithm. (From Modern Operating Systems, Chapter 2)
Question 9 - fork()
Consider the following piece of C code:
void main(int argc, char *argv[]) { fork(); fork(); exit(); }
How many processes are created? Justify your answer and draw a tree describing the process family (ascii trees are good). (From Modern Operating Systems, Chapter 2)
Question 10 - exec
The exec() family of functions wrap the function execve, which executes a program. What purposes do the wrapper functions serve? What information about a process does execve preserve and what information does it not preserve? Why does execve not have a return condition?
Note
This document is released under a Creative Commons Attribution, ShareAlike licence, excluding Questions 6 - 9 which are copyright Andrew Tanenbaum. This assingment is developed, based on an original developed by Michael Locasto.Page Updated: June 5, 2016 at 1830 MDT