Q: How do you get a hundred students in an operating systems class to work on real kernel code, using outdated machines and a lab barely big enough for a quarter of them?
A: Very carefully.
Like most university computer science programs, ours has a mandatory course on operating systems. It is a third-year (junior) course, with high enrollment -- 238 students over the last two semesters -- and assignments traditionally based on OS simulation or toy problems.
It would be nice to have students working with real OS code, though. Students get little experience studying large pieces of software, much less software that is well-written. There is also something to be said for working on the Real Thing rather than abstracted academic contrivances.
The kernel-hacking initiative was first started here by a sessional instructor, teaching operating systems over the summer using the Linux kernel. Being a summer term, there was a relatively low enrollment and, equally important, a reduced demand for student workstations -- a bank of about 30 SPARC-5 machines was allocated for the course, whose kernel the students could modify and reboot with impunity.
How could kernel work be done during a regular semester? There are three issues:
The other pieces of the puzzle, cost and equipment, were solved rather serendipitously. Our campus IT department gave us some dusty PCs they considered obsolete: P166 IBMs with 64M RAM, 2G of disk, floppy and CDROM drives. Free is good.
Our support staff set up 28 of the castoff machines in a separate lab, hidden behind a firewall which let nothing in, and allowed only outbound ssh and sftp connections.
I had two priorities for the OpenBSD configuration on these machines. First, students had to be able to rebuild the machine from an unknown state quickly. Second, kernel compiles had to happen quickly.
To rebuild the machines quickly, I created a configuration where as much of the filesystem as possible resided on CD-ROM. OpenBSD's caching of blocks from the CD-ROM gave good enough performance to make extensive use of the CD-ROM feasible. Tracking down all the programs that wanted to write to the filesystem took a while; I must confess that it took ten attempts to iron out all the details! Once I was done, the basic machine rebuilding sequence the students had to follow took a matter of minutes:
Setting up a fresh, writable copy of the kernel source as quickly as possible took quite a bit of experimentation. I also wanted to have a prebuilt set of object files for the kernel to reduce kernel build time -- building a kernel from scratch on these machines took almost 15 minutes!
Using the union filesystem for kernel source would have been perfect, but it proved to be far too unstable and was abandoned. I also tried using a recursive cp, restore, and untarring symlink trees. The fastest and easiest method I found, however, was untarring a compressed version of the kernel source plus accompanying object files. The time was reduced further by omitting kernel source for architectures other than the x86. Again, I incorporated this into a script, which students would run after rebuilding the machine. This script would take about two minutes to complete.
Students could then modify the kernel source and build their own kernels. To test kernels, students would copy them into /w and simply boot from the hard drive.
I supplied students with a script to find files they had added or modified in the kernel source tree. As their kernel work could conceivably perturb the clock setting, basing changes on file modification times would be unwise. Instead, I precomputed an MD5 hash for each file in the source tree and stored these hashes on the CD-ROM; my script would then compute new MD5 hashes and look for differences. The output was a list of added or modified files that could be used as input to tar.
Our operating systems course has four assignments, which students are to do themselves, i.e., no group work is permitted. To accommodate the sheer size of the class, I actually set up eight assignments, divided into two four-assignment "streams": an OpenBSD lab stream whose assignments must be done in the OpenBSD lab, and a non-lab stream whose (traditional simulation-based) assignments could be done on any of the more plentiful workstations. Each student had to do one OpenBSD-stream assignment, and three non-lab-stream assignments.
Each student could pick which OpenBSD assignment they wanted to do. I supplied a summary of the assignments at the beginning of the course to help them make an informed choice. In an ideal world, each student would have chosen an OpenBSD assignment whose topic interested them. I also made it clear to students in lectures that it was their responsibility to distribute themselves over the four assignments. Naive on my part, at best.
Traditionally, the first operating systems assignment is often an easier introductory assignment; not all students know C at the start of the course, for instance. I followed this tradition, assuming that students with a greater learning curve would avoid the first OpenBSD assignment. I also made the final OpenBSD assignment a challenging one, to encourage students not to procrastinate.
In hindsight, what happened next was predictable. Of 97 submissions for the first assignment, 89 students opted for the OpenBSD assignment -- word got out that it was an easier assignment. I don't want to dwell on how 89 students crammed themselves into a lab meant for 28, though! What I am pleased to mention is the fact that five students waited and did the final OpenBSD assignment, despite knowing that it was going to be harder than the rest.
What did I learn from this? Based on my experience and the results from a survey I gave the students, several lessons are clear.
One approach I am considering as an alternative to the dual streams is to have a set of three traditional assignments, and make the OpenBSD work into a term project.
The other part of the problem was the students' fledgling code-reading skills. The TAs taught students how to use the OpenBSD machines, build kernels, and use tools like grep to search for things in the kernel source, but I asked them to give students limited guidance in the code itself. I had high expectations of the students' ability to read code, perhaps too high -- students are not exposed to a lot of C code now, and the OpenBSD kernel documentation, written by experts for experts, was of little help to students. A far better approach, which I plan to adopt, is to have the TAs walk through a series of small lab exercises with the students, to give them practice with the OpenBSD code structure and finding things of interest in the code.
According to the survey results, the majority of students liked working with OpenBSD kernel code, at least in principle. The real problems lay in the implementation of the idea, but with some refinements I expect using OpenBSD will be a very educational experience for the students.
Thanks to Jim Parker for suggesting the dual streams, Theo de Raadt for feedback on the assignments, and the technical support staff (especially Debbie Mazurek) for setting up the lab. Tim Williams taught the Linux version of the course. Shannon Jaeger proofread a draft of this article.