Research Profile: The Trellis Project

Erin Ottosen - 15 December 2014

In the world of research, we're trying to sell ideas. And the idea I'm trying to sell is that simplicity matters, that the social implications and human nature aspects of sharing are important. Any time you develop software that runs into those issues, you had better keep that foremost in mind, or else your software simply won't be accepted.

- Dr. Paul Lu, Associate Professor of Computing Science, University of Alberta

Suppose your friend wants to borrow your truck. That's a reasonable request, right? But after your friend pops the question and you say yes, the favour gets more and more burdensome.

"I need your trailer too, so do you mind helping me hitch it on?" your friend adds. You grudgingly comply. "And do you mind making room for my car seat? I need to take my kid along."

Now you start to wonder if your friend, with a screaming kid and scant experience driving with a trailer, is going to get your truck in an accident.

These are the hazards of sharing with other people: Will they create extra work for you? Can you trust them not to wreck your property? These hazards are why people sometimes hesitate to share, even scientists who would benefit by sharing their computer power.

Dr. Paul Lu, a professor of computing science at the University of Alberta (U of A), wants to make sharing computer power as easy, simple, and painless as possible.

In 2001, Lu was mulling this problem over with a colleague. "I said, 'We have to get people used to sharing. We should… try to get computers across Canada to work on a common problem...' The natural thing at that point," says Lu, "would have been grid computing."

Grid computing is a contemporary approach for linking computers together to share computing power, data, and access to networks. The goal is for the computers to function together so seamlessly that you could consider them a virtual supercomputer.

"But I concluded," adds Lu, "that it would never happen, to convince all these universities and groups across Canada to take the grid computing route, because it means installing X, Y, and Z on their systems."

So, Lu masterminded the Trellis project, which uses share-friendly methods for linking computers together to form a powerful virtual supercomputer. The key is that only a minimum amount of new software needs to be installed, and it's straightforward enough that your average user can install the software without the help of a system administrator.

"We make it easy for (researchers) to say yes, if we don't require them to install X, Y, and Z," says Lu. "We just ask for regular accounts, which they're used to giving out anyways."

Lu and the rest of the Trellis team want to help scientists in fields like chemistry, biology, and physics-these researchers need serious firepower to compute their large mathematical research problems.

Asking your average desktop computer to solve one of these problems would be like asking a single person to build one of the great pyramids of Egypt-you would have to wait eons for the job to get done, if it got done at all.

To test Trellis software, Lu and his team conduct CISS experiments. CISS stands for Canadian Internetworked Scientific Supercomputer; the acronym also echoes the engineering principle "keep it simple."

Three CISS experiments have been done so far. The last CISS experiment in 2004 linked over 4,100 computer processors belonging to 19 universities all over Canada. At the peak of the 48-hour experiment, the processors worked on over 4,000 jobs at the same time.

In total, the amount of computation completed would have taken a single computer 15 years to complete.

Virtual machines: A ghost (machine) in the machine

For future CISS experiments, the Trellis crew is exploring a new twist: a virtual machine.

Unlike a regular computer with physical parts, a virtual machine has no hardware, only software. When you install a virtual machine on a computer, it's like having two computers in the same physical machine. You can even run operating systems that don't normally jive.

Cam Macdonell working with virtual machines.

"You can run Windows on top of Macintosh; you can run anything you want on top of anything else," says Cam Macdonell, a PhD student who works with virtual machines.

Virtual machines are in vogue, and it might seem like they're a new development in the computer world, but they're not. "Virtualization's been around since the 1960s," says Macdonell. "IBM's been doing it for decades."

So why ask CISS participants to install a virtual machine? One of the challenges in harnessing a bunch of computers together is that they are as diverse as their human owners-for example, some will have the application needed to compute a certain problem, and some won't.

A virtual machine package of all the applications required for CISS is a relatively simple way to make sure that heterogeneous computers have everything they need to work together.

For the sake of keeping sharing simple, the Trellis team asks people to install minimal software on their computers. However, the virtual machine is a package of software, so it might seem that asking CISS participants to install it violates a principle of Trellis.

But Lu says that installing virtual machine software is still a lot easier than, say, setting up grid computing. "And once we accomplish (widespread installation of the virtual machine), then any researcher… can run (jobs) across (the whole) system a lot more easily," he adds.

Also, Lu suspects that in the near future, virtual machines will not be considered a hassle because they will be as commonplace as the computers they inhabit. Adds Macdonell, "Big chip designers like Intel and AMD have changed their hardware to support virtualization."

With challenges like convincing researchers of the value of installing a virtual machine, the Trellis team still has its work cut out when it comes to making sharing easy and simple. But Lu is thrilled with the progress they've made so far.

"For me… the biggest success wasn't the thousands of processors we used… it wasn't even necessarily all the computations we got done. It was being able to get all these groups from coast to coast to say yes, to participate. For me, that was the biggest achievement."