Archive for the ‘linux’ tag
64 Penguins

That was the goal, at least.
The story starts with my first attempt to install Linux on SPARC hardware. I had purchased an old, retired Sun Sparc 20 workstation so I could tinker with Solaris at home. With 512 MB RAM and a quad-Ethernet card, it put my POS PC to shame. But I didn’t know it had two CPUs until I finally got bootp, tftp, and NFS working and was able to net-boot a Linux kernel on it. Twin penguins graced my 20″ Sun monitor (sounds small in the LCD era, but for a CRT this thing was massive!) — one for each of the CPUs.
Years later, I was working with a company that used primarily Sun hardware, and I had the opportunity to install some four-way E450s for a new project. When they arrived, we still had a few weeks before we had to turn the machines over to the developers, so I decided to experiment again with SPARC Linux. The installation this time around was much easier, and, as expected, four handsome penguins saluted me at boot!
About this time, the Sun Enterprise 10000 (or e10k) was considered a pretty beefy server: a fully-populated machine with 16 system boards sported 64 CPUs, 64 Ethernet interfaces, and 64 GB RAM — why anybody needed 64 Ethernet interfaces is beyond me. Against our advice, our development organization purchased two e10ks for use as Java application servers. So, we carved out a chunk of space in the datacenter and Sun rolled these twin behemoths in.
I was helping the developers figure out how to carve the machine up into 4-CPU chunks (called domains), so I had a good idea of the project timelines. When I learned the machines would sit idle for a few weeks before they were needed, I realized this was the opportunity of a lifetime — a chance to build and boot my own army of 64 penguins!
Carl Raffa, one of the brightest and most interesting people I’ve ever had the pleasure of working with, was eager to help. I don’t remember now how long it took to get Linux booting on the e10k. We first tried booting a fully-populated machine with 16 domains, but when that failed, we decided to try it with Solaris and discovered some hardware problems. After swapping some system boards around, we were able to boot Solaris.
Then it was back to trying to boot Linux. I don’t remember now what all of the problems were — most were simple 64-bit or endianness issues — or how long it took before we could consistently boot a single-domain e10k. But once we got that working, we progressively configured the machine larger and larger until we approached a fully-populated 16 domains. When we got to 48 processors, CNET picked up the story. I don’t know for certain, but I suspect that at that time, our e10k represented the largest (and most expensive) Linux machine ever booted.
We can’t take any credit for the low-level device drivers that made any of this work possible. Most of that support was implemented by people who didn’t have access to the big iron, so our involvement was really as tinkerers and testers more than innovators. That said, it was still a fun and challenging project.
Sadly, we were never able to test with a fully-populated e10k. While we were at lunch for a coworker’s retirement party the Friday before we had to turn the machines over to the developers, Sun Professional Services reconfigured the machine and blew away our playground.
As fate would have it, the e10k doesn’t have a framebuffer — its only console output is emulated via the service processor, a separate microcontroller in the e10k chassis. So even though I was never able to see my army of 48 penguins (much less 64), I know they were there, and that will have to do.