The various flavors of Ruby class attributes
Here’s a curious thing about Ruby: it’s got three flavors of class attributes. You can adorn your classes with class variables, class instance variables, and class constants. Not knowing the differences between them, and thinking that one of them might be useful for a project I was working on, I set out to figure out how they all worked, especially with respect to inheritance.
As a trivial example of the design scenario I was working with, consider the case of an object-oriented vegetable garden. Vegetables come in all shapes, sizes, and colors, but we might want to say that all vegetables should be green unless we’ve said otherwise. We might start modeling our vegetable garden with a Vegetable class, and we could set a color attribute on it with a default value of "green". Lettuce, which happens to be green, could inherit that attribute from Vegetable. Eggplant, however, should redefine color to be "purple".
While certainly a contrived and flawed example, it demonstrates the behavior I was looking for. Let’s see how Ruby’s various flavors of class attributes can help us solve this design problem — or not.
First up, class variables:
class Vegetable
@@color = 'green'
def color
@@color
end
end
class Eggplant < Vegetable
@@color = 'purple'
end
Vegetable.new.color # => "purple"
Eggplant.new.color # => "purple"
I wasn’t expecting that! Apparently class variables are shared among subclasses, so you can’t redefine their value in subclasses without changing the value in the base class.
Next up, class instance variables:
class Vegetable
@color = 'green'
class << self
attr_reader :color
end
def color
self.class.color
end
end
class Lettuce < Vegetable
# no need to set @color here, since lettuce is green ... right?
end
Vegetable.new.color # => "green"
Lettuce.new.color # => nil
No love here, either: class instance variables are not accessible from subclasses at all. Probably for the better, since the code needed to access class instance variables from instances is even uglier than that needed to access class variables from instances.
Class constants are right out:
class Vegetable
Color = 'green'
def color
Color
end
end
class Eggplant < Vegetable
Color = 'purple'
end
Vegetable.new.color # => "green"
Eggplant.new.color # => "green"
Class constants are statically bound, so the polymorphic call to Vegetable#color from an Eggplant instance references the Color constant defined in Vegetable, not the one defined in Eggplant.
Giving up on the class attributes approach, I resorted to defining the attributes at the instance level. I considered explicitly setting a @color instance variable in the class initialize method, but then the attribute wouldn’t be constant. Instead, the simplest implementation that does what I want seems to be to use methods that return constant values:
class Vegetable
def color
'green'
end
end
class Lettuce < Vegetable
end
class Eggplant < Vegetable
def color
'purple'
end
end
Vegetable.new.color # => "green"
Lettuce.new.color # => "green"
Eggplant.new.color # => "purple"
So as it turns out, each of Ruby’s class attribute mechanisms behaves differently in subclasses. I’m sure class variables, class instance variables, and class constants have their utility, but they aren’t useful for defining constant attributes shared by all instances of a class, but which can be redefined in subclasses.
Simulating synchronous programming with Python generators
Robey’s recent article on naggati reminded me of something I’d been idly pondering for a while. Having recently written an SSH-based host discovery scanner on top of the Twisted asynchronous programming library, I too yearned for a way to write sequences of commands in plain-old imperative code, hiding the callback complexities of event-driven code from users.
Continuations fit the bill nicely. These are functions from which you can return multiple times, resuming right where you left off. With continuations, you could write a sequence of functions that might make asynchronous calls, but the framework would call your continuation back where it left off.
Python does not have first-class continuations, but it does have generators, and these behave almost identically (for my purposes, at least). A generator is a function that can yield multiple values. Well, actually, it returns an iterator, which then can be used to fetch multiple values from the generator. An example will probably make it clear:
>>> def finite_generator(): ... yield 'apple' ... yield 'orange' ... yield 'pear' ... >>> iterator = finite_generator() >>> for fruit in iterator: ... print fruit ... apple orange pear
Generators can also run forever:
>>> def infinite_generator(): ... i = 0 ... while True: ... yield i ... i += 1 ... >>> iterator = infinite_generator() >>> for i in iterator: ... print i ... 0 1 2 3 4 5 ... and on and on forever
I had been using iterators in my asynchronous host scanner whenever I needed to run asynchronous commands within a loop. The asynchronous programming model prevents you from writing something like:
for foo in bar:
async_method(foo)
Instead, you would do something like this:
def callback(response, iterator):
do_something_with_response(response)
schedule_next_task(iterator)
def schedule_next_task(iterator):
try:
foo = iterator.next()
deferred = async_method(foo)
deferred.addCallback(callback, iterator)
except StopIteration:
pass
iterator = iter(bar)
schedule_next_task(iterator)
It works like this:
- We get an iterator for our list, bar — this could just as well be a generator function
- We fetch the first value from the iterator and pass it to the asynchronous method
- That method presumably makes some type of I/O request, and responds immediately with a Deferred instance
- We add a callback function to the Deferred and request that our iterator instance be passed to it when it is called
- Control returns to the event loop, which might be busy scheduling other I/O requests
- When the I/O completes, the event loop calls our callback function with the response and our iterator instance
- The callback processes the response, and then repeats to step 2, fetching the next item from the iterator
- When the iterator is exhausted, the cycle stops
It occurred to me that I might be able to extend this concept to use generators as a sort of continuation to emulate synchronous code. What if, instead of returning strings or numbers from a generator, you returned functions? Some wrapper code could initialize the iterator, and then loop over it using the technique above, calling each function returned from the generator.
Tonight I decided to give this a try. Forking off an experimental branch and making a few modifications to the underlying fido host discovery routines, I crafted the following pleisiochronous host scanner:
#!/usr/bin/env python
#
# Use a generator to simulate synchronous execution on an asynchronous framework
#
from fido.common.command import RemoteCommandExecutor
from fido.common.host.unix import UnixHost
from fido.common.ssh import SSHCredentials
from contrib.host.software.sun.host import SolarisHost
from contrib.host.software.linux.host import LinuxHost
from twisted.internet import reactor
import pprint
class PlesiochronousHostScanner(object):
"""
Scans a host over SSH, building a list of host attributes. Built on the Twisted asynchronous
library, but uses a Python generator function to emulate garden variety synchronous code.
"""
def __init__(self, address, credentials):
"""
address: the IP address to scan
credentials: a hash like: { 'username': '...' , 'password': '...', 'public_key': '<optional>' }
"""
self.address = address
self.credentials = credentials
self.host = UnixHost(RemoteCommandExecutor(address, credentials))
self.pp = pprint.PrettyPrinter()
# create some scratch space for the discovery methods
self.context = { }
# get an iterator from the generator
self.iterator = self.scanning_sequence()
def scanning_sequence(self):
"""
A typical nugget of synchronous code, with one important exception: asynchronous
functions must be yielded instead of being called directly.
"""
yield self.host.uname
os = self.context['uname'].split()[0]
if os == 'SunOS':
self.host = SolarisHost.from_host(self.host)
yield self.host.zonename
yield self.host.zones
elif os == 'Linux':
self.host = LinuxHost.from_host(self.host)
else:
print "Unable to scan host type: %s" % os
return
yield self.host.hostid
yield self.host.device
yield self.host.bios
yield self.host.installed_memory_in_MB
yield self.host.interfaces
def callback(self, response):
self.context.update(response)
self.schedule_next_task()
def errback(self, error):
print "scanning error: %s" % error
def schedule_next_task(self):
try:
function = self.iterator.next()
deferred = function()
deferred.addCallbacks(self.callback, self.errback)
except StopIteration:
self.scan_complete()
def start_scan(self):
self.schedule_next_task()
def scan_complete(self):
print "Scan of %s is complete" % self.address
self.pp.pprint(self.context)
# In this contrived example, we'll stop the reactor when we've finished scanning a host
reactor.stop()
if __name__ == '__main__':
import sys
from optparse import OptionParser
parser = OptionParser()
parser.add_option("-u", "--username", dest="username")
parser.add_option("-p", "--password", dest="password")
(options, args) = parser.parse_args()
address = args.pop(0)
credentials = iter([SSHCredentials(options.username, options.password, None)])
scanner = PlesiochronousHostScanner(address, credentials)
reactor.callWhenRunning(scanner.start_scan)
reactor.run()
It works:
satellite:~ clay$ python pleisio.py -u username -p password 10.20.30.40
Scan of 10.20.30.40 is complete
{'bios': {'bios_date': '11/15/2007',
'bios_vendor': 'Sun Microsystems',
'bios_version': 'S39_3B25'},
'device': {'system_product': 'Sun Fire X2200 M2',
'system_serial': '0805QAT0EA',
'system_uuid': 'bd6529dc-fc79-0010-9e1b-001b245c1d4f',
'system_vendor': 'Sun Microsystems',
'system_version': 'Rev 50'},
'hostid': '0ec2daa6',
'installed_memory_in_MB': 32768,
'interfaces': {'bge0': {'ipv4_addresses': [10.20.30.40],
'ipv6_addresses': [],
'mac_address': 00:1B:24:5C:18:B5,
'zone': None},
'lo0': {'ipv4_addresses': [],
'ipv6_addresses': [],
'mac_address': None,
'zone': None}},
'uname': 'SunOS myhost.mydomain.com 5.10 Generic_127112-11 i86pc i386 i86pc',
'zonename': 'global',
'zones': {'myzone': {'brand': 'native',
'ip_mode': 'shared',
'root': '/zones/myzone',
'state': 'running',
'uuid': '09fbf9ba-c0c5-408f-c9e9-820471983f25',
'zonename': 'myzone'}}}
The beauty of this approach is that the PlesiochronousHostScanner#scanning_sequence method is pretty straightforward, and could actually be written by end users familiar with Python but not familiar with asynchronous programming. It also makes discovery logic much easier to understand than in the state-machine-based asynchronous discovery engine I had previously built.
Having just concocted this tonight, I’m not sure whether this is something I’ll pursue, but it has been a fun experiment. I’m curious what other asynchronous programmers think of this approach.
Ruby, why do you torment me?
I want to like Ruby, I really do. The language is expressive, powerful, and eminently readable. Moreover, it’s fun to write. But try as I might to be productive, I keep running into quirks and gotchas with Ruby libraries that make we wish I was using a language with a more mature standard library. Things that take five minutes in Perl or Python have taken me all day to get working in Ruby.
SOAP support, which ought to be fully baked in Ruby by now, is still somewhat painful to work with. In Perl, SOAP just works. When I wrote our release orchestration tool a year ago, it took way longer than it should have to get Ruby talking to the SOAP iControl interface on our BigIP load balancers. By contrast, it took all of five minutes to get the Perl sample working — and that includes time spent installing the SOAP::Lite CPAN module.
Using Rails for the first time in a recent project, I was immediately struck by how little work is required to get a web app off the ground. I almost felt guilty for writing so little code. But a lot of the clever Rails magic that’s supposed to make life easier, didn’t. While error messages like, “Expected foo.rb to define Foo” seem pretty straight-forward, they are maddening when foo.rb does indeed define Foo. For their next trick, the Rails developers ought to use their meta-programming fu to produce intelligible error messages!
We recently ported a Rails app to JRuby, and straight away we ran into bugs. JRuby couldn’t call Java correctly, and it had a file descriptor leak in Net::SSH that caused the site crawler component of our application to go belly-up after a few hours. And we should have known better than to try talking to Oracle from JRuby on Rails. The activerecord-jdbc-adapter component had myriad issues — goofy things like "uninitialized constant ActiveRecord::VERSION", improper column name quoting, and incorrect integer datatype coercions. Finally we gave up and ported the database to MySQL.
I understand that Ruby and its libraries are open-source efforts written mostly by unpaid enthusiasts, so I try not to get too upset when things don’t work correctly. I wish I had the time to jump in and submit patches to fix issues when I run into them.
setuid() ate my CSS
We ran into an interesting problem while testing a new version of our code deployment tool tonight. By all appearances, the tool was happily deploying code and launching our Java applications, but one of our QA engineers noticed missing CSS on some pages in our test environment. Could that possibly be related to the code deployment tool, which essentially just untars an archive and forks off a little ruby script to start the application?
Tracing the application’s system calls with truss revealed that the process was getting EPERM errors while trying to read the CSS files, which live on NFS. One of our more clever engineers decided to start up the application manually, not via the code deployment tool, and found that the CSS loaded just fine when the Java process was invoked directly from the shell. He compared user and group ids, as reported by ps, of JVMs started by our tool and those started manually and found no differences. Hmm.
When looking at the processes’ /proc/<pid>/cred files, however, some differences were apparent. The cred file contains binary data and is best viewed with od:
$ od -X /proc/$$/cred
0000000 00002716 00002716 00002716 0000000a
0000020 0000000a 0000000a 00000002 0000000a
0000040 0000000e
0000044
The file consists of a sequence of 32-bit id values in the following order:
* uid
* euid
* suid
* gid
* egid
* sgid
* supplemental group ids …
You can see how that maps to decimal ids by comparing with id output:
$ id -a
uid=10006(clay) gid=10(staff) groups=10(staff),14(sysadmin)
[Solaris geek aside: remember when you wanted to be a member of the sysadmin group so you could run the handy-dandy admintool?]
So what we noticed was that while the manually started JVM and the JVM launched via our code deployment tool had identical uid/euid/sgid and gid/egid/sgid values, they had different supplemental group id lists. Notably, the JVM running under the code deployment tool still had a gid of 0 in its supplemental group list. Letting our Java application servers traipse around the filesystem with elevated privileges is perhaps not the best “feature” we’ve ever implemented.
Trust but verify might be a good foreign policy, but our NFS server wasn’t having any of it. It thoroughly distrusted the Java app servers claiming to have elevated privileges, and rewarded them with EPERMs for their trouble. Root squash is, after all, a pretty common NFS security measure.
As it turns out, I had implemented a new feature in the code deployment agent to make it switch user id on startup. Previously we handled the user switch by launching the tool under su, but that approach prevented the tool from writing its pid file to the root-owned /var/run directory. The solution, I thought, was just to call setgid() followed by setuid(). We tested that code by verifying the user and group ids with ps, and it seemed to work just great.
Quick: what’s wrong with this?
def HostUtils.switch_user user
pwent = Etc::getpwnam(user)
Process::GID::change_privilege(pwent.gid)
Process::UID::change_privilege(pwent.uid)
end
Maybe several things, but certainly one thing is that I’ve completely neglected supplemental group ids. I should have written:
def HostUtils.switch_user user
pwent = Etc::getpwnam(user)
Process::initgroups(user, pwent.gid)
Process::GID::change_privilege(pwent.gid)
Process::UID::change_privilege(pwent.uid)
end
That call to Process::initgroups makes all the difference. After making the change, the apps could access NFS and our test site looked all pretty again. Good thing we caught it when we did!
Turns out this is a fairly common problem, and I feel especially dumb for overlooking something so obvious. Live and learn.
Engineering and Operations: Bridging the Divide
A recent post by the folks over at Agile Web Operations discusses some common sources of tension between engineering and operations organizations in web companies: a mutual lack of experience in each other’s domains, conflicting departmental goals, and an us–against–them mentality drawn from social identity theory. Continuing the conversation, I suggest there is a subtler but more fundamental source of tension between engineers and operators that has to do with their different mindsets: developers think in terms of possibilities, while administrators think in terms of realities.
Developers tend to downplay—perhaps unconsciously—the significance of bugs because they understand how to fix them: just make a one-line change over here and tweak a unit test over there and we’re done. If she has a good idea how to fix a bug, a developer may file it away in the “solved” folder in her brain before she’s actually implemented the fix. I’m not saying developers aren’t concerned with quality—they are—or that they don’t fix bugs—they do. But how many times have you spotted a bug and dutifully reported it only to have the developer reassuringly tell you that, “yes, it’s a known issue, we’ll fix it sooner or later—probably later”?
Systems administrators, on the other hand, face the stark binary reality that the software either works or it doesn’t. It survives unanticipated load or it doesn’t. The pager goes off or it doesn’t. No amount of reassurance that the bug can be fixed easily will appease an administrator—if it’s broken, it’s broken. And during the first few iterations of a new product, frequently the software is, in fact, broken. Over time, administrators become conditioned to believe the software will always be broken. It is not uncommon for administrators to express concern about bugs that were known to bring the site down in months past as if they might strike again the next time they are on-call, despite having been fixed months ago.
I point to the difference in mindsets not to disparage one group or the other—I wore a sysadmin hat long before I wore my developer hat—but to expose a fundamental flaw with organizational structures that divide all site development and maintenance functions into just these two separate–but–equal groups. Despite the benefits afforded by the separation of responsibility that you get with distinct engineering and operations groups, such a structure breeds an inefficiency that can threaten a company’s ability to scale.
How well does your operations team understand your software components and how they interact? How well does your engineering team understand how your systems are built, or how they’re connected? When engineering and operations don’t understand each other’s domains, the result is a release process that is at best inefficient, and at worst dangerously fragile.
For example, even though engineering may write detailed release notes describing new features, systems administrators often don’t speak the same language—release notes are practically useless to operations. As a result, valuable time is wasted translating release notes into a language that operations understands: listings of the commands needed to deploy the software. Conversely, developers may not understand infrastructure dependencies (operating system versions, libraries, NFS mount points, firewall rules), leading to confusion (and possibly outages) when code is deployed to machines where it has no chance of working.
In shops that split all work on the production site between the false dichotomy of engineering and operations roles, most software releases will require the two teams to work closely together, and so releases become a significant source of tension between the groups. If your systems administrators cringe whenever a release is coming up, you know you’ve got a problem. Releasing software is how your company grows, both by adding new features and by fixing bugs in the existing features. Yet if the administrators had it their way, there’d be no releases.
Just about the time I had started thinking that what is needed is a third team responsible solely for releases and other aspects of the production site, a friend and colleague forwarded along a slide deck describing Google’s Site Reliability Engineering organization. This team is responsible for one thing: the production web site. Engineering is free to develop features and operations is free to think strategically about systems, storage, and network. What makes the SRE team so interesting is that it is staffed with (junior) engineers, so it’s got an engineering mindset, but at the same time it’s charged with an operations objective: keeping the web site up.
Using Google’s Site Reliability Engineering concept to frame my own thoughts, I tend to think of SRE as an internal customer of both the engineering and operations teams. SRE expects engineering to deliver working software, and they will file and track bugs when that is not the case. SRE should also make an effort to fix the bugs they have filed—something not possible when operations files all the bugs against production. Conversely, SRE expects operations to deliver the server, storage, and network infrastructure required to meet the demands of the production site. SRE leads capacity planning efforts, placing orders with operations for server, storage, and network expansion. SRE also constantly monitors the production site and is responsible for installing and configuring the monitoring software.
With the addition of an SRE team, the division of responsibilities starts to look like this:
- Operations delivers infrastructure
- Engineering delivers features
- Site Reliability delivers uptime
Despite the title, SRE should not report into the engineering organization. Rather, it should be its own, first-class, top-level organization, complete with executive representation at the VP level. I know what you’re saying: how much is it going to cost to staff yet another organization? Not as much as you think. Since SRE will off–load releases from operations, it may be possible to scale back the operations team. And since SRE removes the inefficiencies involved in translating release notes to deployment plans, engineers will have more time to work on features.
Operations managers may balk at the idea of scaling back their teams, arguing that they’re already so busy that they can’t complete all the work on their plates with the team they have. But look at what is consuming most of the time. It’s probably deployments, especially if they occur anywhere near the frequency of deployments at Flickr. Operations teams are also burdened with production incident response, a responsibility that rightly belongs in the SRE organization. By handing both releases and first–response duties off to SRE, the operations team workload will fall and the team can be restructured, eliminating some middle–tier systems administrator positions while retaining mostly the strategic thinkers (operations architects) and data center support engineers.
If you’ve been thinking “AUTOMATION!” while reading this, I hear you. I wholeheartedly agree that automation, when carefully conceived and conscientiously deployed, can improve efficiencies and ease the tensions stemming from a manual release process. But for all the advances in the current generation of automation tools, it may still be a while before automation tools can configure themselves. Until then, who should own the configuration? Engineering understands the intrinsic properties of the software—the proper sequence to start the various components, the proper settings for feature-related properties—but operations has the extrinsic knowledge necessary to make the site work—which databases are available, which load balancers to use, etc. It might be possible to arrive at a working configuration by merging the two team’s knowledge, but I think it makes more sense if one group owns production and the associated automation configuration and workflows.
Ultimately, by freeing other teams to focus on their core competencies, Site Reliability Engineering can increase uptime and help the company scale, all while reducing tensions among engineering and operations—what more can you want from a three-letter acronym?
Dual-booting Windows XP and Mac OS X on Intel Macs
I had hoped to use this blog entry to post step-by-step instructions for installing Windows XP on shiny new MacIntels, but alas, it appears that someone has beaten me to it. The winners, narf2006 and blanka, have been working on the problem for quite a while and have been posting pictures of their progress over the past few weeks. Today they uploaded a video showing a fresh install of XP on an iMac and they submitted their solution to sud0n1m for testing. Assuming the testing goes well, they will be declared the winners and will share the $13k prize.
While I’m disappointed not to have won, I’m encouraged to see that our approaches were remarkably similar. We both wrote custom EFI CSM drivers to emulate the BIOS functions Windows requires to boot. I’m very curious how they managed to get VGA working, and I won’t be surprised if it doesn’t work in either the Mini or the Macbook Pro, as it looks like they did all their development on an iMac.
If nothing else, this was a tremendous learning experience for me, and the timing couldn’t have been better. I have recently become interested in Intel assembly and protected mode programming, topics I considered too challenging years ago when I was doing DOS programming, but concepts that make much more sense to me now. I had randomly dusted off some old assembly language references from my bookshelf and read some chapters on protected mode programming a few weeks prior to beginning work on this project, so I was able to grasp what needed to be done to provide a working solution almost immediately.
With the deadline fast approaching and narf’s Flickr images haunting me, I coded quickly and didn’t spend much time making the code pretty or maintainable, but I’m still fairly proud of the code I wrote. It’s fairly succinct, but does quite a bit. Anyone who’s interested can peruse the code here.
The main function is in OSXP.c, which contains code for reading the GPT partition table that Mac OS X uses, writing a MBR partition table that Windows would use, and loading a bootloader from an El Torito bootable CD-ROM.
Code to switch from protected mode to real mode (called a thunk) is in thunk.c and asmthunk.s. It’s not very general, but it’s the first protected mode assembly code I’ve ever written, and, surprisingly enough, it works.
Code to setup a real-mode interrupt vector table and the real-mode interrupt service routine is in rmisr.s. For the most part, this duplicates the thunk code, but in reverse order: it switches from real mode to protected mode and then back again. This reverse thunk is necessary to emulate BIOS functions using the native EFI functions (read disk sector, print character, etc).
The protected-mode interrupt service routine, which does the actual BIOS emulation, is in pmisr.c. It reads and writes a saved register context on the stack that the real-mode code inherits upon return from the interrupt service routine.
Just writing those thunk and interrupt service routines is probably about 50% of a complete solution, and I’m very happy with how they came out. The first time I thunked into the MBR code, it worked better than I expected, and actually identified the active partition, loaded the boot sector from it, and jumped to it. Booting into the CD bootloader worked also, though it hung right after probing memory.
I’m pretty amazed that my code works, but to toot my own horn a little more, I’m pretty happy with some of the debug techniques I came up with along the way. Running EFI applications in the pre-boot environment leaves a lot to be desired. You can’t exactly fire up gdb and throw a bunch of watchpoints on your code to find out what’s going wrong (though I did toy with the idea of compiling GDB for EFI). And even if I had a debugger at my disposal, it’s hard to debug a protected mode/real mode transition.
At first, my debugging consisted of writing copious amounts of debug output to the console and waiting for my tester, Chris, to run the code and take a picture of the result. At that time I didn’t have an Intel Mac to play with, so needless to say, progress was slow. We did get fairly far with this method, though. I wrote all of the partition table (GPT and MBR) code without ever having seen my code run with my own eyes. Chris’ pictures showed me what I needed to know, then I’d make a change, recompile, and Chris would download the new file and reboot. Again and again. Thanks, Chris.
I was unsure of how to access the CD-ROM under EFI, because it didn’t show up when I listed all the block I/O devices in Chris’ Macbook Pro. Ryan was nice enough to lend me his shiny new Mac Mini, and I was pleased to find that the CD-ROM device showed up once there was a disk in the drive. I was even more pleased to see that my El Torito code worked almost flawlessly from the beginning.
Then came the hard part, thunking into real mode. I wrote code and pored over it for hours making sure it looked right, but when I ran it, the computer spontaneously rebooted. Unsure of which instruction(s) were causing the reboots, I added an infinite loop to a section of the code, recompiled, and ran it. The machine hung. That validated that all of the code above the loop was not causing the reboot (at least not directly), so I moved the loop down a few instructions and tried again. Using this technique I was eventually able to find all of the bugs in my code, which were all stupid syntax problems and not logic problems (there’s a big difference between $0×10 and 0×10 in AT&T assembly).
Once the thunk and interrupt handler code was working, I started looking into why NTLDR was hanging after probing memory. NTLDR is 233kb and its disassembly is 97k lines long. I knew roughly where the hang occurred, based on the output I had from the last BIOS interrupt it invoked, but I wanted to narrow it down to a specific routine.
It occured to me that I could just write my own debugger of sorts. By handling interrupt 3, my code would get control anytime the NTLDR code stumbled onto an INT3 instruction. So, using the disassembly listing as a guide, I made a list of instructions that I thought might be interesting stopping points, and wrote a routine to replace those instructions with 0xCC (the INT3 opcode). Then I wrote an INT3 handler that replaced the original instruction and decremented the return address by one so the original code would be run upon return from the interrupt service routine. And, to my surprise, it worked!
Earlier today I extended this a bit by automatically enabling trap mode in the INT3 handler so I could repatch the breakpoint instruction with 0xCC right after executing the original code. This change allowed a breakpoint to show up each time through a loop or each time a particular function was called. Then I went a step farther and added a breakpoint option that would leave trap mode enabled, so I could get a trace of every instruction executed between two points in the code. This would prove useful for figuring out which branches the program took as it made various tests and decisions.
The one thing that continues to elude me is how to enable VGA text mode. The standard graphics and text framebuffers (0xA0000 and 0xB8000) aren’t even mapped in memory, and reading from the VGA registers appears to return garbage. I suspect if I knew more about PCI programming, I’d be able to map the framebuffer memory and configure the I/O ports, but I’m at a loss for how to do that. In the interim, I’ve been patching NTLDR at load time so that it writes to my own text framebuffer, which I then scan on every interrupt in order to paint a portion of the emulated textmode screen. This is slow and hackish, and I know it’s possible to enable true VGA mode (narf and blanka did), but I don’t know where to begin.
All in all I’m pretty happy with the code I wrote, even though I didn’t win the contest. I’m looking forward to the next big challenge and an opportunity to use some of the techniques I learned on this project.
Getting past ptrace()
The holidays have given me a chance to relax and geek around with Mac OS X, and I’ve finally gotten around to installing the Developer Tools package, which includes the GNU C compiler (gcc) and the GNU debugger (gdb). Over the years I’ve gotten pretty comfortable using gdb to troubleshoot programs on SPARC and Intel platforms. Debugging requires that you know a bit about assembly language, and I had learned x86 assembly back in the day when I was coding fun little graphics toys for DOS, and had learned some SPARC assembly trying to, uhm, correct an annoying license issue in a piece of commercial software.
During the holiday break I figured I’d learn some PowerPC (PPC) assembly while I still had the chance, given Apple’s decision to move to x86 early next year. Debugging simple programs isn’t much fun, though, so I figured I’d start poking around with a big application. An annoying thing kept happening everytime I fired up the app under the debugger, though; it exited immediately with a strange error code:
% gdb /Applications/blah.app/Contents/MacOS/blah
(gdb) run
Program exited with code 055.
Bummer. I remembered the same problem happening with that commercial Solaris app years before, but I never paid much attention to it back then, because it was possible to work around the problem by attaching to the program after it was already up and running. Apple seems to be a bit smarter than that, though, because whenever I attached to a running copy of the application, GDB seg faulted:
(gdb) attach 17813
Attaching to program: `/Applications/blah.app/Contents/MacOS/blah', process 17813.
Segmentation fault
Since I wanted to know why the application was exiting, I figured I’d step through it one instruction at a time until I found the culprit.
Read the rest of this entry »
No trolls under the root bridge
A Proposed Extension to the IEEE 802.1D Spanning Tree Protocol
Network administrators who maintain complicated Layer 2 networks should be very familiar with the operation of the Spanning Tree Protocol. One common point of confusion among administrators is the use of (essentially) arbitrary numbers to identify the current root bridge. While the root bridge identifier is deterministic, comprised of bridge priority and MAC address, it is not information that network administrators can use without consulting a reference that maps MAC addresses to human-friendly names.
Bridges and switches are often given hostnames for administrative purposes. The extension to Spanning Tree Protocol described below adds this user-friendly hostname information to the BPDU. The hostname could then be used by equipment vendors for diagnostic purposes, i.e., Cisco’s “show spantree” command.
The extensions described below are relative to ANSI/IEEE Std 802.1D, 1998 Edition. I believe these changes will not affect existing implementations of Spanning Tree Protocol, since the standard indicates that “any octets beyond Octet 35 are ignored”, and that “the Protocol Version Identifier is not checked on receipt, in order to allow the possibility of future specification of extensions to the Spanning Tree Protocol”. The extensions below would be detected by a compliant bridge implementation by checking the Protocol Version Identifier field of the BPDU.
This proposal was sent to the IEEE 802.1 committee on August 22, 2001. Initial response has been somewhat discouraging. The standards committee does not accept proposals from non-members, however becoming a member requires attending periodic meetings in Washington, DC. Without the means to attend these meetings, it is unlikely that this proposal will ever be considered. Perhaps an existing member of the committee will volunteer to represent this proposal in my stead…
Proposal
9.2.X Encoding of Bridge Names [New section between "9.2.5 Encoding of Bridge Identifiers" and "9.2.6 Encoding of Root Path Cost"] A Bridge Name shall be encoded as sixteen octets, taken to represent a string of printable ASCII characters (octal 040 through 0177 [decimal 32 through 127]). The string shall contain the first 16 characters of the bridge's hostname, if configured, or any other string which might uniquely identify the bridge. This parameter is intended for human consumption only. The most significant octet is the first character of the string. If the intended ASCII string is less than sixteen characters in length, the unused octets in the BPDU will be set to ASCII NUL (octal 0). If the intended string is greater than sixteen characters in length, any additional characters will be truncated so that only the first sixteen characters will appear in the BPDU. 9.3.1 Configuration BPDUs Figure 9-1 will have the following data structure appended to the diagram: +-------------+ | | 36 | | 37 | | 38 | | 39 | | 40 | | 41 | | 42 | Bridge | 43 | Name | 44 | | 45 | | 46 | | 47 | | 48 | | 49 | | 50 | | 51 +-------------+ b) The Protocol Version Identifier is encoded in Octet 3 of the BPDU. It takes the value 0000 0001.
64 Penguins

That was the goal, at least.
The story starts with my first attempt to install Linux on SPARC hardware. I had purchased an old, retired Sun Sparc 20 workstation so I could tinker with Solaris at home. With 512 MB RAM and a quad-Ethernet card, it put my POS PC to shame. But I didn’t know it had two CPUs until I finally got bootp, tftp, and NFS working and was able to net-boot a Linux kernel on it. Twin penguins graced my 20″ Sun monitor (sounds small in the LCD era, but for a CRT this thing was massive!) — one for each of the CPUs.
Years later, I was working with a company that used primarily Sun hardware, and I had the opportunity to install some four-way E450s for a new project. When they arrived, we still had a few weeks before we had to turn the machines over to the developers, so I decided to experiment again with SPARC Linux. The installation this time around was much easier, and, as expected, four handsome penguins saluted me at boot!
About this time, the Sun Enterprise 10000 (or e10k) was considered a pretty beefy server: a fully-populated machine with 16 system boards sported 64 CPUs, 64 Ethernet interfaces, and 64 GB RAM — why anybody needed 64 Ethernet interfaces is beyond me. Against our advice, our development organization purchased two e10ks for use as Java application servers. So, we carved out a chunk of space in the datacenter and Sun rolled these twin behemoths in.
I was helping the developers figure out how to carve the machine up into 4-CPU chunks (called domains), so I had a good idea of the project timelines. When I learned the machines would sit idle for a few weeks before they were needed, I realized this was the opportunity of a lifetime — a chance to build and boot my own army of 64 penguins!
Carl Raffa, one of the brightest and most interesting people I’ve ever had the pleasure of working with, was eager to help. I don’t remember now how long it took to get Linux booting on the e10k. We first tried booting a fully-populated machine with 16 domains, but when that failed, we decided to try it with Solaris and discovered some hardware problems. After swapping some system boards around, we were able to boot Solaris.
Then it was back to trying to boot Linux. I don’t remember now what all of the problems were — most were simple 64-bit or endianness issues — or how long it took before we could consistently boot a single-domain e10k. But once we got that working, we progressively configured the machine larger and larger until we approached a fully-populated 16 domains. When we got to 48 processors, CNET picked up the story. I don’t know for certain, but I suspect that at that time, our e10k represented the largest (and most expensive) Linux machine ever booted.
We can’t take any credit for the low-level device drivers that made any of this work possible. Most of that support was implemented by people who didn’t have access to the big iron, so our involvement was really as tinkerers and testers more than innovators. That said, it was still a fun and challenging project.
Sadly, we were never able to test with a fully-populated e10k. While we were at lunch for a coworker’s retirement party the Friday before we had to turn the machines over to the developers, Sun Professional Services reconfigured the machine and blew away our playground.
As fate would have it, the e10k doesn’t have a framebuffer — its only console output is emulated via the service processor, a separate microcontroller in the e10k chassis. So even though I was never able to see my army of 48 penguins (much less 64), I know they were there, and that will have to do.
Design Patterns in Ruby (Addison-Wesley Professional Ruby Series) by Russ Olsen
Implementing SOA : Total Architecture in Practice by Paul C. Brown
The CMDB Imperative: How to Realize the Dream and Avoid the Nightmares by Glenn O\’Donnell