<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>~clay</title>
	<atom:link href="http://daemons.net/~clay/feed/" rel="self" type="application/rss+xml" />
	<link>http://daemons.net/~clay</link>
	<description>merely my musings</description>
	<lastBuildDate>Wed, 10 Feb 2010 10:04:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Jumping into the Atlanta start-up scene</title>
		<link>http://daemons.net/~clay/2010/02/09/jumping-into-the-atlanta-start-up-scene/</link>
		<comments>http://daemons.net/~clay/2010/02/09/jumping-into-the-atlanta-start-up-scene/#comments</comments>
		<pubDate>Wed, 10 Feb 2010 06:47:29 +0000</pubDate>
		<dc:creator>clay</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[startups]]></category>

		<guid isPermaLink="false">http://daemons.net/~clay/?p=348</guid>
		<description><![CDATA[There&#8217;s no better way to meet Atlanta entrepreneurs than to become one, and there&#8217;s no better place to do that than at Atlanta Start-Up Weekend. If you haven&#8217;t heard of that, let me break it down for you: 150 people pitch 50 ideas and then start 15 companies in 3 days. Now how long those [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://twitter.com/gomodo"><img alt="Gomodo say: GO MORE, DO MORE" src="http://a3.twimg.com/profile_images/528095979/Screen_shot_2009-11-15_at_12.00.07_AM.png" title="Gomodo" class="alignleft" width="200" height="216" /></a>There&#8217;s no better way to meet Atlanta entrepreneurs than to become one, and there&#8217;s no better place to do that than at <a href="http://atlanta.startupweekend.org/">Atlanta Start-Up Weekend</a>. If you haven&#8217;t heard of that, let me break it down for you: 150 people pitch 50 ideas and then start 15 companies in 3 days. Now how long those companies last is anybody&#8217;s guess, but where else can you take a product from concept to launch in only three days? It&#8217;s exciting stuff, but it&#8217;s also a great way to meet other like-minded folk, and with that goal in mind I signed up for this year&#8217;s Start-Up Weekend.</p>
<p>First rule of Start-Up Weekend: it&#8217;s not a conference! It&#8217;s not like you go in and get lectured by CEOs older and richer than you. Start-up Weekend is hands-on from day one, and rightly so, because you only have a couple hours to pitch and vote on ideas before breaking into teams and building products and companies. In that way, Start-Up Weekend is *nothing* like <a href="http://startupschool.org/">Start-Up School</a>—both were time well spent, but vastly different.</p>
<p>Here&#8217;s a very stream-of-consciousness play-by-play of how the weekend went for me:</p>
<h2>Day One: The Pitch</h2>
<p>Friday around 6:00 people started rolling in, and by 7:00 the room was at capacity and tropically hot. The first round of pitches lasted maybe an hour. One by one people stood and presented their ideas. Most of them rambled but there were some great presenters in the crowd. We raised our hands in support of each idea as it was presented. I pitched a crowd-sourced translations service for web apps; didn&#8217;t get many votes, which was sort of a bummer because I could really use that tool for my current project, <a href="http://tennismatch.com">TennisMatch</a>.</p>
<p>After all ideas had been pitched, those with the most votes were selected for a second round of discussion with some time allotted for Q&#038;A. Since my idea didn&#8217;t make it to the second round, I listened for a project that might have similarities with some other concepts I had been mulling over. My ears perked up when <a href="http://twitter.com/qthrul">Jay Cuthrell</a> pitched an idea for a location-aware mobile app that would show you what&#8217;s going on near you right now. Of all the product ideas I heard that night, Jay&#8217;s idea (originally called SPACE or PAGE or AGAPE or something similarly acronymious) had the right combination of real-time, location-aware, mobile, and fun that I was looking for.</p>
<p>Also in the second round, <a href="http://twitter.com/MikeSchinkel">Mike Schinkel</a> pitched an idea for an event aggregation service, something like an Eventful–MeetUp hybrid. His idea sounded similar enough to Jay&#8217;s that the crowd suggested the ideas be merged into a super event aggregation and discovery tool. So, we went with it.</p>
<p>With the pitching complete, teams started forming. I joined Jay and Mike; my role on the team was technical, as I had a couple years hands-on experience building web apps in Django. We were also joined by a graphic designer, a lawyer, a copywriter, a tester, a marketer, and some serial entrepreneurs that I&#8217;ll neglect to name here in the interest of time—with the exception of <a href="http://twitter.com/jdawkinsatl">Justin Dawkins</a>—no relation to Richard, by the way—whom I will mention because he&#8217;s awesome <img src='http://daemons.net/~clay/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> .</p>
<p>Our team convened in an ATDC conference room and began brainstorming about how the product should work. Justin and I seemed to have the same vision for a web app with a very clean UI that simply showed you what was happening near you right now, with kayak.com–style filters. Mike was less interested in the &#8220;now&#8221; component and more interested in the ability to discover MeetUp–style planned events, so we explored a concept where the central UI metaphor consisted of two prominent tabs: &#8220;Now&#8221; and &#8220;Later&#8221;. Not arriving at any consensus that evening, we decided to sleep on it and come back in the morning with ideas and name suggestions.</p>
<h2>Day Two: The Rush</h2>
<p>Still no consensus. After a few hours of trying to design a product that appealed to both Mike and Jay, we decided to rethink the merger that had brought the two ideas together, and instead approach it from the perspective of Mike&#8217;s concept as an API upon which Jay&#8217;s idea could be built. Once we recast the problem like that, it was simply a matter of splitting into teams and building two pretty straight-forward products. Justin, Jay, and I branched off and started working on the mobile, location-aware, real-time event discovery tool, while Mike and the rest of the team started designing the ultimate event aggregation service, complete with handy API.</p>
<p>We decided an iPhone app was too ambitious for a weekend project, but that a mobile-friendly web site was doable. We were behind schedule, having only begun to design our product after lunch on Saturday. Fortunately, Jay is handy with Linux and got our Linode server instance up and running lickity split, complete with Django and MySQL. And I was able to reuse the Django deployment process and work-flow that I had designed for <a href="http://tennismatch.com">TennisMatch</a>—more details to come in a future post. So we were up and running with the necessary infrastructure by that afternoon.</p>
<p>I&#8217;m not usually very creative when it comes to naming things, but inspiration hit as I was playing with combinations of the words &#8220;go&#8221; and &#8220;mobile&#8221;: Gomodo! It&#8217;s short, rolls off the tongue, sounds interesting, is unique and brandable, and conjures a logo idea featuring a Komodo dragon. So we went with it. Jay doodled a dragon and our mascot was born—fortunately, he chose an obnoxious shade of orange to replace the dragon&#8217;s original obnoxious shade of brown, which had something of a poop patina.</p>
<p>Gomodo.com was taken, but not in use—have I mentioned how much I hate domain squatters?—so we opted for something fun and short: gomodo.me. We figured that if we were wildly successful and made it into the vernacular as a verb (a la &#8220;just google it&#8221;), that it&#8217;d be sort of catchy to say &#8220;gomodo me!&#8221;</p>
<p>In technical terms, the app is pretty simple: we have two models, Event and Venue, populated by an ingestor daemon that fetches event data from event aggregation sites like Mike&#8217;s <a href="http://eventtank.com/">EventTank</a> and <a href="http://eventful.com">Eventful</a>, and searched by a simple web front-end that figures out where you are using GeoIP or Javascript location services. I worked on the app all night, more out of excitement than necessity, and had it mostly working by 6:00 AM.</p>
<h2>Day Three: The Pitch, Part Deux</h2>
<p>The first real test of Gomodo&#8217;s utility was when I pulled out my iPhone early Sunday morning, and it told me about a running group meeting for breakfast and a run at Piedmont Park at 7:00 AM—so it successfully got the time and location and showed me something relevant! Much like a director must feel when he attends his film&#8217;s premier, there&#8217;s an inescapable feeling of pride and accomplishment when you&#8217;re first able to actually use the software that you&#8217;ve written. And if for nothing else, Start-Up Weekend was worth giving up my weekend for that reason alone.</p>
<p>The rest of the day we spent polishing details and working on the UI—and on our presentation. Justin took the lead on UI design and did a bang-up job making something simple and easy with the limited time and tools at our disposal. Jay took the lead on our presentation, and did a fantastic job of keeping it short and sweet.</p>
<p>Teams were feverishly racing to finish their products before presentations, and since ours was mostly done I took some time to stroll around and see what everyone was working on. Some folks had gotten wind of the fact that I worked at Twitter and that I knew Django, so I got to help a few teams with Twitter integration, Django arcana, and the vagaries of DNS.</p>
<p>As evening rolled around, all the teams reconvened and we launched into presentations. The energy in the room was electric, and it was really fun seeing what everyone had come up with. Almost all of the teams had working demos, and many had minimum viable products with which they could start attracting real customers. At least half of the companies had changed names, and a few had changed product ideas altogether.</p>
<p>Jay presented Gomodo—you can <a href="http://www.slideshare.net/qthrul/gomodo-helps-you-quickly-find-events-near-you">check out the slide deck  here</a>. I was pretty nervous because the presentation included a live demo—but not just Jay demonstrating the app on his phone. No, he gave everyone the URL and let them try it themselves, so we had about 50 visits to the site during the presentation. It was super exciting watching the logs and seeing the app serve up events just like it was supposed to!</p>
<h2>The Aftermath</h2>
<p>Something like 15 companies and alpha-quality products came out of Start-Up Weekend 3. Many of them are still alive. Gomodo hummed along with occasional care-and-feeding for a few months. Sadly, our main source of event data (<a href="http://eventful.com">Eventful</a>) seems to have wised up and started prohibiting full event dumps, so Gomodo doesn&#8217;t return useful results any more. Neither Jay nor Justin nor I have had the time to invest in looking for other sources of event data because we&#8217;re all busy with our own start-ups and consulting gigs. I still think Gomodo is a great proof-of-concept for a real-time, location-aware, mobile event discovery app, and I&#8217;d love to pick it back up once things settle down with <a href="http://tennismatch.com">TennisMatch</a>.</p>
<p>Thanks to <a href="http://twitter.com/lance">Lance Weatherby</a> and <a href="http://twitter.com/atdc">ATDC</a> for organizing and hosting such a valuable event. It&#8217;s great to know that the Atlanta start-up scene is alive and well. With the connections I&#8217;ve made over the weekend, I have no doubt that Atlanta is the best place to bring <a href="http://tennismatch.com">TennisMatch</a> to life.</p>
<p>[Ed note: this post is rather late. But hey, I'm starting a company, so the blog's pretty low priority.]</p>
]]></content:encoded>
			<wfw:commentRss>http://daemons.net/~clay/2010/02/09/jumping-into-the-atlanta-start-up-scene/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The various flavors of Ruby class attributes</title>
		<link>http://daemons.net/~clay/2009/05/16/the-various-flavors-of-ruby-class-attributes/</link>
		<comments>http://daemons.net/~clay/2009/05/16/the-various-flavors-of-ruby-class-attributes/#comments</comments>
		<pubDate>Sun, 17 May 2009 04:57:38 +0000</pubDate>
		<dc:creator>clay</dc:creator>
				<category><![CDATA[Engineering]]></category>
		<category><![CDATA[Geek]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://daemons.net/~clay/?p=292</guid>
		<description><![CDATA[Here&#8217;s a curious thing about Ruby: it&#8217;s got three flavors of class attributes. You can adorn your classes with class variables, class instance variables, and class constants. Not knowing the differences between them, and thinking that one of them might be useful for a project I was working on, I set out to figure out [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a curious thing about Ruby: it&#8217;s got three flavors of class attributes. You can adorn your classes with class variables, class instance variables, and class constants. Not knowing the differences between them, and thinking that one of them might be useful for a project I was working on, I set out to figure out how they all worked, especially with respect to inheritance.</p>
<p>As a trivial example of the design scenario I was working with, consider the case of an object-oriented vegetable garden. Vegetables come in all shapes, sizes, and colors, but we might want to say that all vegetables should be green unless we&#8217;ve said otherwise. We might start modeling our vegetable garden with a <code>Vegetable</code> class, and we could set a <code>color</code> attribute on it with a default value of <code>"green"</code>. <code>Lettuce</code>, which happens to be green, could inherit that attribute from <code>Vegetable</code>. <code>Eggplant</code>, however, should redefine <code>color</code> to be <code>"purple"</code>. </p>
<p>While certainly a contrived and flawed example, it demonstrates the behavior I was looking for. Let&#8217;s see how Ruby&#8217;s various flavors of class attributes can help us solve this design problem &#8212; or not.</p>
<p>First up, class variables:</p>
<pre class="brush: ruby;">
class Vegetable
  @@color = 'green'
  def color
    @@color
  end
end

class Eggplant &lt; Vegetable
  @@color = 'purple'
end

Vegetable.new.color  # =&gt; &quot;purple&quot;
Eggplant.new.color   # =&gt; &quot;purple&quot;
</pre>
<p>I wasn&#8217;t expecting that! Apparently class variables are shared among subclasses, so you can&#8217;t redefine their value in subclasses without changing the value in the base class.</p>
<p>Next up, class instance variables:</p>
<pre class="brush: ruby;">
class Vegetable
  @color = 'green'
  class &lt;&lt; self
    attr_reader :color
  end
  def color
    self.class.color
  end
end

class Lettuce &lt; Vegetable
  # no need to set @color here, since lettuce is green ... right?
end

Vegetable.new.color  # =&gt; &quot;green&quot;
Lettuce.new.color    # =&gt; nil
</pre>
<p>No love here, either: class instance variables are not accessible from subclasses at all. Probably for the better, since the code needed to access class instance variables from instances is even uglier than that needed to access class variables from instances.</p>
<p>Class constants are right out:</p>
<pre class="brush: ruby;">
class Vegetable
  Color = 'green'
  def color
    Color
  end
end

class Eggplant &lt; Vegetable
  Color = 'purple'
end

Vegetable.new.color  # =&gt; &quot;green&quot;
Eggplant.new.color   # =&gt; &quot;green&quot;
</pre>
<p>Class constants are statically bound, so the polymorphic call to Vegetable#color from an Eggplant instance references the Color constant defined in Vegetable, not the one defined in Eggplant.</p>
<p>Giving up on the class attributes approach, I resorted to defining the attributes at the instance level. I considered explicitly setting a <code>@color</code> instance variable in the class <code>initialize</code> method, but then the attribute wouldn&#8217;t be constant. Instead, the simplest implementation that does what I want seems to be to use methods that return constant values:</p>
<pre class="brush: ruby;">
class Vegetable
  def color
    'green'
  end
end

class Lettuce &lt; Vegetable
end

class Eggplant &lt; Vegetable
  def color
    'purple'
  end
end

Vegetable.new.color  # =&gt; &quot;green&quot;
Lettuce.new.color    # =&gt; &quot;green&quot;
Eggplant.new.color   # =&gt; &quot;purple&quot;
</pre>
<p>So as it turns out, each of Ruby&#8217;s class attribute mechanisms behaves differently in subclasses. I&#8217;m sure class variables, class instance variables, and class constants have their utility, but they aren&#8217;t useful for defining constant attributes shared by all instances of a class, but which can be redefined in subclasses.</p>
]]></content:encoded>
			<wfw:commentRss>http://daemons.net/~clay/2009/05/16/the-various-flavors-of-ruby-class-attributes/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Simulating synchronous programming with Python generators</title>
		<link>http://daemons.net/~clay/2009/05/15/simulating-synchronous-programming-with-python-generators/</link>
		<comments>http://daemons.net/~clay/2009/05/15/simulating-synchronous-programming-with-python-generators/#comments</comments>
		<pubDate>Sat, 16 May 2009 06:41:06 +0000</pubDate>
		<dc:creator>clay</dc:creator>
				<category><![CDATA[Engineering]]></category>
		<category><![CDATA[Geek]]></category>
		<category><![CDATA[Systems Management]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[twisted]]></category>

		<guid isPermaLink="false">http://daemons.net/~clay/?p=287</guid>
		<description><![CDATA[Robey&#8217;s recent article on naggati reminded me of something I&#8217;d been idly pondering for a while. Having recently written an SSH-based host discovery scanner on top of the Twisted asynchronous programming library, I too yearned for a way to write sequences of commands in plain-old imperative code, hiding the callback complexities of event-driven code from [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://robey.lag.net/">Robey</a>&#8217;s recent article on <a href="http://robey.lag.net/2009/03/02/actors-mina-and-naggati.html">naggati</a> reminded me of something I&#8217;d been idly pondering for a while. Having recently written an SSH-based host discovery scanner on top of the <a href="http://twistedmatrix.com/">Twisted</a> asynchronous programming library, I too yearned for a way to write sequences of commands in plain-old imperative code, hiding the callback complexities of event-driven code from users.</p>
<p><a href="http://en.wikipedia.org/wiki/Continuation">Continuations</a> fit the bill nicely. These are functions from which you can return multiple times, resuming right where you left off. With continuations, you could write a sequence of functions that might make asynchronous calls, but the framework would call your continuation back where it left off.</p>
<p>Python does not have first-class continuations, but it does have <a href="http://en.wikipedia.org/wiki/Generator_(computer_science)">generators</a>, and these behave almost identically (for my purposes, at least). A generator is a function that can yield multiple values. Well, actually, it returns an iterator, which then can be used to fetch multiple values from the generator. An example will probably make it clear:</p>
<pre class="brush: python;">
&gt;&gt;&gt; def finite_generator():
...     yield 'apple'
...     yield 'orange'
...     yield 'pear'
...
&gt;&gt;&gt; iterator = finite_generator()
&gt;&gt;&gt; for fruit in iterator:
...     print fruit
...
apple
orange
pear
</pre>
<p>Generators can also run forever:</p>
<pre class="brush: python;">
&gt;&gt;&gt; def infinite_generator():
...     i = 0
...     while True:
...         yield i
...         i += 1
...
&gt;&gt;&gt; iterator = infinite_generator()
&gt;&gt;&gt; for i in iterator:
...     print i
...
0
1
2
3
4
5
... and on and on forever
</pre>
<p>I had been using iterators in my asynchronous host scanner whenever I needed to run asynchronous commands within a loop. The asynchronous programming model prevents you from writing something like:</p>
<pre class="brush: python;">
for foo in bar:
    async_method(foo)
</pre>
<p>Instead, you would do something like this:</p>
<pre class="brush: python;">
def callback(response, iterator):
    do_something_with_response(response)
    schedule_next_task(iterator)

def schedule_next_task(iterator):
    try:
        foo = iterator.next()
        deferred = async_method(foo)
        deferred.addCallback(callback, iterator)
    except StopIteration:
        pass

iterator = iter(bar)
schedule_next_task(iterator)
</pre>
<p>It works like this:</p>
<ol>
<li>We get an iterator for our list, bar &#8212; this could just as well be a generator function</li>
<li>We fetch the first value from the iterator and pass it to the asynchronous method</li>
<li>That method presumably makes some type of I/O request, and responds immediately with a Deferred instance</li>
<li>We add a callback function to the Deferred and request that our iterator instance be passed to it when it is called</li>
<li>Control returns to the event loop, which might be busy scheduling other I/O requests</li>
<li>When the I/O completes, the event loop calls our callback function with the response and our iterator instance</li>
<li>The callback processes the response, and then repeats to step 2, fetching the next item from the iterator</li>
<li>When the iterator is exhausted, the cycle stops</li>
</ol>
<p>It occurred to me that I might be able to extend this concept to use generators as a sort of continuation to emulate synchronous code. What if, instead of returning strings or numbers from a generator, you returned functions? Some wrapper code could initialize the iterator, and then loop over it using the technique above, calling each function returned from the generator.</p>
<p>Tonight I decided to give this a try. Forking off an experimental branch and making a few modifications to the underlying fido host discovery routines, I crafted the following pleisiochronous host scanner:</p>
<pre class="brush: python;">
#!/usr/bin/env python
#
# Use a generator to simulate synchronous execution on an asynchronous framework
#

from fido.common.command import RemoteCommandExecutor
from fido.common.host.unix import UnixHost
from fido.common.ssh import SSHCredentials

from contrib.host.software.sun.host import SolarisHost
from contrib.host.software.linux.host import LinuxHost

from twisted.internet import reactor

import pprint

class PlesiochronousHostScanner(object):
    &quot;&quot;&quot;
    Scans a host over SSH, building a list of host attributes. Built on the Twisted asynchronous
    library, but uses a Python generator function to emulate garden variety synchronous code.
    &quot;&quot;&quot;

    def __init__(self, address, credentials):
        &quot;&quot;&quot;
        address: the IP address to scan
        credentials: a hash like: { 'username': '...' , 'password': '...', 'public_key': '&lt;optional&gt;' }
        &quot;&quot;&quot;

        self.address = address
        self.credentials = credentials
        self.host = UnixHost(RemoteCommandExecutor(address, credentials))
        self.pp = pprint.PrettyPrinter()

        # create some scratch space for the discovery methods
        self.context = { }

        # get an iterator from the generator
        self.iterator = self.scanning_sequence()

    def scanning_sequence(self):
        &quot;&quot;&quot;
        A typical nugget of synchronous code, with one important exception: asynchronous
        functions must be yielded instead of being called directly.
        &quot;&quot;&quot;
        yield self.host.uname

        os = self.context['uname'].split()[0]

        if os == 'SunOS':
            self.host = SolarisHost.from_host(self.host)
            yield self.host.zonename
            yield self.host.zones
        elif os == 'Linux':
            self.host = LinuxHost.from_host(self.host)
        else:
            print &quot;Unable to scan host type: %s&quot; % os
            return

        yield self.host.hostid
        yield self.host.device
        yield self.host.bios
        yield self.host.installed_memory_in_MB
        yield self.host.interfaces

    def callback(self, response):
        self.context.update(response)
        self.schedule_next_task()

    def errback(self, error):
        print &quot;scanning error: %s&quot; % error

    def schedule_next_task(self):
        try:
            function = self.iterator.next()
            deferred = function()
            deferred.addCallbacks(self.callback, self.errback)
        except StopIteration:
            self.scan_complete()

    def start_scan(self):
        self.schedule_next_task()

    def scan_complete(self):
        print &quot;Scan of %s is complete&quot; % self.address
        self.pp.pprint(self.context)

        # In this contrived example, we'll stop the reactor when we've finished scanning a host
        reactor.stop()

if __name__ == '__main__':
    import sys
    from optparse import OptionParser
    parser = OptionParser()
    parser.add_option(&quot;-u&quot;, &quot;--username&quot;, dest=&quot;username&quot;)
    parser.add_option(&quot;-p&quot;, &quot;--password&quot;, dest=&quot;password&quot;)

    (options, args) = parser.parse_args()

    address = args.pop(0)
    credentials = iter([SSHCredentials(options.username, options.password, None)])

    scanner = PlesiochronousHostScanner(address, credentials)

    reactor.callWhenRunning(scanner.start_scan)

    reactor.run()
</pre>
<p>It works:</p>
<pre class="brush: plain;">
satellite:~ clay$ python pleisio.py -u username -p password 10.20.30.40
Scan of 10.20.30.40 is complete
{'bios': {'bios_date': '11/15/2007',
          'bios_vendor': 'Sun Microsystems',
          'bios_version': 'S39_3B25'},
 'device': {'system_product': 'Sun Fire X2200 M2',
            'system_serial': '0805QAT0EA',
            'system_uuid': 'bd6529dc-fc79-0010-9e1b-001b245c1d4f',
            'system_vendor': 'Sun Microsystems',
            'system_version': 'Rev 50'},
 'hostid': '0ec2daa6',
 'installed_memory_in_MB': 32768,
 'interfaces': {'bge0': {'ipv4_addresses': [10.20.30.40],
                         'ipv6_addresses': [],
                         'mac_address': 00:1B:24:5C:18:B5,
                         'zone': None},
                'lo0': {'ipv4_addresses': [],
                        'ipv6_addresses': [],
                        'mac_address': None,
                        'zone': None}},
 'uname': 'SunOS myhost.mydomain.com 5.10 Generic_127112-11 i86pc i386 i86pc',
 'zonename': 'global',
 'zones': {'myzone': {'brand': 'native',
                          'ip_mode': 'shared',
                          'root': '/zones/myzone',
                          'state': 'running',
                          'uuid': '09fbf9ba-c0c5-408f-c9e9-820471983f25',
                          'zonename': 'myzone'}}}
</pre>
<p>The beauty of this approach is that the <code>PlesiochronousHostScanner#scanning_sequence</code> method is pretty straightforward, and could actually be written by end users familiar with Python but not familiar with asynchronous programming. It also makes discovery logic much easier to understand than in the state-machine-based asynchronous discovery engine I had previously built.</p>
<p>Having just concocted this tonight, I&#8217;m not sure whether this is something I&#8217;ll pursue, but it has been a fun experiment. I&#8217;m curious what other asynchronous programmers think of this approach.</p>
]]></content:encoded>
			<wfw:commentRss>http://daemons.net/~clay/2009/05/15/simulating-synchronous-programming-with-python-generators/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ruby, why do you torment me?</title>
		<link>http://daemons.net/~clay/2009/05/03/ruby-why-do-you-torment-me/</link>
		<comments>http://daemons.net/~clay/2009/05/03/ruby-why-do-you-torment-me/#comments</comments>
		<pubDate>Sun, 03 May 2009 17:39:03 +0000</pubDate>
		<dc:creator>clay</dc:creator>
				<category><![CDATA[Engineering]]></category>
		<category><![CDATA[Geek]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://daemons.net/~clay/?p=91</guid>
		<description><![CDATA[I want to like Ruby, I really do. The language is expressive, powerful, and eminently readable. Moreover, it&#8217;s fun to write. But try as I might to be productive, I keep running into quirks and gotchas with Ruby libraries that make we wish I was using a language with a more mature standard library. Things [...]]]></description>
			<content:encoded><![CDATA[<p>I want to like Ruby, I really do. The language is expressive, powerful, and eminently readable. Moreover, it&#8217;s fun to write. But try as I might to be productive, I keep running into quirks and gotchas with Ruby libraries that make we wish I was using a language with a more mature standard library. Things that take five minutes in Perl or Python have taken me all day to get working in Ruby.</p>
<p>SOAP support, which ought to be fully baked in Ruby by now, is still somewhat painful to work with. In Perl, SOAP just works. When I wrote our release orchestration tool a year ago, it took way longer than it should have to get Ruby talking to the SOAP iControl interface on our BigIP load balancers. By contrast, it took all of five minutes to get the Perl sample working &#8212; and that includes time spent installing the <code>SOAP::Lite</code> CPAN module.</p>
<p>Using Rails for the first time in a recent project, I was immediately struck by how little work is required to get a web app off the ground. I almost felt guilty for writing so little code. But a lot of the clever Rails magic that&#8217;s supposed to make life easier, didn&#8217;t. While error messages like, &#8220;<a href="http://blog.teksol.info/2007/03/09/expected-x-to-define-y-error">Expected foo.rb to define Foo</a>&#8221; seem pretty straight-forward, they are maddening when foo.rb does indeed define Foo. For their next trick, the Rails developers ought to use their meta-programming fu to produce intelligible error messages! <img src='http://daemons.net/~clay/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>We recently ported a Rails app to JRuby, and straight away we ran into bugs. JRuby couldn&#8217;t call Java correctly, and it had a file descriptor leak in Net::SSH that caused the site crawler component of our application to go belly-up after a few hours. And we should have known better than to try talking to Oracle from JRuby on Rails. The <code>activerecord-jdbc-adapter</code> component had myriad issues &#8212; goofy things like <code>"uninitialized constant ActiveRecord::VERSION"</code>, improper column name quoting, and incorrect integer datatype coercions. Finally we gave up and ported the database to MySQL.</p>
<p>I understand that Ruby and its libraries are open-source efforts written mostly by unpaid enthusiasts, so I try not to get too upset when things don&#8217;t work correctly. I wish I had the time to jump in and submit patches to fix issues when I run into them.</p>
]]></content:encoded>
			<wfw:commentRss>http://daemons.net/~clay/2009/05/03/ruby-why-do-you-torment-me/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>setuid() ate my CSS</title>
		<link>http://daemons.net/~clay/2009/05/02/setuid-ate-my-css/</link>
		<comments>http://daemons.net/~clay/2009/05/02/setuid-ate-my-css/#comments</comments>
		<pubDate>Sat, 02 May 2009 10:15:35 +0000</pubDate>
		<dc:creator>clay</dc:creator>
				<category><![CDATA[Engineering]]></category>
		<category><![CDATA[Systems Management]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[setuid]]></category>

		<guid isPermaLink="false">http://daemons.net/~clay/?p=244</guid>
		<description><![CDATA[We ran into an interesting problem while testing a new version of our code deployment tool tonight. By all appearances, the tool was happily deploying code and launching our Java applications, but one of our QA engineers noticed missing CSS on some pages in our test environment. Could that possibly be related to the code deployment tool, which essentially just untars an archive and forks off a little ruby script to start the application?]]></description>
			<content:encoded><![CDATA[<p>We ran into an interesting problem while testing a new version of our code deployment tool tonight. By all appearances, the tool was happily deploying code and launching our Java applications, but one of our QA engineers noticed missing CSS on some pages in our test environment. Could that possibly be related to the code deployment tool, which essentially just untars an archive and forks off a little ruby script to start the application?</p>
<p>Tracing the application&#8217;s system calls with truss revealed that the process was getting EPERM errors while trying to read the CSS files, which live on NFS. One of our more clever engineers decided to start up the application manually, not via the code deployment tool, and found that the CSS loaded just fine when the Java process was invoked directly from the shell. He compared user and group ids, as reported by ps, of JVMs started by our tool and those started manually and found no differences. Hmm.</p>
<p>When looking at the processes&#8217; <code>/proc/&lt;pid&gt;/cred</code> files, however, some differences were apparent. The <code>cred</code> file contains binary data and is best viewed with <code>od</code>:</p>
<p><code><br />
$ od -X /proc/$$/cred<br />
0000000 00002716 00002716 00002716 0000000a<br />
0000020 0000000a 0000000a 00000002 0000000a<br />
0000040 0000000e<br />
0000044<br />
</code></p>
<p>The file consists of a sequence of 32-bit id values in the following order:</p>
<p>* uid<br />
* euid<br />
* suid<br />
* gid<br />
* egid<br />
* sgid<br />
* supplemental group ids &#8230;</p>
<p>You can see how that maps to decimal ids by comparing with <code>id</code> output:</p>
<p><code><br />
$ id -a<br />
uid=10006(clay) gid=10(staff) groups=10(staff),14(sysadmin)<br />
</code></p>
<p>[Solaris geek aside: remember when you wanted to be a member of the sysadmin group so you could run the handy-dandy admintool?]</p>
<p>So what we noticed was that while the manually started JVM and the JVM launched via our code deployment tool had identical uid/euid/sgid and gid/egid/sgid values, they had different supplemental group id lists. Notably, the JVM running under the code deployment tool still had a gid of 0 in its supplemental group list. Letting our Java application servers traipse around the filesystem with elevated privileges is perhaps not the best &#8220;feature&#8221; we&#8217;ve ever implemented.</p>
<p>Trust but verify might be a good foreign policy, but our NFS server wasn&#8217;t having any of it. It thoroughly distrusted the Java app servers claiming to have elevated privileges, and rewarded them with EPERMs for their trouble. Root squash is, after all, a pretty common NFS security measure.</p>
<p>As it turns out, I had implemented a new feature in the code deployment agent to make it switch user id on startup. Previously we handled the user switch by launching the tool under <code>su</code>, but that approach prevented the tool from writing its pid file to the root-owned /var/run directory. The solution, I thought, was just to call <code>setgid()</code> followed by <code>setuid()</code>. We tested that code by verifying the user and group ids with <code>ps</code>, and it seemed to work just great.</p>
<p>Quick: what&#8217;s wrong with this?</p>
<pre class="brush: ruby;">
    def HostUtils.switch_user user
      pwent = Etc::getpwnam(user)
      Process::GID::change_privilege(pwent.gid)
      Process::UID::change_privilege(pwent.uid)
    end
</pre>
<p>Maybe several things, but certainly one thing is that I&#8217;ve completely neglected supplemental group ids. I should have written:</p>
<pre class="brush: ruby;">
    def HostUtils.switch_user user
      pwent = Etc::getpwnam(user)
      Process::initgroups(user, pwent.gid)
      Process::GID::change_privilege(pwent.gid)
      Process::UID::change_privilege(pwent.uid)
    end
</pre>
<p>That call to <a href="http://www.ruby-doc.org/core/classes/Process.html#M003208">Process::initgroups</a> makes all the difference. After making the change, the apps could access NFS and our test site looked all pretty again. Good thing we caught it when we did!</p>
<p>Turns out this is a fairly <a href="http://www.ruby-forum.com/topic/110492">common problem</a>, and I feel especially dumb for overlooking something so obvious. Live and learn.</p>
]]></content:encoded>
			<wfw:commentRss>http://daemons.net/~clay/2009/05/02/setuid-ate-my-css/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Engineering and Operations: Bridging the Divide</title>
		<link>http://daemons.net/~clay/2009/04/02/engineering-and-operations-bridging-the-divide/</link>
		<comments>http://daemons.net/~clay/2009/04/02/engineering-and-operations-bridging-the-divide/#comments</comments>
		<pubDate>Fri, 03 Apr 2009 03:22:56 +0000</pubDate>
		<dc:creator>clay</dc:creator>
				<category><![CDATA[Engineering]]></category>
		<category><![CDATA[Operations]]></category>
		<category><![CDATA[eng]]></category>
		<category><![CDATA[ops]]></category>
		<category><![CDATA[sre]]></category>

		<guid isPermaLink="false">http://daemons.net/~clay/?p=97</guid>
		<description><![CDATA[A recent post by the folks over at Agile Web Operations discusses some common sources of tension between engineering and operations organizations in web companies: a mutual lack of experience in each other&#8217;s domains, conflicting departmental goals, and an us–against–them mentality drawn from social identity theory. Continuing the conversation, I suggest there is a subtler but more fundamental source [...]]]></description>
			<content:encoded><![CDATA[<p>A <a href="http://www.agileweboperations.com/partitions-and-warfare/">recent post</a> by the folks over at <a href="http://www.agileweboperations.com/">Agile Web Operations</a> discusses some common sources of tension between engineering and operations organizations in web companies: a mutual lack of experience in each other&#8217;s domains, conflicting departmental goals, and an us–against–them mentality drawn from social identity theory. Continuing the conversation, I suggest there is a subtler but more fundamental source of tension between engineers and operators that has to do with their different mindsets: developers think in terms of <em>possibilities</em>, while administrators think in terms of <em>realities</em>.</p>
<p>Developers tend to downplay—perhaps unconsciously—the significance of bugs because they understand how to fix them: just make a one-line change over here and tweak a unit test over there and we&#8217;re done. If she has a good idea how to fix a bug, a developer may file it away in the &#8220;solved&#8221; folder in her brain before she&#8217;s actually implemented the fix. I&#8217;m not saying developers aren&#8217;t concerned with quality—they are—or that they don&#8217;t fix bugs—they do. But how many times have you spotted a bug and dutifully reported it only to have the developer reassuringly tell you that, &#8220;yes, it&#8217;s a known issue, we&#8217;ll fix it sooner or later—probably later&#8221;?</p>
<p>Systems administrators, on the other hand, face the stark binary reality that the software either works or it doesn&#8217;t. It survives unanticipated load or it doesn&#8217;t. The pager goes off or it doesn&#8217;t. No amount of reassurance that the bug can be fixed easily will appease an administrator—if it&#8217;s broken, it&#8217;s broken. And during the first few iterations of a new product, frequently the software is, in fact, broken. Over time, administrators become conditioned to believe the software will always be broken. It is not uncommon for administrators to express concern about bugs that were known to bring the site down in months past as if they might strike again the next time they are on-call, despite having been fixed months ago.</p>
<p>I point to the difference in mindsets not to disparage one group or the other—I wore a sysadmin hat long before I wore my developer hat—but to expose a fundamental flaw with organizational structures that divide all site development and maintenance functions into just these two separate–but–equal groups. Despite the benefits afforded by the separation of responsibility that you get with distinct engineering and operations groups, such a structure breeds an inefficiency that can threaten a company&#8217;s ability to scale.</p>
<p>How well does your operations team understand your software components and how they interact? How well does your engineering team understand how your systems are built, or how they&#8217;re connected? When engineering and operations don&#8217;t understand each other&#8217;s domains, the result is a release process that is at best inefficient, and at worst dangerously fragile.</p>
<p>For example, even though engineering may write detailed release notes describing new features, systems administrators often don&#8217;t speak the same language—release notes are practically useless to operations. As a result, valuable time is wasted translating release notes into a language that operations understands: listings of the commands needed to deploy the software. Conversely, developers may not understand infrastructure dependencies (operating system versions, libraries, NFS mount points, firewall rules), leading to confusion (and possibly outages) when code is deployed to machines where it has no chance of working.</p>
<p>In shops that split all work on the production site between the false dichotomy of engineering and operations roles, most software releases will require the two teams to work closely together, and so releases become a significant source of tension between the groups. If your systems administrators cringe whenever a release is coming up, you know you&#8217;ve got a problem. Releasing software is how your company grows, both by adding new features and by fixing bugs in the existing features. Yet if the administrators had it their way, there&#8217;d be no releases.</p>
<p>Just about the time I had started thinking that what is needed is a third team responsible solely for releases and other aspects of the production site, a friend and colleague forwarded along a <a href="http://research.google.com/archive/LinuxWorld-07-describeSRE.pdf">slide deck describing Google&#8217;s Site Reliability Engineering</a> organization. This team is responsible for one thing: the production web site. Engineering is free to develop features and operations is free to think strategically about systems, storage, and network. What makes the SRE team so interesting is that it is staffed with (junior) engineers, so it&#8217;s got an engineering mindset, but at the same time it&#8217;s charged with an operations objective: keeping the web site up.</p>
<p>Using Google&#8217;s Site Reliability Engineering concept to frame my own thoughts, I tend to think of SRE as an internal customer of both the engineering and operations teams. SRE expects engineering to deliver working software, and they will file and track bugs when that is not the case. SRE should also make an effort to <em>fix</em> the bugs they have filed—something not possible when operations files all the bugs against production. Conversely, SRE expects operations to deliver the server, storage, and network infrastructure required to meet the demands of the production site. SRE leads capacity planning efforts, placing orders with operations for server, storage, and network expansion. SRE also constantly monitors the production site and is responsible for installing and configuring the monitoring software.</p>
<p>With the addition of an SRE team, the division of responsibilities starts to look like this:</p>
<ul>
<li>Operations delivers infrastructure</li>
<li>Engineering delivers features</li>
<li>Site Reliability delivers uptime</li>
</ul>
<p>Despite the title, SRE should not report into the engineering organization. Rather, it should be its own, first-class, top-level organization, complete with executive representation at the VP level. I know what you&#8217;re saying: how much is it going to cost to staff yet another organization? Not as much as you think. Since SRE will off–load releases from operations, it may be possible to scale back the operations team. And since SRE removes the inefficiencies involved in translating release notes to deployment plans, engineers will have more time to work on features.</p>
<p>Operations managers may balk at the idea of scaling back their teams, arguing that they&#8217;re already so busy that they can&#8217;t complete all the work on their plates with the team they have. But look at what is consuming most of the time. It&#8217;s probably deployments, especially if they occur anywhere near the <a href="http://en.oreilly.com/velocity2009/public/schedule/detail/7641">frequency of deployments at Flickr</a>. Operations teams are also burdened with production incident response, a responsibility that rightly belongs in the SRE organization. By handing both releases and first–response duties off to SRE, the operations team workload will fall and the team can be restructured, eliminating some middle–tier systems administrator positions while retaining mostly the strategic thinkers (operations architects) and data center support engineers.</p>
<p>If you&#8217;ve been thinking &#8220;AUTOMATION!&#8221; while reading this, I hear you. I wholeheartedly agree that automation, when carefully conceived and conscientiously deployed, can improve efficiencies and ease the tensions stemming from a manual release process. But for all the advances in the current generation of automation tools, it may still be a while before automation tools can configure themselves. Until then, who should own the configuration? Engineering understands the intrinsic properties of the software—the proper sequence to start the various components, the proper settings for feature-related properties—but operations has the extrinsic knowledge necessary to make the site work—which databases are available, which load balancers to use, etc. It might be possible to arrive at a working configuration by merging the two team&#8217;s knowledge, but I think it makes more sense if one group owns production and the associated automation configuration and workflows.</p>
<p>Ultimately, by freeing other teams to focus on their core competencies, Site Reliability Engineering can increase uptime and help the company scale, all while reducing tensions among engineering and operations—what more can you want from a three-letter acronym?</p>
]]></content:encoded>
			<wfw:commentRss>http://daemons.net/~clay/2009/04/02/engineering-and-operations-bridging-the-divide/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Dual-booting Windows XP and Mac OS X on Intel Macs</title>
		<link>http://daemons.net/~clay/2006/03/15/dual-booting-windows-xp-and-mac-os-x-on-intel-macs/</link>
		<comments>http://daemons.net/~clay/2006/03/15/dual-booting-windows-xp-and-mac-os-x-on-intel-macs/#comments</comments>
		<pubDate>Wed, 15 Mar 2006 20:39:29 +0000</pubDate>
		<dc:creator>clay</dc:creator>
				<category><![CDATA[Geek]]></category>
		<category><![CDATA[mac]]></category>
		<category><![CDATA[windows]]></category>

		<guid isPermaLink="false">http://daemons.net/~clay/?p=50</guid>
		<description><![CDATA[I had hoped to use this blog entry to post step-by-step instructions for installing Windows XP on shiny new MacIntels, but alas, it appears that someone has beaten me to it. The winners, narf2006 and blanka, have been working on the problem for quite a while and have been posting pictures of their progress over [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/cwaidoh/4345926802/"><img alt="OSXP" src="http://farm3.static.flickr.com/2732/4345926802_8435b94027_m.jpg" title="OSXP" class="alignleft" width="240" height="180" /></a>I had hoped to use this blog entry to post step-by-step instructions for installing Windows XP on shiny new MacIntels, but alas, it appears that someone <a href="http://windowsxp.onmac.net/">has beaten me to it</a>. The winners, narf2006 and blanka, have been working on the problem for quite a while and have been posting <a href="http://www.flickr.com/photos/32436196@N00/110977744/in/photostream/">pictures of their progress</a> over the past few weeks. Today they uploaded a <a href="http://youtube.com/watch?v=nzH6OFpXgzI">video </a>showing a fresh install of XP on an iMac and they submitted their solution to sud0n1m for testing. Assuming the testing goes well, they will be declared the winners and will share the $13k prize.</p>
<p>While I&#8217;m disappointed not to have won, I&#8217;m encouraged to see that our approaches were remarkably similar. We both wrote custom EFI CSM drivers to emulate the BIOS functions Windows requires to boot. I&#8217;m very curious how they managed to get VGA working, and I won&#8217;t be surprised if it doesn&#8217;t work in either the Mini or the Macbook Pro, as it looks like they did all their development on an iMac.</p>
<p>If nothing else, this was a tremendous learning experience for me, and the timing couldn&#8217;t have been better. I have recently become interested in Intel assembly and protected mode programming, topics I considered too challenging years ago when I was doing DOS programming, but concepts that make much more sense to me now. I had randomly dusted off some old assembly language references from my bookshelf and read some chapters on protected mode programming a few weeks prior to beginning work on this project, so I was able to grasp what needed to be done to provide a working solution almost immediately.</p>
<p>With the deadline fast approaching and narf&#8217;s Flickr images haunting me, I coded quickly and didn&#8217;t spend much time making the code pretty or maintainable, but I&#8217;m still fairly proud of the code I wrote. It&#8217;s fairly succinct, but does quite a bit. Anyone who&#8217;s interested can <a href="http://github.com/claymation/osxp/tree/master">peruse the code here</a>.</p>
<p>The main function is in OSXP.c, which contains code for reading the GPT partition table that Mac OS X uses, writing a MBR partition table that Windows would use, and loading a bootloader from an El Torito bootable CD-ROM.</p>
<p>Code to switch from protected mode to real mode (called a <a href="http://en.wikipedia.org/wiki/Thunk">thunk</a>) is in thunk.c and asmthunk.s. It&#8217;s not very general, but it&#8217;s the first protected mode assembly code I&#8217;ve ever written, and, surprisingly enough, it works.</p>
<p>Code to setup a real-mode interrupt vector table and the real-mode interrupt service routine is in rmisr.s. For the most part, this duplicates the thunk code, but in reverse order: it switches from real mode to protected mode and then back again. This reverse thunk is necessary to emulate BIOS functions using the native EFI functions (read disk sector, print character, etc).</p>
<p>The protected-mode interrupt service routine, which does the actual BIOS emulation, is in pmisr.c. It reads and writes a saved register context on the stack that the real-mode code inherits upon return from the interrupt service routine.</p>
<p>Just writing those thunk and interrupt service routines is probably about 50% of a complete solution, and I&#8217;m very happy with how they came out. The first time I thunked into the MBR code, it worked better than I expected, and actually identified the active partition, loaded the boot sector from it, and jumped to it. Booting into the CD bootloader worked also, though it hung right after probing memory.</p>
<p>I&#8217;m pretty amazed that my code works, but to toot my own horn a little more, I&#8217;m pretty happy with some of the debug techniques I came up with along the way. Running EFI applications in the pre-boot environment leaves a lot to be desired. You can&#8217;t exactly fire up gdb and throw a bunch of watchpoints on your code to find out what&#8217;s going wrong (though I did toy with the idea of compiling GDB for EFI). And even if I had a debugger at my disposal, it&#8217;s hard to debug a protected mode/real mode transition.</p>
<p><a href="http://www.flickr.com/photos/extraspecial/107310593/in/set-72057594074203534/"><img alt="Take a picture now!" src="http://farm1.static.flickr.com/40/107310593_31eff49e81.jpg" title="Take a picture now!" class="alignleft" width="406" height="500" /></a>At first, my debugging consisted of writing copious amounts of debug output to the console and waiting for my tester, Chris, to run the code and <a href="http://www.flickr.com/photos/extraspecial/107310668/in/set-72057594074203534/">take a picture of the result</a>. At that time I didn&#8217;t have an Intel Mac to play with, so needless to say, progress was slow. We did get fairly far with this method, though. I wrote all of the partition table (GPT and MBR) code without ever having seen my code run with my own eyes. <a href="http://www.flickr.com/photos/extraspecial/sets/72057594074203534/">Chris&#8217; pictures</a> showed me what I needed to know, then I&#8217;d make a change, recompile, and Chris would download the new file and reboot. Again and again. Thanks, Chris.</p>
<p>I was unsure of how to access the CD-ROM under EFI, because it didn&#8217;t show up when I listed all the block I/O devices in Chris&#8217; Macbook Pro. Ryan was nice enough to lend me his shiny new Mac Mini, and I was pleased to find that the CD-ROM device showed up once there was a disk in the drive. I was even more pleased to see that my El Torito code worked almost flawlessly from the beginning.</p>
<p>Then came the hard part, thunking into real mode. I wrote code and pored over it for hours making sure it looked right, but when I ran it, the computer spontaneously rebooted. Unsure of which instruction(s) were causing the reboots, I added an infinite loop to a section of the code, recompiled, and ran it. The machine hung. That validated that all of the code above the loop was not causing the reboot (at least not directly), so I moved the loop down a few instructions and tried again. Using this technique I was eventually able to find all of the bugs in my code, which were all stupid syntax problems and not logic problems (there&#8217;s a big difference between $0&#215;10 and 0&#215;10 in AT&amp;T assembly).</p>
<p>Once the thunk and interrupt handler code was working, I started looking into why NTLDR was hanging after probing memory. NTLDR is 233kb and its disassembly is 97k lines long. I knew roughly where the hang occurred, based on the output I had from the last BIOS interrupt it invoked, but I wanted to narrow it down to a specific routine.</p>
<p>It occured to me that I could just write my own debugger of sorts. By handling interrupt 3, my code would get control anytime the NTLDR code stumbled onto an INT3 instruction. So, using the disassembly listing as a guide, I made a  list of instructions that I thought might be interesting stopping points, and wrote a routine to replace those instructions with 0xCC (the INT3 opcode). Then I wrote an INT3 handler that replaced the original instruction and decremented the return address by one so the original code would be run upon return from the interrupt service routine. And, to my surprise, it worked!</p>
<p>Earlier today I extended this a bit by automatically enabling trap mode in the INT3 handler so I could repatch the breakpoint instruction with 0xCC right after executing the original code. This change allowed a breakpoint to show up each time through a loop or each time a particular function was called. Then I went a step farther and added a breakpoint option that would leave trap mode enabled, so I could get a trace of every instruction executed between two points in the code. This would prove useful for figuring out which branches the program took as it made various tests and decisions.</p>
<p>The one thing that continues to elude me is how to enable VGA text mode. The standard graphics and text framebuffers (0xA0000 and 0xB8000) aren&#8217;t even mapped in memory, and reading from the VGA registers appears to return garbage. I suspect if I knew more about PCI programming, I&#8217;d be able to map the framebuffer memory and configure the I/O ports, but I&#8217;m at a loss for how to do that. In the interim, I&#8217;ve been patching NTLDR at load time so that it writes to my own text framebuffer, which I then scan on every interrupt in order to paint a portion of the emulated textmode screen. This is slow and hackish, and I know it&#8217;s possible to enable true VGA mode (narf and blanka did), but I don&#8217;t know where to begin.</p>
<p>All in all I&#8217;m pretty happy with the code I wrote, even though I didn&#8217;t win the contest. I&#8217;m looking forward to the next big challenge and an opportunity to use some of the techniques I learned on this project.</p>
]]></content:encoded>
			<wfw:commentRss>http://daemons.net/~clay/2006/03/15/dual-booting-windows-xp-and-mac-os-x-on-intel-macs/feed/</wfw:commentRss>
		<slash:comments>51</slash:comments>
		</item>
		<item>
		<title>Getting past ptrace()</title>
		<link>http://daemons.net/~clay/2005/12/26/pesky-ptrace/</link>
		<comments>http://daemons.net/~clay/2005/12/26/pesky-ptrace/#comments</comments>
		<pubDate>Mon, 26 Dec 2005 13:38:37 +0000</pubDate>
		<dc:creator>clay</dc:creator>
				<category><![CDATA[Geek]]></category>
		<category><![CDATA[debug]]></category>
		<category><![CDATA[os x]]></category>
		<category><![CDATA[ppc]]></category>
		<category><![CDATA[ptrace]]></category>

		<guid isPermaLink="false">http://daemons.net/~clay/?p=36</guid>
		<description><![CDATA[During the holiday break I figured I'd learn some PowerPC (PPC) assembly while I still had the chance, given Apple's decision to move to x86 early next year. Debugging simple programs isn't much fun, though, so I figured I'd start poking around with a big application. An annoying thing kept happening everytime I fired up the app under the debugger, though; it exited immediately with a strange error code:
<code>
% gdb /Applications/blah.app/Contents/MacOS/blah
(gdb) run
Program exited with code 055.
</code>

Bummer. I remembered the same problem happening with that commercial Solaris app years before, but I never paid much attention to it back then, because it was possible to work around the problem by attaching to the program after it was already up and running. Apple seems to be a bit smarter than that, though, because whenever I attached to a running copy of the application, GDB seg faulted:
<code>
(gdb) attach 17813
Attaching to program: `/Applications/blah.app/Contents/MacOS/blah', process 17813.
Segmentation fault
</code>

Since I wanted to know why the application was exiting, I figured I'd step through it one instruction at a time until I found the culprit.]]></description>
			<content:encoded><![CDATA[<p>The holidays have given me a chance to relax and geek around with Mac OS X, and I&#8217;ve finally gotten around to installing the Developer Tools package, which includes the GNU C compiler (gcc) and the GNU debugger (gdb). Over the years I&#8217;ve gotten pretty comfortable using gdb to troubleshoot programs on SPARC and Intel platforms. Debugging requires that you know a bit about assembly language, and I had learned x86 assembly back in the day when I was coding fun little graphics toys for DOS, and had learned some SPARC assembly trying to, uhm, correct an annoying license issue in a piece of commercial software.</p>
<p>During the holiday break I figured I&#8217;d learn some PowerPC (PPC) assembly while I still had the chance, given Apple&#8217;s decision to move to x86 early next year. Debugging simple programs isn&#8217;t much fun, though, so I figured I&#8217;d start poking around with a big application. An annoying thing kept happening everytime I fired up the app under the debugger, though; it exited immediately with a strange error code:<br />
<code><br />
% gdb /Applications/blah.app/Contents/MacOS/blah<br />
(gdb) run<br />
Program exited with code 055.<br />
</code></p>
<p>Bummer. I remembered the same problem happening with that commercial Solaris app years before, but I never paid much attention to it back then, because it was possible to work around the problem by attaching to the program after it was already up and running. Apple seems to be a bit smarter than that, though, because whenever I attached to a running copy of the application, GDB seg faulted:<br />
<code><br />
(gdb) attach 17813<br />
Attaching to program: `/Applications/blah.app/Contents/MacOS/blah', process 17813.<br />
Segmentation fault<br />
</code></p>
<p>Since I wanted to know why the application was exiting, I figured I&#8217;d step through it one instruction at a time until I found the culprit.<br />
<span id="more-36"></span><br />
In simple programs, it&#8217;s usually sufficient to set a breakpoint on the <code>main()</code> function, but that skips over the C run-time start-up code. Since I wanted to step through each and every instruction in the program, including the start-up code, I used Apple&#8217;s <code>otool</code> command to print the application&#8217;s text segment, which gave me the address of the first instruction. I&#8217;ll demonstrate with /bin/ls:<br />
<code><br />
[satellite:~] clay% otool -tvV /bin/ls | head -5<br />
/bin/ls:<br />
(__TEXT,__text) section<br />
00001ac4        or      r26,r1,r1<br />
00001ac8        addi    r1,r1,0xfffc<br />
00001acc        rlwinm  r1,r1,0,0,26<br />
00001ad0        li      r0,0x0<br />
</code></p>
<p>This shows that the first instruction in /bin/ls is located at address 0&#215;1ac4, so I set a breakpoint for that and started running the program. Again, using /bin/ls to demonstrate:<br />
<code><br />
(gdb) break *0x1ac4<br />
Breakpoint 1 at 0x1ac4<br />
(gdb) run<br />
Starting program: /bin/ls</p>
<p>Breakpoint 1, 0x00001ac4 in ?? ()<br />
</code></p>
<p>Now it helps to see the instruction that&#8217;s about to be executed, so I setup a display that prints the instruction at $pc (the program counter) everytime the program stops:<br />
<code><br />
(gdb) display/i $pc<br />
1: x/i $pc  0x1ac4:     mr      r26,r1<br />
</code></p>
<p>At first glance it may seem that GDB and <code>otool</code> print different instructions at address 0&#215;1ac4:<br />
<code>
<pre>
otool:   or r26,r1,r1
GDB:     mr r26,r1
</pre>
<p></code></p>
<p>GDB is nice and prints user-friendly mnemonics like <code>mr</code> (for move register), which are easier to read and understand than the literal instructions that <code>otool</code> prints, but the instructions are identical.</p>
<p>Now the plan was to step over instruction after instruction, using <code>stepi</code>, until I found the offending bit of code. This was somewhat tedious, so I ended up using <code>nexti</code> quite a bit, which steps over function calls instead of stepping into them, speeding up the debugging. The only problem with that was that sometimes I would step over the function that caused the program to abort. Whenever that happened, I added a breakpoint on the address of the function call and then used <code>stepi</code> to step into the offending function. After several rounds of this (I think I was up to 11 breakpoints) I finally found where the program was stopping:<br />
<code><br />
0x90054204 in ptrace ()<br />
1: x/i $pc  0x90054204 &lt;ptrace +36&gt;:    sc<br />
(gdb) stepi<br />
Program exited with code 055.<br />
</code></p>
<p>&#8220;sc&#8221; is the PPC instruction that invokes a system call. Since GDB is nice enough to print symbols when it can, I decided to see if ptrace is a documented system call in Darwin. Sure enough, it is:</p>
<blockquote><p><code><br />
PTRACE(2)                   BSD System Calls Manual                  PTRACE(2)</p>
<p>NAME<br />
     ptrace -- process tracing and debugging</p>
<p>SYNOPSIS<br />
     #include &lt;sys/types.h&gt;<br />
     #include &lt;sys/ptrace.h&gt;</p>
<p>     int<br />
     ptrace(int request, pid_t pid, caddr_t addr, int data);</p>
<p>DESCRIPTION<br />
     ptrace() provides tracing and debugging facilities.<br />
</code></p></blockquote>
<p>Well that sounds interesting, since I was having trouble debugging the process, and the process is calling this system call that provides some type of debugging facility. The man page describes how a debugger (like GDB) might use ptrace() to control another process, but that wasn&#8217;t what I was interested in. I was looking for a way a debugged program might be able to control the debugger. Reading on, the man page hinted that this might be possible:</p>
<blockquote><p><code><br />
     [...] except for one special case noted below, all ptrace() calls are made by the tracing process [...]<br />
</code></p></blockquote>
<p>Hmm, does that one special case allow a traced process to invoke ptrace() to control a debugger? Sure enough, it does:</p>
<blockquote><p><code><br />
PT_DENY_ATTACH</p>
<p>This request is the other operation used by the traced process; it allows a process that is not currently being traced to deny future traces by its parent.  All other arguments are ignored.  If the process is currently being traced, it will exit with the exit status of ENOTSUP; otherwise, it sets a flag that denies future traces.  An attempt by the parent to trace a process which has set this flag will result in a segmentation violation in the parent.<br />
</code></p></blockquote>
<p>Bingo! This explains both the seg fault and the exit immediately after startup. So now that I knew why the program wasn&#8217;t cooperating with GDB, I could coerce it into playing nicely. First, a disassembly of the ptrace() function:<br />
<code><br />
(gdb) disas<br />
Dump of assembler code for function ptrace:<br />
0x900541e0 &lt;ptrace +0&gt;:  li      r7,0<br />
0x900541e4 &lt;ptrace +4&gt;:  mflr    r0<br />
0x900541e8 &lt;ptrace +8&gt;:  bcl-    20,4*cr7+so,0x900541ec &lt;ptrace +12&gt;<br />
0x900541ec  &lt;ptrace +12&gt;: mflr    r8<br />
0x900541f0  &lt;ptrace +16&gt;: mtlr    r0<br />
0x900541f4  &lt;ptrace +20&gt;: addis   r8,r8,4091<br />
0x900541f8  &lt;ptrace +24&gt;: lwz     r8,7860(r8)<br />
0x900541fc  &lt;ptrace +28&gt;: stw     r7,0(r8)<br />
0x90054200  &lt;ptrace +32&gt;: li      r0,26<br />
0x90054204  &lt;ptrace +36&gt;: sc<br />
0x90054208  &lt;ptrace +40&gt;: b       0x90054210 &lt;ptrace +48&gt;<br />
0x9005420c  &lt;ptrace +44&gt;: b       0x90054230 &lt;ptrace +80&gt;<br />
0x90054210  &lt;ptrace +48&gt;: mflr    r0<br />
0x90054214  &lt;ptrace +52&gt;: bcl-    20,4*cr7+so,0x90054218 &lt;ptrace +56&gt;<br />
0x90054218 &lt;ptrace +56&gt;: mflr    r12<br />
0x9005421c  &lt;ptrace +60&gt;: mtlr    r0<br />
0x90054220  &lt;ptrace +64&gt;: addis   r12,r12,4091<br />
0x90054224  &lt;ptrace +68&gt;: lwz     r12,7792(r12)<br />
0x90054228  &lt;ptrace +72&gt;: mtctr   r12<br />
0x9005422c  &lt;ptrace +76&gt;: bctr<br />
0x90054230  &lt;ptrace +80&gt;: blr<br />
End of assembler dump.<br />
</code></p>
<p>Now I don&#8217;t know exactly what all of that means, but I saw the &#8220;sc&#8221; instruction in the middle of the function and I knew I wanted to avoid making that system call. The obvious thing to try was to set a breakpoint on function entry and then use the <code>jump</code> command to jump right to the last instruction in the function, effectively bypassing that pesky system call:<br />
<code><br />
(gdb) break ptrace<br />
Breakpoint 11 at 0x900541f4<br />
(gdb) run<br />
Starting program: /Applications/blah.app/Contents/MacOS/blah</p>
<p>Breakpoint 11, 0x900541f4 in ptrace ()<br />
(gdb) jump *0x90054230<br />
Continuing at 0x90054230.<br />
</code></p>
<p>When I did that, breakpoint 11 fired again, so I did the <code>jump</code> again, and breakpoint 11 fired again, so, well, the short story is that I had to do the <code>jump</code> 6 times before the program continued on. Maybe the developers were paranoid and called ptrace() six times. Whatever the reason, after finally getting past all six ptrace()s, the program started up just fine:<br />
<code><br />
(gdb) jump *0x90054230<br />
Continuing at 0x90054230.<br />
Reading symbols for shared libraries . done</p>
<p></code></p>
<p>The program ran as normal, under control of the debugger. Now that I know how to get it to run under GDB I can start poking around with some of the more interesting bits of code.</p>
<p>I hope everyone&#8217;s having a nice holiday, and I promise to post more often! <img src='http://daemons.net/~clay/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://daemons.net/~clay/2005/12/26/pesky-ptrace/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>No trolls under the root bridge</title>
		<link>http://daemons.net/~clay/2001/08/22/no-trolls-under-the-root-bridge/</link>
		<comments>http://daemons.net/~clay/2001/08/22/no-trolls-under-the-root-bridge/#comments</comments>
		<pubDate>Wed, 22 Aug 2001 20:10:56 +0000</pubDate>
		<dc:creator>clay</dc:creator>
				<category><![CDATA[Geek]]></category>
		<category><![CDATA[ethernet]]></category>
		<category><![CDATA[ieee]]></category>
		<category><![CDATA[spanning tree]]></category>

		<guid isPermaLink="false">http://daemons.net/~clay/?p=148</guid>
		<description><![CDATA[A Proposed Extension to the IEEE 802.1D Spanning Tree Protocol
Network administrators who maintain complicated Layer 2 networks should be very familiar with the operation of the Spanning Tree Protocol. One common point of confusion among administrators is the use of (essentially) arbitrary numbers to identify the current root bridge. While the root bridge identifier is [...]]]></description>
			<content:encoded><![CDATA[<h2>A Proposed Extension to the IEEE 802.1D Spanning Tree Protocol</h2>
<p>Network administrators who maintain complicated Layer 2 networks should be very familiar with the operation of the Spanning Tree Protocol. One common point of confusion among administrators is the use of (essentially) arbitrary numbers to identify the current root bridge. While the root bridge identifier is deterministic, comprised of bridge priority and MAC address, it is not information that network administrators can use without consulting a reference that maps MAC addresses to human-friendly names.</p>
<p>Bridges and switches are often given hostnames for administrative purposes. The extension to Spanning Tree Protocol described below adds this user-friendly hostname information to the BPDU. The hostname could then be used by equipment vendors for diagnostic purposes, i.e., Cisco&#8217;s &#8220;show spantree&#8221; command.</p>
<p>The extensions described below are relative to <a href="http://standards.ieee.org/getieee802/">ANSI/IEEE Std 802.1D, 1998 Edition</a>. I believe these changes will not affect existing implementations of Spanning Tree Protocol, since the standard indicates that &#8220;any octets beyond Octet 35 are ignored&#8221;, and that &#8220;the Protocol Version Identifier is not checked on receipt, in order to allow the possibility of future specification of extensions to the Spanning Tree Protocol&#8221;. The extensions below would be detected by a compliant bridge implementation by checking the Protocol Version Identifier field of the BPDU.</p>
<p>This proposal was sent to the IEEE 802.1 committee on August 22, 2001. Initial response has been somewhat discouraging. The standards committee does not accept proposals from non-members, however becoming a member requires attending periodic meetings in Washington, DC. Without the means to attend these meetings, it is unlikely that this proposal will ever be considered. Perhaps an existing member of the committee will volunteer to represent this proposal in my stead&#8230;</p>
<h3>Proposal</h3>
<pre>
9.2.X Encoding of Bridge Names
[New section between "9.2.5 Encoding of Bridge Identifiers" and "9.2.6
Encoding of Root Path Cost"]

A Bridge Name shall be encoded as sixteen octets, taken to represent a
string of printable ASCII characters (octal 040 through 0177 [decimal 32
through 127]).  The string shall contain the first 16 characters of the
bridge's hostname, if configured, or any other string which might
uniquely identify the bridge.  This parameter is intended for human
consumption only.

The most significant octet is the first character of the string.  If the
intended ASCII string is less than sixteen characters in length, the
unused octets in the BPDU will be set to ASCII NUL (octal 0).  If the
intended string is greater than sixteen characters in length, any
additional characters will be truncated so that only the first sixteen
characters will appear in the BPDU.

9.3.1 Configuration BPDUs

Figure 9-1 will have the following data structure appended to the
diagram:

+-------------+
|             | 36
|             | 37
|             | 38
|             | 39
|             | 40
|             | 41
|             | 42
|   Bridge    | 43
|    Name     | 44
|             | 45
|             | 46
|             | 47
|             | 48
|             | 49
|             | 50
|             | 51
+-------------+

b) The Protocol Version Identifier is encoded in Octet 3 of the BPDU. It
takes the value 0000 0001.
</pre>
]]></content:encoded>
			<wfw:commentRss>http://daemons.net/~clay/2001/08/22/no-trolls-under-the-root-bridge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>64 Penguins</title>
		<link>http://daemons.net/~clay/2000/10/21/64-penguins/</link>
		<comments>http://daemons.net/~clay/2000/10/21/64-penguins/#comments</comments>
		<pubDate>Sat, 21 Oct 2000 21:32:15 +0000</pubDate>
		<dc:creator>clay</dc:creator>
				<category><![CDATA[Geek]]></category>
		<category><![CDATA[e10k]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[sparc]]></category>

		<guid isPermaLink="false">http://daemons.net/~clay/?p=170</guid>
		<description><![CDATA[
That was the goal, at least.
The story starts with my first attempt to install Linux on SPARC hardware. I had purchased an old, retired Sun Sparc 20 workstation so I could tinker with Solaris at home. With 512 MB RAM and a quad-Ethernet card, it put my POS PC to shame. But I didn&#8217;t know [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-171" title="penguin" src="http://daemons.net/~clay/wp-content/uploads/2009/04/penguin.png" alt="penguin" width="80" height="80" /></p>
<p>That was the goal, at least.</p>
<p>The story starts with my first attempt to install Linux on SPARC hardware. I had purchased an old, retired Sun Sparc 20 workstation so I could tinker with Solaris at home. With 512 MB RAM and a quad-Ethernet card, it put my POS PC to shame. But I didn&#8217;t know it had two CPUs until I finally got bootp, tftp, and NFS working and was able to net-boot a Linux kernel on it. Twin penguins graced my 20&#8243; Sun monitor (sounds small in the LCD era, but for a CRT this thing was massive!) — one for each of the CPUs.</p>
<p>Years later, I was working with a company that used primarily Sun hardware, and I had the opportunity to install some four-way E450s for a new project. When they arrived, we still had a few weeks before we had to turn the machines over to the developers, so I decided to experiment again with SPARC Linux. The installation this time around was much easier, and, as expected, four handsome penguins saluted me at boot!</p>
<p>About this time, the Sun Enterprise 10000 (or e10k) was considered a pretty beefy server: a fully-populated machine with 16 system boards sported 64 CPUs, 64 Ethernet interfaces, and 64 GB RAM — why anybody needed 64 Ethernet interfaces is beyond me. Against our advice, our development organization purchased two e10ks for use as Java application servers. So, we carved out a chunk of space in the datacenter and Sun rolled these twin behemoths in.</p>
<p>I was helping the developers figure out how to carve the machine up into 4-CPU chunks (called domains), so I had a good idea of the project timelines. When I learned the machines would sit idle for a few weeks before they were needed, I realized this was the opportunity of a lifetime — a chance to build and boot my own army of 64 penguins!</p>
<p>Carl Raffa, one of the brightest and most interesting people I&#8217;ve ever had the pleasure of working with, was eager to help. I don&#8217;t remember now how long it took to get Linux booting on the e10k. We first tried booting a fully-populated machine with 16 domains, but when that failed, we decided to try it with Solaris and discovered some hardware problems. After swapping some system boards around, we were able to boot Solaris.</p>
<p>Then it was back to trying to boot Linux. I don&#8217;t remember now what all of the problems were — most were simple 64-bit or endianness issues — or how long it took before we could consistently boot a single-domain e10k. But once we got that working, we progressively configured the machine larger and larger until we approached a fully-populated 16 domains. When we got to 48 processors, CNET <a href="http://news.cnet.com/Test-version-of-new-Linux-kernel-available/2100-1001_3-247983.html">picked up the story</a>. I don&#8217;t know for certain, but I suspect that at that time, our e10k represented the largest (and most expensive) Linux machine ever booted.</p>
<p>We can&#8217;t take any credit for the low-level device drivers that made any of this work possible. Most of that support was implemented by people who didn&#8217;t have access to the big iron, so our involvement was really as tinkerers and testers more than innovators. That said, it was still a fun and challenging project.</p>
<p>Sadly, we were never able to test with a fully-populated e10k. While we were at lunch for a coworker&#8217;s retirement party the Friday before we had to turn the machines over to the developers, Sun Professional Services reconfigured the machine and blew away our playground.</p>
<p>As fate would have it, the e10k doesn&#8217;t have a framebuffer — its only console output is emulated via the service processor, a separate microcontroller in the e10k chassis. So even though I was never able to see my army of 48 penguins (much less 64), I know they were there, and that will have to do.</p>
]]></content:encoded>
			<wfw:commentRss>http://daemons.net/~clay/2000/10/21/64-penguins/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
