A Beginner’s Guide to Practical Syntactic Magic: the tale of Hpricot’s sudo-constructor

I spent much of today working with Hpricot. And so, as when spending significant solo time with any of why the lucky stiff‘s code, I found myself admiring all the neat little syntactic nicknacks strewn about to cozy up the place.

One of the best is the way you get started. Hpricot is a toolkit for parsing and manipulating XHTML. So, obviously enough, just about every time you invoke it, you’re going to want to pass it an XHTML document so it can, you know, prep it for parsing and manipulation. And how do you do that? What’s the syntax?

Hpricot(my_document)

That’s it. There’s no “Hpricot::Base.new(my_document).parse” nonsense, or any of the other more or less torturous common options. Not a single character of syntax is wasted.

But, if you’re a mere Ruby mortal, like me, you’re probably looking at that code and going: ‘Huh?’ Isn’t Hpricot a constant? It’s capitalized. But it’s taking an argument like a method. How is that even valid Ruby? How can the parser tell if it’s a constant or a method?

Well, it turns out that there’s no rule against having capitalized method names; the parser can tell it’s a method because it’s got an argument. And that’s all that’s required for it to be sent off to method- instead of constant-dispatch (as Chris pointed out, this is one advantage of not having Ruby be “turtles all the way down”; Smalltalk couldn’t do this).

Beyond providing fodder for a Language Nerd Attack, though, what’s the upshot? How’s this fact help the man on the street? Well: there’s nothing actually sophisticated going on here. So: you can do it too.

Here’s an admittedly contrived (and useless) example:

class Dogger
def initialize
puts "dog"
end
end
def Dogger()
Dogger.new
end

a simple class definition followed by a simple method invoking it.

Which leaves us with the ability to write two snippets of code that, while they may look nearly the same, do very different things:

>> Dogger
=> Dogger
>> Dogger()
dog
=> #<Dogger:0x15d2478>

and that is exactly from where _why’s use of this little quirk derives its leverage. This trick makes you feel like you’re invoking a constructor or calling some other kind of class method when you are, in fact, doing nothing of the sort. Just as our Dogger() method above needn’t have done anything remotely related to the Dogger class, _why could have named his method Clown() or ChunkyBacon() while still calling Hpricot.parse(input, opts) inside it (which is exactly what Hpricot() does).

But his chosen usage is particularly inspired. In one fell swoop, he gives his whole complex feature-ful library a single welcoming point of entry. You need never concern yourself with the internal machinery; just heave a document over the transom and let the library figure out what to do with it. And this is the wider lesson of _why: real power comes from combining the playfulness (better: the insouciance) needed to probe, question, and even bend the limits of the language with the discipline and aesthetic sense required to use what you find not to obfuscate and confuse, but to write elegant and, above all, more humane code.

I mean, Hpricot would definitely not be a better library if that method was called ChunkyBacon(). Right?

Tagged: , , , , ,

This entry was posted in learns_to. Bookmark the permalink.

0 Responses to A Beginner’s Guide to Practical Syntactic Magic: the tale of Hpricot’s sudo-constructor

  1. Dr Nic says:

    Damn, it never occurred to me to ask “how does ‘Hpricot(…)’ work?” Thanks for asking the question AND sussing out the solution.

  2. Cheap Reveal says:

    CHUNKY BACON! CHUNKY BACON! CHUNKY BACON! CHUNKY BACON! CHUNKY BACON! CHUNKY BACON! CHUNKY BACON! CHUNKY BACON!
    THANKS, GREAT LITTLE TUTORIAL

  3. In Smalltalk, methods can begin with capital letters; it’s just not usually done. However, all messages do need a receiver, so in Smalltalk you would have to write something like:
    Parser Hpricot: someXhtml.
    Seeing as the method named Hpricot is badly named – it doesn’t say anything about what the method does – I’d instead write something like:
    Parser parseXhtml: someXhtml.
    which is way, way more obvious for the porr follow on developer who has to read the code.
    Which leaves me wondering why you think using clever syntax that obscures meaning is a good thing? I prefer to leave that the C programmers, myself…

  4. I don’t think “turtles all the way down” means what you think it means.
    Turtles all the way down means the language implementation is written in, and fully accessible from, the language. If Ruby’s lexer, parser, and compiler were all written in Ruby and available from within Ruby, running on top of a tiny virtual machine, then Ruby would have turtles all the way down, just like Smalltalk or Common Lisp (but notably not most Schemes). In a Smalltalk environment, for example, I can go to the Compiler class and see how Smalltalk code is turned into bytecode. I can then tweak it if I want, giving it new capabilities, fixing bugs, or even replacing it entirely. This isn’t a hypothetical example, either; Squeak Smalltalk has modified its compiler in the past to add traits (similar to mixins) and change the way closures work without either changing the virtual machine one whit or touching a single line of C. *That’s* turtles-all-the-way-down.
    If I’m inferring correctly, you think “turtles all the way down” means a kind of single paradigm for syntax à la Smalltalk, Lisp, Self, or Io. That’s an entirely different issue. In all of those but Smalltalk, there’d be nothing prohibiting doing the “Hpricot(foo)” trick–verbaitim in Io, in fact. Even in Smalltalk, though, you could, due to its turtles-all-the-way-down approach, modify the Compiler to handle your funky new syntax, if eschewing readability for terseness really floated your boat. You could even make it support the full Ruby syntax (and there are projects underway to do just that). You have more flexibility than Ruby, not less.
    The thing is, I strongly agree with James on this one: Hpricot(foo) conveys one hell of a lot less to me as a code reader than even Hpricot.parse(foo) would. I’d still have to look up what on Earth Hpricot *does*, but at least I’d know what’s going on.
    Don’t get me wrong; Hpricot::Base.new(foo).parse is a horrible API, but Hpricot(foo) is just as bad, because the reader can’t divine the intention of the writer. Sure, why’s method saves a few keystrokes, but at the expense of transparency and maintainability. There’s a perfectly reasonable middle ground, and James has it.

  5. Chris Carter says:

    I think that the Hpricot(foo) syntax is a good idea, if you realize that it is just an alternate constructor. Usually the short-hand alternate constructors use [], but I think making it a () method call reads better than Struct[:name, :age]…

  6. Kevin Teague says:

    This is an example of the

    factory method pattern
    , and if _why had been writing in Java, the culture of that language might have driven him to create:

    HpricotFactoryImpl(my_document)

    I imagine if you are going to break the rule of methods starting with lowercase, then a factory method would be the place to do it. Or perhaps _why was just wishing that we was writing in Python when he used the ‘ClassName(constructor_args)’ syntax.

  7. A couple things:
    Greg: thanks for writing stuff like this. It’s always interesting to me that more developers (particularly in the ruby community) don’t dig deep into stuff like this. I recall spending weeks last year pouring over why’s camping library trying to figure out all the little tricks he used.
    Yes, these kinds of tricks can make for obtuse code in some cases, but then take a look an actual camping example and it’s quite concise and legible. Even more so to my eyes than, say, a Rails or Django app.
    As for the “turtles all the way down” thing, I have to agree with Benjamin that it sounds like you might have mistaken the meaning of that statement. That said, I feel the need to point out Rubinius — a very clever attempt to get at Ruby implemented in Ruby (inspired by Smalltalk, of course). And, it *will* still support this kind of syntax.

  8. KeithB says:

    Which leaves us with the ability to write two snippets of code that, while they may look nearly the same, do very different things:

    Ouch! This is not a help to the reader of the code. In fact, this is the sort of thing which makes reading someone else’s Perl sutch an adventure.
    There’s a balance to be struck here: I dread working with those programmers who get wrapped up in an infinite regression of wanting to “know how it works” down to the metal before they’ll do anything with a line of code; on the other hand I don’t have much time for those who want to understand every line of code in front of them without ever doing any thinking either.
    It’s probably OK (although not desirable) to have to read one level down to see what a method means, so it’s not so bad that this technique produces opaque usage as such, but still: laying a trap for the reader (which is what “look nearly the same, do very different things” amounts to) seems very wrong.

  9. I’d just like to point out that doc = Hpricot.parse(xml) is also available. I prefer to use it over the doc = Hpricot(xml) syntax, myself. In fact, the former is called by the latter anyway.
    I’ve written up two examples of Hpricot use over on my blog, if you’d like to have a look and vote on which style you’d be more inclined to write, and would prefer to read.

  10. riffraff says:

    just for your information: some of this methods are in the core ruby too, I remeber Integer(),FLoat(),String() and Array() 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *