Nick Sieger: Visualization of Ruby's Grammartag:blog.nicksieger.com,2005:TypoTypo2007-08-31T16:32:08+00:00Anonymous Cowardurn:uuid:4ce3aee3-2dfa-46b8-8743-5189dea39f962007-01-29T23:06:00+00:002007-08-31T16:32:08+00:00Comment on Visualization of Ruby's Grammar by Anonymous Coward<p>Yes, what Brendan said. The biggest difference between these graphs is that some of them, like ANSI C, have straight lines of nodes, and some, like JavaScript, have loopy-looking “chains” involving about twice as many nodes. However, this is totally an artifact of the particular grammar used by the artist, and it doesn’t have anything to do with the languages C and JavaScript themselves. (That is, you could easily rewrite the C grammar to produce loopy chains, or the JavaScript grammar to produce straight lines.)</p>
<p>The culprits are JavaScript’s foo, fooNoIn, fooTail, and fooNoTail productions, where C just has foo.
Rewriting the grammar would get rid of those productions, but it would add complications to the yacc file that don’t show up in the diagram.</p>
<p>In short, the diagrams may look pretty to some people (not me), but they don’t say anything about the underlying language standards, and they don’t even give much useful information about the artist’s particular grammar files.</p>Brendan Eichurn:uuid:db8fb158-a809-479f-943c-0e060cf65dbe2006-12-22T22:42:58+00:002007-08-31T16:32:08+00:00Comment on Visualization of Ruby's Grammar by Brendan Eich<p>occam: long paths simply mean precedence hierarchy, not “red tape” that obstructs programmers. Expressions using the minimal parenthesization work as in C (and Java, and …).</p>
<p>Another wrinkle that does not make red tape for users: the “noIn” parameterization of productions used by ECMA-262, to forbid “in” expressions in the head of “for” loops. This forks the expression grammar, and uglies up the graph considerably, but again it is no hardship for users.</p>
<p>I’ve noted in keynotes about JavaScript this year that I was under orders from Netscape management to “make it look like Java” in 1995. If I had my druthers, it would have looked more like Self, Logo, Smalltalk, or even HyperTalk. In that elseworld, the graph for JS’s grammer would be simpler, for sure.</p>
<p>Many languages claim C as an ancestor; lots of hackers know (most of) C’s syntax (and gcc nags us to overparenthesize when we mix bitwise and equality ops, since the precedence is misordered in that part of the grammar).</p>
<p>JS is one such C-derived language. Nothing much to do about it at this point. It has its pluses and minuses.</p>
<p>/be</p>Joeurn:uuid:d459a789-dc20-4b60-b479-f853242ebe4c2006-11-28T20:18:10+00:002007-08-31T16:32:07+00:00Comment on Visualization of Ruby's Grammar by Joe<p>And Tcl?</p>occamurn:uuid:9a5d68d3-7161-477c-999b-c590412f52552006-11-26T07:52:03+00:002007-08-31T16:32:07+00:00Comment on Visualization of Ruby's Grammar by occam<p>With the Ruby graph, it seems to me that the congested primary means that there’s less context to remember about using various items in the language. So, more of the language is a “first class citizen”. In other words, you can do almost anything from anywhere in Ruby which is probably why it seems so elegant and powerful (and concise).</p>
<p>Javascript seems very ordered graphically, but also very obstructionist since there are some very long paths to get to certain items. Lots of order, but lots of red tape too.</p>
<p>Java seems like somewhere in the middle.</p>
<p>That’s my first intuition from these graphs.</p>werturn:uuid:6f627a2f-2f69-4a48-af66-882ae10e4ab12006-11-26T00:24:17+00:002007-08-31T16:32:06+00:00Comment on Visualization of Ruby's Grammar by wert<p>“But over time we seem to have become more accepting of this degree of complexity,”</p>
<p>No it is just that knowledgeable people are not heard anymore in all the noise that the fan-boys make.
Take for example ruby it is a total piece of shit but only because there is a lot of noise about it do people think it is good.</p>ericurn:uuid:e1fc46cd-2fb9-4788-a2d2-37b859c5f5fe2006-11-24T16:03:08+00:002007-08-31T16:32:06+00:00Comment on Visualization of Ruby's Grammar by eric<p>NICK, could you also do ada 2005, D, and possibly a C++ to compare with your c??:) thanks man, hope to see D and Ada 2005 especially:D!!</p>Julesurn:uuid:57644998-18e3-4e13-8004-512c51d542d92006-11-24T15:20:31+00:002007-08-31T16:32:06+00:00Comment on Visualization of Ruby's Grammar by Jules<p>jer wrote: The simple fact that they have quotes for string literals or whatnot is just sugar and not strictly required.</p>
<p>Doh, but it’s in the language so it counts as extra syntax. You cannot implement string literals from inside the Io language (not 100% sure though).</p>Harald Korneliussenurn:uuid:8cd90bec-026e-4339-a4d8-7c4ae6b580202006-11-24T07:04:54+00:002007-08-31T16:32:06+00:00Comment on Visualization of Ruby's Grammar by Harald Korneliussen<p>I wonder what the Ada 2005 syntax tree would look like… or one of the newer Fortrans, or C++. Interesting that Ruby’s grammar isn’t LALR(1)/LL(1). That confirmed my suspicion that modern languages aren’t always. I think the first language I came across that made me wonder “Won’t this require a rather more complex parser?” was SML. But for all I know I was wrong there.</p>jerurn:uuid:0dfe68ad-8978-4f7b-a602-f691d19341fb2006-11-24T00:32:55+00:002007-08-31T16:32:06+00:00Comment on Visualization of Ruby's Grammar by jer<p>Jules wrote: Io has more syntax than that (literals for example), and so has Lisp.</p>
<p>Literals in Io are messages just like any other, they’re not syntactically special. The simple fact that they have quotes for string literals or whatnot is just sugar and not strictly required.</p>tdphuc@centrum.czurn:uuid:639ca6d3-4b78-4ca9-86c4-dd6672162aeb2006-10-31T18:38:11+00:002007-08-31T16:32:06+00:00Comment on Visualization of Ruby's Grammar by tdphuc@centrum.cz<p>download</p>Robert Feldturn:uuid:d260cb65-e074-4b8b-a2f0-3c3029f2be7a2006-10-30T09:45:19+00:002007-08-31T16:32:06+00:00Comment on Visualization of Ruby's Grammar by Robert Feldt<p>Very nice visualisations!</p>
<p>One thing that should be noted about Ruby’s grammar is that the lexer is (mildly) context-dependent. So the full complexity of the grammar(s) is not captured in these diagrams. Also this is why the Ruby grammar is not LALR(1)/LL(1) or some other of the simple grammar classes.</p>rueurn:uuid:b0369521-e47b-4466-886e-c67da1b61d642006-10-30T04:47:41+00:002007-08-31T16:32:06+00:00Comment on Visualization of Ruby's Grammar by rue<p>Ed, take a peek at <a href='http://xruby.com/default.aspx' rel="nofollow">http://xruby.com/default.aspx</a>. He supposedly has a full Ruby grammar written for ANTLR.</p>Nickurn:uuid:c956690c-8814-4273-a971-ac53ab96bb772006-10-29T22:48:45+00:002007-08-31T16:32:05+00:00Comment on Visualization of Ruby's Grammar by Nick<p>Rather than spend more time generating however many formats, I’ve uploaded the .dot files, so you can use GraphViz to generate whatever format you’d like.</p>
<p><a href='/files/ruby.dot' rel="nofollow">ruby.dot</a> <a href='/files/c.dot' rel="nofollow">c.dot</a> <a href='/files/java.dot' rel="nofollow">java.dot</a> <a href='/files/ecmascript.dot' rel="nofollow">ecmascript.dot</a> <a href='/files/python.dot' rel="nofollow">python.dot</a></p>
<p>As Bil said, the originals are big enough to read the text, but possibly not hi-res enough for printing, if you want to do such a thing. Generating a pdf or svg from the dot file should serve you well in that respect.</p>Bil Kleburn:uuid:f6553d0b-65bb-4d19-a2e3-580f384ca4572006-10-29T21:00:29+00:002007-08-31T16:32:05+00:00Comment on Visualization of Ruby's Grammar by Bil Kleb<p>JD and Robin: follow the image links to Flickr, and you’ll find hi resolution originals. –Bil</p>Ed Boraskyurn:uuid:4d07659f-a65b-47d8-b05a-0e132a5340c92006-10-29T17:39:28+00:002007-08-31T16:32:04+00:00Comment on Visualization of Ruby's Grammar by Ed Borasky<p>That is interesting … it seems like the existence of an IDE for Antlr grammars would be a good reason to start using Antlr as the tool set of choice for Ruby implementations … anyone have an contrasting opinion?</p>
<p>You might want to post this to the ruby-core and YARV mailing lists.</p>JDurn:uuid:b5bb171f-9c84-4c6a-887f-9fb7a5b6fad42006-10-29T14:19:34+00:002007-08-31T16:32:03+00:00Comment on Visualization of Ruby's Grammar by JD<p>Oh yeah and chiming in with robin… making these already made graphs available in some format like SVG, PDF …etc – where zooming in to get a clear read on the textis possible – would be a great addition to already great work… </p>
<p>– JD</p>JDurn:uuid:3b28874b-34cc-4c16-93d8-c7fd48f7d07c2006-10-29T14:16:37+00:002007-08-31T16:32:03+00:00Comment on Visualization of Ruby's Grammar by JD<p>Interesting stuff. Being a comp.lang hobbyist, I’m curious at seeing the EBNF, C++, C#, and myriad of other language grammars that exist graphed out this way. (Yes I mean EBNF as language grammar itself)… Also I agree with the previous poster who indicated that “just because it’s complex doesn’t mean it’s inherently bad” (paraphrasing)</p>
<p>later</p>
<ul>
<li>JD</li>
</ul>Robinganemccalla@gmail.comurn:uuid:4d402fd7-d7a8-4af0-9c03-2d33d90d9ba22006-10-29T13:07:10+00:002007-08-31T16:32:03+00:00Comment on Visualization of Ruby's Grammar by Robinganemccalla@gmail.com<p>Is there any way I can get a higher resolution version of the image? It looks nice but I can’t actually see the text.</p>Michael Shigorinurn:uuid:dc7ffae3-5a74-4cf9-a274-54060d803bf32006-10-29T12:21:55+00:002007-08-31T16:32:03+00:00Comment on Visualization of Ruby's Grammar by Michael Shigorin<p>Thanks, rather interesting!</p>ACurn:uuid:cdd6bfe6-a332-446f-a613-c87ac079aa152006-10-29T09:08:27+00:002007-08-31T16:32:02+00:00Comment on Visualization of Ruby's Grammar by AC<p>Really FOOLISH.
Know and think about “pseudo-simplicity”</p>Julesurn:uuid:8a41fd4c-ce90-4fdf-9ed4-dbdeb4833f2c2006-10-28T20:42:55+00:002007-08-31T16:32:02+00:00Comment on Visualization of Ruby's Grammar by Jules<p>Io has more syntax than that (literals for example), and so has Lisp.</p>
<p>Forth’s syntax is the simplest: words. Literals are handled from inside the language.</p>quagurn:uuid:9ffb6f38-1a85-4b2f-9d71-71f1cb6fa8c22006-10-28T19:45:40+00:002007-08-31T16:32:02+00:00Comment on Visualization of Ruby's Grammar by quag<p>This is a little bit of a joke. I think the equivalent graph for Io is <a href='http://www.quag.geek.nz/io/message.png' rel="nofollow">http://www.quag.geek.nz/io/message.png</a> . I imagine that lisp, scheme and a few other languages would have similar graphs.</p>Bruce Perensurn:uuid:e802a5c8-a66b-4969-8a46-ca4fae6348cf2006-10-28T19:10:04+00:002007-08-31T16:32:01+00:00Comment on Visualization of Ruby's Grammar by Bruce Perens<p>The centrality of the primary node in the parser indicate that it is complex and multivalent. I think the main implication of that is that the lexical analyzer does a lot of work on that node that does not appear in the parser.</p>
<p>Language gurus used to prefer languages that parsed simply - for example ones that were pure LR(1). It surely makes it easier to explain how the language is parsed. Ruby’s 8000-line parser would offend language purists, if there were any left. But over time we seem to have become more accepting of this degree of complexity, and nobody seems to care greatly about explaining how Ruby is parsed as long as it works.</p>
<p>Then again, Ruby doesn’t seem to have confusion-producing elements like C’s declarations. Certainly the parser doesn’t get in the way of the programmer.</p>
<p>Bruce</p>philurn:uuid:76e6992b-67b2-4b9b-805d-54e1796c01ed2006-10-28T17:56:23+00:002007-08-31T16:32:00+00:00Comment on Visualization of Ruby's Grammar by phil<p>“Bison-to-ANTLR converter” - had no idea that such a beast existed. Seems that there was a project to convert Ruby’s yacc parser to ANTLR, does this mean that it’s trivial to convert the Ruby parser over to ANTLR?</p>
<p>The ultimate goal of many of these projects is to be able to parse Ruby in Ruby - really looking forward to that.</p>Giles Moranturn:uuid:99428e58-2df4-45d9-be78-9bb6311fb79a2006-10-28T10:53:00+00:002007-08-31T16:32:00+00:00Comment on Visualization of Ruby's Grammar by Giles Morant<p>The graphs for C and Python are very different to the ECMAscript, Ruby and Java.</p>
<p>To be honest though, I’m not sure what I’m looking at.</p>
<p>Is the ideal to be lots of many-linked items? Fewer items? Long lines of items?</p>Nick Siegerurn:uuid:0aaea16b-fda8-4754-a306-329aa9187d872006-10-27T16:48:00+00:002007-08-31T16:31:57+00:00Visualization of Ruby's Grammar<p>As part of the momentum surrounding the <a href="http://on-ruby.blogspot.com/2006/10/rubyconf-2006-implementers-summit.html">Ruby implementer’s summit</a>, I have decided to take on a pet project to understand Ruby’s grammar better, with the goal of contributing to an implementation-independent specification of the grammar. Matz mentioned during his keynote how <a href="/articles/2006/10/22/rubyconf-matz-keynote">parse.y was one of the uglier parts of Ruby</a>, but just how ugly?</p>
<p>Well, judge for yourself. Below is a grammar dependency graph generated using <a href="http://www.antlr.org/works/index.html">ANTLRWorks</a> and <a href="http://www.graphviz.org/">GraphViz</a>. The steps I took are as follows. I took parse.y, stripped all C definitions, code and actions from it to give a bare YACC definition. Next, I did the equivalent of <code>gsub(/[kt]([A-Z]+)/, '1')</code> (since ANTLR’s convention is to have lexer tokens named starting with a capital letter). I then used the <a href="http://www.antlr.org/share/list">Bison-to-ANTLR converter</a> to generate an ANTLR 2.x grammar, which I hand-modified to produce a v3 grammar. Opening the resulting grammar in ANTLRWorks allows you to generate a DOT file from which GraphViz can then generate a jpeg image. I’ve also included visualizations of the Java 1.5 and Javascript (ECMAScript) grammars for comparison.</p>
<p>I haven’t even begun to absorb all the meanings from this picture, but one stark difference between Ruby and the other two is the node in the middle of the picture with a high concentration of outgoing edges. That node is called <code>primary</code> in the grammar definition, and it is probably one of the reasons that Ruby syntax is so flexible and forgiving. A primary node’s direct children apparently represent a large portion of the syntax, and explain why in Ruby a single statement can either be a literal, a method invocation (or series of them), a standalone expression (such as <code>a < b</code>), all the way up to larger syntactic groupings such as <code>if ... else ... end</code> and <code>begin ... rescue ... end</code>, among many others.</p>
<h2>Ruby</h2>
<p><a href="http://www.flickr.com/photos/nicksieger/280661836/" title="Photo Sharing"><img src="http://static.flickr.com/93/280661836_e477a01932.jpg" width="500" height="290" alt="Ruby 1.8.4 grammar dependency graph" /></a></p>
<h2>Java 1.5</h2>
<p><em>Generated from <a href="http://antlr.org/grammar/1152141644268/java.g">Java 1.5 grammar on antlr.org</a></em></p>
<p><a href="http://www.flickr.com/photos/nicksieger/280662707/" title="Photo Sharing"><img src="http://static.flickr.com/119/280662707_5d335ac808.jpg" width="500" height="462" alt="Java 1.5 grammar dependency graph" /></a></p>
<h2>Javascript</h2>
<p><em>Generated from <a href="http://antlr.org/grammar/1153976512034/ecmascriptA3.g">ECMAScript grammar on antlr.org</a></em></p>
<p><a href="http://www.flickr.com/photos/nicksieger/280662871/" title="Photo Sharing"><img src="http://static.flickr.com/109/280662871_a53a2680ce.jpg" width="367" height="500" alt="ECMAScript (Javascript) grammar dependency graph" /></a></p>