Merb Mind Maps

By Phlip Plumlee
February 15, 2009 | Comments: 3

All blogs correlate their posts with tags. This blog post shows how to use these tags to display a mind map, hooking the current post into a tree of related posts.

Our output will look like this — a mind-map in the upper right corner of a blog post, with links to other posts:

        (Scroll right and down to see the full map!)

jammin_blog.png

Clicking any node should navigate to that post. The mind map will pull related Posts closer together, and push unrelated ones farther away.

(For our sample data, I have simulated a hypothetical blog that discusses unremarkable details of pop music.)

Ingredient List

We will use Merb, for the blog engine, RSpec to drive its behavior, and GraphViz to render the mind maps. If your blog uses no Merb or RSpec, you could easily apply these techniques to its infrastructure.

GraphViz provides a light language to declare graphs. Read its dotguide.pdf to learn all about it. Another good resource is An Introduction to GraphViz and dot, by Michele Simionato.

To start this experiment, I found a simple and easy Merb blog: wolfmanblog, by Jim Morris. This project would work with any blog (that has Posts and Tags!) because it uses very little of the rest of the blogging architecture, so you should start with whatever system you find easiest to install and work with.

To get you over the Merb learning curve, my marginalia will frequently compare it to Rails. To get you over the Ruby on Rails learning curve ... read any of the billion books, blogs, and job listings about it!

Our RSpec specifications will showcase assert{ 2.0 }, and its new XPath matching system, xpath{}. They allow very lean Ruby expressions to test raw XML in arbitrarily complex ways.

Our ingredient list also includes a little graph theory. Here is all of it, in a nutshell:

node
(or vertex) the "city" or "intersection" in a graph
edge
a link between two nodes
path
a route that steps from node to edge to node...
connected
a path exists between any two nodes
cycle
more than one path between any two nodes
tree
a connected graph with no cycles.

Our project starts by building a class to manage that graph:

class MindMap
  def initialize(post = nil)
    @current_post = post
  end
end

Merb works best with RSpec tests. This project needs a batch of Posts and Tags to work with, so we use fixture_dependencies to supply them to each specification.

Here's all of spec/fixtures/posts.yml:

C30_C60_C90_Go:
  title:     'C30, C60, C90, Go'
  body:      'press my playback to make it last'
  author:    'Bow Wow Wow'
  tags:    [ :new_wave, :british, :surf, :downbeat ]

Diamond_Dogs:
  title:     'Diamond Dogs'
  body:      'Ten thousand peoploids'
  author:    'David Bowie'
  tags:    [ :new_wave, :british, :rock, :downbeat ]

Jammin:
  title:     "Jammin'"
  body:      'in the name of the Lord'
  author:    'Bob Marley'
  tags:    [ :roots, :reggae, :back_beat, :music ]

Joan_Crawford_Has_Risen:
  title:     'Joan Crawford has Risen'
  body:      'From the Grave'
  author:    'Blue Oyster Cult'
  tags:    [ :progressive, :rock, :downbeat, :music ]

Lithium:
  title:     'Lithium'
  body:      'I light candles'
  author:    'Nirvana'
  tags:    [ :grunge, :rock, :downbeat, :music ]

One_Step_Beyond:
  title:     'One Step Beyond'
  body:      'Rocksteady'
  author:    'Madness'
  tags:    [ :ska, :back_beat, :music ]

Oryx_vs_Crake:
  title:     'Oryx and Crake'
  body:      'corknut!'
  author:    'Margaret Atwood'
  tags:    [ :book ]

Sardonicus:
  title:     'Sardonicus'
  body:      "is everybody's friend"
  author:    'UB40'
  tags:    [ :progressive, :reggae, :back_beat, :music ]

Surfin_USA:
  title:     "Surfin' USA"
  body:      'San Onofree'
  author:    'The Beach Boys'
  tags:    [ :surf ]

And the crucial yet redundant spec/fixtures/tags.yml:

back_beat:   { name: back_beat   }
book:        { name: book        }
british:     { name: british     }
downbeat:    { name: downbeat    }
grunge:      { name: grunge      }
music:       { name: music       }
progressive: { name: progressive }
music:       { name: music       }
new_wave:    { name: new wave    }
reggae:      { name: reggae      }
rock:        { name: rock        }
rocksteady:  { name: rocksteady  }
roots:       { name: roots       }
ska:         { name: ska         }
surf:        { name: surf        }

They require this warm-up code, for a more Rails-like fixture experience:

require 'fixture_dependencies/test_unit/sequel'
require 'fixture_dependencies/sequel'
FixtureDependencies.fixture_path = Merb.root + '/spec/fixtures'
  # ^ from spec/spec_helper.rb

require File.join( File.dirname(__FILE__), '..', "spec_helper" )
require 'assert2/xpath'

describe MindMap do

  setup do
    FixtureDependencies.load(:posts)
    FixtureDependencies.load(:tags)
       #  note that all specs will still work
  end  #  even if we add new fixtures

  def load(model, symbols)
    symbols.map!{|symbol| :"#{model}__#{symbol}" }
    return FixtureDependencies.load(*symbols)
  end

  def posts(*symbols);  load(:post, symbols);  end
  def tags(*symbols);   load(:tag,  symbols);  end

Note that some fixture systems only load into each specification the exact set of database records that it uses, and some systems use the more primitive functionality that Ruby on Rails enjoys: They dump every fixture into every test case. This can cause fragility when adding new fixtures that old tests do not expect.

Our Merb Mind Maps project will use the Rails-like system, partly to avoid excessive clutter in each specification, and partly to illustrate how to write assertions that flex gracefully when new fixtures appear.

Minimum Spanning Tree

This project will draw a graph of Posts, with edges between them representing their tags in common. Posts with more tags in common will be closer together, so the most basic method is the one that takes two Posts and counts how many Tags they share:

  it 'should rate two posts by affinity' do
    map  = MindMap.new
    jam  = posts(:Jammin)
    sard = posts(:Sardonicus)
    lith = posts(:Lithium)
    oryx = posts(:Oryx_vs_Crake)
    map.affinity(jam, sard).should == 3  #  reggae, back_beat, & music
    map.affinity(jam, lith).should == 1  #  music
    map.affinity(jam, oryx).should == 0  #  a book!
  end

  class ::MindMap
    def affinity(from, to)
      (from.tags & to.tags).length
    end  #  the & is Ruby's set intersection operator
  end

The next specification selects the song "Jammin'", by Bob Marley, for the "current Post". Reggae songs will have higher affinity to "Jammin'" than Rock songs (with a different beat structure).

Our sample data have one Post with no affinity at all — a book! It has no tags in common with "Jammin'".

Now the fun starts. If we rendered this list of Posts as a graph, with an edge for each positive affinity, we would not get a mind map. We would get a "mind mesh", with too many edges!

We need to cull all those lines, so each remaining edge represents the highest value edge from one Post to another. The previous specifications have already started Prim's Algorithm for Minimum Spanning Tree, by sorting the Posts by affinity. The next step will make a list of the value of every edge.

We need an array that can sort correctly, so each element is a sub-array containing the sort criteria. We start with a specification for the method that converts two Posts into one element of that array:

  it 'should produce elements of arrays sortable by affinity' do
    jam  = posts(:Jammin)
    map  = MindMap.new(jam)
    one  = posts(:One_Step_Beyond)
    joan = posts(:Joan_Crawford_Has_Risen)
    oryx = posts(:Oryx_vs_Crake)
    jam_to_one  = map.affinity_edge(jam,  one)
    jam_to_joan = map.affinity_edge(jam, joan)
    joan_to_one = map.affinity_edge(joan, one)
    jam_to_oryx = map.affinity_edge(jam, oryx)

                       #   +----- 0 == this edge links to the current Post
                       #   |   +-- "cost" of this edge
                       #   |   |   +-- Posts in this edge
                       #   v   v   v
    jam_to_one .should == [0, -2, jam,  one ]
    jam_to_joan.should == [0, -1, jam,  joan]
    joan_to_one.should == [1, -1, joan, one ]
    jam_to_oryx.should == [0,  0, jam,  oryx]
  end  #  sorting an array of those will find the maximum value tree

  class ::MindMap
    def affinity_edge(from, to)
      return [ from == @current_post || to == @current_post ? 0 : 1,
              -affinity(from, to),
               from, to ]
    end
  end

(Parenthetically, this post is beginning to show the lower limits of the BDD concept. The verbiage "elements of arrays sortable by affinity" is not customer-facing, so painstakingly expressing it as a complete English sentence might be a Pyrrhic Victory!)

The next method must return an array of all of those affinity_edges. No edge must duplicate — even by reversing its [from, to] components. And we cull out edges with no affinity — Post pairs with no Tags in common.

To make this bookkeeping easier, we will query the Posts from the database in order by title. This is an artificial constraint that won't affect the output.

  it 'should produce elements of arrays sortable by affinity' do
    jam   = posts(:Jammin)
    map   = MindMap.new(jam)
    one   = posts(:One_Step_Beyond)
    joan  = posts(:Joan_Crawford_Has_Risen)
    oryx  = posts(:Oryx_vs_Crake)  #  the vs is an inside joke... (-:
    edges = map.affinity_edges
    edges.length.should == edges.uniq.length
    assert{ [0, -2, jam,  one ].in?(edges) }
    assert{ [0, -1, jam,  joan].in?(edges) }
    pairs = edges.map{|c,a,f,t| [f,t]}
    assert{  [joan, one ].in?(pairs) }
    assert{ ![jam,  oryx].in?(pairs) }  #  no edge!
    assert{ ![one,  jam ].in?(pairs) }  #  not unique!
    assert{ ![joan, jam ].in?(pairs) }
    assert{ ![one,  joan].in?(pairs) }
  end  #  note that an "assert not" must relax more than an assert!

  class ::Post
    def inspect;  "<#{title}>";  end
  end  #  this makes diagnostics easier to understand

  class ::MindMap
    def affinity_edges
      posts = Post.order(:title).all  
        #  like ActiveRecord Post.find(:all, :order => 'title')
        
      returning [] do |edges|
        posts.each_with_index do |from, index|
          posts[index + 1 .. -1].each do |to|
            a = affinity_edge(from, to)
            edges << a if a[1] != 0
          end
        end
      end
    end
  end

Now that we have an array ready for sorting, we just need to sort it! Riiight??

  it 'should produce elements of arrays sorted by affinity' do
    jam, sard, one, joan = posts( :Jammin, :Sardonicus, :One_Step_Beyond,
                                  :Joan_Crawford_Has_Risen )
    map   = MindMap.new(jam)
    edges = map.sorted_affinity_edges
    jam_to_sard  = edges.index([0, -3, jam,  sard])  #  both are reggae
    jam_to_one   = edges.index([1, -2, jam,  one ])  #  both have a back beat
    jam_to_joan  = edges.index([1, -1, jam,  joan])  #  both are music
    joan_to_sard = edges.index([1, -2, joan, sard])  #  both are progressive
    jam_to_sard .should == 0
    jam_to_sard .should < jam_to_one
    jam_to_one  .should < jam_to_joan
    joan_to_sard.should < jam_to_joan
    edges.should == edges.sort
  end

That specification grew a little bit more complex. When we use these edges to build a graph, the first edge must link from our current Post to its closest neighbor post. That's very simply because our "mind mesh" might contain many disconnected sub-graphs, with no edges between them. (Think of a blog that alternates between discussions of programming and cartooning, for example!) If we pick the first node from the wrong subgraph, then our output tree would not contain our current Node!

However, if we leave more than one 0 in the first column of that table, our tree would go star-shaped. Low-value edges would appear in the graph, merely because they connect to the current Post. And high-value edges would get culled when they form cycles with low-value paths to the current Post.

We need a tree that shows how to wander around in our blog; a tree that emphasizes the clusters of topics.

In our specific example, if the first column in our table remained 0, then a low-value edge from "Jammin'" to "Joan Crawford Has Risen", for example, would trump a high-value edge, such as from "Lithium" to "Joan...". We need a graph that shows the paths away from the current Post to others, so we are going for a tree, with a trunk and branches.

To satisfy this requirement, the sorter will sort once, then remove all the 0s in the table's left column after the first one, then sort again. This technique preserves the current Post's supremacy in the graph, while removing this bias from any other edge.

Got all that? (-:

  class ::Post
    def <=> other
      self.title <=> other.title
    end  #  a tie breaker...
  end

  class ::MindMap
    def sorted_affinity_edges
      edges = affinity_edges.sort
      edges[1..-1].each{|edge| edge[0] = 1 }
      return edges.sort
    end  #  that wasn't too heinous, now was it?
  end

The first sort found the "trunk" — the highest value edge from the current Post. The second sort disillusioned every other Post away from abject worship of the current Post.



  

That adjacency table now looks a little bit like this:

  [[0, -3, <Jammin'>, <Sardonicus>],
   [1, -3, <Joan Crawford has Risen>, <Lithium>],
   [1, -2, <Jammin'>, <One Step Beyond>],
   [1, -2, <Joan Crawford has Risen>, <Sardonicus>],
   [1, -2, <One Step Beyond>, <Sardonicus>],
   [1, -1, <Jammin'>, <Joan Crawford has Risen>],
   [1, -1, <Jammin'>, <Lithium>],
   ...
   [1, -1, <Joan Crawford has Risen>, <One Step Beyond>]]

Prim's algorithm now commands us to start with the highest value edge (the "minimum cost"), and then add every edge in descending order of value to a new tree, only if one end is in the new graph and the other end is outside. We need a function that takes an adjacency table and returns this culled list. Any Posts that find no path to "Jammin'" will drop out, and any low-value edges within the neighborhood of "Jammin'" will drop out.

The return value is a list of pairs of Posts, representing each edge that survived the culling. We will assert that our high-value edges remained, and low value ones disappeared.

As usual, for a convenience, each pair appears in collating order by title.

  it 'should cull low-value Posts and edges' do
    jam, sard, one, joan = posts( :Jammin, :Sardonicus, 
                                  :One_Step_Beyond,
                                  :Joan_Crawford_Has_Risen )
    map = MindMap.new(jam)
    pairs = map.cull
    assert{  [jam, sard].in?(pairs) }  #  essentially the trunk!
    assert{ ![jam, joan].in?(pairs) }  #  because joan is closer to other nodes
    assert{ ![one, sard].in?(pairs) }  #  to avoid a cycle with Jammin...
  end

  class ::MindMap
    def cull
      table = sorted_affinity_edges
      posts = {}

      returning [] do |edges|
        table.map do |prime, weight, from, to|
          if prime == 0 or posts[from.title] != posts[to.title]
            posts[from.title] = true
            posts[  to.title] = true
            edges << [from, to]
          end
        end  #  Prim's minimum spanning tree algorithm
      end
    end
  end

Our mind map has been processed all the way from a cloud of Posts and Tags into an array of objects. Inspecting them yields a structure like this:

  [[<Jammin'>, <Sardonicus>],
   [<Jammin'>, <One Step Beyond>],
   [<Joan Crawford has Risen>, <Sardonicus>],
   ...
   [<Jammin'>, <Lithium>]]

Now we will get to see that table rendered as a graph.

To the GraphViz-Mobile!

A healthy dot file looks a little bit like this:

  graph mind_map {
     graph [bgcolor = "transparent"];
     post_3767 [label = "Jammin'", shape = box];
     post_3768 [label = "Sardonicus", shape = box];
     ...
     post_3767 -- post_3768;
     post_3767 -- post_3770;
   }

The first line with post_3767 is the current Post, with its label and shape, and the lower ones are the edges -- between Posts.

We will start by generating the list of Posts:

  it "should create a graph containing the current Post's title" do
    post = posts(:Jammin)
    map = MindMap.new(post)
    dot = map.to_dot
    dot.should match(/graph mind_map \{/)
    dot.should match(/post_#{post.id}.*label = .#{post.title}/)
  end

That specification forces us to push the current Node into DOT notation. We will use the GraphvizR gem:

  class ::MindMap
    def to_dot
      require 'graphviz_r'
      gvr = GraphvizR.new('mind_map')
      post = @current_post
      gvr["post_#{ post.id }"][:label => post.title, :shape => :box]
      return gvr.to_dot
    end
  end

The line with gvr[...][...gvr] pushes one node in.

Close inspection reveals that code only emits one Post. Our new specifications should never demand more than a tiny increment of new behavior. The next specification will look at more than one Post. We must specify that the usual suspects show up, and the culled ones do not:

  it 'should push related Posts into graph nodes' do
    suspects = posts( :Jammin, :Sardonicus, :One_Step_Beyond,
                      :Joan_Crawford_Has_Risen, :Lithium )

    dot = MindMap.new(suspects.first).to_dot

    suspects.each do |post|
      dot.should match(/post_#{ post.id }.*label.*#{ post.title }/)
    end
    
    dot.should_not match(/Oryx and Crake/)
  end

That specification requires our code to figure out the difference between culled Posts ("Oryx and Crake") and the musical ones. The simplest way to do that is call our cull() method!

  class ::MindMap
    def to_dot
      require 'graphviz_r'
      gvr = GraphvizR.new('mind_map')
      edges = cull()

      edges.flatten.uniq.each do |post|
        gvr["post_#{ post.id }"][:label => post.title, :shape => :box]
      end

      return gvr.to_dot
    end
  end

Rendering that beast gives:

unlinked.png

The edges are missing! When we push them in, GraphViz will take care of the layout and formatting.

  it 'should link Posts by maximum affinity' do
    jam, sard = posts(:Jammin, :Sardonicus)
    dot = MindMap.new(jam).to_dot
    dot.should match(/post_#{ jam.id } -- post_#{ sard.id }/)
  end

Notice that both our specifications and our code force that pair's elements to appear in collating order. "Sardonicus" will never precede "Jammin'".

Without that constraint, adding more fixtures might affect the sorting algorithm — even if it still worked correctly. Our specifications would report a false negative.

Fixture Friction

By making our algorithm a little more strict, our specifications can remain more flexible. No matter how many fixtures we add to our specifications, for other features, these assertions will still pass.

Further, assert calls should be very strict, and deny calls (or assert not calls) should be very loose. A deny assertion that requires [1, -2, <foo>, <bar>] to never appear should not pass, and generate a false positive, if [1, -3, <foo>, <bar>] appears!

This post answers a very important FAQ in the TDD community: How do you test-first that your server is secure? or your algorithm is efficient? or your View is beautiful? The answer is you do not: You TDD the details of a published algorithm, and only specify that your implementation will remain on-track.

These techniques ensure that, as you change your program, your algorithm will resist bugs. Your server will remain secure, and your GUI beautiful, even if you add more data fixtures to your tests. If a new fixture makes a test fail, you have learned something valuable about your implementation!

This code passes the last specification:

  class ::MindMap
    def style_node(post)
      return { :fontsize => post == @current_post ? '12' : '9',
            :peripheries => post == @current_post ?  '2' : '0',
                  :color => :white, 
                  :label => post.title,
                  :shape => :box,
                  :style => :filled,
                  :width => '0',
                 :height => '0',
                  }
    end  #  BDD-ing this method directly would be easy... it decouples from GraphvizR

    def to_dot
      require 'graphviz_r'
      gvr = GraphvizR.new('mind_map')
      gvr.graph[:bgcolor => 'transparent']
      edges = cull()

      edges.flatten.uniq.each do |post|
        def post.dot_id; "post_#{ id }"; end
        gvr[post.dot_id][ style_node(post) ]
      end

      edges.each{ |from, to|  gvr[from.dot_id] - gvr[to.dot_id]  }
      return gvr.to_dot                #  that ^ declares a GraphViz edge!
    end
  end

That produces a graph very similar to the one at the top of this post. Now we must render it as a PNG file, and stick it into our View.

To specify our MindMap can produce PNG files, we will first configure the GraphViz dot command to produce SVG files. We specify these using XPath to query internal signatures, such as:

  <g id="edge2" class="edge"><title>post_4193--post_4196</title>

dot adds invisible metadata to the SVG, so if we can spot one of them, then dot is doing its job:

  it 'should generate SVG' do
    posts = posts(:C30_C60_C90_Go, :Lithium)
    image = MindMap.new(posts.first).to_image(:svg)
    path  = Pathname.new(Merb.root) + "public" + image
    assert_xhtml path.read
    
    assert do
      xpath :"g[ @class = 'edge' ]/title[ contains(., 'post_#{posts.first.id}') and contains(., 'post_#{posts.last.id}') ]"
    end
  end

(The fun stuff with contains() happened because dot's XML module mangled its innocent -- dashes, and I don't feel like researching a charismatic and blog-worthy system to unmangle them. contains() is a very good generic technique for XPath testing, so we will illustrate that instead.)

  class ::MindMap
    def to_image(format)
      image = "images/post_#{@current_post.id}.#{format}"
      path = Pathname.new(Merb.root) + "public" + image
      command = "dot -T#{format} -o #{path.to_s.inspect}"
      IO.popen(command, 'w'){|f| f.write(to_dot) }
      return image
    end
  end

View the Finished Mind-Map

The last mile of the marathon is in sight! Now all we need is a specification that forces our image to appear in an output page...

  it 'should decorate the Post page with a mind map' do
    jam_id = posts(:Jammin).id
    @response = request("/posts/#{jam_id}")
    assert_xhtml @response.body

    xpath :div, :class => :post do
      xpath :div, :style => 'clear:right; float:right;' do
        xpath :img, :src => "/images/post_#{jam_id}.png"
      end
    end
  end

That specification inspires us to add our mind map to the upper left corner of a post.

Forcing that PNG file to be a real mind map generated by MindMap is left as an exercize for the reader!

Here's the new action code in the controller:

class ::Posts < Application

  # GET /posts/:id
  def show
        provides :rss
    @post = Post[params[:id]]
    @image = MindMap.new(@post).to_image(:png)
    raise NotFound unless @post
    display @post
  end
  
end

And here's the new View code (in glorious HAML):

 .post
    %div{ :style => 'clear:right; float:right;' }
      %img{ :src => '/' + @image }
    %h2= @post.title

It looks more familiar as its HTML output:

        <div class='post'>
           <div style='clear:right; float:right;'>
             <img src='/images/post_3408.png' />
           </div>
          <h2>Jammin'</h2>
          <p class='auth'>Posted by Bob Marley <span class='typo_date'>on Fri Feb 13 20:21:27 -0800 2009</span></p>

Do List

The tools we invented should make these minor details very easy to specify and code:

  • No reusable helper class is complete without a gem project and an acts_as_mind_map directive...
  • MindMap#to_mind is overdue for a merciless Object Method Refactor.
  • The code currently rebuilds a map each time you look at a blog. They need the usual time-stamp system, so a busy website can efficiently re-use them.
  • Blogs with too many posts need a way to limit the total neighborhood of the current Post. (The alternative — blogs without too many posts — is the subject of an upcoming blog post;)
  • To put that drop-shadow on the graph, run it thru this jaw-breaker ImageMagick command: convert post_3408.png \( +clone -background black -shadow 75x3+4+5 \) +swap -background transparent -layers merge +repage post_3408_shadow.png
  • If we can build mind-maps by writing blog posts, then writing blog posts that author mind-maps is right around the corner. Consider delivering a map as SVG, and using its metadata in Ajax commands, to edit it directly in the surface of your website.

All blogs use tags (and sometimes tag clouds) to link their posts, so any blog could easily take advantage of these techniques, to turn their existing tags into a mind map!


You might also be interested in:

3 Comments

BTW I forgot the shim that connects RSpec to traditional Test::Unit::Assertions. It be:

Spec::Runner.configure do |c|
c.include Test::Unit::Assertions
end

I liked the rails, tags and mindmap combination. Would be cool if you put the code up on github or somewhere so readers could easily play with it.

The thing about productization is I don't have a blog, so I wouldn't know if the system installed correctly.

So consider me pinged!

News Topics

Recommended for You

Got a Question?