ttdot: Drawing Trees in Dot

1. Introduction

ttdot (tree-to-dot) is a preprocessor for the dot graph description and visualization language. It takes an input that describes a rooted tree in a form where node branching is represented by indentation, and produces a description of that tree in dot. The result can be transformed to a graphics file using graphviz, or it can be further developed as a dot file.

Binary trees are recognized as a special kind of rooted trees and are processed accordingly.

The dot language has no means of distinguishing a tree from a graph in general. Because of that, a description of a tree in the language is necessarily redundant, each node being repeated for each edge that links it to another node. Besides, not knowing, in general, that it draws a rooted tree means that graphviz will not necessarily produce a visually symmetric and otherwise specific diagram which we are used to with regard to trees.

Actually, the dot layout processor of graphviz is smart enough to automatically detect if a graph is indeed a tree and is very good at finding a space-optimal layout, which is indispensable for really large trees. However, in many practical cases one may be willing to trade space-minimality for legibility and other benefits of aesthetic character.

Binary trees are particularly problematic for dot, because a node with no siblings must be specifically identified as a left or right descendant of its parent, which dot as a language does not provide for, and which cannot be derived automatically.

The only way that graphviz can be guaranteed to produce a rooted tree drawing with a layout that abides with certain requirements is to compute the node coordinates oneself and include them in the dot file that is feeded to graphviz. Of course, computing coordinates must still be done automatically.

Generating a dot file from a simple tree description, along with ensuring specific tree layouts by arranging nodes independently of graphviz, is basically what ttdot does.

Compared to using graphviz alone, enhancing it with ttdot brings the following benefits:

It provides a simple and adequate language for describing trees that can be used as a handier alternative, or an addition, to the dot language. A tree description for ttdot is easy not only to generate, but also to edit, both by hand and programmatically.
For not too large trees, the tree structure is easily discernible by looking at the description file. Worth noticing is that the tree description format is almost identical to the output of the tree utility in the most popular operating systems [1,2,3,4].
The presence of nodes with the same labels does not require, as it does in dot, introducing formal node names solely for distinguishing purposes.
Directed or non-directed dot graphs can be produced from the same tree description.
ttdot implements simple but useful tree-specific layouts not available in graphviz. Whether to use these layouts or stick to graphviz's own node arrangement mechanisms is up to the user: one could choose to do the latter and still make use of ttdot's tree description language.
Tree diagrams can be produced that show a tree with the root oriented upwards, downwards, or leftwards — this is achieved through a tiny variation of the way ttdot is called.
One can set the exact horizontal and vertical spacing between nodes. (In the dot language this is possible only to a certain extent, in some cases, and rather indirectly.)

2. Rooted Trees

An input file for ttdot representing a rooted tree must be structured according to the following rules.

The tree representation consists of a sequence of non-blank lines, each corresponding to a node of the tree.
The tree representation begins on the first line of the file, unless that line is a heading, in which case the tree is represented on the lines immediately following the heading.
The first line of the file is considered a heading if the very first character on it is =. A heading may contain information pertaining to the tree as a whole.
The lines of the tree representation map to the nodes of the tree in what is known as prefix traversal order: a node is followed by the nodes of its leftmost sub-tree, then the nodes of its second leftmost sub-tree, etc., this recursively applying to every non-leaf node, starting from the root of the tree.
Descending between nodes is defined by indentation as follows: all nodes that immediately succeed a given node X must be equally indented, with an offset greater than the one of X. Indentation is measured by the offset of the first non-blank character with respect to the beginning of the line.
The tree representation extends to the end of the file or up to the first blank line if there is one. A blank line is an empty line or one that contains only blank characters.
All lines in the file, beginning with the first blank line, if any, are ignored by ttdot.
Each line of the tree representation consists of at most the following three parts, in this order, and possibly separated by blanks:
- a string of the kind cLc, where c is any non-blank character not encountered within L.
- a string of the kind [N], and
- a string of the kind {E}.
L, N, and E are, from ttdot's point of view, arbitrary, possibly empty strings. L is the string to be displayed as node contents (the node label). N contains one or more node attributes in the dot language. Similarly, E contains one or more attributes for the edge that links the given node to its parent.
Each of the [N] and [E] parts or both may be omitted. If present, their contents are not checked by ttdot, and simply copied to the dot output.
The remainder of the heading after the = character may contain parameter settings, possibly followed by a string of the kind [G]. A parameter is any of the following:
- the name diam, followed by the diameter of the circle (if it is a circle) designating the node geometrically;
- the name dx, followed by the horizontal distance between adjacent nodes;
- the name dy, followed by the vertical distance between nodes of adjacent levels.
These values are positive integers denoting millimeters. The parameters can be given in any order, and any or all of them can be omitted or repeated. The defaults are diam = 8, dx = 10, and dy = 12.
G is a string containing one or more graph attributes (including global attributes of nodes and edges) in the dot language, which are passed unchecked to the dot text generated by ttdot.

3. Binary Trees

The input files for binary trees are similar to those for general rooted trees, but the rule 4. must be modified as follows.
Each node has at most two descendants, and if a node has only one, the latter is assumed to be a left descendant. The case of a right descendant that has no left sibling is handled by using a dummy node in place of the missing left descendant: a line holding a single non-blank character, indented so as to align with the right descendant.

Lines with a single non-blank character are also allowed in rooted trees, where they are simply skipped. This is so that any binary tree can be processed by ttdot as an ordinary rooted tree.

4. Node Arrangement

ttdot arranges the nodes of rooted and binary trees according to simple principles that result in predictable, readable, and relatively compact layout.

An archetypal tree is one that links a node to one or several descending nodes. The tree can be pictured like this:, where the descending nodes are evenly spaced and the root is centred above them. Drawing rooted trees in ttdot generalizes on this basic idea and is summarized in the following rules.

The leaves of a rooted tree, if projected onto the same horizontal line, will always preserve the left-to-right order of traversing the tree, and two adjacent leaves on this line are always the same distance (dx) one from another. A non-leaf node is always centred with respect to the horizontal extent of the sub-tree rooted at this node. And of course, nodes of the same level (i.e. equally distant from the root) are placed on the same horizontal, any two adjacent levels being at a vertical distance dy from each other.

Somewhat different rules apply for a binary tree. Here, all the nodes of the tree, if projected onto the same horizontal line, will preserve the left-to-right order (a.k.a. inorder) of traversing the tree, and are evenly spaced at a distance dx. It follows that for the whole tree or any of its sub-trees the root is to the right of all its left descendants and to the left of all its right descendants. In other words, a vertical line from any node of a binary tree never crosses an edge or touches another node of the tree.

5. Usage

ttdot reads its input — a tree description — from the standard input, and writes the result in the dot language to the standard output. Thus, assuming that ttdot is in a visible directory and that a tree is defined in the file a.tree, a dot file, say a.dot, can be obtained by running

    ttdot <a.tree >a.dot

By default ttdot assumes that its input is an ordinary rooted tree. The b option is used to tell ttdot to consider the input as a binary tree:

    ttdot b <a.tree >a.dot

Another option, d, instructs ttdot to produce a directed graph, i.e., to put arrowheads at the ends of the edges. Yet another option, x, causes the \(x\) and \(y\) coordinates to be exchanged, so in the resulting diagram the tree’s root will be on the left. If two or all the three options are given in an invocation to ttdot, they must be written as one word, with no space between them. The order of the options is irrelevant.

ttdot assumes nothing of the input file names or their extensions — they are entirely at user’s choice.

Of the several graphviz layout engines, neato is the one that accepts nodes with fixed positions when such are given. So here is how to get a file with, say, an SVG drawing from the source file a.tree:

    (ttdot | neato -n -Tsvg) <a.tree >a.svg

The so obtained diagram shows the tree as Nature has her trees: with the root at the bottom. This is because ttdot assigns larger \(y\)-coordinates to nodes that are farther from the root, and in the coordinate system of the dot language \(y\)-coordinates increase upwards. Most often, however, graphical trees are drawn with the root at the top of the diagram. This can be achieved by telling neato to reverse the direction of \(y\)-coordinates, but the latter requires an intermediate transform before producing an actual graphical output, such as SVG or PDF. Thus, to do the job in this case, a call of ttdot and two calls of neato are needed, which is unwieldy.

In order to enable the above and other variants of producing graphical trees in an easy to use form, the scripts mktree for Linux and mktree.bat for MS Windows are provided along with ttdot, and it is suggested that ttdot is mostly used through these scripts rather than directly. The description below applies to both scripts.

The scripts assume that ttdot, neato, and dot reside in visible directories, and accept two or three arguments. The first argument is the name of the file that defines the tree as required by ttdot. The proper name can be prefixed with a path as needed. The second argument specifies the output format: svg, ps, pdf, etc. This is also the filename extension of the output file. The output file is created in the same directory, and is given the same name as the input file, but is given the new extension. If the input file name itself has one or more extensions, the last of them is dropped before appending the output-specific extension. Thus, given an input file, e.g. some/path/names/tree.txt, the command

    mktree some/path/names/tree.txt svg

will produce an SVG file some/path/names/tree.svg. The same will happen if the input file is some/path/names/tree and the command

    mktree some/path/names/tree svg

is issued.

A third argument to mktree (or mktree.bat), if present, is one to be passed to ttdot — b, d, x or any combination of these — to which any of the letters i and n, or both, can be added. (i and n are processed by the script itself, and the remainder of the third argument is passed to ttdot.)

The option i refers to inverting the vertical orientation of the tree, so that its root, which by default is placed at the top of the diagram, goes to the bottom. With the option n the script is instructed to use dot in place of neato to generate the diagram. This means that the coordinates assigned to the tree’s nodes by ttdot are ignored and dot's own layout mechanism is used. This also implies that the b and x options of ttdot have no effect in the presence of n. The i option still applies, so one can make use of dot for layout but invert the resulting diagram.

Note that using the i and x options alone or together has these effects on the layout of the tree:

none of `i` and `x`	the tree is drawn in ‘normal’ position, i.e. with the root at the top;
`i` and no `x`	the tree is ‘inversed’, i.e. with its root at the bottom;
`x` and no `i`	the tree’s root is on the left of the diagram, with tree’s own ‘left’ up;
both `x` and `i`	the tree’s root is on the left of the diagram, with tree’s own ‘left’ down.

6. Examples

Here are some examples of input files and the corresponding tree diagrams that result from applying ttdot and graphviz.

A binary tree with one empty node, a missing left node, and some same-labeled nodes

-49-
   --
       -?-
           -5-
           -28-
       -45-
   -?-
      -52-
         -
         -?-

A binary tree with non-enclosed nodes of varying sizes

= dx 22 dy 20 [fixedsize=false shape=none]
=Elizabeth Tudor\n(Elizabeth I)=
  1Henry Tudor\n(Henry VIII)1
    2Henry Tudor\n(Henry VII)2
    3Elizabeth Plantagenet\nof York3
  4Anne Boleyn4
    5Thomas Boleyn5
    6Elizabeth Howard6

A rooted tree

= dx 14 dy 21
'A'
     'B'
          'E'
               'M'
               'N'
          'F'
          'G'
               'O'
               'P'
               'Q'
          'H'
               'R'
               'S'
     'C'
          'I'
     'D'
          'J'
               'T'
          'K'
               'U'
               'V'
               'W'
               'X'
          'L'

The same tree layed out by `dot` (the option `n` of `mktree`)

Note that none of the nodes whose descendants are two or more leaves is centred with respect to them. Even the J node, whose only descendant is a leaf (T), is horizontally displaced rather than being straight above it.

A structural view of the arithmetic expression \(\,ab^n{+}cd^{z^k}\!e\)

Multiplication and exponentiation are represented by × and ^.
All the three diagrams are obtained from the same description, calling mktree with the n, b, and none options, respectively.

= diam 7 dy 15 [style=filled fillcolor="#ffc000"]
:+:
   :×:
      :a:           [shape=square fillcolor=yellow]
      :^:
        :b:         [shape=square fillcolor=yellow]
        :n:         [shape=square fillcolor=yellow]
   :×:
      :×:
         :c:        [shape=square fillcolor=yellow]
         :^:
            :d:     [shape=square fillcolor=yellow]
            :^:
               :z:  [shape=square fillcolor=yellow]
               :k:  [shape=square fillcolor=yellow]
      :e:           [shape=square fillcolor=yellow]

Rotated and reversed trees

Once again, all four pictures are produced from the same description. The tree at the top is produced by calling mktree with no options. The one at the bottom requires the i option. The two trees in the middle are produced with x and xi, respectively.

= diam 2 dx 5 dy 9 [style=filled]
>> [color="#b06000"]
  >> [color=red]
  >> [color=green]
  >> [color=blue]