SIGPIPE 13

Maintaining a Layout

August 13th, 2009

TextMate works with fixed-width fonts both because of the simplicity and because it is the immediate difference between a plain text editor and a word processor.

Though for version 2.0 I want it to do a richer layout, e.g. larger headings in markup languages, indented soft wrap, proper support for unicode, etc. So I had to bite the bullet and figure out how to allow this with reasonable performance, this article explains the problem and data structure I picked.

Problem

The main problem is variable height of lines. If we make a change to line 3,128 then what pixel position does that correspond to (for redrawing) when lines preceding it can have an arbitrary height?

Naive Solution Methods

Two solution methods exist with different characteristics:

  1. Use an array of lines where each line knows its Y position.
  2. Use a linked list of lines and let each line know its height.

If we pick the first we can find lines and line positions in constant time, but changing a line’s height or inserting/removing lines is linear in time.

Solution number two allows us to insert/remove lines in constant time (if we know where to insert) and also to update a line’s height in constant time. Finding the position of a line (or the nth line) is however linear in time.

Pick Your Poison

Both the solution methods are unusable as explained above, but unlike the first one, the latter one has potential. The reason for this is that the first solution method requires the entire data structure to be updated, the data structure is invalid if we do not perform this update, it is unlikely we will be able to improve on that (and updating is a common operation).

The latter solution method has expensive queries, but there is almost always a way to speedup a query, for example by using a cache.

Optimizing Queries

If we go with the second solution method our problem is now how to speedup queries in a linked list.

One way to improve it is by using an index, for example a skip lists. We have two different query keys, line number and Y position. I.e. given a layout, we may want to ask for the node corresponding to line number 32, or we may want to ask which node is at Y position 27,423.

For now, let us only focus on the first type of queries, so query key is the line number. The problem is the same because if we create an index for the line number and insert a new line, our line numbers change, and the index becomes void.

Rather than work with a skip list, let us use a binary search tree and let me draw how a (perfectly balanced) one looks for a layout with 7 lines:

     4
   /   \
  2     6
 / \   / \
1   3 5   7

So with this tree we can find the node for line 3 by first visiting node 4 and 2.

If we insert a new line after line 2 then it becomes:

     4
   /   \
  2     6
 / \   / \
1   3 5   7
   /
  ×

After the insertion all line numbers (after line 2) have changed, so we need to update the keys in our search tree:

     5
   /   \
  2     7
 / \   / \
1   4 6   8
   /
  3

So we are back to the original problem of having to update the entire data structure. But there is a way around this.

Instead of storing the actual line number in each node, we store number of nodes in the sub-tree rooted at the node in question (which I think is called a finger tree), so the initial tree instead becomes:

     7
   /   \
  3     3
 / \   / \
1   1 1   1

The way we use this tree is slightly different than when we had the actual line numbers.

We need to always start at the root and set: skipped_lines = 0.

Whenever we go right in the tree, we add the number of children in the left subtree plus one (for the node itself): skipped_lines += node.left.count + 1.

The line number for a node is then: skipped_lines + node.left.count + 1.

So the transformation has not made queries more expensive. Let’s now look at an insertion (again after line 2):

     7
   /   \
  3     3
 / \   / \
1   1 1   1
   /
  ×

The only nodes which need updating are those on the path from the root down to the inserted node, i.e. we never update more nodes than the height of the tree which is log(n) (assuming our binary search tree is self-balancing), the updated tree looks like this:

     8
   /   \
  4     3
 / \   / \
1   2 1   1
   /
  1

Back to Pixels

We have solved the problem of storing n items in a binary search tree and finding items based on their index (line number) in log(n) time (which is different than regular binary search trees where each item has a fixed search key).

This is no different than using Y positions. Instead of storing number of children in each tree, we store the (pixel) height of the (sub)tree which is node.height = node.lineHeight + node.left.height + node.right.height. When we traverse the tree (as loosely described above) we add node.lineHeight to skipped_lines instead of 1 (and probably rename the variable to skipped_pixels).

[by Allan Odgaard]


4 Responses to “Maintaining a Layout”

  1. Sébastien VARLET Says:
    August 14th, 2009 at 8:13

    Interesting approach but your tree will never stay perfectly balanced since the main root doesn't change. Therefore you'll get a tree looking like a linked list :(

    You should look at AVL trees ;) (http://en.wikipedia.org/wiki/AVL_tree)

  2. Allan Odgaard Says:
    August 14th, 2009 at 8:44

    Sébastien: That is why I wrote “assuming our binary search tree is self-balancing” (when I stated the log(n) time complexity).

  3. Daniel Yankowsky Says:
    September 13th, 2009 at 8:04

    It seems like you could use this structure to keep track of line widths as well. Each node could store the largest width of itself and its children. The overall width of the document is the width stored in the root node. Then, when a line is added, removed, or modified, you could quickly update just the affected subtree. That would be an efficient way to recalculate the extents of the horizontal scrollbar as the document is modified.

  4. Anonymous Says:
    November 19th, 2009 at 9:01

    I think performance will be a problem. When I use TM to open a latex file with over ten thousand lines(don't blame me not separate them into multiple files, sometime it just have to be done in this way), it takes a notable time to make the syntax highlighting to work. If the awesome feature in this post implemented, I can imagine it will take more time to generate the layout. I am not a programmer, so I am just giving some user side feedback(so forgive me if my idea is naive…). My mac is MacBook Pro 4,1, 2.4G Core 2 Duo + 4G Mem, with SL installed.

    I think AUCTeX in Emacs provide some similar features.


Leave a Reply