Maintaining a Layout

TextMate works with fixed-width fonts both because of the simplicity and because it is the immediate difference between a plain text editor and a word processor.

Though for version 2.0 I want it to do a richer layout, e.g. larger headings in markup languages, indented soft wrap, proper support for unicode, etc. So I had to bite the bullet and figure out how to allow this with reasonable performance, this article explains the problem and data structure I picked.

Problem

The main problem is variable height of lines. If we make a change to line 3,128 then what pixel position does that correspond to (for redrawing) when lines preceding it can have an arbitrary height?

Naive Solution Methods

Two solution methods exist with different characteristics:

Use an array of lines where each line knows its Y position.
Use a linked list of lines and let each line know its height.

If we pick the first we can find lines and line positions in constant time, but changing a line’s height or inserting/removing lines is linear in time.

Solution number two allows us to insert/remove lines in constant time (if we know where to insert) and also to update a line’s height in constant time. Finding the position of a line (or the nth line) is however linear in time.

Pick Your Poison

Both the solution methods are unusable as explained above, but unlike the first one, the latter one has potential. The reason for this is that the first solution method requires the entire data structure to be updated, the data structure is invalid if we do not perform this update, it is unlikely we will be able to improve on that (and updating is a common operation).

The latter solution method has expensive queries, but there is almost always a way to speedup a query, for example by using a cache.

Optimizing Queries

If we go with the second solution method our problem is now how to speedup queries in a linked list.

One way to improve it is by using an index, for example a skip lists. We have two different query keys, line number and Y position. I.e. given a layout, we may want to ask for the node corresponding to line number 32, or we may want to ask which node is at Y position 27,423.

For now, let us only focus on the first type of queries, so query key is the line number. The problem is the same because if we create an index for the line number and insert a new line, our line numbers change, and the index becomes void.

Rather than work with a skip list, let us use a binary search tree and let me draw how a (perfectly balanced) one looks for a layout with 7 lines:

     4
   /   \
  2     6
 / \   / \
1   3 5   7

So with this tree we can find the node for line 3 by first visiting node 4 and 2.

If we insert a new line after line 2 then it becomes:

     4
   /   \
  2     6
 / \   / \
1   3 5   7
   /
  ×

After the insertion all line numbers (after line 2) have changed, so we need to update the keys in our search tree:

     5
   /   \
  2     7
 / \   / \
1   4 6   8
   /
  3

So we are back to the original problem of having to update the entire data structure. But there is a way around this.

Instead of storing the actual line number in each node, we store number of nodes in the sub-tree rooted at the node in question (which I think is called a finger tree), so the initial tree instead becomes:

     7
   /   \
  3     3
 / \   / \
1   1 1   1

The way we use this tree is slightly different than when we had the actual line numbers.

We need to always start at the root and set: skipped_lines = 0.

Whenever we go right in the tree, we add the number of children in the left subtree plus one (for the node itself): skipped_lines += node.left.count + 1.

The line number for a node is then: skipped_lines + node.left.count + 1.

So the transformation has not made queries more expensive. Let’s now look at an insertion (again after line 2):

     7
   /   \
  3     3
 / \   / \
1   1 1   1
   /
  ×

The only nodes which need updating are those on the path from the root down to the inserted node, i.e. we never update more nodes than the height of the tree which is log(n) (assuming our binary search tree is self-balancing), the updated tree looks like this:

     8
   /   \
  4     3
 / \   / \
1   2 1   1
   /
  1

Back to Pixels

We have solved the problem of storing n items in a binary search tree and finding items based on their index (line number) in log(n) time (which is different than regular binary search trees where each item has a fixed search key).

This is no different than using Y positions. Instead of storing number of children in each tree, we store the (pixel) height of the (sub)tree which is node.height = node.lineHeight + node.left.height + node.right.height. When we traverse the tree (as loosely described above) we add node.lineHeight to skipped_lines instead of 1 (and probably rename the variable to skipped_pixels).

SIGPIPE 13