Worker Thread Protocol

When two components are used together, let’s call them A and B, it is a good approach to figure out who is using whom, and if A is using B then B should not know about A and vice versa.

This rule of thumb lowers complexity and makes both refactoring and re-use of code easier.

One scenario where it might be appealing to ignore this rule is when outsourcing computation to a worker thread, but here it is actually more important to stick with it.

Let us say we want to search folders recursively and provide the user with status about where we are in the process.

To provide this status we can have the worker thread send a message back to the main thread to let it know which folder it is presently searching, but this breaks the rule! The main thread sets up the worker thread and will also terminate it, should the user abort the search, so the main thread clearly knows about the worker thread (and need to). If the worker thread sends back messages, then it knows about the main thread.

Synchronous Message Deadlock

If we do synchronous message passing then this simple design can lead to a deadlock. E.g. if the worker thread sends back a status update and at the same time, the main thread sends a terminate message to the worker then both threads are stuck waiting for the other to acknowledge the message.

Asynchronous Message Race Condition

By using asynchronous message passing we avoid the deadlock but instead introduce potential race conditions. The main thread may send a terminate message to an already completed worker thread, because it hasn’t received a “did terminate” message yet, or the worker thread may send a status update to the main thread after the main thread sent a terminate message.

This may lead to messages sent to disposed objects or resources being leaked, it is not impossible to “get right” but it is definitely not a simple problem.

The Solution: Polling

While polling in general should be avoided, it fits this problem very well. Our search code will look something like the following (C++):

class searcher
{
	volatile bool keep_running, done;
	std::vector<std::string> results;

public:
	searcher () : keep_running(true), done(false) { }

	void start_search (std::string const& src)
	{
		std::vector<std::string> toSearch(1, src);
		while(keep_running && !toSearch.empty())
		{
			std::vector<std::string> tmp;
		
			// pseudo-code:
			folder = toSearch.pop()
			for each file in folder
			   tmp.push(file)      if file matches criterion
			   toSearch.push(file) if file.type == folder

			lock(mutex);
			results.insert(results.end(), tmp.begin(), tmp.end());
			unlock(mutex);
		}
		done = true;
	}

	void stop_search ()
	{
		keep_running = false;
	}

	std::vector<std::string> get_results ()
	{
		std::vector<std::string> res;
		lock(mutex);
		res.swap(results);
		unlock(mutex);
		return res;
	}

	bool is_done () const
	{
		return done;
	}
};

This encapsulates the searching, but does not use a thread itself. The get_results member function though is thread safe, so a user can spawn a thread, call start_search in that thread. In the main thread a timer is started, and get_results is periodically called (together with is_done).

When is_done returns true, the main thread knows that the search is done and can stop the timer (and delete the searcher object).

Advantages

In addition to avoiding the potential deadlock and/or race conditions, two other advantages with this approach is:

Separation of concerns. The search code is completely self-contained and does not need to incorporate knowledge about threads or message passing. This makes it easier to re-use it, e.g. if we are making unit tests we can test the code without needing to involve an actual worker thread.
Free throttling! In a user interface we don’t want to refresh the progress more than a few times per second, so we simply set the timer to fire e.g. 5 times per second. Had we instead made the worker thread signal the main thread, it would be difficult to control the number of messages sent. For example searching a 4600 RPM disk might only produce one new result per second, so here it might be ideal to signal the main thread whenever we get a new result, but say we are searching an SSD disk, the disk cache is hot, or there are hundreds of matches per folder, then we flood our main thread to a point where it may affect perceived performance.

Closing Remarks

I started by writing that if A knows about B, B should not know about A. When deciding which of the two should know about the other, it should be the component most likely to be re-used, which should not know about the other component.

In the above example we made the search code be the candidate for re-use by not letting it have any dependencies (knowledge about other objects), in a MVC pattern it is the view and model we want to re-use, and so, these do not know about any of the other parts.

SIGPIPE 13