Erik Lenaerts

Do, or do not. There is no try. - Yoda

April 2005 - Posts

ThreadPool Class hazard

I'm writing an application that downloads an HTML file and parses the content of it using regular expressions. First sight nothing special about it. Now, this app. needed to process this action several times in parallel. Automatically I thought in the direction of a threading system that could launch each of these HTML screen scraping actions on a separate thread.

To be honest I started out with asynchronous method calls, but I got some strange unexpected effects there. So I went looking for some solution and stumbled into the ThreadPool class. A pretty straight forward class to do some limited amount of tasks in parallel using a number of threads. The ThreadPool class takes care all threading aspects, the only thing I needed to do was to push each screen scraping task on the queue using the QueUserWorkItem static method and forget about it.

This new solution looked clean, and I was happy that I have learned yet another class of the .NET framework. However, still got the strange effects... :(

So, digging into this, I found a post on Wallace B. McClure blog that discussed an issue with the .NET Framework Threadpool.

Here's the base problem I actually got; when you start a method on a Thread using the Threadpool of .NET, and if in that method, you use again the Threadpool of .NET, than you will get in trouble.

Now, basically I didn't saw this one coming, because I wasn't aware that I was using the Threadpool in my screen scraping functionality. But, apparently I did by using the WebClient Class which uses internally the WebRequest Class which is based on the ThreadPool Class.

Since the methods of the ThreadPool Class are all static, I figure that there's only one Threadpool available. So that's why it messed up my application.

The solution

I solved everything by using a what is called a Managed Thread Pool. So before making such a thing (if I ever could make it) I looked on the Net and found one on gotdotnet from Stephen Toub. The guy did a great job and it was so easy to implement it because the class has the same method signatures (+ others) than the ThreadPool Class of the .NET framework.

So,  I let the WebClient use the ThreadPool of .NET and my tasks are started using the ManagedThreadPool Class, now everything works like a charm.

- Erik