Inside the "Inside the Python GIL" PresentationMy comment: Thanks to Dave Beazley for doing some of the very best writing and research on Python's GIL. It removed much of my own stupidity on the subject.
Image via WikipediaWhat seems to be the answer: *** I/O bound threaded code - no risk of context switches swamping the CPU: For CPython's implementation, implementing the GIL using both a condition variable and an pthreads/OS mutex lock is the way to go, so that code can be developed on a single core machine, and not blow up on a multiple core machine (and vice-versa, to a degree). *** I/O bound threaded code - possibility of context switches swamping the CPU: As the number of "concurrent" events grows, there is a possibility of the CPU being swamped by the context switches involved in OS threading. The solution is, in pure Python, using event-driven multi-threading and deferred objects, using the Twisted library or rolling your own. You have the problem of avoiding writing blocking code, such as loops. And code that has the possibility of blocking must regularly check/pause to handle queued events. *** CPU intensive threaded code: 1) event-driven and deferred objects (using Twisted or rolling your own) - being careful to avoid loops and other long running code 2) "green" threads, implemented in pure Python - maintain your own stack of tuples of ("functionname", arg0, arg1) and ("continuationname", statevar0, statevar1), and dispatch on name from that stack - being careful to avoid loops and other long running code - if you have some code with a loop, break it up into more than one "continuationname0" "continuationname1" etc.
Image via Wikipedia3) message passing between Python processes 4) Python module multiprocessing - threading work-alike 5) combination of the above four 6) Why not modern shared mutable state threading like in Java and C#: the implementation in the virtual machines and the language and library constructs? Sweep away all the complexities of the above 5 with a single broom? I am prejudiced against general shared mutable state threading because it is brittle and non-deterministic. That makes it a non-starter: you are never able to make ANY guarantees about low-latency and performance after ANY change in the code, no matter how small. And to regain adequate low-latency and performance, your implementation could get very hairy very quickly. Of course, the penalty paid by my suggested approach is a hairy implementation right off the bat - I have to be honest about that. It seems to me: any techniques adequate to handle CPU intensive multi-threaded code would be overkill for I/O bound multi-threaded code. So best to deal with the cases separately. [ Use Google books to find out about "shared mutable state threading in Java" http://books.google.com/books?q=shared+mutable+state+threading+java&btnG=Search+Books ] The biggest missing piece: in a long running high availability application (a candidate for multi-threaded code), code reloading on the fly. Right now, a terrible solution is using Erlang as a thin layer of supervisor code, where the real work is farmed out to Python. The only advantage of this approach is avoiding predictable failure modes. Armin Ronacher blogged about this problem: http://lucumr.pocoo.org/2009/7/24/singletons-and-their-problems-in-python