So I just finished watching this talk on the Python Global Interpreter Lock (GIL) http://blip.tv/file/2232410.
The gist of it is that the GIL is a pretty good design
Another solution is: http://docs.python.org/library/multiprocessing.html
Note 1: This is not a limitation of the Python language, but of CPython implementation.
Note 2: With regard to affinity, your OS shouldn't have a problem doing that itself.
This is a pretty old question but since everytime I search about information related to python and performance on multi-core systems this post is always on the result list, I would not let this past before me an do not share my thoughts.
You can use the multiprocessing module that rather than create threads for each task, it creates another process of cpython compier interpreting your code. It would make your application to take advantage of multicore systems. The only problem that I see on this approach is that you will have a considerable overhead by creating an entire new process stack on memory. (http://en.wikipedia.org/wiki/Thread_(computing)#How_threads_differ_from_processes)
Python Multiprocessing module: http://docs.python.org/dev/library/multiprocessing.html
"The reason I am not using the multiprocessing module is that (in this case) part of the program is heavily network I/O bound (HTTP requests) so having a pool of worker threads is a GREAT way to squeeze performance out of a box..."
About this, I guess that you can have also a pool of process too: http://docs.python.org/dev/library/multiprocessing.html#using-a-pool-of-workers
Att, Leo
The Python GIL is per Python interpreter. That means the only to avoid problems with it while doing multiprocessing is simply starting multiple interpreters (i.e. using seperate processes instead of threads for concurrency) and then using some other IPC primitive for communication between the processes (such as sockets). That being said, the GIL is not a problem when using threads with blocking I/O calls.
The main problem of the GIL as mentioned earlier is that you can't execute 2 different python code threads at the same time. A thread blocking on a blocking I/O call is blocked and hence not executin python code. This means it is not blocking the GIL. If you have two CPU intensive tasks in seperate python threads, that's where the GIL kills multi-processing in Python (only the CPython implementation, as pointed out earlier). Because the GIL stops CPU #1 from executing a python thread while CPU #0 is busy executing the other python thread.
I have never heard of anyone using taskset for a performance gain with Python. Doesn't mean it can't happen in your case, but definitely publish your results so others can critique your benchmarking methods and provide validation.
Personally though, I would decouple your I/O threads from the CPU bound threads using a message queue. That way your front end is now completely network I/O bound (some with HTTP interface, some with message queue interface) and ideal for your threading situation. Then the CPU intense processes can either use multiprocessing or just be individual processes waiting for work to arrive on the message queue.
In the longer term you might also want to consider replacing your threaded I/O front-end with Twisted or some thing like eventlets because, even if they won't help performance they should improve scalability. Your back-end is now already scalable because you can run your message queue over any number of machines+cpus as needed.
An interesting solution is the experiment reported by Ryan Kelly on his blog: http://www.rfk.id.au/blog/entry/a-gil-adventure-threading2/
The results seems very satisfactory.Until such time as the GIL is removed from Python, co-routines may be used in place of threads. I have it on good authority that this strategy has been implemented by two successful start-ups, using greenlets in at least one case.