Let\'s say I had a program in C# that did something computationally expensive, like encoding a list of WAV files into MP3s. Ordinarily I would encode the files one at a time
It is the operating system's job to split threads across different cores, and it will do so when automatically when your threads are using a lot of CPU time. Don't worry about that. As for finding out how many cores your user has, try Environment.ProcessorCount
in C#.
Although I agree with most of the answers here, I think it's worth it to add a new consideration: Speedstep technology.
When running a CPU intensive, single threaded job on a multi-core system, in my case a Xeon E5-2430 with 6 real cores (12 with HT) under windows server 2012, the job got spread out among all the 12 cores, using around 8.33% of each core and never triggering a speed increase. The CPU remained at 1.2 GHz.
When I set the thread affinity to a specific core, it used ~100% of that core, causing the CPU to max out at 2.5 GHz, more than doubling the performance.
This is the program I used, which just loops increasing a variable. When called with -a, it will set the affinity to core 1. The affinity part was based on this post.
using System;
using System.Diagnostics;
using System.Linq;
using System.Runtime.InteropServices;
using System.Threading;
namespace Esquenta
{
class Program
{
private static int numThreads = 1;
static bool affinity = false;
static void Main(string[] args)
{
if (args.Contains("-a"))
{
affinity = true;
}
if (args.Length < 1 || !int.TryParse(args[0], out numThreads))
{
numThreads = 1;
}
Console.WriteLine("numThreads:" + numThreads);
for (int j = 0; j < numThreads; j++)
{
var param = new ParameterizedThreadStart(EsquentaP);
var thread = new Thread(param);
thread.Start(j);
}
}
static void EsquentaP(object numero_obj)
{
int i = 0;
DateTime ultimo = DateTime.Now;
if(affinity)
{
Thread.BeginThreadAffinity();
CurrentThread.ProcessorAffinity = new IntPtr(1);
}
try
{
while (true)
{
i++;
if (i == int.MaxValue)
{
i = 0;
var lps = int.MaxValue / (DateTime.Now - ultimo).TotalSeconds / 1000000;
Console.WriteLine("Thread " + numero_obj + " " + lps.ToString("0.000") + " M loops/s");
ultimo = DateTime.Now;
}
}
}
finally
{
Thread.EndThreadAffinity();
}
}
[DllImport("kernel32.dll")]
public static extern int GetCurrentThreadId();
[DllImport("kernel32.dll")]
public static extern int GetCurrentProcessorNumber();
private static ProcessThread CurrentThread
{
get
{
int id = GetCurrentThreadId();
return Process.GetCurrentProcess().Threads.Cast<ProcessThread>().Single(x => x.Id == id);
}
}
}
}
And the results:
Processor speed, as shown by Task manager, similar to what CPU-Z reports:
It is not necessarily as simple as using the thread pool.
By default, the thread pool allocates multiple threads for each CPU. Since every thread which gets involved in the work you are doing has a cost (task switching overhead, use of the CPU's very limited L1, L2 and maybe L3 cache, etc...), the optimal number of threads to use is <= the number of available CPU's - unless each thread is requesting services from other machines - such as a highly scalable web service. In some cases, particularly those which involve more hard disk reading and writing than CPU activity, you can actually be better off with 1 thread than multiple threads.
For most applications, and certainly for WAV and MP3 encoding, you should limit the number of worker threads to the number of available CPU's. Here is some C# code to find the number of CPU's:
int processors = 1;
string processorsStr = System.Environment.GetEnvironmentVariable("NUMBER_OF_PROCESSORS");
if (processorsStr != null)
processors = int.Parse(processorsStr);
Unfortunately, it is not as simple as limiting yourself to the number of CPU's. You also have to take into account the performance of the hard disk controller(s) and disk(s).
The only way you can really find the optimal number of threads is trial an error. This is particularly true when you are using hard disks, web services and such. With hard disks, you might be better off not using all four processers on you quad processor CPU. On the other hand, with some web services, you might be better off making 10 or even 100 requests per CPU.
One of the reasons you should not (as has been said) try to allocated this sort of stuff yourself, is that you just don't have enough information to do it properly, particularly into the future with NUMA, etc.
If you have a thread read-to-run, and there's a core idle, the kernel will run your thread, don't worry.
In the case of managed threads, the complexity of doing this is a degree greater than that of native threads. This is because CLR threads are not directly tied to a native OS thread. In other words, the CLR can switch a managed thread from native thread to native thread as it sees fit. The function Thread.BeginThreadAffinity is provided to place a managed thread in lock-step with a native OS thread. At that point, you could experiment with using native API's to give the underlying native thread processor affinity. As everyone suggests here, this isn't a very good idea. In fact there is documentation suggesting that threads can receive less processing time if they are restricted to a single processor or core.
You can also explore the System.Diagnostics.Process class. There you can find a function to enumerate a process' threads as a collection of ProcessThread objects. This class has methods to set ProcessorAffinity or even set a preferred processor -- not sure what that is.
Disclaimer: I've experienced a similar problem where I thought the CPU(s) were under utilized and researched a lot of this stuff; however, based on all that I read, it appeared that is wasn't a very good idea, as evidenced by the comments posted here as well. However, it's still interesting and a learning experience to experiment.
You can definitely do this by writing the routine inside your program.
However you should not try to do it, since the Operating System is the best candidate to manage these stuff. I mean user mode program should not do try to do it.
However, sometimes, it can be done (for really advanced user) to achieve the load balancing and even to find out true multi thread multi core problem (data racing/cache coherence...) as different threads would be truly executing on different processor.
Having said that, if you still want to achieve we can do it in the following way. I am providing you the pseudo code for(Windows OS), however they could easily be done on Linux as well.
#define MAX_CORE 256
processor_mask[MAX_CORE] = {0};
core_number = 0;
Call GetLogicalProcessorInformation();
// From Here we calculate the core_number and also we populate the process_mask[] array
// which would be used later on to set to run different threads on different CORES.
for(j = 0; j < THREAD_POOL_SIZE; j++)
Call SetThreadAffinityMask(hThread[j],processor_mask[j]);
//hThread is the array of handles of thread.
//Now if your number of threads are higher than the actual number of cores,
// you can use reset the counters(j) once you reach to the "core_number".
After the above routine is called, the threads would always be executing in the following manner:
Thread1-> Core1
Thread2-> Core2
Thread3-> Core3
Thread4-> Core4
Thread5-> Core5
Thread6-> Core6
Thread7-> Core7
Thread8-> Core8
Thread9-> Core1
Thread10-> Core2
...............
For more information, please refer to manual/MSDN to know more about these concepts.