On a single core CPU, each process runs in the OS, and the CPU jumps around from one process to another to best utilize itself. A process can have many threads, in which cas
Cores (or CPUs) are the physical elements of your computer that execute code. Usually, each core has all necessary elements to perform computations, register files, interrupt lines etc.
Most operating systems represent applications as processes. This means that the application has its own address space (== view of memory), where the OS makes sure that this view and its content are isolated from other applications.
A process consists of one or more threads, which carry out the real work of an application by executing machine code on a CPU. The operating system determines, which thread executes on which CPU (by using clever heuristics to improve load balance, energy consumption etc.). If your application consists only of a single thread, then your whole multi-CPU-system won't help you much as it will still only use one CPU for your application. (However, overall performance may still improve as the OS will run other applications on the other CPUs so they don't intermingle with the first one).
Now to your specific questions:
1) The OS usually allows you to at least give hints about on which core you want to execute certain threads. What OpenMP does is to generate code that spawns a certain amount of threads to distribute shared computational work from loops of your program in multiple threads. It can use the OS's hint mechanism (see: thread affinity) to do so. However, OpenMP applications will still run concurrently to others and thus the OS is free to interrupt one of the threads and schedule other (potentially unrelated) work on a CPU. In reality, there are many different scheduling schemes you might want to apply depending on your situation, but this is highly specific and most of the time you should be able to trust your OS doing the right thing for you.
2) Even if you are running a single-threaded application on a multi-core CPU, you notice other CPUs doing work as well. This comes a) from the OS doing its job in the meantime and b) from the fact that your application is never running alone -- each running system consists of a whole bunch of concurrently executing tasks. Check Windows' task manager (or ps/top on Linux) to check what is running.
Note also that the OS doesn't much care which process the threads are from. It will usually schedule threads to processors / cores regardless of which process the thread is from. This could lead to four threads from one process running at the same time, as easily as one thread from four processes running at the same time.
Yes, threads and processes can run concurrently on multi-core CPUs, so this works as you describe (regardless of how you create those threads and processes, OpenMP or otherwise). A single process or thread only runs on a single core at a time. If there are more threads requesting CPU time than available cores (generally the case), the operating system scheduler will move threads on and off cores as needed.
The reason why single-threaded processes run on more than one CPU or core is related to your operating system, and not specifically any feature of the hardware. Some operating systems have no sense of "thread affinity" - they don't care what processor a thread is running on - so when time comes to re-evaluate what resources are being used (several times a second, at least), they'll move a thread/process from one core/CPU to another. Other than causing cache misses, this generally doesn't affect the performance of your process.
If there is one thread application which has say 10 threads, initially it will start on the same CPU/core.over a period of time the multiple threads will be distributed to other cores/cpus due to the load balancer in Linux. If there are multiple such thread applications are there,I think all the application threads mostly run on the same core/cpu as the locals/globals of the threads are readily available in l1/l2 cache of the core in which they were running.Moving them out of the core is time consuming than their execution time.If the threads need be run in a different core.I think one has to supply the affinity info to the thread.
@BjoernD, you mentioned that..
.. If your application consists only of a single thread, then your whole multi-CPU-system won't help you much as it will still only use one CPU for your application...
I think even if its a single threaded application, that application thread may be executed on different cores during its lifetime. On each preemption and later assignment to a CPU, a different core may get assigned to that thread.