I think to your question one can easily write at least some chapters for the book on Operating Systems. I suggest you to read Tanenbaum: Modern Operating Systems.
Main difference of heap and stack, that one is per process item, the other per thread item. Initially when program is started it gets some minimal heap and some stack segment. Heap is grown, stack is static (for each thread). If you write a recursive function which does not terminate (endless recursion) you will get stack overflow ;) Any function call has a stack frame on stack segment, when the function leaves, the stack is unwound and frame is free to be used by the next function. Stack is a continuous linear structure. On Linux you can configure the stack segment size for a process via an environment variable. On windows (at least with MS Visual C++) you can pass a linker flag with the size of stack segment. Stack overflows can also be produced when allocating at compile time some big array:
char test[1000000];
Heap is a different story. When a process starts up heap size is some default value and can vary form OS to OS or configuration being used on that OS (e.g. on Windows it is 2MB by default, as far as I remember it). Further, if you need more heap, to allocate more space for variables etc. it will grow. If the program doesn't free heap memory it runs out of it (or heap space). There are different data structures for heap implementation some of them are binary tree derivatives, some are not e.g. Fibonacci Heap (forrest of trees). You can read some articles etc. on how to write a memory allocator. These data structures must be optimized for finding the heap node when an allocated chunk needs to be de-allocated, or appending (finding a free chunk) when new heap space is needed.
Each process on a 32 bit OS has 4GB of virtual address space. As you can imagine there can't be so much RAM where all processes with their 4GBs of virtual address space fit. OS memory is organized in pages, which are swapped to HD when no longer needed or expired. This is where paging comes to play. Everything is mapped to pages: a process with the stack or the growing heap. Due to the structure of heap that it grows dynamically, it can be placed on multiple pages. This is why heap access can be very expensive, because if the page is not in memory a page fault happens and OS has to load a page from the disk (and that can be by magnitude slower). Stack frame of the thread being executed is in processor cache, which is much faster as RAM.
Different heap types are possible, there might be heaps which are very fast for small objects or heaps which are very efficient in multi-threaded environments. Alexandrescu describes in "Modern C++ Design" how to develop small object allocator and a heap which manages small objects. This implementation is available in his Loki C++ library. Some embedded systems offer physically different memory regions, where different heap types can be implemented ontop. To write an own allocator (heap manager etc.) is a hard job if you want to beat a compiler.
Regards,
Ovanes