What is the normal way people writing network code in Delphi use Windows-style overlapped asynchronous socket I/O?
Here\'s my prior research into this question:
@Chris Miller - What you've stated in your answer is factually inaccurate.
Windows message-style async, as available through WSAAsyncSelect, is indeed largely a workaround for lack of a proper threading model in Win 3.x days.
.NET Begin/End, however, is not using extra threads. Instead, it is using overlapped I/O, using the extra argument on WSASend / WSARecv, specifically the overlapped completion routine, to specify the continuation.
This means that the .NET style harnesses the Windows OS's async I/O support to avoid burning a thread by blocking on a socket.
Since threads are generally speaking expensive (unless you specify a very small stack size to CreateThread), having threads blocking on sockets will stop you from scaling to 10,000s of concurrent connections.
This is why it's important that async I/O be used if you want to scale, and also why .NET is not, I repeat, is not, simply "using threads, [...] just managed by the Framework".
There is a free IOCP (completion ports) socket components : http://www.torry.net/authorsmore.php?id=7131 (source code included)
"By Naberegnyh Sergey N.. High performance socket server based on Windows Completion Port and with using Windows Socket Extensions. IPv6 supported. "
i've found it while looking better components/library to rearchitecture my little instant messaging server. I haven't tried it yet but it looks good coded as a first impression.
@Roddy - Synchronous sockets are not what I'm after. Burning a whole thread for the sake of a possibly long-lived connection means you limit the amount of concurrent connections to the number of threads that your process can contain. Since threads use a lot of resources - reserved stack address space, committed stack memory, and kernel transitions for context switches - they do not scale when you need to support hundreds of connections, much less thousands or more.
@Roddy - I've already read the links you point to, they are both referenced from Paul Tyma's presentation "Thousands of Threads and Blocking I/O - The old way to write Java Servers is New again".
Some of the things that don't necessarily jump out from Paul's presentation, however, are that he specified -Xss:48k to the JVM on startup, and that he's assuming that the JVM's NIO implementation is efficient in order for it to be a valid comparison.
Indy does not specify a similarly shrunken and tightly constrained stack size. There are no calls to BeginThread (the Delphi RTL thread creation routine, which you should use for such situations) or CreateThread (the raw WinAPI call) in the Indy codebase.
The default stack size is stored in the PE, and for the Delphi compiler it defaults to 1MB of reserved address space (space is committed page by page by the OS in 4K chunks; in fact, the compiler needs to generate code to touch pages if there are >4K of locals in a function, because the extension is controlled by page faults, but only for the lowest (guard) page in the stack). That means you're going to run out of address space after max 2,000 concurrent threads handling connections.
Now, you can change the default stack size in the PE using the {$M minStackSize [,maxStackSize]} directive, but that will affect all threads, including the main thread. I hope you don't do much recursion, because 48K or (similar) isn't a lot of space.
Now, whether Paul is right about non-performance of async I/O for Windows in particular, I'm not 100% sure - I'd have to measure it to be certain. What I do know, however, is that arguments about threaded programming being easier than async event-based programming, are presenting a false dichotomy.
Async code doesn't need to be event-based; it can be continuation-based, like it is in .NET, and if you specify a closure as your continuation, you get state maintained for you for free. Moreover, conversion from linear thread-style code to continuation-passing-style async code can be made mechanical by a compiler (CPS transform is mechanical), so there need be no cost in code clarity either.