Here\'s an example of a program, where coroutines really help to simplify the algorithm - imho its hardly possible to implement otherwise. I also tried to choose a useful ta
Implementing coroutines portably its a difficult task. Please consider using Boost.coroutine candidate. Here are updates to the library.
I've used it on OS X and Linux quite a bit together with boost::asio and they've proven to be very robustly implemented and a very useful abstraction of threads with the deterministic behavior of a sequential program
I don't know why it hasn't yet been added to the main boost distribution. My guess is there some political argument disguised as a technical one behind that fact, although you are encouraged to take my paranoia with a grain of salt
EDIT: there is a new boost candidate in the boost vault called Boost.Context, and its part of a larger library called Boost.Fiber. It doesn't have a webpage yet so i won't link it here. It seems to have better support
Ok, here's what I actually asked about (see [1]) - the trick to statically call a function from child class. tl;dr is apparently a mighty power, so here's your readable standard coroutine fibonacci generator this time. There's a small difference though - we don't really need coroutines to generate these numbers, but its really hard (if possible) to make a faster implementation of my first program without coroutines.
#include <stdio.h>
#include <stddef.h>
#include <setjmp.h>
// without noinline some compilers tend to allocate the array before setjmp()
#ifdef __GNUC__
#define NOINLINE __attribute__((noinline))
#else
#define NOINLINE __declspec(noinline)
#endif
enum{ STKPAD=1<<16 };
struct coroutine {
volatile unsigned state;
jmp_buf PointA, PointB;
void yield( int value ) { if( setjmp(PointB)==0 ) { state=value; longjmp(PointA,value); } }
template <typename T> NOINLINE void call_do_process() {
char stktmp[STKPAD]; state=ptrdiff_t(stktmp); ((T*)this)->do_process();
}
template <typename T> unsigned coro_process( T* ) {
if( setjmp(PointA)==0 ) if( state ) longjmp(PointB,3); else call_do_process<T>();
return state;
}
};
struct fibonacci : coroutine {
void do_process( void ) {
unsigned a=0,b=1;
while(1) {
yield( b );
b = b + a;
a = b - a;
}
}
unsigned get( void ) {
return coro_process(this);
}
} F;
int main( int argc, char** argv ) {
for( int i=0; i<20; i++ ) {
printf( "%i ", F.get() );
} printf( "\n" );
return 0;
}
And since Jerry Coffin's alternative version still fails to produce sensible results, here're some simpler stream benchmarks. Its a pity, as I'd expect it to be even slower with iterators.
In fact I've tested all kinds of approaches with arithmetic coders - plain getc/putc, virtual methods, plain functions pointers, iterator-like classes, and its clear that there's a large difference. For now, coroutines proved to be the best way for this - there's no complex logic encapsulated into byte i/o calls (unlike iterators), and the processing doesn't have to care about i/o details. Sure, there're even further optimizations, but I really only tried to demonstrate the benefits of coroutine approach here...
#define _CRT_SECURE_NO_DEPRECATE
#define _CRT_DISABLE_PERFCRIT_LOCKS
#include <stdio.h>
#include <time.h>
#include <fstream>
int main( int argc, char** argv ) {
if( argc<3 ) return 1;
{
clock_t start = clock();
FILE* f = fopen( argv[1], "rb" ); if( f==0 ) return 2;
FILE* g = fopen( argv[2], "wb" ); if( g==0 ) return 3;
while(1) {
int c = getc(f);
if( c<0 ) break;
putc(c,g);
}
fclose(f);
fclose(g);
clock_t stop = clock();
printf( " File copy via stdio getc/putc - %7.3fs\n", float(stop-start)/CLOCKS_PER_SEC );
}
{
clock_t start = clock();
FILE* f = fopen( argv[1], "rb" ); if( f==0 ) return 2;
FILE* g = fopen( argv[2], "wb" ); if( g==0 ) return 3;
while(1) {
static char buf[1<<16];
int l = fread( buf, 1,sizeof(buf), f ); if( l<=0 ) break;
fwrite( buf, 1,l, g ); if( l<sizeof(buf) ) break;
}
fclose(f);
fclose(g);
clock_t stop = clock();
printf( " File copy via stdio 64k fread/fwrite - %7.3fs\n", float(stop-start)/CLOCKS_PER_SEC );
}
{
clock_t start = clock();
std::ifstream f(argv[1],std::ios::in|std::ios::binary); if( !f.is_open() ) return 2;
std::ofstream g(argv[2],std::ios::out|std::ios::binary); if( !g.is_open() ) return 3;
while(1) {
int c = f.get();
if( c<0 ) break;
g.put(c);
}
f.close();
g.close();
clock_t stop = clock();
printf( "File copy via ifstream::get/ofstream::put - %.3fs\n", float(stop-start)/CLOCKS_PER_SEC );
}
}
----- 100,000,000 byte file ----- [ GCC 4.5 ] File copy via stdio getc/putc - 0.546s File copy via stdio 64k fread/fwrite - 0.188s File copy via ifstream::get/ofstream::put - 10.578s [ IntelC 11.1 / VS 2005 ] File copy via stdio getc/putc - 0.500s File copy via stdio 64k fread/fwrite - 0.156s File copy via ifstream::get/ofstream::put - 14.656s [ MSC 14.0 / VS 2005 ] File copy via stdio getc/putc - 0.609s File copy via stdio 64k fread/fwrite - 0.156s File copy via ifstream::get/ofstream::put - 19.063s ----- 1,000,000,000 byte file ----- [ GCC 4.5 ] File copy via stdio getc/putc - 7.468s File copy via stdio 64k fread/fwrite - 1.828s File copy via ifstream::get/ofstream::put - 109.891s [ IntelC 11.1 / VS 2005 ] File copy via stdio getc/putc - 6.718s File copy via stdio 64k fread/fwrite - 1.672s File copy via ifstream::get/ofstream::put - 145.500s [ MSC 14.0 / VS 2005 ] File copy via stdio getc/putc - 6.453s File copy via stdio 64k fread/fwrite - 1.609s File copy via ifstream::get/ofstream::put - 191.031s
Just for grins, here's a rough idea of how I'd handle the part that just encodes to/decodes from an arbitrary alphabet. As promised, the actual encoding/decoding is around a dozen lines of code. The overall size is larger, largely because I've used templates throughout, so the numbers can be an arbitrary integer type, and the characters can be an arbitrary character type, and it uses iterators for both, so it can read from/write to arbitrary collections (streams, stringstreams, vectors, etc.)
Edit: modified code to read input from a file and write output to a file (and fixed a minor error or two):
#include <iterator>
#include <iostream>
#include <string>
#include <limits>
#include <vector>
#include <fstream>
#include <time.h>
#include <math.h>
#include <stdlib.h>
template <class intT>
intT log2(intT input) {
return intT(log10((double)input) / log10(2.0));
}
template <class intT>
class coder {
std::string alphabet;
size_t range;
unsigned ratio;
public:
coder(std::string const &alpha) : alphabet(alpha), range(alpha.size()) {
ratio = ceil(double(log2(std::numeric_limits<intT>::max())/log2(range)));
}
template <class inIt, class outIt>
void encode(inIt begin, inIt end, outIt out) {
while (begin != end) {
intT val = *begin++;
for (int i=0; i<ratio; i++) {
*out++ = alphabet[val % range];
val /= range;
}
}
}
template <class inIt, class outIt>
void decode(inIt begin, inIt end, outIt out) {
while (begin != end) {
int temp = 0;
for (int i=0; i<ratio; i++)
temp += alphabet.find(*begin++) * pow((double)range, i);
*out++ = temp;
}
}
};
int main(int argc, char **argv) {
if (argc != 3) {
std::cerr << "Usage: encode <infile> <outfile>\n";
return EXIT_FAILURE;
}
coder<unsigned> enc("ABCDEFGHIJKLMNOPQRSTUVWXYZ");
std::ifstream in(argv[1], std::ios::binary);
std::ofstream out(argv[2]);
clock_t start = clock();
enc.encode(std::istream_iterator<char>(in),
std::istream_iterator<char>(),
std::ostream_iterator<char>(out, ""));
clock_t stop = clock();
std::cerr << "Processing time: " << double(stop-start)/CLOCKS_PER_SEC << "\n";
return 0;
}
At least for the moment, I've ignored the arithmetic encoding part, but it should (at least IMO) follow a similar structure so you could pretty easily string things together more or less arbitrarily.
As far as comparing speed and size goes, keep in mind that this isn't doing any compression (at all) just the baseX encoding -- that being the case, attempting to compare to something that does compression makes no real sense (except, for example, to get an idea of how effective the compression is -- but if it's effective at all, it'll obviously produce smaller output).
As far as executable size goes, about all I can say is that gcc producing large executables never surprises me. Using MS VC++, I get an executable of 9,728 bytes for the code above.