I\'m just playing around with FireMonkey to see if graphical painting is any faster than GDI or Graphics32 (my library of choice at the moment).
To see how fast it is, I
Summary: Antialiasing subpixel thickness lines is hard work and requires a number of dirty tricks to output what we intuitively expect to see.
The extra effort you're seeing is almost certainly due to antialiasing. When the line thickness is less than one pixel and the line doesn't sit squarely at the center of a row of device pixels, every pixel drawn for the line will be a partial brightness pixel. To make sure that those partial values are bright enough so that the line doesn't disappear, more work is required.
Since video signals operate on a horizontal sweep (think CRT, not LCD), graphics operations traditionally focus on processing things one horizontal scanline at a time.
Here's my guess:
To solve certain sticky problems, rasterizers sometimes "nudge" lines so that more of their virtual pixels align with device pixels. If a .25 pixel thick horizontal line is exactly half way between device scanline A and B, that line may completely disappear because it doesn't register strongly enough to light up any pixels in scanline A or B. So, the rasterizer might nudge the line "down" a tiny bit in virtual coordinates so that it will align with scanline B device pixels and produce a nice strongly lit horizontal line.
The same can be done for vertical lines, but probably isn't if your graphics card/driver is hyperfocused on horizontal scanline operations (as many are).
So, in this scenario, a horizontal line would render very fast because there's no antialiasing to be performed at all, and it can all be done in one scanline.
A vertical line would require antialiasing analysis for every horizontal scanline that crosses the line. The rasterizer may have a special case for vertical lines to only consider the left and right pixels to calculate antialiasing values.
A diagonal line has no shortcuts. It has jaggies everywhere, so there is plenty of antialiasing work to do throughout. The antialias calculation must consider (subsample) a whole matrix of points (at least 4, probably 8) around the target point to decide how much of a partial value to give the device pixel. The matrix can be simplified or eliminated entirely for vertical or horizontal lines, but not for diagonals.
There is an additional item that is really only a concern for sub-pixel thickness lines: how do we avoid the subpixel thickness line from disappearing entirely or having noticeable gaps where the line does not cross the center of a device pixel? It is likely that after the antialias values are calculated on a scanline, if there is no clear "signal" or sufficiently lit device pixel caused by the virtual line, the rasterizer hast to go back and "try harder" or apply some boosting heuristics to get a stronger signal to floor ratio so that the device pixels representing the virtual line are tangible and continuous.
Two adjacent device pixels at 40% brightness is ok. If the only rasterizer output for the scanline is two adjacent pixels at 5%, the eye will perceive a gap in the line. Not ok.
When the line is more than 1.5 device pixels in thickness, you will always have at least one well lit device pixel on every scanline and don't need to go back and try harder.
Why is 1.5 the magic number for line thickness? Ask Pythagoras. If your device pixel is 1 unit in width and height, then the length of the diagonal of the square device pixel is sqrt(1^2 + 1^2) = sqrt(2) = 1.41ish. When your line thickness is greater than the length of the diagonal of a device pixel, you should always have at least one "well lit" pixel in the scanline output no matter what the angle of the line.
That's my theory, anyway.
In other libraries there seem to be fast algorithms for single lines, and thick lines are slower because a polygon is created first, so why is FireMonkey the other way around?
In Graphics32, Bresenham's line algorithm is used to speed up lines that are drawn with a 1px width and that should definitely be fast. FireMonkey does not have its own native rasterizer, instead it delegates painting operations to other APIs (in Windows, it will delegate to either Direct2D or GDI+.)
What you are observing is in fact the performance of the Direct2D rasterizer and I can confirm that I've made similar observations previously (I've benchmarked many different rasterizers.) Here's a post that talks specifically about the performance of the Direct2D rasterizer (btw, it's not a general rule that thin lines are drawn slower, especially not in my own rasterizer):
http://www.graphics32.org/news/newsgroups.php?article_id=10249
As you can see from the graph, Direct2D has very good performance for ellipses and thick lines, but much worse peformance in the other benchmarks (where my own rasterizer is faster.)
I mostly need single-pixel lines, so should I paint lines in a different way maybe?
I implemented a new FireMonkey backend (a new TCanvas descendent), that relies on my own rasterizer engine VPR. It should be faster than Direct2D for thin lines and for text (even though it's using polygonal rasterization techniques.) There may still be some caveats that need to be addressed in order to make it work 100% seamlessly as a Firemonkey backend. More info here:
http://graphics32.org/news/newsgroups.php?article_id=11565