Prologue
This subject pops up here on SO from time to time, but is removed usually because of being a poorly written question. I saw many such questions and then silence from the OP (usual low rep) when additional info is requested. From time to time if the input is good enough for me I decide to respond with an answer and it usually gets a few up-votes per day while active but then after a few weeks the question gets removed/deleted and all starts from the beginning. So I decided to write this Q&A so I can reference such questions directly without rewriting the answer over and over again …
Another reason is also this META thread targeted at me so if you got additional input feel free to comment.
Question
How to convert bitmap image to ASCII art using C++ ?
Some constraints:
- gray scale images
- using mono-spaced fonts
- keeping it simple (not using too advanced stuff for beginner level programmers)
Here is a related Wiki page ASCII art (thanks to @RogerRowland)
There are more approaches for image to ASCII art conversion which are mostly based on using mono-spaced fonts for simplicity I stick only to basics:
pixel/area intensity based (Shading)
This approach handles each pixel of area of pixels as single dot. The idea is to compute the average gray scale intensity of this dot and then replace it with character with close enough intensity to the computed one. For that we need some list of usable characters each with precomputed intensity let call it character map
. To choose more quickly which character is the best for which intensity there are two ways:
linearly distributed intensity character map
So we use only characters which have intensity difference with the same step. In other words when sorted ascending then:
intensity_of(map[i])=intensity_of(map[i-1])+constant;
Also when our character
map
is sorted then we can compute the character directly from intensity (no search needed)character=map[intensity_of(dot)/constant];
arbitrary distributed intensity character map
So we have array of usable characters and their intensities. We need to find intensity closest to the
intensity_of(dot)
So again if we sorted themap[]
we can use binary search otherwise we needO(n)
search min distance loop orO(1)
dictionary. Sometimes for simplicity the charactermap[]
can be handled as linearly distributed causing slight gamma distortion usually unseen in the result unless you know what to look for.
Intensity based conversion is great also for gray-scale images (not just black and white). If you select the dot as a single pixel the result gets large (1 pixel -> single character) so for larger images an area (multiply of font size) is selected instead to preserve aspect ratio and do not enlarge too much.
How to do it:
- so evenly divide image to (gray-scale)pixels or (rectangular) areas dot's
- compute the intensity of each pixel/area
- replace it by character from character map with the closest intensity
As character map
you can use any characters but the result gets better if the character has pixels dispersed evenly along the character area. For starters you can use:
char map[10]=" .,:;ox%#@";
sorted descending and pretend to be linearly distributed.
So if intensity of pixel/area is i = <0-255>
then the replacement character will be
map[(255-i)*10/256];
if i==0
then the pixel/area is black, if i==127
then the pixel/area is gray and if i==255
then the pixel/area is white. You can experiment with different characters inside map[]
...
Here ancient example of mine in C++ and VCL:
AnsiString m=" .,:;ox%#@";
Graphics::TBitmap *bmp=new Graphics::TBitmap;
bmp->LoadFromFile("pic.bmp");
bmp->HandleType=bmDIB;
bmp->PixelFormat=pf24bit;
int x,y,i,c,l;
BYTE *p;
AnsiString s,endl;
endl=char(13); endl+=char(10);
l=m.Length();
s="";
for (y=0;y<bmp->Height;y++)
{
p=(BYTE*)bmp->ScanLine[y];
for (x=0;x<bmp->Width;x++)
{
i =p[x+x+x+0];
i+=p[x+x+x+1];
i+=p[x+x+x+2];
i=(i*l)/768;
s+=m[l-i];
}
s+=endl;
}
mm_log->Lines->Text=s;
mm_log->Lines->SaveToFile("pic.txt");
delete bmp;
you need to replace/ignore VCL stuff unless you use Borland/Embarcadero environment
mm_log
is memo where the text is outputtedbmp
is input bitmapAnsiString
is VCL type string indexed form 1 not from 0 aschar*
!!!
this is the result: Slightly NSFW intensity example image
On the left is ASCII art output (font size 5px), and on the right input image Zoomed few times. As you can see the output is larger pixel -> character. if you use larger areas instead of pixels then the zoom is smaller but of course the output is less visually pleasing. This approach is very easy and fast to code/process.
When you add more advanced things like:
- automated map computations
- automatic pixel/area size selection
- aspect ratio corrections
Then you can process more complex images with better results:
here result in 1:1 ratio (zoom to see the characters):
Of course for area sampling you lose the small details. This is image of the same size as the first example sampled with areas:
Slightly NSFW intensity advanced example image
As you can see this is more suited for bigger images
Character fitting (hybrid between Shading and Solid ASCII Art)
This approach tries to replace area (no more single pixel dots) with character with similar intensity and shape. This lead to better results even with bigger fonts used in comparison with previous approach on the other hand this approach is a bit slower of course. There are more ways to do this but the main idea is to compute the difference (distance) between image area (dot
) and rendered character. You can start with naive sum of abs difference between pixels but that will lead to not very good results because even a 1 pixel shift will make the distance big, instead you can use correlation or different metrics. The overall algorithm is the almost the same as previous approach:
- so evenly divide image to (gray-scale) rectangular areas dot's
- ideally with the same aspect ratio as rendered font characters (it will preserve aspect ratio, do not forget that characters usually overlap a bit in x axis)
- compute the intensity of each area (
dot
) - replace it by character from character
map
with the closest intensity/shape
How to compute distance between character and dot? That is the hardest part of this approach. While experimenting I develop this compromise between speed, quality, and simpleness:
Divide character area to zones
- compute separate intensity for left, right, up, down, and center zone of each character from your conversion alphabet (
map
) - normalize all intensities so they are independent on area size
i=(i*256)/(xs*ys)
- compute separate intensity for left, right, up, down, and center zone of each character from your conversion alphabet (
process source image in rectangle areas
- (with the same aspect ratio as target Font)
- for each area compute intensity in the same manner as in bullet 1
- find the closest match from intensities in conversion alphabet
- output fitted character
This is result for font size = 7px
As you can see the output is visually pleasing even with bigger font size used (the previous approach example was with 5px font size). The output is roughly the same size as input image (no zoom). The better results are achieved because the characters are closer to original image not only by intensity but also by overall shape and therefore you can use larger fonts and still preserving details (up to a point of coarse).
Here complete code for the VCL based conversion app:
//---------------------------------------------------------------------------
#include <vcl.h>
#pragma hdrstop
#include "win_main.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
Graphics::TBitmap *bmp=new Graphics::TBitmap;
//---------------------------------------------------------------------------
class intensity
{
public:
char c; // character
int il,ir,iu,id,ic; // intensity of part: left,right,up,down,center
intensity() { c=0; reset(); }
void reset() { il=0; ir=0; iu=0; id=0; ic=0; }
void compute(DWORD **p,int xs,int ys,int xx,int yy) // p source image, (xs,ys) area size, (xx,yy) area position
{
int x0=xs>>2,y0=ys>>2;
int x1=xs-x0,y1=ys-y0;
int x,y,i;
reset();
for (y=0;y<ys;y++)
for (x=0;x<xs;x++)
{
i=(p[yy+y][xx+x]&255);
if (x<=x0) il+=i;
if (x>=x1) ir+=i;
if (y<=x0) iu+=i;
if (y>=x1) id+=i;
if ((x>=x0)&&(x<=x1)
&&(y>=y0)&&(y<=y1)) ic+=i;
}
// normalize
i=xs*ys;
il=(il<<8)/i;
ir=(ir<<8)/i;
iu=(iu<<8)/i;
id=(id<<8)/i;
ic=(ic<<8)/i;
}
};
//---------------------------------------------------------------------------
AnsiString bmp2txt_big(Graphics::TBitmap *bmp,TFont *font) // charcter sized areas
{
int i,i0,d,d0;
int xs,ys,xf,yf,x,xx,y,yy;
DWORD **p=NULL,**q=NULL; // bitmap direct pixel access
Graphics::TBitmap *tmp; // temp bitmap for single character
AnsiString txt=""; // output ASCII art text
AnsiString eol="\r\n"; // end of line sequence
intensity map[97]; // character map
intensity gfx;
// input image size
xs=bmp->Width;
ys=bmp->Height;
// output font size
xf=font->Size; if (xf<0) xf=-xf;
yf=font->Height; if (yf<0) yf=-yf;
for (;;) // loop to simplify the dynamic allocation error handling
{
// allocate and init buffers
tmp=new Graphics::TBitmap; if (tmp==NULL) break;
// allow 32bit pixel access as DWORD/int pointer
tmp->HandleType=bmDIB; bmp->HandleType=bmDIB;
tmp->PixelFormat=pf32bit; bmp->PixelFormat=pf32bit;
// copy target font properties to tmp
tmp->Canvas->Font->Assign(font);
tmp->SetSize(xf,yf);
tmp->Canvas->Font ->Color=clBlack;
tmp->Canvas->Pen ->Color=clWhite;
tmp->Canvas->Brush->Color=clWhite;
xf=tmp->Width;
yf=tmp->Height;
// direct pixel access to bitmaps
p =new DWORD*[ys]; if (p ==NULL) break; for (y=0;y<ys;y++) p[y]=(DWORD*)bmp->ScanLine[y];
q =new DWORD*[yf]; if (q ==NULL) break; for (y=0;y<yf;y++) q[y]=(DWORD*)tmp->ScanLine[y];
// create character map
for (x=0,d=32;d<128;d++,x++)
{
map[x].c=char(DWORD(d));
// clear tmp
tmp->Canvas->FillRect(TRect(0,0,xf,yf));
// render tested character to tmp
tmp->Canvas->TextOutA(0,0,map[x].c);
// compute intensity
map[x].compute(q,xf,yf,0,0);
} map[x].c=0;
// loop through image by zoomed character size step
xf-=xf/3; // characters are usually overlaping by 1/3
xs-=xs%xf;
ys-=ys%yf;
for (y=0;y<ys;y+=yf,txt+=eol)
for (x=0;x<xs;x+=xf)
{
// compute intensity
gfx.compute(p,xf,yf,x,y);
// find closest match in map[]
i0=0; d0=-1;
for (i=0;map[i].c;i++)
{
d=abs(map[i].il-gfx.il)
+abs(map[i].ir-gfx.ir)
+abs(map[i].iu-gfx.iu)
+abs(map[i].id-gfx.id)
+abs(map[i].ic-gfx.ic);
if ((d0<0)||(d0>d)) { d0=d; i0=i; }
}
// add fitted character to output
txt+=map[i0].c;
}
break;
}
// free buffers
if (tmp) delete tmp;
if (p ) delete[] p;
return txt;
}
//---------------------------------------------------------------------------
AnsiString bmp2txt_small(Graphics::TBitmap *bmp) // pixel sized areas
{
AnsiString m=" `'.,:;i+o*%&$#@"; // constant character map
int x,y,i,c,l;
BYTE *p;
AnsiString txt="",eol="\r\n";
l=m.Length();
bmp->HandleType=bmDIB;
bmp->PixelFormat=pf32bit;
for (y=0;y<bmp->Height;y++)
{
p=(BYTE*)bmp->ScanLine[y];
for (x=0;x<bmp->Width;x++)
{
i =p[(x<<2)+0];
i+=p[(x<<2)+1];
i+=p[(x<<2)+2];
i=(i*l)/768;
txt+=m[l-i];
}
txt+=eol;
}
return txt;
}
//---------------------------------------------------------------------------
void update()
{
int x0,x1,y0,y1,i,l;
x0=bmp->Width;
y0=bmp->Height;
if ((x0<64)||(y0<64)) Form1->mm_txt->Text=bmp2txt_small(bmp);
else Form1->mm_txt->Text=bmp2txt_big (bmp,Form1->mm_txt->Font);
Form1->mm_txt->Lines->SaveToFile("pic.txt");
for (x1=0,i=1,l=Form1->mm_txt->Text.Length();i<=l;i++) if (Form1->mm_txt->Text[i]==13) { x1=i-1; break; }
for (y1=0,i=1,l=Form1->mm_txt->Text.Length();i<=l;i++) if (Form1->mm_txt->Text[i]==13) y1++;
x1*=abs(Form1->mm_txt->Font->Size);
y1*=abs(Form1->mm_txt->Font->Height);
if (y0<y1) y0=y1; x0+=x1+48;
Form1->ClientWidth=x0;
Form1->ClientHeight=y0;
Form1->Caption=AnsiString().sprintf("Picture -> Text ( Font %ix%i )",abs(Form1->mm_txt->Font->Size),abs(Form1->mm_txt->Font->Height));
}
//---------------------------------------------------------------------------
void draw()
{
Form1->ptb_gfx->Canvas->Draw(0,0,bmp);
}
//---------------------------------------------------------------------------
void load(AnsiString name)
{
bmp->LoadFromFile(name);
bmp->HandleType=bmDIB;
bmp->PixelFormat=pf32bit;
Form1->ptb_gfx->Width=bmp->Width;
Form1->ClientHeight=bmp->Height;
Form1->ClientWidth=(bmp->Width<<1)+32;
}
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
{
load("pic.bmp");
update();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormDestroy(TObject *Sender)
{
delete bmp;
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormPaint(TObject *Sender)
{
draw();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormMouseWheel(TObject *Sender, TShiftState Shift,int WheelDelta, TPoint &MousePos, bool &Handled)
{
int s=abs(mm_txt->Font->Size);
if (WheelDelta<0) s--;
if (WheelDelta>0) s++;
mm_txt->Font->Size=s;
update();
}
//---------------------------------------------------------------------------
It is simple form app (Form1
) with single TMemo mm_txt
in it. It loads image "pic.bmp"
, then according to resolution choose which approach to use to converts to text which is saved to "pic.txt"
and sent to memo to visualize. For those without VCL ignore the VCL stuff and replace AnsiString
with any string type you have, and also the Graphics::TBitmap
with any bitmap or image class you have at disposal with pixel access capability.
Very important note is that this uses the settings of mm_txt->Font
so make sure you set:
Font->Pitch=fpFixed
Font->Charset=OEM_CHARSET
Font->Name="System"
to make this work properly otherwise the font will not be handled as mono-spaced. Mouse wheel just changes the font size up/down to see results on different font sizes
[Notes]
- see Word Portraits visualization
- use language with bitmap/file access and text output capabilities
- strongly recommend to start with the first approach as it is very easy strait forward and simple, and only then move to the second (which can be done as modification of the first so most of the code stays as is anyway)
- It is a good idea to compute with inverted intensity (black pixels is the max value) because standard text preview is on white background hence leading to much better results.
- you can experiment with size,count and layout of the subdivision zones or use some grid like
3x3
instead.
[Edit1] comparison
Finally here is a comparison between the two approaches on the same input:
The green dot marked images are done with approach #2 and the red ones with #1 all on 6
pixel font size. As you can see on the Light bulb image the shape sensitive approach is much better (even if the #1 is done on 2x zoomed source image).
[Edit2] cool app
While reading todays new questions I got an Idea of an cool app that grabs selected region of desktop and continuously feed it to ASCIIart convertor and view the result. After an hour of coding it's done and I am so satisfied with the result that I simply must have to add it here.
OK the App consist from just 2 windows. The first master window is basically my old convertor window without the image selection and preview (all the stuff above is in it). It has just the ASCII preview and conversion settings. The second window is empty form with transparent inside for the grabbing area selection (no functionality whatsoever).
Now on timer I just grab the selected area by selection form, pass it to conversion and preview the ASCIIart.
So you enclose area you want to convert by the selection window and view the result in master window. It can be a game,viewer,... It looks like this:
So now I can watch even videos in ASCIIart for fun. Some are really nice :).
[Edit3]
If you want to try to implement this in GLSL take a look at this:
来源:https://stackoverflow.com/questions/32987103/image-to-ascii-art-conversion