I\'m looking for deep understanding of how WebGL works. I\'m wanting to gain knowledge at a level that most people care less about, because the knowledge isn\'t necessary us
I would read these articles
http://webglfundamentals.org/webgl/lessons/webgl-how-it-works.html
Assuming those articles are helpful, the rest of the picture is that WebGL runs in a browser. It renderers to a canvas tag. You can think of a canvas tag like an img tag except you use the WebGL API to generate an image instead of download one.
Like other HTML5 tags the canvas tag can be styled with CSS, be under or over other parts of the page. Is composited (blended) with other parts of the page. Be transformed, rotated, scaled by CSS along with other parts of the page. That's a big difference from OpenGL or OpenGL ES.
Hopefully this little write-up is helpful to you. It overviews a big chunk of what I've learned about WebGL and 3D in general. BTW, if I've gotten anything wrong, somebody please correct me -- because I'm still learning, too!
The browser is just that, a Web browser. All it does is expose the WebGL API (via JavaScript), which the programmer does everything else with.
As near as I can tell, the WebGL API is essentially just a set of (browser-supplied) JavaScript functions which wrap around the OpenGL ES specification. So if you know OpenGL ES, you can adopt WebGL pretty quickly. Don't confuse this with pure OpenGL, though. The "ES" is important.
The WebGL spec was intentionally left very low-level, leaving a lot to be re-implemented from one application to the next. It is up to the community to write frameworks for automation, and up to the developer to choose which framework to use (if any). It's not entirely difficult to roll your own, but it does mean a lot of overhead spent on reinventing the wheel. (FWIW, I've been working on my own WebGL framework called Jax for a while now.)
The graphics driver supplies the implementation of OpenGL ES that actually runs your code. At this point, it's running on the machine hardware, below even the C code. While this is what makes WebGL possible in the first place, it's also a double edged sword because bugs in the OpenGL ES driver (which I've noted quite a number of already) will show up in your Web application, and you won't necessarily know it unless you can count on your user base to file coherent bug reports including OS, video hardware and driver versions. Here's what the debug process for such issues ends up looking like.
On Windows, there's an extra layer which exists between the WebGL API and the hardware: ANGLE, or "Almost Native Graphics Layer Engine". Because the OpenGL ES drivers on Windows generally suck, ANGLE receives those calls and translates them into DirectX 9 calls instead.
Now that you know how the pieces come together, let's look at a lower level explanation of how everything comes together to produce a 3D image.
First, the JavaScript code gets a 3D context from an HTML5 canvas element. Then it registers a set of shaders, which are written in GLSL ([Open] GL Shading Language) and essentially resemble C code.
The rest of the process is very modular. You need to get vertex data and any other information you intend to use (such as vertex colors, texture coordinates, and so forth) down to the graphics pipeline using uniforms and attributes which are defined in the shader, but the exact layout and naming of this information is very much up to the developer.
JavaScript sets up the initial data structures and sends them to the WebGL API, which sends them to either ANGLE or OpenGL ES, which ultimately sends it off to the graphics hardware.
Once the information is available to the shader, the shader must transform the information in 2 phases to produce 3D objects. The first phase is the vertex shader, which sets up the mesh coordinates. (This stage runs entirely on the video card, below all of the APIs discussed above.) Most usually, the process performed on the vertex shader looks something like this:
gl_Position = PROJECTION_MATRIX * VIEW_MATRIX * MODEL_MATRIX * VERTEX_POSITION
where VERTEX_POSITION
is a 4D vector (x, y, z, and w which is usually set to 1); VIEW_MATRIX
is a 4x4 matrix representing the camera's view into the world; MODEL_MATRIX
is a 4x4 matrix which transforms object-space coordinates (that is, coords local to the object before rotation or translation have been applied) into world-space coordinates; and PROJECTION_MATRIX
which represents the camera's lens.
Most often, the
VIEW_MATRIX
andMODEL_MATRIX
are precomputed and calledMODELVIEW_MATRIX
. Occasionally, all 3 are precomputed intoMODELVIEW_PROJECTION_MATRIX
or justMVP
. These are generally meant as optimizations, though I'd like find time to do some benchmarks. It's possible that precomputing is actually slower in JavaScript if it's done every frame, because JavaScript itself isn't all that fast. In this case, the hardware acceleration afforded by doing the math on the GPU might well be faster than doing it on the CPU in JavaScript. We can of course hope that future JS implementations will resolve this potential gotcha by simply being faster.
When all of these have been applied, the gl_Position
variable will have a set of XYZ coordinates ranging within [-1, 1], and a W component. These are called clip coordinates.
It's worth noting that clip coordinates is the only thing the vertex shader really needs to produce. You can completely skip the matrix transformations performed above, as long as you produce a clip coordinate result. (I have even experimented with swapping out matrices for quaternions; it worked just fine but I scrapped the project because I didn't get the performance improvements I'd hoped for.)
After you supply clip coordinates to gl_Position
WebGL divides the result by gl_Position.w
producing what's called normalized device coordinates.
From there, projecting a pixel onto the screen is a simple matter of multiplying by 1/2 the screen dimensions and then adding 1/2 the screen dimensions.[1] Here are some examples of clip coordinates translated into 2D coordinates on an 800x600 display:
clip = [0, 0]
x = (0 * 800/2) + 800/2 = 400
y = (0 * 600/2) + 600/2 = 300
clip = [0.5, 0.5]
x = (0.5 * 800/2) + 800/2 = 200 + 400 = 600
y = (0.5 * 600/2) + 600/2 = 150 + 300 = 450
clip = [-0.5, -0.25]
x = (-0.5 * 800/2) + 800/2 = -200 + 400 = 200
y = (-0.25 * 600/2) + 600/2 = -150 + 300 = 150
Once it's been determined where a pixel should be drawn, the pixel is handed off to the pixel shader, which chooses the actual color the pixel will be. This can be done in a myriad of ways, ranging from simply hard-coding a specific color to texture lookups to more advanced normal and parallax mapping (which are essentially ways of "cheating" texture lookups to produce different effects).
Now, so far we've ignored the Z component of the clip coordinates. Here's how that works out. When we multiplied by the projection matrix, the third clip component resulted in some number. If that number is greater than 1.0 or less than -1.0, then the number is beyond the view range of the projection matrix, corresponding to the matrix zFar and zNear values, respectively.
So if it's not in the range [-1, 1] then it's clipped entirely. If it is in that range, then the Z value is scaled to 0 to 1[2] and is compared to the depth buffer[3]. The depth buffer is equal to the screen dimensions, so that if a projection of 800x600 is used, the depth buffer is 800 pixels wide and 600 pixels high. We already have the pixel's X and Y coordinates, so they are plugged into the depth buffer to get the currently stored Z value. If the Z value is greater than the new Z value, then the new Z value is closer than whatever was previously drawn, and replaces it[4]. At this point it's safe to light up the pixel in question (or in the case of WebGL, draw the pixel to the canvas), and store the Z value as the new depth value.
If the Z value is greater than the stored depth value, then it is deemed to be "behind" whatever has already been drawn, and the pixel is discarded.
[1]The actual conversion uses the gl.viewport
settings to convert from normalized device coordinates to pixels.
[2]It's actually scaled to the gl.depthRange
settings. They default 0 to 1.
[3]Assuming you have a depth buffer and you've turned on depth testing with gl.enable(gl.DEPTH_TEST)
.
[4]You can set how Z values are compared with gl.depthFunc