Chasing the Scanline

In a traditional rendering pipeline, each triangle is drawn individually and completely to a framebuffer. When all triangles are drawn the content of the framebuffer is sent to the display and the triangles for the next frame are rendered.

Unfortunately for QuickSilver, the Nexys 2 simply does not have the memory bandwidth required to render each and every triangle on-screen individually and then load the complete framebuffer for display.

Which is why we're not going to use a framebuffer.

Instead, we're going to send the rendered pixels straight to the display. Unlike a framebuffer, however, a typical display device does not allow random access; the pixels must be written in a certain order: left-to-right, top-to-bottom. And the pixels we send to the display are final, no overwriting allowed. So in order to send a pixel to the screen, we must have first sampled every single triangle that covers this pixel. Which gives us our first requirement:

The information of every single triangle that covers the rendered area must be present at the same time.

Although the Spartan-3E FPGA on the Nexys-2 doesn't have enough space for an entire framebuffer, it can easily fit a few scanlines. That way we don't have render in exact left-to-right order, and we can render an entire line for each triangle, instead of just a single pixel, which helps to reduce overhead. Triangles do tend to be taller than a single scanline, so we'll need to remember the triangle data until it has been fully rendered:

Triangle data must be buffered until the triangle is fully rendered.

With scenes easily containing many thousands of triangles, that buffer needs to be quite large if we store every random triangle in the rendering pipeline, and memory space and bandwidth are exactly the things the Nexys 2 lacks. Fortunately, we can optimise. By sorting the incoming triangles top-to-bottom only the triangles that cover the current scanline need to be stored. Furthermore, if a triangle is found in the buffer that is completely below the current scanline then we can safely skip all triangles that follow. This saves additional overhead for loading and checking each triangle.

Incoming triangle data must be sorted top-to-bottom based on their topmost Y-coordinate.

The scanline rendering architecture is starting to take shape:

Triangles are sorted based on their topmost Y-coordinate.
Triangles are stored in a buffer.
Scanlines are drawn top-to-bottom.
Triangles are loaded from the buffer and a single scanline is rendered.
Triangles that have been completely rendered are removed from the buffer.
If a triangle is found in the buffer that is completely below the current scanline, it and all following triangles are skipped.
When the end of the triangle buffer has been reached, the current scanline is complete and is sent to the screen. The next scanline starts rendering.

Scanline rendering differs greatly from traditional framebuffer rendering. This has several advantages, and limitations:

Advantages:

No Framebuffer. Memory footprint is greatly reduced because of this.
Guaranteed 60Hz. Everything is sent straight to the screen, so no dropped frames and no tearing.
Cheaper Z-buffering. We only need a single scanline of Z-buffer, instead of the entire frame.

Disadvantages:

No Framebuffer. Framebuffer effects, complex alpha, environment mapping and multi-pass rendering are all out.
Guaranteed 60Hz. Sounds great, but what happens if we go slightly over our render time? It's impossible to slightly delay the frame and take a bit more time to render it. Instead, triangles will be skipped and visual artefacts will be visible. Because everything is rendered per-scanline, triangle density becomes an important factor too.
Hard triangle limitations. In addition to the timing issues mentioned above, the triangle buffer itself poses a hard limit on the number of triangles that can be rendered on each scanline. In practice, the render time imposes the stricter limit.

In the next post I'll talk a bit about the basic formula behind the triangle rasterization.

The platform: Why it can't do 3D, and how we'll do it anyway

Primitive Rasterization

Discussions

Become a Hackaday.io Member