Close

Chasing the Scanline

A project log for QuickSilver Neo: Open Source GPU

A 3D Graphics Accelerator for FPGAs

ruud-schellekensRuud Schellekens 05/21/2016 at 21:440 Comments

In a traditional rendering pipeline, each triangle is drawn individually and completely to a framebuffer. When all triangles are drawn the content of the framebuffer is sent to the display and the triangles for the next frame are rendered.

Unfortunately for QuickSilver, the Nexys 2 simply does not have the memory bandwidth required to render each and every triangle on-screen individually and then load the complete framebuffer for display.

Which is why we're not going to use a framebuffer.

Instead, we're going to send the rendered pixels straight to the display. Unlike a framebuffer, however, a typical display device does not allow random access; the pixels must be written in a certain order: left-to-right, top-to-bottom. And the pixels we send to the display are final, no overwriting allowed. So in order to send a pixel to the screen, we must have first sampled every single triangle that covers this pixel. Which gives us our first requirement:

The information of every single triangle that covers the rendered area must be present at the same time.

Although the Spartan-3E FPGA on the Nexys-2 doesn't have enough space for an entire framebuffer, it can easily fit a few scanlines. That way we don't have render in exact left-to-right order, and we can render an entire line for each triangle, instead of just a single pixel, which helps to reduce overhead. Triangles do tend to be taller than a single scanline, so we'll need to remember the triangle data until it has been fully rendered:

Triangle data must be buffered until the triangle is fully rendered.

With scenes easily containing many thousands of triangles, that buffer needs to be quite large if we store every random triangle in the rendering pipeline, and memory space and bandwidth are exactly the things the Nexys 2 lacks. Fortunately, we can optimise. By sorting the incoming triangles top-to-bottom only the triangles that cover the current scanline need to be stored. Furthermore, if a triangle is found in the buffer that is completely below the current scanline then we can safely skip all triangles that follow. This saves additional overhead for loading and checking each triangle.

Incoming triangle data must be sorted top-to-bottom based on their topmost Y-coordinate.

The scanline rendering architecture is starting to take shape:

Scanline rendering differs greatly from traditional framebuffer rendering. This has several advantages, and limitations:

Advantages:

Disadvantages:

In the next post I'll talk a bit about the basic formula behind the triangle rasterization.

Discussions