Close

Need hardware rasterization... Architecture needs an upgrade

A project log for C70100 GPU

The Goal: 3D rendering on an FPGA

dylan-brophyDylan Brophy 10/11/2020 at 02:303 Comments

At 640x480 pixels, at 60 FPS, you have to write about 18 million pixels per second.

So each pixel needs to be rasterized and written to RAM in only 5 clock cycles (if clk=100Mhz).  There is absolutely no way I can do that in software.  Having 64 ALUs will help me calculate the positions of all the fragments in the needed time, but not actually write the fragments to RAM.

Suppose hardware rasterization is supported.  The CPU + ALUs are computing the next set of fragments while the rasterizer writes pixels to RAM.  It would barely be fast enough to handle two triangles covering the whole screen.  Of course, you could have more triangles occupying all of the screen, or several triangles in a confined space overlapping.  But what if you have a 3D world with many overlapping triangles?  It's possible for more fragments to be created than there are pixels on the screen - and all of them need to be compared with the depth buffer.  Although only a subset get written to RAM, the comparison takes time...

There are a few options:

  1. Decrease framerate
  2. Decrease resolution
  3. Increase clock speed
    1. Probably requires new, faster external RAM (hard to add to my board)
  4. Highly parallel comparing and writing
    1. Potentially requires more memory to be used in parallel
  5. Increase clock speed, but increase memory bandwidth by adding external RAM
    1. Certainly requires more memory

I can add more external RAM to the Mercury board, but it may end up being slower or more expensive, because it would have to go through a 5v level-shifter built into the board.

I am not interested in rolling my own FPGA board because I lack the production capabilities.  I could make a carrier board for the Mercury 2 though, and I probably will.

Discussions

agp.cooper wrote 10/12/2020 at 02:39 point

Perhaps I was not that clear:

* Sort the triangle with depth (yes, closest first).

* Cull the far ones.

* Plot what you can in the given time.

Okay, really it should have said, if you over time, skip the frame and adjust the cull size for next time.

I will bet you did not notice that a frame was skipped! "Common man," it was only 1/60 of a second ago (joke)!

What the message here is that the concepts start off quite simple but get much more complicated with optimisation.

The two basic ways for graphics rendering are:
1) Use a Z Buffer and unsorted tiles/triangles. The Z-Buffer is only over written if the Z is less.

2) The other (as I described) is painter's method.

The trick is to avoid processing what is not visible.

For example, I prefer tiles to triangles as you only have half as many to process.

Another cheap optimisation is not to process the back of a tile/triangle.

The list goes on.

---
With my code, all the calculations (constants) that are needed for transformation from image space to world space, from world space to local space, and the inverse transformation (back to image space) are pre-calculated and/or stored in the tile/triangle structure before 3D processing.

Anyway, perhaps I am wandering off in the wrong direction?

Alan

  Are you sure? yes | no

agp.cooper wrote 10/11/2020 at 15:51 point

Okay, I will bite!

You are basically talking software graphics for the most part.
When I do this I have a working buffer in memory and process that, until time runs out.
Yes not everything get plotted, but if the tiles/triangles are sort by depth, in most cases that does not matter.

Then I bit-blit the working buffer to the display buffer.

So start with OpCodes to support BitBlit. BitBlit is basically a memcpy(...).

Processing the buffer, means mapping the image/texture/colour into your 3D space. For true 3D you need three divisions per pixel. In the 90's when I played with writing my own crappy software graphics, this was not practical. So just I just did a linear approximation.

Sure, it looks weird up close, so don't look up close!

Bullfrog's Magic Carpet (video game) basically did the same. 

If you need real 3D mapping then add hardware multiplication and division to your must have OpCode list.

Not sure if the above helps.

I recently wrote a simple X11 graphics program that does the above. Nothing flash, just floating above water in rain/fog.

Alan

  Are you sure? yes | no

Dylan Brophy wrote 10/11/2020 at 19:08 point

Thanks Alan!  I will keep this in mind.  I typically assume the triangles come in unsorted, or have the possibility of intersecting.  I eventually want to drive this from a Teensy 4, which has plenty of processing power to sort the triangles for me.  In that case rasterization would be much simpler.

I don't think I need real 3D mapping, but if I end up wanting it or needing it I do have hardware floating point multiply/divide.  The division is an estimation though, so I'd worry about errors.  Similar to the Quake III hack.

Wait, so you process the nearest triangles first, then the farther ones, so that at least the nearest ones render, then if you have time the background is rendered?

  Are you sure? yes | no