• Rendering the World (Part 2)

    Dylan Brophy05/24/2024 at 14:27 2 comments

    I changed the screen resolution to 400x300 for now, but this significantly impacts the framerate.  This is fine, as I'm just running some tests for now.  First, I needed to fix the texture rendering.  The math I was using before was very monkey, and needed to be changed.  There is some code used to check if a given pixel is inside each rendered quad, and this generates some coefficients.  These coefficients apply to the edges, not vertices, however.  I found that it wasn't hard to do some linear transformations to extract the texture coordinates, and actually the new math is faster than the original code.  I also made grass have a different texture for the top; and now we have this rendering:
    This is much better.  The quads in this image are diamond shaped and rotated, but the textures are almost perfect!  The shape of the terrain isn't too hard to make out, and with a more interesting scene, it could probably look a lot cooler.  We do see nearby textures bleeding into the dirt texture though.  To fix this, we can add a simple if block to prevent the texture coordinates from going out of bounds.  Certainly, this can't affect our render time much.  Certainly... right?

    At this point I was doing some profiling to see what's eating up my CPU time.  The first thing I wanted to know was the overhead of the map floodfill algorithm; turns out it's really tiny.  I knew it wouldn't be more than half, but I had no idea it would be this small:
    Processed 1035 blocks of 16384 total blocks.
    Total time: 43.820000ms
    Render time: 43.351000ms
    Time taken for 100 frames: 4412ms
    Time per frame: 44120.000us
    FPS: 22.7

    Less than half a millisecond - ore just more than 1% of the render time.  About 98.5% of the render time is spent on the 3D rendering code, and of that probably about 99% is spent on computing and drawing individual pixels - not even matrix multiplication or any actual 3D projection.  Ok, so let's see how much time we add by just adding some if blocks to check the texture coordinates:

    // Almost all textures tile well.
    if (kx > 1) kx -= 1;
    else if (kx < 0) kx += 1;
    if (ky > 1) ky -= 1;
    else if (ky < 0) ky += 1
    Processed 1035 blocks of 16384 total blocks.
    Total time: 46.214000ms
    Render time: 45.750000ms
    Time taken for 100 frames: 4652ms
    Time per frame: 46520.000us
    FPS: 21.5

     That's a big increase, considering its a simple if block.  The reason is that the fragment code does an unbelievable number of iterations, so that if block is probably running millions of time per frame.  All together the if block adds about 2.39ms, or 5%.  That's a lot when you need less than 16ms for 60FPS.  If a simple if block like that adds that much time, then what about the other if blocks in the fragment code?

    inline void fragmentShaderRaw(DisplayBuffer* display, int x, int y, uint8_t z, Color textureColor) {
      uint32_t displayIndex = display->width * y + x;
    
      uint8_t* depthLoc = &(display->depthArray[displayIndex]);
      if (*depthLoc < z)
        // Discard the fragment
        return;
    
      if (textureColor >> 4 == 0)
        // The fragment has no color; discard.
        return;
    
      Color fragColor = textureColor;
    
      // Apply this fragment to the framebuffer
      Color* outColor = &display->colorArray[displayIndex];
      if (fragColor >> 4 == 15)
        *outColor = fragColor;
      else {
        *outColor |= fragColor;
        *outColor &= 0x7;
      }
      *depthLoc = z;
    }
    So now, looking at this simple and rather streamlined function, we can see that in reality it takes a lot of time.  Most of these checks cannot be avoided.  However, I was able to optimize a lot of my quad drawing function, which did a lot of the same math many times in a tight loop.  I found that each multiplication I removed saved me about half a millisecond!  Here is my timing now, under the same test:
    This is much better, especially for the higher resolution of 300x400.

    It is worth noting that, for these tests, the processor...
    Read more »

  • Rendering the World (Part 1)

    Dylan Brophy05/23/2024 at 21:52 0 comments

    Here's where I'm at now with the rendering:

    I can render some of the terrain's shape, but the texture coordinates are not handled properly, and I can currently only render one chunk.  This is the first part for getting the rendering to do what I want, after this I need to:
    1. Optimize the code more and fix the texture bugs, and ensure it works for more scenarios (Part 2)
    2. Expand it to handle multiple 16x16x64 chunks (Part 3)

    Block Iteration Speed - How to Render Voxels Fast

    I don't have screenshots, but originally my render for a whole chunk took over a second - so entirely unplayable.  This was for two reasons:

    • My naive loop that iterates over every block in the chunk, checks if it is covered by other blocks, and renders if not
    • Rendering the sides of chunks

    It was easy to make the chunk sides not render.  We don't want this because the player should never see the sides anyway - they would simply enter a new chunk.  I think this was the bulk of it, because when I switch to the loop again, it now only takes 82ms to render.  That's a 90%+ speed boost.

    The harder part was eliminating the loop.  How do you render the blocks without checking each one?  Well, all the blocks that need to be rendered have two things in common.  First, they all occur within the camera's view.  Second, they all touch the same connected region of air.  By using something akin to a floodfill algorithm, we can find only blocks visible to the player, and ignore about 90% of the blocks in the chunk - never even iterating over them.

    A floodfill algorithm has expensive overhead, especially when considering the need to check the camera's viewport, which involves two matrix multiplications for every block.  However, to iterate over every block in the chunk, it is 16x16x64 iterations, or 16384 iterations.  For many blocks that aren't visible, this may cause rendering, and even for non-rendered blocks, it involves a series of checks for rendering.  This is especially true of caves.  So, it is much better never to iterate over those blocks at all.  The overhead for the floodfill algorithm, as it turns out, is well worth it.

    My floodfill algorithm processed only 1140 blocks, or 7% of the 16384 blocks in the chunk.  It isn't perfect and probably has a bug or two, but still this is very pleasing.

    The algorithm is a modified Breadth-First-Search.  Here's the code:

    // This essentially checks if the block is in the camera's view
    bool isValidNode(int x, int y, int z) {
      Vector3 pos = { x, y, z };
    
      applyMat4fToVertex(&pos, &(settings.worldMatrix));
    
      pos.z += 0.5f;
      if (pos.z > 0 && pos.z < 0.1f)
        pos.z = 0.1f;
    
      float k = 1 / pos.z;
      pos.x = (pos.x + (pos.x < 0 ? 0.5f : -0.5f)) * k;
      pos.y = (pos.y + (pos.y < 0 ? 0.5f : -0.5f)) * k;
    
      applyMat4fToVertex(&pos, &settings.projectionMatrix);
    
      return pos.z >= 0 && pos.z < 260 && pos.x >= 0 && pos.y >= 0 && pos.x < width && pos.y < height;
    }
    
    void searchRender(uint8_t blocks[16][16][64], uint8_t camx, uint8_t camy, uint8_t camz) {
      // This function does a breadth-first search, where the graph's nodes are every block within the camera's viewport.
      // x, y, and z are the coordinates to start the search at.
    
      Vector3iQueue queue(128);
      uint16_t searchedBlocks[16][64];
      bzero(searchedBlocks, 16*64*2);
    
      queue.append(camx, camy, camz);
    
      int blocksProcessed = 0;
      uint8_t x, y, z;
      while (queue.get(x, y, z)) {
        // Fill it.
    
        for (uint8_t q = 1; q < 64; q*=2) {
          uint8_t i = x + ((q >> 0) & 1) - ((q >> 3) & 1);
          uint8_t j = y + ((q >> 1) & 1) - ((q >> 4) & 1);
          uint8_t k = z + ((q >> 2) & 1) - ((q >> 5) & 1);
    
          // Skip blocks outside this chunk
          if (i & 0xF0 || j & 0xF0 || k & 0x80)
            continue;
    
          // Mark the block as checked
          if ((searchedBlocks[j][k] >> i) & 1)
            continue;
          searchedBlocks[j][k] |= 1 << i;
          blocksProcessed++;
    
          // Check if the block is in view:
          if (!isValidNode(i, j, k))
            continue;
    
          // Should we render it, or fill it?
          uint8_t block = blocks[i][j][k];
    ...
    Read more »

  • Procedural World Gen with Plate Tectonics

    Dylan Brophy05/22/2024 at 13:30 0 comments

    I have the rendering pipeline mostly working, but I need something to actually render.  At some point world gen will be required, so I thought I should just implement it now.

    Most world gen uses perlin noise to create the heightmap, which is a critical feature of the terrain - perhaps the most important feature.  The problem, is that it isn't very realistic on a larger scale.  Here is a map of a rather large Minecraft world, generated with perlin noise like this, and it does not have realistic continents, islands, biomes, etc:
    In real life, there are island chains, separated continents, deep trenches under the oceans, mountain chains, etc.  All of these features are created by plate tectonics.  But how can be simulate plate tectonics in a procedural generation algorithm?

    Plate Tectonics in an Infinite Procedurally Generated World
    When I started working on this world generator, I started by thinking that maybe I could generate a list of nearby points to a chunk, and make faultlines as lines between them, maybe modified with some noise.  The issue, is that with a procedurally generated and infinite world, keeping track of and searching these points is not really feasible, and as far as I know, there is not an algorithm to compute the nearby points based on arbitrary given coordinates.  This is important, because when we generate the world, we generate it one chunk at a time, where a chunk is a 16x16x64 tower of blocks that forms one section of the world.  We cannot generate the entire world at once, because the world may be too large, even infinite.  In essence, it isn't feasible to generate tectonic plates themselves.  Thankfully, there is a great workaround.

    A procedural world generator like this is, in essence, a function that describes the characteristics of the world at point x, y given a seed value.  So, we need a function that describes the tectonic plate and/or nearby plates at the coordinate x, y.  And we must do this without having a list of tectonic plates to pull from, as the number of tectonic plates is infinite.

    Bent Space
    Instead of generating the tectonic plates, it works far better to start with a square grid of tectonic plates, then project the x and y coordinates onto this grid using a bunch of cool math.  Here is the same world as the one I showed in the first image, but without coordinate transformation.  Note that faultines are still simulated:As you can see, it is very blocky.  All I did to get from this to the world you saw in the first image, was apply changes to the x and y coordinates.  This makes the blocky grid disappear, and allows us to bypass that tectonic plate issue.

    Steps of Coordinate Alteration
    The sequence of steps I apply is as follows:

    1. Apply a sine function to the coordinates, with a changing frequency
    2. Perlin noise is added to the coordinates.  This seems to produce more realistic terrain than adding perlin noise to the heightmap.
    3. Some perlin noise is added again, but with different parameters and math
    4. Math is done to distort the corners of the grid.  This causes the corners to smooth and disappear into the terrain, as well as creates more interesting terrain features.
    5. Sometimes the corner of a square of the grid is removed, and sometimes the entire square is divided and given to the nearby squares.  This is chosen and controlled by value noise, and makes the features even less blocky.  At this point the grid pattern we started with is nearly imperceptible.

    For each of these features, a world config controls their strength, and the strength is then modulated with more perlin noise.  Each layer of perlin noise should have a different seed, although looking at my code, it seems I didn't do this.  Anyway, different regions of the world should have different coordinate-altering characteristics, leading to differences in the way the terrain looks and feels in different regions.

    This coordinate...

    Read more »

  • Teensy 3D Tests

    Dylan Brophy05/19/2024 at 17:54 0 comments

    Framebuffer produced by a test on the Teensy 4.1. Split image for raw hex data vs rendering.

    Background

    I found out about the Teensy 4.0 shortly after it came out, and it quickly became my favorite MCU due to the wide range of capabilities, incredible horsepower, and low price.  I think that MCU marked a huge change for the Arduino platform, because it was such a leap forward in processing power - it opens up some interesting avenues in the Arduino environment that just weren't open before.  Since then I've hacked together a few interesting devices with the Teensy 4.1, but I haven't done much on these devices.  They haven't really done anything cool.  Part of the issue, is that I didn't build any great devices to run anything on.  Everything sorta had problems or was hacked together.  So, in the latest iteration of the #Arduino Desktop project, I designed a new platform that I can actually build some cool stuff on.  We also have a new #uPD7220 Retro Graphics Card (and VGA hack) which I designed, which goes with the new Teensy system.  With these in mind, I can contemplate a lot more interesting software to load to the Teensy: Namely, a 3D game.

    Is the Teensy fast enough?

    This is the first big question.  With speeds approaching old phone processors, which ran Minecraft Pocket Edition, it feels like there's a chance; but I imagine those phones had hardware to accelerate graphics.  We don't have that.  So, I've translated some of my Java LWJGL code into something arduino compatible, and ran some speed tests:
    Note: 720Mhz overclock for these tests
    As you can see, we can draw a quad in less than 0.25ms.  This includes a basic fragment shader, which handles basic transparency and blending, as well as an 8-bit depth buffer.  These numbers should be realistic enough, as almost every part of the render pipeline is accounted for - the only important omitted part is sending pixel data to the GPU.  That is a big concern though, as it could eat up a lot of CPU time that could be used for rendering.  Another thing to note, is that each quad drawn in the quad test is 30x30 pixels, which is really quite large.

    Considering the above numbers, we can draw about 65 30x30 pixel quads every frame, if we want to hit 60hz.  That's not much, but it's certainly something to work with.  Each block is drawn with 3 quads, but at least one quad is almost always very small, so we can call it 2 quads - this gives us about 32 blocks we can render each frame at 60hz.  I'm sure there's some optimizations or hacks I can do to improve this - it's a work in progress.  Nonetheless, this is certainly enough to run a (small) 3D game, no?

    Other math: I want a display with about 192000 pixels.  If about 2/3 of the screen is to be rendered on average, then this gives us 128k pixels to render every frame.  Each pixel is rendered as one or more fragments, so we have a minimum of 128k fragments to render.  Each fragment takes about 0.3us to render when including quad processing, so the minimum render time should be approximately 38.4ms, or 26 FPS.  That's actually not that bad, all things considered; I expected it to be much lower.  I mean, I wouldn't notice that FPS; I used to play Minecraft at 14 FPS when I was a kid :)

    Display Limitations, Color, and Loading Minecraft's old terrain.png File

    To make things extra fun, I'm going to run this with a uPD7220 GDC to handle the video output.  Due to limitations of graphics memory size and bandwidth, we only get 16 colors.  We can't use 24 bit color anyway, as the Teensy 4.1 would run out of memory.  Also, for textures and the framebuffer, the Teensy 4.1 has to track transparency along with the color.  Thus:
    • The Teensy uses an 8-bit format for textures and framebuffers - 4 bits of alpha and 1 bit for each color channel, including a brightness channel.
    • To load terrain.png, it must be converted to that 8-bit...
    Read more »