• ### Rendering the World (Part 2)

Dylan Brophy05/24/2024 at 14:27 2 comments

```Processed 1035 blocks of 16384 total blocks.
Total time: 43.820000ms
Render time: 43.351000ms
Time taken for 100 frames: 4412ms
Time per frame: 44120.000us
FPS: 22.7```

Less than half a millisecond - ore just more than 1% of the render time.  About 98.5% of the render time is spent on the 3D rendering code, and of that probably about 99% is spent on computing and drawing individual pixels - not even matrix multiplication or any actual 3D projection.  Ok, so let's see how much time we add by just adding some if blocks to check the texture coordinates:

```// Almost all textures tile well.
if (kx > 1) kx -= 1;
else if (kx < 0) kx += 1;
if (ky > 1) ky -= 1;
else if (ky < 0) ky += 1; ```
```Processed 1035 blocks of 16384 total blocks.
Total time: 46.214000ms
Render time: 45.750000ms
Time taken for 100 frames: 4652ms
Time per frame: 46520.000us
FPS: 21.5```

That's a big increase, considering its a simple if block.  The reason is that the fragment code does an unbelievable number of iterations, so that if block is probably running millions of time per frame.  All together the if block adds about 2.39ms, or 5%.  That's a lot when you need less than 16ms for 60FPS.  If a simple if block like that adds that much time, then what about the other if blocks in the fragment code?

```inline void fragmentShaderRaw(DisplayBuffer* display, int x, int y, uint8_t z, Color textureColor) {
uint32_t displayIndex = display->width * y + x;

uint8_t* depthLoc = &(display->depthArray[displayIndex]);
if (*depthLoc < z)
return;

if (textureColor >> 4 == 0)
// The fragment has no color; discard.
return;

Color fragColor = textureColor;

// Apply this fragment to the framebuffer
Color* outColor = &display->colorArray[displayIndex];
if (fragColor >> 4 == 15)
*outColor = fragColor;
else {
*outColor |= fragColor;
*outColor &= 0x7;
}
*depthLoc = z;
}```

• ### Rendering the World (Part 1)

Dylan Brophy05/23/2024 at 21:52 0 comments

Here's where I'm at now with the rendering:

I can render some of the terrain's shape, but the texture coordinates are not handled properly, and I can currently only render one chunk.  This is the first part for getting the rendering to do what I want, after this I need to:
1. Optimize the code more and fix the texture bugs, and ensure it works for more scenarios (Part 2)
2. Expand it to handle multiple 16x16x64 chunks (Part 3)

Block Iteration Speed - How to Render Voxels Fast

I don't have screenshots, but originally my render for a whole chunk took over a second - so entirely unplayable.  This was for two reasons:

• My naive loop that iterates over every block in the chunk, checks if it is covered by other blocks, and renders if not
• Rendering the sides of chunks

It was easy to make the chunk sides not render.  We don't want this because the player should never see the sides anyway - they would simply enter a new chunk.  I think this was the bulk of it, because when I switch to the loop again, it now only takes 82ms to render.  That's a 90%+ speed boost.

The harder part was eliminating the loop.  How do you render the blocks without checking each one?  Well, all the blocks that need to be rendered have two things in common.  First, they all occur within the camera's view.  Second, they all touch the same connected region of air.  By using something akin to a floodfill algorithm, we can find only blocks visible to the player, and ignore about 90% of the blocks in the chunk - never even iterating over them.

A floodfill algorithm has expensive overhead, especially when considering the need to check the camera's viewport, which involves two matrix multiplications for every block.  However, to iterate over every block in the chunk, it is 16x16x64 iterations, or 16384 iterations.  For many blocks that aren't visible, this may cause rendering, and even for non-rendered blocks, it involves a series of checks for rendering.  This is especially true of caves.  So, it is much better never to iterate over those blocks at all.  The overhead for the floodfill algorithm, as it turns out, is well worth it.

My floodfill algorithm processed only 1140 blocks, or 7% of the 16384 blocks in the chunk.  It isn't perfect and probably has a bug or two, but still this is very pleasing.

The algorithm is a modified Breadth-First-Search.  Here's the code:

```// This essentially checks if the block is in the camera's view
bool isValidNode(int x, int y, int z) {
Vector3 pos = { x, y, z };

applyMat4fToVertex(&pos, &(settings.worldMatrix));

pos.z += 0.5f;
if (pos.z > 0 && pos.z < 0.1f)
pos.z = 0.1f;

float k = 1 / pos.z;
pos.x = (pos.x + (pos.x < 0 ? 0.5f : -0.5f)) * k;
pos.y = (pos.y + (pos.y < 0 ? 0.5f : -0.5f)) * k;

applyMat4fToVertex(&pos, &settings.projectionMatrix);

return pos.z >= 0 && pos.z < 260 && pos.x >= 0 && pos.y >= 0 && pos.x < width && pos.y < height;
}

void searchRender(uint8_t blocks[16][16][64], uint8_t camx, uint8_t camy, uint8_t camz) {
// This function does a breadth-first search, where the graph's nodes are every block within the camera's viewport.
// x, y, and z are the coordinates to start the search at.

Vector3iQueue queue(128);
uint16_t searchedBlocks[16][64];
bzero(searchedBlocks, 16*64*2);

queue.append(camx, camy, camz);

int blocksProcessed = 0;
uint8_t x, y, z;
while (queue.get(x, y, z)) {
// Fill it.

for (uint8_t q = 1; q < 64; q*=2) {
uint8_t i = x + ((q >> 0) & 1) - ((q >> 3) & 1);
uint8_t j = y + ((q >> 1) & 1) - ((q >> 4) & 1);
uint8_t k = z + ((q >> 2) & 1) - ((q >> 5) & 1);

// Skip blocks outside this chunk
if (i & 0xF0 || j & 0xF0 || k & 0x80)
continue;

// Mark the block as checked
if ((searchedBlocks[j][k] >> i) & 1)
continue;
searchedBlocks[j][k] |= 1 << i;
blocksProcessed++;

// Check if the block is in view:
if (!isValidNode(i, j, k))
continue;

// Should we render it, or fill it?
uint8_t block = blocks[i][j][k];
...```

• ### Procedural World Gen with Plate Tectonics

Dylan Brophy05/22/2024 at 13:30 0 comments

I have the rendering pipeline mostly working, but I need something to actually render.  At some point world gen will be required, so I thought I should just implement it now.

Most world gen uses perlin noise to create the heightmap, which is a critical feature of the terrain - perhaps the most important feature.  The problem, is that it isn't very realistic on a larger scale.  Here is a map of a rather large Minecraft world, generated with perlin noise like this, and it does not have realistic continents, islands, biomes, etc:
In real life, there are island chains, separated continents, deep trenches under the oceans, mountain chains, etc.  All of these features are created by plate tectonics.  But how can be simulate plate tectonics in a procedural generation algorithm?

Plate Tectonics in an Infinite Procedurally Generated World
When I started working on this world generator, I started by thinking that maybe I could generate a list of nearby points to a chunk, and make faultlines as lines between them, maybe modified with some noise.  The issue, is that with a procedurally generated and infinite world, keeping track of and searching these points is not really feasible, and as far as I know, there is not an algorithm to compute the nearby points based on arbitrary given coordinates.  This is important, because when we generate the world, we generate it one chunk at a time, where a chunk is a 16x16x64 tower of blocks that forms one section of the world.  We cannot generate the entire world at once, because the world may be too large, even infinite.  In essence, it isn't feasible to generate tectonic plates themselves.  Thankfully, there is a great workaround.

A procedural world generator like this is, in essence, a function that describes the characteristics of the world at point x, y given a seed value.  So, we need a function that describes the tectonic plate and/or nearby plates at the coordinate x, y.  And we must do this without having a list of tectonic plates to pull from, as the number of tectonic plates is infinite.

Bent Space
Instead of generating the tectonic plates, it works far better to start with a square grid of tectonic plates, then project the x and y coordinates onto this grid using a bunch of cool math.  Here is the same world as the one I showed in the first image, but without coordinate transformation.  Note that faultines are still simulated:As you can see, it is very blocky.  All I did to get from this to the world you saw in the first image, was apply changes to the x and y coordinates.  This makes the blocky grid disappear, and allows us to bypass that tectonic plate issue.

Steps of Coordinate Alteration
The sequence of steps I apply is as follows:

1. Apply a sine function to the coordinates, with a changing frequency
2. Perlin noise is added to the coordinates.  This seems to produce more realistic terrain than adding perlin noise to the heightmap.
3. Some perlin noise is added again, but with different parameters and math
4. Math is done to distort the corners of the grid.  This causes the corners to smooth and disappear into the terrain, as well as creates more interesting terrain features.
5. Sometimes the corner of a square of the grid is removed, and sometimes the entire square is divided and given to the nearby squares.  This is chosen and controlled by value noise, and makes the features even less blocky.  At this point the grid pattern we started with is nearly imperceptible.

For each of these features, a world config controls their strength, and the strength is then modulated with more perlin noise.  Each layer of perlin noise should have a different seed, although looking at my code, it seems I didn't do this.  Anyway, different regions of the world should have different coordinate-altering characteristics, leading to differences in the way the terrain looks and feels in different regions.

This coordinate...

• ### Teensy 3D Tests

Dylan Brophy05/19/2024 at 17:54 0 comments

Background

I found out about the Teensy 4.0 shortly after it came out, and it quickly became my favorite MCU due to the wide range of capabilities, incredible horsepower, and low price.  I think that MCU marked a huge change for the Arduino platform, because it was such a leap forward in processing power - it opens up some interesting avenues in the Arduino environment that just weren't open before.  Since then I've hacked together a few interesting devices with the Teensy 4.1, but I haven't done much on these devices.  They haven't really done anything cool.  Part of the issue, is that I didn't build any great devices to run anything on.  Everything sorta had problems or was hacked together.  So, in the latest iteration of the #Arduino Desktop project, I designed a new platform that I can actually build some cool stuff on.  We also have a new #uPD7220 Retro Graphics Card (and VGA hack) which I designed, which goes with the new Teensy system.  With these in mind, I can contemplate a lot more interesting software to load to the Teensy: Namely, a 3D game.

Is the Teensy fast enough?

This is the first big question.  With speeds approaching old phone processors, which ran Minecraft Pocket Edition, it feels like there's a chance; but I imagine those phones had hardware to accelerate graphics.  We don't have that.  So, I've translated some of my Java LWJGL code into something arduino compatible, and ran some speed tests:As you can see, we can draw a quad in less than 0.25ms.  This includes a basic fragment shader, which handles basic transparency and blending, as well as an 8-bit depth buffer.  These numbers should be realistic enough, as almost every part of the render pipeline is accounted for - the only important omitted part is sending pixel data to the GPU.  That is a big concern though, as it could eat up a lot of CPU time that could be used for rendering.  Another thing to note, is that each quad drawn in the quad test is 30x30 pixels, which is really quite large.

Considering the above numbers, we can draw about 65 30x30 pixel quads every frame, if we want to hit 60hz.  That's not much, but it's certainly something to work with.  Each block is drawn with 3 quads, but at least one quad is almost always very small, so we can call it 2 quads - this gives us about 32 blocks we can render each frame at 60hz.  I'm sure there's some optimizations or hacks I can do to improve this - it's a work in progress.  Nonetheless, this is certainly enough to run a (small) 3D game, no?

Other math: I want a display with about 192000 pixels.  If about 2/3 of the screen is to be rendered on average, then this gives us 128k pixels to render every frame.  Each pixel is rendered as one or more fragments, so we have a minimum of 128k fragments to render.  Each fragment takes about 0.3us to render when including quad processing, so the minimum render time should be approximately 38.4ms, or 26 FPS.  That's actually not that bad, all things considered; I expected it to be much lower.  I mean, I wouldn't notice that FPS; I used to play Minecraft at 14 FPS when I was a kid :)