Close
0%
0%

GLXgears on a commodore 64

GLXgears on a commodore 64

Similar projects worth following

This is a somewhat practical use of a commodore 64, if only for historical purposes.  The purpose would be just to rediscover the programming model & see how fast it could run a modern task.  Programming a C64 is a kind of sadistic game in itself.  If you already wrote GLXGears for an arduino, you might as well do it for a C64.

Another reason is to have what lions couldn't afford 40 years ago: a native development environment if only for an emulator.

  • Complete model

    lion mclionhead02/21/2023 at 08:17 0 comments

    Banged out all the gears for the reproduction of the original demo.  Multicolors are sadly not possible.  The 160x200 multicolor mode would be illegible.  Maybe the color map could be set based on bounding boxes from the gears, but the overlapping areas wouldn't work.  The best trick might be drawing 2 bitmaps, 1 bitmap containing just the big gear & the other bitmap containing just the small gears since they usually don't overlap.  It could toggle between bitmaps in a raster interrupt, but it would have to be a static display.

    The cursor keys rotate it manually, to prove it's doing the 3D transformation.

    It runs at around 2 frames every 3 seconds.  It burned 12kb.

    1 thing we couldn't do 40 years ago was make crosseyed stereo pairs.  There's not enough resolution to get much depth.  There was an intriguing possibility of printing crosseyed stereo pairs on a VIC 1525.

    Sadly, the magic of learning to program the commodore 64 40 years ago wasn't rediscovered.  To a modern lion, it's just another embedded system with its own goofy architecture.  There were other aspects like sprites, character sets, & sound but the mane thing young lions couldn't master was fast bitmap drawing.

  • The gear function

    lion mclionhead02/20/2023 at 20:49 0 comments

    In order to make any more than a cube fit, the memory mapping gets gnarly.  A big deal in the old days was locating programs around the bitmaps.  The default output from ld65 starts the executable at $800.  The bitmaps go from $2000 to $7b40.  Then $8000 to $d000 is free.

    You can dump the memory in VICE by entering the monitor (alt-h).  Type 'm 0 ffff' to dump the entire memory.  You can make a kind of debugging trace by setting an address & printing the address with 'm 2000'  Type 'r' to dump the registers.

    CC65 has ways of creating memory holes but they're quite involved & specific to C.  The easiest way in assembly was just reserving a hole & jumping past it:

        jmp mane
    .res $7700
    mane:
    
    

     Another way is moving the bitmaps higher.  There are all kinds of restrictions on where color memory & bitmap memory can be.  The mane one is a bitmap can't span 2 VIC banks.  The highest useful bitmap is $a000 so by moving the bitmaps to $5c00 to $c000 it had 20kb for the program.  Anything higher overlaps the I/O registers or kernal.  Cc65 automatically swaps out the BASIC ROM but it uses the kernal.

    After freeing up enough memory, the procedural gear was drawing at roughly 2 frames per second.

  • Math optimization

    lion mclionhead02/16/2023 at 10:37 0 comments

    When trying to manually set rotation angles in the cube demo you have to calculate both cos & sin for the X rotation & again for the Y rotation. There were a few more optimizations to be had in eliminating division branches & combining the unsigned projection tables into 1 signed table.  To increase the chance of the gears having enough resolution, the trig tables were increased to 256 entries with a range from -127 to 127.

    The fastest way to draw a gear is going to be procedurally drawing 1 side of a gear with the Z rotation applied, then drawing 1 point on the other side to calculate a fixed offset between the 2 sides, then applying XY rotation with the existing code.  Add the fixed offset to the one side to create the other side.

    The cube demo can similarly be optimized by only computing 4 points & using those 4 points to compute fixed offsets to create the other 4 points.  Technically a 4x4 transformation matrix does the same thing as adding a fixed offset to every point, but it also has a scaling step which is slow.  

    The gears would be baked in polar coordinates at compile time, then rotated & converted to XY coordinates for each frame.  A key optimization would be precalculating the polar to XY conversions for the 9 circles in the model.  That would use 512 * 9 or 4608 bytes.  By knowing cos is just sin with a phase offset of 64, this can be reduced to 320 bytes per circle or 2880 bytes.

    The thought occurred of how fast glxgears would run on an arduino if it used the same methods, but the point of that demo was manely to show the REGIS protocol drawing over a serial port.

    The original cube demo hard coded which coordinates to use for all the line drawing commands.  A more general gear routine needs to convert a batch of polar coordinates into 2D points.  Another routine needs to draw lines from the set of points.  The biggest gear contains 200 points.  

  • Cube demo using cc65

    lion mclionhead02/15/2023 at 07:06 0 comments

    Ported the assembly language demo from https://retro64.altervista.org/blog/an-introduction-to-vector-based-graphics-the-commodore-64-rotating-simple-3d-objects/ to ca65.

    Basic line drawing & pixel drawing kicked off this port.  Thus ran the lion kingdom's 1st commodore 64 program in 35 years.  Strangely more satisfying to rediscover commodore 64 programming & assembly language optimization 35 years later than it is to do something productive.  It's the activity lions couldn't afford to do 35 years ago.

    It might have been easier to find out what assembler the cube demo used.  There are many wrinkles, the line routine only supporting 8 bit X but the plot routine supporting 16 bit X, big endian being used for local variables while little endian is used by the 6502 instructions, unnecessary fetches instead of constants.

    Fetching from memory was really slow.  As many opcodes as possible should have hard coded literals.  There's a table of instructions & cycle times on https://the-dreams.de/aay64.txt

    You can get the byte codes for the instructions by passing -l to ca65.  

    0001E9r 1  B9 rr rr         lda ytablehigh_BMP0,y
    

    This gives the bytes occupied by ytablehigh_BMP0 at runtime.

    The original compiler obviously didn't support ifdef.  Many functions come from https://codebase64.org/ which has a lot of numerical recipes in assembly.  Even a lion who hasn't programmed the 6502 in 40 years still finds a lot of bugs & waste.  After all those struggles 40 years ago, the conversion from coordinates to bitmap offsets was actually very simple.

    The tricky bit is the math library.  Lions won't pretend to know what's going on there.  10 year old lion was 7 years away from being exposed to even the basic trig functions so it was nowhere close to happening in those days.

    The demo gains a lot of speed by only drawing & clearing a small part of the screen.  By the time the 1st cube demo was animating, it was clearly going to be super slow by the time it was 3 gears.

    The original cube demo after porting drew a small cube in the center.

    Managed to maximize the cube size & draw the X border.  There's no clipping support so different angles create different limits on the dimensions.  The coordinates are signed.  Unlike most compilers, ca65 can't automatically convert negative numbers to unsigned.  You have to write 256 - the number.

    Compiled the various iterations of the cube demo into a vijeo.  The only thing affecting the speed is the size of the area being cleared.  The clear operation is a clockcycle buster.  The line drawing & math is negligible in a polygon this size.  The optimized 3D drawing goes a lot faster than lions remember 40 years ago but maximum size isn't fast enough to believe they didn't also use some aggressive optimizations.

    It might be fastest to redraw the cube with a line erase function, but it wouldn't be fastest with a gear polygon.

  • Simple assembly language program with cc65

    lion mclionhead02/13/2023 at 23:26 0 comments

    Lions remember very little about C64 development.  load "$",8 loads a directory listing into the program space.  PRG files are programs.  SEQ files are data.  Assembly language programs tended to be stored in SEQ files.  There was 1 PRG file for starting.  load with ,8,1 was required for assembly language.  The mane wrinkle was the many addressing modes enabled by the X Y index registers.  You'd load 8 bit offsets into those & use LDA STA variants which add those to the address.  There are no 16 bit registers, which lions tend to confuse with the 68HC11 which had a 16 bit X Y & accumulator.  

    Assembly language arguments in ca65 are:

    $01 dereference a value in a hex address

    1234 dereference a value in a decimal address

    #$34 a literal in hex format

    #%01010101 a literal in binary 

    #123 a literal in decimal format

    < provides the low byte of a literal

    > provides the high byte of a literal

    Enclosing the agrument in various parenthesis  POINTER,X  (POINTER) (POINTER),Y (POINTER,X) invokes different addressing modes described in http://www.emulator101.com/6502-addressing-modes.html.

    POINTER,X  Add X to pointer to get POINTER2.  Dereference the value in POINTER2.  POINTER is a 16 bit  or 8 bit address.  There are different opcodes for the 16 bit & 8 bit variant.  The 8 bit variant wraps at 256 (zero page memory).  This works with X or Y.

    (POINTER) Read the address stored in POINTER to get POINTER2.  Dereference the value in POINTER2.  POINTER is a 16 bit address. This only works with GOTO.

    (POINTER,X) add X to POINTER to get POINTER2.  Dereference the address in POINTER2.  POINTER is limited to zero page memory.  This works with X only.

    (POINTER),Y Dereference the address stored in POINTER to get POINTER2.  Add Y to POINTER2 to get POINTER3.  Dereference the address in POINTER3.  POINTER is limited to zero page memory.  This works with Y only.

    Double & triple pointers were the key to accessing large amounts of memory with 8 bit registers.   They had special instructions for accessing the 1st 256 bytes of memory (zero page).  You'd ideally have all your variables in that space.  16 bit POINTER,X was the only indexing mode lions could understand 40 years ago.

    A starting point is to call into the C library to print something.

    .autoimport    on              ; imports _cprintf, pushax
    .forceimport    __STARTUP__ ; imports STARTUP, INIT, ONCE
    .export        _main           ; expose mane to the C library
    .segment    "RODATA"
    _Text:                      ; PETSCII text to print
        .byte    $C8,$45,$4C,$4C,$4F,$20,$57,$4F,$52,$4C,$44,$21,$00
    
    
    .segment    "CODE"
    .proc    _main: near
        lda     #<(_Text) ; low byte function argument
        ldx     #>(_Text) ; high byte function argument
        jsr     pushax ; put function arguments on stack
        ldy     #$02   ; size of function arguments (2 bytes)
        jsr     _cprintf ; C library function
    .endproc
    

    The trick with this is it uses PETSCII instead of ASCII.  

    The pushax function is a beast in libsrc/runtime/pushax.s

    There's a command to assemble it.

    ca65 -t c64 hello.s

     Then link it with the standard C library.

    ld65 -t c64 hello.o -o hello c64.lib

    The executable doesn't end in .prg but is a PRG file anyway.  It's an utterly gigantic 2108 bytes for what it does, probably because cprintf has to parse formatting codes.  There is a simpler _puts function.  

    	lda     #<(string)
    	ldx     #>(string)
    	jsr     _puts
    

    The trick with the C library is compiling C programs to figure out the function arguments.  Cc65 generates a .s file with the assembly language function calls.  The mane C function of note is cprintf.

    For fast development on the emulator, the journey begins by creating a disk image

    c1541 -format "disk,00" d64 disk.d64

     Store the program in the disk image.

    c1541 -attach...
    Read more »

  • Introducing cc65

    lion mclionhead02/11/2023 at 23:21 0 comments

    CC65 compiles on Linux with just a simple make command.    bin/ca65 is the assembler.  As expected, it can't compile the 3D demo.

    The samples directory has some sample programs which compile with make.  'make disk' creates a .d64 image with all the programs, but requires vice & its c1541 tool to be installed.  The cbm directory has more samples which must be compiled with separate make commands.  They all compile to machine language.  The most impressive one might be the plasma demo.  We didn't have fullscreen animations like that in 1985.

    It has its own graphics library tgi.h which draws 2D polygons in the 320x200 monochrome mode.  The commodore 64 routines are in libsrc/c64/tgi/c64-hi.s.  The platform independent bits are in libsrc/tgi.  It's all assembly & contains examples of the assembler syntax for a library.  There are no examples of a standalone program in assembly language.

    As was conventional in the old days, there are no register names.  They write the numbers for all the addresses.  

    It sort of makes sense to do it in C & use the TGI library.  Some benchmarks should be done to compare cc65 with paw coded math routines.  The SETPIXEL routine seems to be a lot slower than the 3D one.  It may be the current fascination with assembly language is just the preference of gootubers to watch vijeos about assembly language.

    Helas, this style of development is not much of a trip to the past.  The commodore is now just another embedded system.  Moreover, it can never be more than an emulation of hardware lions can't afford.  

    The big question is how much optimization can be done before it's not really 3D graphics anymore.  There are 854 vertices in the glxgears model.  If the animation was baked into 8 bit XY coordinates to be fed directly to the line drawer, it would fit 38 frames in all 64k of RAM.  It possibly could be animated with only 15 frames or 25620 bytes.  Hiding the hidden lines could save a lot of memory.  The mane loss is the interactive rotation.

    The arduino port was manely to show how graphics could be sent over the serial port.  The C64 port is manely to show how fast a C64 could draw a 3D graphics demo from 25 years ago.  It's not really doing the purpose if it's baking it all in 2D.

    It's never going to be a real 3D plotter anyway, but the mane differentiator from a 2D plotter is the need for interactive rotation.  So the optimization is as much as still allows interactive rotation.

  • Previous work

    lion mclionhead02/10/2023 at 23:23 0 comments

    In terms of historic re-enactment, there can never be a return to manually entering a new program, line by line, only referring to the programmer's reference guide, as was done in 1985.  The post 1985 world is always going to be modifying existing internet examples.

    While there were general purpose 3D drawing libraries, lions don't believe they were optimized.  They had to use floating point & transformation matrices to be general purpose.

    The goog spits out many previous 3D graphics programs.  

    https://retro64.altervista.org/blog/an-introduction-to-vector-based-graphics-the-commodore-64-rotating-simple-3d-objects/

    This contains a complete assembly language listing which does a 3D transformation & line drawing using all integers.  

    The initialization is impressively simple compared to even a modern game engine.

    It uses its own optimized multiply & divide routines instead of calling BASIC.  Trig routines are lookup tables.  The problem is reduced to just replacing the cube coordinates with gear coordinates.

    Instead of using a transformation matrix, it rotates the cube model using high school trig formulas.  Most of the code is the model rotation in rotate_loop.  2 variants rotate only 1 axis.  1 variant rotates 2 axes.  It might need a more general way of rotating & translating the model so the different gear models can be drawn simultaneously.

    The 3D projection is done with a simple lookup table that scales XY based on Z.

    There is a hidden line removal technique which is specific to a cube.

    https://retro64.altervista.org/blog/another-look-at-3d-graphics-fast-hidden-faces-removal-back-face-culling/

    A more general hidden line removal technique is shown.

  • Tools

    lion mclionhead02/10/2023 at 07:50 0 comments

    The mane tools are a cross compiler, a disk image creator, & an emulator.  It's purely intended for the 320x200 bitmap mode.

    The preferred toochain is CC65.  That has an assembler & C compiler.  Apparently C is hopeless on the 6502.

    https://cc65.github.io/

    Disk images are created with c1541, which is part of the emulator.

    https://codebase64.org/doku.php?id=base:tools_for_putting_files_into_a_.d64_image

    The lion kingdom must confess to being more fascinated with commodore emulation 25 years ago than now that the rest of the world is.  All the demos seem to cut off after 2000, so retro computing might just be the gootube algorithm's own fabrication.

View all 8 project logs

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates