Close
0%
0%

NS 32000 cross-assembler in C

By Richard Rodman, and published in Dr. Dobbs' Journal.

Similar projects worth following
Salvage and restoration of 1986 code, from ancient scans to modern C source code.
Alas it is too buggy and messy, not worth the time to fix or rewrite.

I started this salvage project to produce a 32000-series cross-assembler that could be compiled to run on my Linux PC, and indeed any machine with a C compiler. Re-typing listings is very tedious, but one can use OCR to do much of the work. It is still tedious though.

I thought this would be a good way to get my Linux PC assembling code, instead of running some old MSDOS binaries.

However, it has proven far more work than expected. It has bugs, crashes, and doesn't support features that are pretty standard these days, such as conditional assembly. So I intend to get the gcc toolchain working for the 32000.

https://wiki.sensi.org/dokuwiki/doku.php?id=ns32ktoolchain

It is not trivial, but should only need doing once.

test_nop.s

A test file with a single nop. Assembles.

Assembler Source File - 23.00 bytes - 02/16/2024 at 00:45

Download

test.s

A bigger test file. Crashes.

Assembler Source File - 9.40 kB - 02/16/2024 at 00:45

Download

A32000.c

C source code for a 32000 series cross-assembler. Tediously recovered from OCR'd scans of a magazine. Recovered enough to compile, but it crashes. Needs a better coder than me to fix it!

text/x-csrc - 38.00 kB - 02/11/2024 at 17:09

Download

A32000.h

Header file of prototypes of function used in A32000.c

text/x-chdr - 1.06 kB - 02/11/2024 at 17:09

Download

A32000_C_source_code_scans.zip

Original listing scans, cropped for OCR.

Zip Archive - 1.04 MB - 02/11/2024 at 17:09

Download

  • Original listing scans

    Keith02/11/2024 at 16:22 0 comments

  • Original article OCR

    Keith02/11/2024 at 16:14 0 comments

    Dr. Dobb’s Journal, December 1986, page 48.

    A table-driven assembler that can be modified for other processors.

    Series 32000 Cross-Assembler

    by Richard Rodman, 1923 Anderson Rd., Falls Church, VA 22043

    The 32000 processor features generalized addressing modes available in almost all instructions.

    The National Semiconductor series 32000 microprocessor line includes the 32-bit 32032 and the 16-bit 32016 (formerly called 16032) microprocessors. As part of a project to build a board using a 32032, I wrote an assembler in Software Toolworks’ C/80; adaptation to any other variant of C should be easy.

    Although most people lump the 68000 and the 32016 together, these processors are radically different. The differences have been summed up as "the 68000 is PDP-11-like, whereas the 32000 is VAX-like." The 32000 includes bit-field, translate, procedure enter/return, and other high-level instructions in its instruction set.

    Basic Program Design

    This program works in a brute-force fashion, but it is easy to understand, modify, and debug. Each instruction’s binary equivalent is stored in a string, with xs where operands need to be inserted. A string matcher, match(), matches the opcodes against lines in the source file, keeping matches to wildcards in the buffer ambig_buffer. Each opcode has an option character, opopt, associated with it that controls special-case logic for some instructions. The data is output in Intel absolute hex format. Table 1, page 49, shows the definitions for the opopt characters and the instruction table format. Table 2, page 49, shows some examples of instruction formats defined using the structure in Table 1.

    The 32000 processor, although allowing absolute addressing, features generalized addressing modes available in almost all instructions. Two’s-complement offsets can be used in three different sizes — 7, 14, or 30 bits long — as needed. Because these offsets could refer to areas not yet defined, and the length of the code varies with the offset, three passes are necessary. The first pass gets a coarse value of all symbols, the second pass then makes the variable offsets the right length and corrects the symbol values, and the third pass actually generates the code. After the first pass, the symbol table is sorted; then in the second and third passes, a binary search is used to find entries more quickly.

    Assembler Syntax

    • Symbols — This assembler limits line length to 128 characters; symbols can be up to a whole line long. Labels must be followed by a colon and can not be reused. The colon must be omitted on equates. Values assigned with equ can be redefined, however.
    • Pseudo-ops — org must be followed by a value. Although the 32000 does not require word alignment of code or data, it does make some operations faster, so an even pseudo-op is provided to force the code address to an even boundary.

      Define byte, word, double (dbdwdd) must be only one value per line. Currently, character-string constants are not supported.

      Numeric constants must begin with a digit. Default radix is decimal, or the value can be followed with an hq, or b for hexadecimal, octal, or binary, respectively. The code address is known as as  ".", and the assembly address (which may be different) as  "..".

    • Opcodes — All 32000 opcodes are supported. The assembly instructions must conform to the NS16000 Instruction Set Reference Manual — for example, arguments to the SAVE instruction must be enclosed in square brackets. You can include multiple instructions on a line as long as all operands to each instruction are provided.
    • Comments — Comments begin with a semicolon (;) and continue to the end of the line. Some programmers have the bad habit of omitting the beginning * or ;. That won't work here.
    • Assembly-time arithmetic — Only "+" "-", and "/"are supported. A look at the listing shows...
    Read more »

  • Original article scans

    Keith02/11/2024 at 16:13 0 comments

  • Project journal

    Keith02/11/2024 at 15:50 0 comments

    Tasks done

    • Downloaded djvu OCR text from archive.org. The original article is okay but the listing is heavily mangled.
    • Downloaded djvu listing scans.
    • Cropped off adverts
    • Cut into single columns
    • OCR with Google Docs. 
      Results are worse than djvu, because Google looks for non-ASCII chars like bullets, Unicode symbols, etc.
    • Edited OCR into a C file (very tedious)
    • Edited C file until it would compile
    • Ran compiled code. It crashes horribly, with segmentation faults and a core dump

    There's about a weekend work there.

    I don't have the time to debug the code right now, but I have uploaded it for other people who might be interested in doing so.

    The code itself is very old, in K&R instead of ANSI style. I took the liberty of updating it to ANSI style and standard include files.

    The code uses short variable names like 'o' and 'l' which people used to do when disk space was limited. The OCR confused them with '0' and '1', requiring many corrections. Modern style is to use longer and self-explanatory names, but I shan't change variable names for now.

    The code uses pointers a lot, which is not as readable as arrays. Rogue pointers are apt to crash programs, which is likely the cause of the current crashing.

    The code use alloc(), which my compiler didn't recognise so I replaced it with malloc(). I have never written programs that required malloc, so I don't know if that is a big mistake or not.

    2024-02-14

    Alan Cox gets the code to a respectable state where it can do something before crashing.

    I notice that it does note assemble the file you tell it to.
    You have to give it the filename without extension.
    It will then look for filename.s and create filename.hex

    It assembles test_nop.s with the command

    ./A32000 test_nop

    but

    ./A32000 test

    produces 

    >>---> Error u at icu
    >>---> Error u at zvers
    >>---> Error u at realst
    >>---> Error u at memszx
    >>---> Error u at mem1
    Segmentation fault (core dumped)

    To do:

    • Fix crashing
    • Create a source code test file that will exercise all opcodes and features
    • Verify correct operation by comparing with output of a trusted 32000 assembler

View all 4 project logs

Enjoy this project?

Share

Discussions

EtchedPixels wrote 02/18/2024 at 09:54 point

There are a bunch of unchecked copies in there. I gave up after getting it part working but you could certainly get arbitrary code execution from suitably hostile inputs. Would be easier to rewrite than fix I think

  Are you sure? yes | no

Keith wrote 02/18/2024 at 10:51 point

Agreed. Thanks for your time having a look at it.

  Are you sure? yes | no

Ken Yap wrote 02/18/2024 at 07:59 point

I only found the site of the buffer overflow by using valgrind, as it didn't crash under gdb. The cause was harder to find, I had to use a variable watch in gdb to discover when the overwriting of nearby variables happened. But when I fixed that by placing bounds on a string copy, it faulted somewhere else.

In my opinion the code as it stands is a hot mess. There is duplication of standard C library routines. Many of the calls to string functions and operations should be rewritten to use bounded versions from the C library. The huge isopcode() routine should be split up. It seems from a comment somewhere it was written within the limitations of the C/80 compiler, probably something that ran under CP/M or DOS.

I tinker with it now and then to verify theories of mine, but have no desire to take it to the conclusion because I really have no hardware target for it. Good luck to the one who gets it fixed.

IMO if one is serious about 32k development, there's an ancient gcc around. The assembler in that uses a different syntax though. Another possibility might be to write a 32k target for asxxxx. Then you get to use macros and includes, generate object files and use a limker.

  Are you sure? yes | no

Keith wrote 02/18/2024 at 10:45 point

I reached that conclusion too. 

  Are you sure? yes | no

Isaac Wingfield wrote 02/16/2024 at 04:32 point

The NS32000 came out about the same time as the Motorola 68000 and soon after the iNtel 8086, IIRC. At the time, I was working for the company who had bought MITS, and was looking for a processor for a "next-generation" system. It was clear to us that the 32000 had far and away the best architecture, with the 68000 running a close second, while the 8086 was almost not even in the running. Predictably, management intruded and we wasted a bunch of time trying to get an 8086 system running. Management's decision was based on a claim by iNtel that the 8086 was "code compatible" with the 8080 the company was currently shipping. This was actually a lie; it was true ONLY if your code had been written in PL/M (which of course, they didn't bother to mention to the managers).

  Are you sure? yes | no

Ken Yap wrote 02/16/2024 at 03:45 point

After A32000.c line 1645 add this debugging code:

       if (inpptr < 0 || inpptr >= 256)
               fprintf(stderr, "%s:%d: Buffer overrun, inpptr = %d\n", __FILE__, __LINE__, inpptr);

and you'll see it's a plain old buffer overrun as the first line in stderr shows:

A32000.c:1647: Buffer overrun, inpptr = 11274

However as inpptr jumps suddenly to 11274, it probably means that inpptr and inpcnt got overwritten by inpbuf just before them.

I'll leave it with you for a while, I need sleep.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates