Close

"The Linker does just that!"

A project log for Z80 Reverse-Engineering And Hacking Adventures

What else can I say?

eric-hertzEric Hertz 09/04/2022 at 17:4314 Comments

Back in the early 2000's I had a project which went quite smoothly, until suddenly, and for months, eluded the heck out of me, and still that bit haunts me to this day.

Prior to that project, I'd been doing stuff with AVRs for years, fancied myself pretty good at it. This new project was just a minor extension from there, right? An ARM7, the only difference, really, was the external FLASH/RAM, right? No Big Deal. Right?

So I designed the whole PCB, 4 layers. Two static RAM chips, FLASH, a high-speed DAC and an ADC... even USB via one of those stupid flat chips with no pins. I'd basically done all this with barely the semblance of a prior prototype... The ARM7 evaluation-board was used for little more than a quick tutorial and then for its onboard jtag dongle for flash programming. 

Aside from one stupid oversight--wherein I used tiny vias coupled to the power-planes through thermal-reliefs without realizing I needed to go deeper into the via options to consider the "trace" size used to create the copperless spacing and wound-up cutting the connection due to the traces' overlapping round ends--Aside from that (mind you, at the time the cheap PCB fabs offered 4layer 4in x 4in boards for $60 apiece, if you bought three, as I recall, and free shipping was not a thing)... 

So, aside from that $200 mistake, the board worked perfectly. I even figured out how to solder that stupid USB chip without a reflow oven.

Even my wacky idea about a strange use of DMA (which I'd never messed with, prior), enabling the system to sample both the DAC and ADC simultaneously at breakneck speeds and precision timing... even that all worked without a hitch. As-planned.

Amazing!

What *didn't* work, then?

The friggin FLASH chip was, of course, slowing the system. I hadn't considered instruction cache (who, coming from AVRs, would?), and, frankly, I don't even recall its mention in any of the devkit's docs, aside maybe a bullet-point in the ARM's datasheet.

As far as I could tell, this thing was running every single instruction from the ROM, just like an AVR would... But, that meant accessing the *external* chip with a multiple-clock-cycle process (load address, strobe read, wait for data, unstrobe read), made worse by the number of wait-states necessary for the slow read-access of the Flash.

So, every single instruction took easily a half-dozen clock cycles, rendering my [was it 66MHz?] ARM *far slower* than the 16MHz AVRs I was used to.

.

Whew, I wasn't planning on this becoming a long story.

Long-story-short, I needed to move my code to RAM, and even after several months near full-time I never did figure it out. And, now, (well, a few weeks ago) I ran into the same problem again with this project.

This time it's not about speed, it's about the "ihexFlasher", which allows me to reprogram the firmware in the [now] Flash-ROM in-system. (Pretty sure I explained it in a previous log). Basically: set a jumper to boot into the ihexFlasher, upload an ihex file via serial, change the jumper back, reset, and you're in the new firmware.

Problem is, the flash-chip can't be read while it's being written, so where's the Z80 gonna get the instructions from that tell it how to write the new firmware *while* it's writing the new firmware? RAM, of course.

Somehow I need[ed] to burn the flash-writing-function into flash, then boot from flash, then load the flash-writing function into RAM, then run it from there.

Basically the same ordeal I never did figure out with what was probably my most ambitious project ever, and with the most weight riding on it, back in the early 2000's.

...

Well, I figured out A way to hack it, this time, but I still can't believe how much of an ordeal it was.

Back Then the internet was nothing like it is today, the likes of StackExchange were barely existent, and certainly not as informative, nor the answers as-scrutinized. Forums were the place to go... And the resounding sentiment from folk was that I needed to learn/use "The Linker."

So, I tried. For months, nearly fulltime. And I never did figure it out. After all these years, and countless similar projects under my belt, I still don't get it.

...

But I found some ideas on one of those Stack pages, this time, and worked-out a hack that works for this one function on this one system...

The key, in this case, was to just forget the linker, and TryAndSee whether this particular combination of C-compiler [version!]  optimization settings, and machine architecture worked as I needed. Oh, and a bit of the Assembly-reading-ability I've picked-up since Back in the day (much thanks to @ziggurat29 ). And, thankfully, with this particular setup, it seems it worked like I needed, with only a couple hiccups.

The basic jist is to write a simple function, in C, that copies byte-by-byte starting from the function-pointer (in ROM) to a uint8_t array in RAM, then cast the array-pointer to a function-pointer, and call that.

Simple-enough, yeah?

Many hurdles: 

First: NO, according to The Stacks, this is not at all portable, and not at all within the standards. TryAndSee in *every* situation, even if all you did was change optimization settings, or upgrade your C compiler to the next version. "Functions are not objects."

Second: Apparently there's no way to tell my C compiler to Not use absolute jumps. So, even though I got it running, initially, from RAM, as soon as it entered a loop, it jumped not to the top of the loop, but to the top of the loop *in  ROM*.

There were a few other gotchas that required hand-editting of the assembly output. Which is fine for this.

...

But, it really gets me wondering how this can possibly not be a well-established standardized thing. I mean, when you run a program off a hard disk, it gets copied to and run from RAM... And the same was true for all those programs run from cassette tapes... Is it darn-near *always* the job of the OS, even in embedded systems?

Linux wasn't yet ported to ARM, as I recall, when I worked on that devkit... So, then, how would folk even use that thing if they were expected to hand-code an OS to those extents? And how would they even assure their code *could* be run from RAM (i.e. no absolute jumps )?

 SURELY, there must be some "normal" way of doing this, but here still, some twenty years later, I'm not finding anything other than "code it in assembly," or TryAndSees like mine, or the magickal handwavy "use The Linker!"

So, herein, as far as I understand The Linker, all it does is *link*... So even if I was smart enough to understand its syntax, it still would only result in my telling the linker to put the function at that [RAM] address in the hex-file. And when I burn that hexfile to flash, the portions that are addressed in RAM will, well, be outside the flash chip. [If I use my ihexFlasher it might get written to the RAM, but of course get lost after reboot. Though this gives me an interesting idea about loading test code without flashing... hmmm] But, The Linker, surely, isn't inserting code into my file, before my code, to do the actual copying of the code from ROM, to RAM. How could it? My boot code is in handwritten assembly, it's got .orgs, as necessary, for things like interrupt vectors, a jump at 0x0000 around those to the actual boot code, and so-forth. So, even if I informed "The Linker" about all those, in its crazy-ass syntax, it'd somehow have to not only modify those to inject a function-call to copy my function to RAM, but then it would also have to add code to actually do the copying. That code's gotta come from *somewhere*... So, now, again, intuitively, it doesn't seem like the sort of thing "The *linker*" would do (generating code).

So, now, twenty years later I have a vague idea that code might be something well-standardized, in a library, of sorts, not unlike printf, or stdio... (Oooh, maybe I should dig out my K&R). Maybe it's *so* standardized that we usually just pretend it doesn't exist... like... how the appropriate integer-multiplication *function* is *called* when we type the '*' character.

Similarly: I've a *vague* understanding, now, especially after having seen the assembly-output so many times, that initialized global variables (int globalA = 0) require some external routine, called *before* main(), to actually load those RAM locations with a table of values stored in ROM. Similar idea. But *Something* has to actually *do* that copying... Something, code I didn't write, has to inject itself into my code *before* my code. And so, I take it, the linker merely tells everything where to *find* my RAM function (e.g. in calls), and maybe even adds that function to a list of bytes that need to be copied to RAM locations... But, still, I *don't* gather that the linker actually *does* the copying. Something else, some library most folk seem to barely even acknowledge exist, must do that. 

OK, I'll buy it. Though, it's a bit unnerving to think about... it means all *my* initializations, in main(), before the while(1) loop come *long* after boot. Things like initializing Chip-Select pins that keep two devices from trying to write the bus at the same time. Or the watchdog timer, etc. Sure, the former can be handled by pull resistors, and should, anyhow, during resets... But I've little doubt there's plenty potential for unexpected-consequences when one thinks main() is *the first* thing that runs in an embedded system. Heh.

Anyhow, Somewhere there must be a resource that covers the entire boot process step-by-step, and similarly the build-process... The tools involved, the purpose of each step we ignore in the background, the functions called, and where they're supplied, and what to write and how to name them, if you're working with a system that doesn't already have them... etc. 

But as it stands, I've yet to find the proper "M" to "RTF." So, this is the process, thus far, for me... Piecing stuff like this together gradually over 25 years.

I gather, thus far, that a "C-Runtime" must be responsible for e.g. initializing global variables, and eventually, later, calling main().

But, I can't imagine that to be anything but *highly* architecture-specific... It has to know the memory-map, at the very least. Maybe that's where the linker comes in. And *something* has to be responsible for things like the ISR tables, and jumping to the actual ISR functions, and, upon reset, jumping *around* that table to the *actual* boot-code... "The Compiler" fine. But, then, "The Compiler" must also know when to do all that, or when to output code that's compatible with an already-running OS, which, surely, would require an entirely different "C Runtime" (or whatever's responsible).

So, so far, this is where I'm at. Some weird intermediate state between having been "pretty good at" "low-level" embedded programming, which, apparently, was quite a bit higher-level than I gathered, and, now, having gotten somewhat-familiar with "the lowest level" assembly, ISR tables, jumping from the boot vector to actual boot code... And finding that this intermediate step between those seems to have a comparatively *huge* learning-curve of even higher-level languages and APIs and even more cryptic than raw assembly syntaxes, and things so deep behind the scenes that finding info about them is akin to looking up a word in the dictionary only to find that you have to look up another, only to find that *that* definition uses the word you were looking up in the first place. Hah!

...

I don't think this is where I intended on going with this... but, sufficed to say, for me learning to code in C was easier. Which could make sense, since it's apparently pretty high level (even if you only consider, again, function-calls injected in place of things like "*", as opposed to the even higher-level of hiding things like the boot process injecting things like global initializations! Almost like a BIOS injected in your hexfile). 

Learning to code in assembly, too, was easier... at least, for me... And, I suppose that could make some sense for the opposite reason. There's *very little* hidden there.

But, now, that hidden part, between the two... That's where things get weird. At least, from my perspective. Considerations like whether a C Compiler even *can* be told *not* to use absolute jumps in a particular function... And how to do-so (#pragmas?)... Or how to tell it to put a function at a specific memory location. Or whether a string can be stored in ROM, or whether it needs to be initialized into RAM, and if it needs to access them differently. Whole new and separate languages for such things that, frankly, are *easy* to do in raw assembly, and surely should've been concerns even way back in the K&R days, maybe even moreso then, that seemingly *aren't* standardized from one C compiler to the next(?!), nor from one Architecture to the next(?!).

An aside: did you know that Two's-Compliment is *not* guaranteed in the C Standard? That even INT8_MIN may have different values on different architectures (-127, -128), that testing bits, or bit-shifting in such a case might give entirely different results on different hardware or even different compilers for the same hardware?

Hah!

Anyhow, again, I think I've gotten *way* off-topic.

Presently (off topic again) my technique of merging the Assembly boot-code with my C code is to basically compile my C code into assembly and remove nearly everything related to the linker, and hand-code, or otherwise not use, everything the C compiler tries to call from whatever it is (C-Runtime?) that handles things like global inits or integer-multiplication functions. 

It's not *particularly* hard to do, and is surely easier for me than doing the same in raw assembly. It also has the benefit of portability (i.e., again, the integer-multiplication function), at, of course, the cost of not being hand-optimized...

Except, of course, in the odd case where things apparently *can't* be done in standard C, like inserting a "jp main" at a specific address, which has to be added to the assembly output by hand. Or copying a function to RAM. Or telling the compiler not to use absolute addressed jumps in loops. Or telling the RAM usage to start *after* that used by the assembly boot code.

Again, I've no doubt there *is* a way to do much of this ("The Linker!" "C-Runtime!" "#pragmas!") but my head just can't wrap around all that any time soon. 

And, maybe that's OK? I mean, I got my start a couple years before Arduinos came onto the scene... Arduino is even higher-level than avr-gcc... And then, the "retro"/"8-bit" (and even homebrew CPU) trend seems to be stemming from former Arduinite, trying to grasp the lower-level... I'm betting there are many folk, now, finding themselves similarly "in the middle" trying to piece-together the hidden details, that seem so easy from their newly-learned assembly-perspective, that are far more complicated than one might guess... Plausibly leaving many feeling like Bare-Metal Assembly and C are two separate worlds. Maybe this hurdle can be overcome... Or maybe it's just like so many things in my life, I've fallen into some crack that most even experts aren't aware exists; the one weird mind that can understand both C and Assembly, but can't grasp the linker? I dunno.

...

Sheesh, I think I went way off topic. 

I have found *some* resources that at least confirm some of my assumptions... 

One specifically states that my C Compiler *does not* have the ability to output relocatable code (can't tell it not to use absolute jumps), which, frankly, is a bit of a surprise.

Another, I think I linked a few logs back, I think goes through the entire compilation process, from gcc to linker to assembler and surely more. It took *years* to stumble on one like that, and yes, I am planning to read it at some point. But, it also appears to be somewhat specific to one particular toolchain, which I again find a bit odd, as it seems like this process should be well-established. OTOH, if it *weren't* specific to a toolchain, it might be too high-level for me to try-out.

Who knows.

I am, however, still of the minset, reinforced by much of this latest experience, that there needs to be an intermediate step between Assembly and what most folk think of as "C"... 

C, itself, or at least its syntax, would do nicely, it just means whittling it down to those things that don't require libraries without our knowingly/exclusively including/calling them (again "*" comes to mind, and global inits). Oh, and some sort of standard for things like .org.

Assembly, too, would do, if there was a standard syntax across architectures. I've discussed this in other logs, and it's a bit more difficult than the C-stripdown idea, since different arcitectures have different numbers of registers, etc. But, I think, not impossible.

Neither, of course, would be nearly as optimized as hand-written architecture-specific Assembly... But that's not the point.

"What is the point?"

Heh, frankly, I dunno anymore. It had something to do with promoting good programming techniques through awareness of underlying functionality... Something to do with a core set of universal libraries which *aren't* mere autogenerated binary (or assembly) blobs, which are heavily-documented inside and out, portable to all systems, can be learned-from. 

Another idea including potential for what I'mma call "2D-Programming", wherein... Code itself is 1D: lines of text read left-to-right, line-by-line, executed pretty much the same... The 2nd dimension, then, being Time (or learning-process), wherein, say, a function evolved over time. Its first iteration might've been "brute force." E.g. integer multiplication might've been first implemented simply as a loop adding A to A, N times. So, using/viewing the integer-multiplication-library would show this to a newb, in all its well-explained glory. Content with that, they use it, sure why not... But later they're curious about what's next, or need some speed-up, so can dig into their used-libraries to learn more, about whichever topic/library interests them, and select version 2.

The second version might be the use of left-shifts... So forth...

It's a reference-manual/text *within* the *usable* code itself. History, links, links to other libraries/versions that inspired those progressions, etc... In order of the way these things progressed. Being that, frankly, most of these things *have* progressed in that order for a reason... 

It doesn't make sense to throw kids into calculus before they know algebra, but I feel that's *very much* an analog for how computing goes these days. And, frankly, I think there are consequences...

Granted, no one could wrap their head around *every* library and all its versions... even those merely at this low-level. But all the information is there, then, for whoever might take on a bugfix or see some obvious improvement along the way...

I dunno, it's probably crazy, nevermind the potential for near-infinite branches...

But, this sorta thinking is very much where my brain's been at, in this realm, since I started coding some 25 years ago. Which, frankly, is a bit ridiculous, because I tend to reinvent the wheel rather than learn from others' examples... So why would it make sense a person like that would in any way know how to reach folk of an entirely different learning-style?

Maybe I should go eat breakfast, then it'll all be clear.

Discussions

ziggurat29 wrote 09/05/2022 at 22:15 point

Is the gist of what you're trying to do here is run from ram? but burn the code somewhere in ROM so that you can bulk copy it into ram and then jump into it?  because you're trying to realize a flash loader, but you can't run from flash?
Assuming the answer is 'yes', it doesn't seem that SDCC has convenient features for that, but I wonder if:
1)  define a complete and minimal system org'ed out in RAM somewhere of your choosing.  It has the kit and kaboodle of ISRs and LCD and maybe a button and your flashing logic.  It has a well-defined entry point (much like a reset vector -- might as well make it at offset 0).  Just put the required features in this system, because you'd like to avoid fiddling with it once it's working.
2)  after you build that (and test it!), then use a ihex editing tool to 'rebase' that block down somewhere into your regular flash area.  Now it's just an opaque blob, and you can't execute it directly because all the internal addresses are for the RAM location.  I would suggest rebase to the end of flash.  I don't know the details of your flash chip though, like sector sizes, etc.
3)  your main application, when entering 'flash update' mode then block copies that blob to RAM and jumps into it.  Probably does shutdown of IRQs etc, because this is like a cold reboot.  Then the RAM system comes up and takes control of the hardware and does the uart and display stuff, and I'd suggest at least a button for 'abort'.  It burns the flash -- possibly protecting itself from overwrite.  When it's done, it jumps to the reset vector and viola.
If your flash supports sector erase then you should be able to avoid erasing your flash updater blob.  If you can only bulk erase, then you'll have a maintenance chore of being sure to include that blob in subsequent builds of your main application.
Apologies if I'm adding noise because this wasn't what you were trying to do.

  Are you sure? yes | no

Eric Hertz wrote 09/06/2022 at 00:27 point

The ihexFlasher has been working for weeks...

Your method sounds legit, and not too unlike what I did. I only, however, moved one function to RAM, the bit which writes a single sector and waits until the writing procedure completes. (Remember, we've only got 2K of RAM!)

The ihexFlasher resides at Flash Chip's 0x0000, and whatever I want to flash into it gets loaded with an offset of 0x4000, so the ihexFlasher *can't* overwrite itself. A jumper, then, selects which 0x4000-byte-long page to boot from by either leaving the appropriate address bit alone (for the ihexFlasher), or by inverting it (for the new firmware).

I'm intrigued that your method also involves hand-editting of the ihex output. And find it a bit of a relief that it sounds like I'm not too far off in thinking there's not really a "standard" way of relocating code like this without some extra elbow-grease.

...

darttest001a has been slightly modified so that rather than calling its original mainlike loop function "darttest" it instead calls 0x3ff0. I now call it "bootLoadMain", and it fits well under 0x0B00. (so the call is to space outside it).

As-is, then, I can flash this into the lower portion of the ROM.

I also made a script to generate a C header file and an assembly-include containing all the symbols, including EndOfDaveRAM and EndOfDaveCode.

When I compile code from C, it generates assembly which won't assemble without some minor hand-editting... I change its code section to ".org EndOfDaveCode" (actually, I have it round-up to a 256byte boundary, which puts it at 0x0B00) and similar for RAM. A few other such things. Then at the end of the autogenerated assembly code, I add ".org 0x3ff0 jp _main"

After assembly, then, it creates an ihex starting at 0x0B00.

For projects *other* than the ihexFlasher, then, I can write bootLoadMain *once* to the flash, then new experiments from C can be flashed from 0x0B00 each time.

For the ihexFlasher, though, it can't overwrite itself (like your example, it'd have to fit *completely* in, and run from, RAM). So, I combine the ihex files from bootLoadMain and ihexFlasher (simply 'cat bootLoadMain.ihx ihexFlasher.ihx > fullImage.ihx' then delete one line in the middle, the ihex "end of ihex file" from bootLoadMain.ihx). Then I flash that in a chip-programmer.

OK, so that's the hand-linking process, it's pretty easy and much has been scripted.

...

The process of loading a function from ROM into RAM and executing from there is, as I said, not very ideal, in that it's a non-standard use in C that just happens to work in this case. (And, now that I write this, I realize I could just as easily, and within standards, simply use an assembly .equ, sheesh). I take the function-pointer, in C, and cast it to a pointer to a uint8_t, then copy byte-by-byte to RAM up to the next function's pointer. (which, I guess is wrong for many reasons). It works, though, for now. So, then I have to modify the assembly-output of that function, because it uses jp instead of jr *everywhere* (?! I think I heard it's faster, or saves a byte?). But, again, it simply means changing a handful of jp's to jr's... no big deal. Though, it seems surprising jr is not an option, and I found a forum message explicitely stating that, and sounds like you're confirming it too. Weird.

Then I simply call a different function name, that isn't defined anywhere. Compiling complains it doesn't exist, but still outputs the assembly. I add a label with that name, in assembly, at the end of the RAM section, where the function was copied to.

Then it assembles.

Not hard, really. But, again, easy to mess up by hand, and apparently using some "tricks" that can't be relied-on to exist in the C standard... which is kinda the main concern, should it change in a later version of the compiler, or should I want to do similar on a different architecture.

...

So, yeah, I find it quite strange there doesn't seem to *be* a standard way of doing that copy-over and such. Surely I'm missing something, because C has existed for quite some time, and boot-ROM is still used even in PC's today... shadow-ROMs... weren't they even a thing back in 8-bitters, which had 64K of RAM, and swapped out the ROM to do-so?

I mean, I thought it was pretty common-knowledge that, of high-level languages, C is the closest to Assembly... Surely, C's been used for OS's, BIOS's and such for quite some time, if not practically designed for it?

...

Anyhow.

I think I see a big difference between your suggestion and mine: Actually tell it to be *in RAM-Space* when assembling... then its internal jp addresses will be correct, and there's no need to convert to jr. Hmmm...

Since the entirety of the code is *not* in RAM, it'd mean somehow putting that one function in a different section, within C... I do think I saw something like that in the compiler's pragmas, explicitely telling it a starting-address (which would become a .org when converted to assembly)

Then, here's the kicker, manually move that section in the generated ihex file to a free section in the ROM-space (and, of course, create a function to copy from there to the appopriate RAM space, to be called near boot-time).

I think I get it. The hand-work of moving it in the ihex is far less than modifying the call, and such-like in the assembly. And, maybe, could even make modifying the assembly unnecessary if I can figure out a few other things, like how to tell The Linker (I presume) to start its normal code section at EndOfDaveCode.

I could probably even script the ihex modification, since I already have the ihexParser and checksum stuff... Hmmm. 

Thanks for the ideas

Still baffles my mind this process isn't well-defined!

  Are you sure? yes | no

ziggurat29 wrote 09/06/2022 at 02:02 point

Dude, you're defining it!  lol.

I think you're doing it appropriately insofar as this stuff is well outside the realm of language standards, and this is more platform-specific linkage and loader details. It's just we don't have tool support, so you're going to have to munge it a little.  Some other tools do have some support; e.g. STM32 toolchain has a thing called 'ram functions' that are notionally similar (though intended for performance -- not flash avoidance) and there's language /extensions/ to place stuff there to make easy.  crt0.s handles the init from rom to ram.  But it is a little surprise there's not something usable in SDCC because 'overlays' were quite common back in the day -- think your TI-86 banked rom.  Code was linked for a certain address space but was definitely placed in a different one in the image.  But SDCC is a community effort and features are catch-as-catch-come. (I tried extending 8080 support and quickly ran away screaming.  Ken Yap knows because we spoke and he tried and got much further than I did.)

I forgot about the 2k ram constraint, and it's amusing you mention the tiny function approach because I was going to edit my response to suggest similar and thought against it. Anyway, I'd be concerned that if you're flashing the rest of the system, then when tiny function finishes burning a sector, then where will it safely return to? because that location might have moved in the new firmware image?  I guess unless you have a protected area (that you can never update), or at least a well-known vector address.

Welp, if you can't fit an ihex parser in 2K ram, perhaps XMODEM of a binary file might be more to your tastes?

  Are you sure? yes | no

Eric Hertz wrote 09/06/2022 at 04:12 point

The ihexFlasher exists in a separate 16K page of Flash than the "firmware", the two get swapped via physical jumper/switch, so the system either boots directly from one or the other (from what the z80 sees as address 0, which may be the flash chip's physical address 0 or may've been externally swapped with the flash chip's physical address 0x4000).

Yes, I had considered more of a bootloader which could e.g. load the ihexFlasher if a pushbutton was pressed during reset. But, then all those concerns you mentioned would need to be considered. 

Using a bootloader would also mean that whatever program/"firmware" was being tested wouldn't be able to work *without* the bootloader, if, say, I decided to make a project permanent, in an EPROM instead of Flash. Which implies things like ISR vectors, etc. would be inaccessible to the new firmware, unless some sort of interface was designed for just that sort of thing (and each and every one, even if I don't plan to use them), or unless the bootloader didn't use them.

Later, for instance, I've thought about using C to do the whole shebang (like Ken suggested), including setting up ISR vectors, or drivers for the timer/uart... But, if the system was designed to use a bootloader that isn't physically/electrically capable of remapping those vector-table addresses (or the addresses they vector to) to a different program's address, via software control, (or, unless those addresses were in RAM) then there's no way to have two different bootloadable programs which use different functions for ISRs, etc. Thus, no way to test code-changes or apply bugfixes to them.

Instead, just treat it like there's a 16K EPROM at 0x000 containing the ihexFlasher, which writes to a 16K Flash chip at 0x4000, then swap their chip-selects depending on which I want to boot from. (except it's all in one chip).

...

Later, though, such a bootloader might be an idea, especially if I add page-remapping under software control, which is something I definitely have pondered in the TI-86ing era... Also, plausibly, remapping RAM over the ROM space, wherein maybe a small "OS" (or bootloader) could reside in ROM used just to select one from many completely separate and all-consuming 64K programs with custom int-handlers, etc. to be selected from, say, an SD card.... (and plausibly, soon thereafter, multitasking/context-switching!)

But I'm nowhere near there, yet. Therein be custom hardware I've not yet planned-out. And, well, use-cases I've yet to imagine (multitasking *what*?)

...

I'm not at all surprised there are language *extensions* for these things... I just find it rather hard to believe they didn't have a *standard* for it, or at the very least the necessary standards for it to be possible. My K&R book is allegedly in a box marked "Books" and scrawled less-neatly "K&R!" burried under many others, I've been meaning to dig it out. Though, I think I may've done that already many years ago and it's wound-up somewhere unknown like the "Broken Glass Everywhere" boxes. Unfortunately, I've tried reading other references like the C99 standard, and the way it's written just does not compute.

However, it is a very good point that it's hard to expect 100% standards-conformance from... well... most anything. And, well, I guess it makes sense things like sdcc or avr-gcc or stm32's would have their own ways of doing things specific to their devices.

And, then, of course, even if one can write every piece of, say, an OS in standard C, alone, there will still be many factors which are architecture-specific. Aside from the obvious, E.G. as it stands, the ihexFlasher will only work with one particular Flash chip, from one particular manufacturer. So, I guess the question of standards-conformance is a bit moot. One can always \#ifdef \#elif... around extensions, I guess.

...

Interesting the CRT handles the copyover. I guess that would make sense... Too bad I hadn't found out about its existence to even look into further, while looking through the linker docs all those years ago. :/

  Are you sure? yes | no

ziggurat29 wrote 09/06/2022 at 14:02 point

Yup, the crt0.s does a bunch of things depending on platform.  Notable common ones are:
*  copying initialization values from rom into ram (e.g. if you defined a global "int i = 1;"
*  zero-initing globals that do not have explicit initializers
*  setting up the arenas for malloc()
*  setting up handle tables for FILE* stuff
*  invoking constructors for globals if you're doing C++
*  getting environment vars, if that's meaningful
*  getting command line parameters, if that's meaningful
*  etc...
It's a long way to main()!
Not all of that is in asm (in some cases it can be none).  Some of that involves linker shenanigans to coalesce things appropriately.  E.g. all the initialized data is clustered together so the initialization can be done as a bulk copy.  Similarly the unitialized globals being zero'ed in bulk.
The thing is, there /is no/ standard for this because how all this happens is outside the language specification.  It's 'an implementation detail' and systems can do what's convenient so long as the net result once you hit main() is conformant.
It's fun to see what's under the hood, and fun to sometimes replace the engine!

  Are you sure? yes | no

Eric Hertz wrote 09/06/2022 at 19:15 point

GAH!

Yep, then... Looks like:

 "CRT0.s does just that!" 

I did take a brief look at @Ken Yap 's link to minos and looked at its CRT0.s.

It, in comparison to your list, is indeed very minimal.

Which is a bit of a relief to someone like me, as I'd've otherwise imagined many things on that list (e.g. malloc) might be required by the compiler, even though they may not get used in my program...

So, then, a homebrew crt may not be so daunting after all!

I noticed you didn't mention things like multi32()... so, I'm guessing those must be part of LibC, and my gut says there's probably a daunting number of minimal requirements for that.

Though, since I found it's possible to not use it at all, I suppose a minimal LibC could be within reason to homebrew as well. 

Maybe just not even call it libc, and just create (instead of avoid) the missing functions each time the compiler (linker?) complains.

Quite a bit less daunting than trying to start a new project like this, and doing a libc from the start, as a first-step before any code can even be tested on the unit.

As far as block-copying global initializers from code-space to RAM... I had pondered whether that same process could be used to do the copying of the function... Hmmm... 

NOTES TO PAST-ME: 

".s" files aren't some creepy compiled binary that'll destory your terminal-settings, and beep violently, and make you think you've unleashed some horrible script with 'rm -r /' burried amongst all the garbage on your screen if viewed. Nor will it set BASH to some unknown language, expecting you to type "exit" in Wing-dings.

They're friggin' text files containing assembly... The S stands for: "Stupidly chose 's' instead of 'asm'".

(oh, and 'cat'ing a binary, won't screw up your machine, just the terminal window. And there are easy means to fix it, I'm forgetting. Or just close the terminal window and open another.

But in the interest of promoting the idea of "digging under the hood" you could always just view unknown files with a hex editor, first, see whether that stuff at the side is ascii. Also, vim, these days, syntax-highlights MANY file-formats, automatically, which is quite informative when you don't know anything about the syntax. E.G. try vim with an intel hex file [might need to be .hex, rather'n .ihx])

  Are you sure? yes | no

ziggurat29 wrote 09/07/2022 at 00:40 point

Yes, your crt0.s (and that is not a holy name -- it is just common with gcc-esque things) does whatever it needs to do to get the assumptions of the environment valid.

In getting there, C is 'generalized assembler' so that so long as you don't use library routines, you can C your way to paradise with registers and stack, and you can probably implement most/all of your runtime bootstrap code in C.  Just don't touch a printf()  ir a 'c = a * b' until you have the full environment up and running.

You're transitioning the system from a chaotic world of assembly into an orderly world of C.

Do not worry about implementing multi32() or div() or those things -- the stdlib already has those implemented.  If you bootstrap C, you'll get all of stdlib (or at least as much as you want to have. FILE* and malloc() might be more that you care to use, but who cares? don't implement them if you're not going to use them in your product, anyway.)

Anecdote:  In the mid 90's I had to do some of this because my boss insisted on a ca. 1986 MicroFocus COBOL compiler.  He said that he was concerned about upgrading because 'it would introduce bugs'. (Lol. b@tsh!t crazy) But... In this ancient system we had to use 'overlays'. which were basically a register-level ABI. There were no C bindings out-of-box. Yuck! I wasn't going to do any of that!  So I made a bootstrap C environment such that I could implement these 'extension' functions in C, and compile them, and then link them the bootstrap code, and then ultimately call them from the COBOL.  So boss is happy that we're using s4!77y COBOL, and I get to retain my sanity.  And be more productive!  But I did have to manually implement malloc() and free() and also ldiv().  That's the last time I implemented division by hand.  C++ was a little trickier, and I never got vector new working for reasons I won't bore you with.  But who does new[] anyway?

Yes, lol, on the 's' extension. It was a little foreign to me but I've since gotten use to it. One day I'll have to find the origin of that nomenclature.

You're digging into the plumbing of the system that most folks don't even know exists. Isn't that exciting? There's a whole world down here. But practically we do strive to get out of it as quickly as possible.

Now I guess I've made a long post.  Sorry!

  Are you sure? yes | no

Eric Hertz wrote 09/08/2022 at 20:45 point

LOL "We do strive to get out of it as quickly as possible"

I suppose that may be part of the reason it hasn't been easy for me to find "The Process."

At times like these I feel some sense of obligation to spend additional time "in it" to try to document it, for posterity. But Technical Communications was by-far one of my least favorite classes, and, again, I really don't know enough to document these things in any way other than "this is how I understand it" and "this is how it seems to work in this setup/system".

Your explanations to my ramblings are very insightful. Maybe it'll help a few others, too... if they can get past my ramblings.

...

That sounds like quite an adventure with the COBAL thing! But, yeah, I guess there's a lot to be said for working with the tools one's experienced with. That is why I tend toward C, and trying to write it in portable of ways.

C++, Ugh. Heh. I think I recall being rather inspired in taking that class right after the C class... But, I think I eventually realized it wasn't as much an extension than an entirely different and higher-level language... which was kinda the opposite direction than I had use for. (Java was another class in the *two*-class series, and, in fact, C++ was the second and final until literally the quarter after I took C, after which they replaced C++ with Java, so I took C++ as an elective. I was none too happy with Java!).

I guess in being somewhat used to 1K or less of RAM, the ideas of malloc and new seemed exceptionally risky. And, indeed, it took me many years to feel OK with "bad practices" of, e.g. making arrays (and strings!) global that are only used in main(), so that avr-gcc would report them in the RAM-usage section.

After I finally got over that, I'm somewhat-convinced many of the "good/bad practices" really were taught from the perspective of an application running under an OS, with a lot of RAM, etc. and should probably be evaluated on a case-by-case basis.

gotos, still, I avoid with deeply-ingrained passion... but lately, e.g., I've become very fond of replacing for() loops with while(1)...if()break. I've gotten into the habit of \#define NOTBREAK 1 , and then it's while(NOTBREAK) to make it clear it's not intended to be infinite. But, I've found, the benefits are many; e.g. no more trying to recall whether the for loop will *always* be enterred, then the test run after, or what to do if you've got two things that iterate simultaneously, like a character counter *and* a character address. And it's *far* easier to have *multiple* different conditions that cause a break. And, further, many different handlings for such cases (does i get incremented in this case, but not the other?).

Looking back at old code I've seen some really gnarly tests around those two colons... encouraged by peers' boasting things like "good coders know details like those" (when the test is run, when/if the increment occurs, how to put actual code *in* the tests/increments, how to do multiple things within one of those tests, etc.) and trying to come up with, seemingly, the least-legible "code" they could, as some means of showing their expertise... 

Encouraged, as well, by the idea C was higher-level than it really is... It's not that C does all that math within if() before ultimately feeding 0 or nonzero into the 'if' instruction. It's that it, essentially, does whatever math is necessary to run an 'if' test determine whether it needs to do more math to run another test, and another jump-if-non-zero to possibly another test, and so-forth. 

I've found, too many times, come back a year later and there's no clue how a complicated conditional works. Heh! 

Break up those if conditionals into multiple if statements, nest them for &&, separate them for ||... That's far closer to how it would be compiled, either way, anyhow. Far closer to how the machine will actually run it. And far clearer to read.

I'm pretty fond of my recent NOTBREAK idea. Though, I have a feeling it probably doesn't lend itself, as well, to compilation to e.g. 'brne'. I'll take clarity over that, in most cases. [and, I've found, it rarely gets used, anyhow, because what's the compiler going to do, switch around its register contents every time there's another loop on a different iterater?].

I suppose that might be where RISC has begun to take over... A good assembly programmer would know which registers are best to use for what throughout a function, but a compiler optimizer would have to be really sophisticated to do the same, so such CISC instructions might just go unused, regularly, anyhow. [Makes me wonder about e.g. x86's SSE extensions, etc. Were they called *explicitely* by the coders? Dragging higher-level languages back down to the Assembly-level? hmmm.]

  Are you sure? yes | no

ziggurat29 wrote 09/08/2022 at 21:56 point

C++ was a miraculous discovery for me in the late 80s/early 90s.  "You mean I don't have to manually write and call and nest init/uninit routines for structs?" (constructors/destructors)  "You mean I don't have to manually make a struct of function pointers to abstract behaviour?" (virtual functions)  Those two things were the closer for me.  Sold.  But I think my device driver background made the value of such more obvious to me than for some of my colleagues who seemed to struggle with the 'why OO' thing, and the religion thereof of how to do OO 'right'.  But fast forward 20+ years to C++11 and things have gotten quite sophisticated.  Where once I was an expert, now I am a dilettante.
School curriculum with Java was in vogue in the late 90s ostensibly to teach hands-on stuff that would get students jobs, and I guess "maybe", but it seems that schools are doing less of that now and getting back to principles rather than specific platforms.  Which I think is good.
As for compiler optimizers, they are quite good now -- often better than hand optimising.  I remember disassembling some code (and this was 18 years ago) that looked kooky, but upon further reflection, I realized the optimizer knew how to keep the instruction pipeline full by re-ordering the assembly.  Things are so sophisticated now that you kind of need a machine to do that stuff for you.
As for things like SSE -- yes, the coders explicitly call them at some point in some form.  Hence 'libraries'.  Someone had to think the thought, and then you get to reuse it -- perhaps firewalled away from those gory details by things like DirectX, CUDA, OpenCL, etc.
We've come a long way!

  Are you sure? yes | no

Eric Hertz wrote 09/09/2022 at 03:25 point

I can see the benefits!

  Are you sure? yes | no

Ken Yap wrote 09/05/2022 at 06:49 point

There's another point I noted a while back. You wondered why when you declared a const int for the address of a routine, you got an indirect jump. The reason is const doesn't mean a substitution with a literal integer. Rather it declares an immutable datum. If that datum has external linkage, then another module could refer to it and expect to be able to treat it as a datum to be read and in particular to take the address of with &. If the const datum is local to the module, the compiler could optimise it by replacing instances with a literal, but it doesn't have to. This is one case where you need to use a \#define.

When const is applied to compound data types like arrays or structs, then there is no way the compiler can make a literal replacement, so it goes into the .rodata area.

  Are you sure? yes | no

Eric Hertz wrote 09/05/2022 at 07:10 point

Interesting. Thanks for the explanation. 

I wasn't exactly wondering why on the matter, since there were so many matters to consider at the time. I kinda just accepted it as-is and moved on, since the fix itself was rather easy in the assembly-output I was already modifying, and the consequences of not doing-so would've only resulted in a few extra instruction cycles, if I had the proper library to supply the trampoline function. 

But, indeed, your explanation makes quite a bit of sense, and could be helpful to know in future endeavors. The address-of explanation really makes it hit home.

  Are you sure? yes | no

Ken Yap wrote 09/05/2022 at 00:35 point

So much flailing around. I never understood why you home brewed C/asm linkage when indeed "the linker does just that". Did you not suspect that software engineers (and embedded SEs in particular) would have developed standard solutions for combining object files?

Firstly, use C as the primary language. It is possible to do nearly everything in it. C proved it is possible to use a higher level language than assembly for systems programming. Have a look at https://github.com/hkzlab/minos-z80-monitor to see how practically everything is done in C, with a custom crt0.s to handle the memory layout of the target board.

Secondly, the standard executable sections. Unix defined .text (code), .data (mutable initialised data), and .bss (zeroed data). Later .rodata was added (for immutable data). The last is important for Harvard architectures where constant tables have to be stored in the ROM region. It's also important so that constants are not stored twice wasting memory, once in the .rodata section and again in the .data section, copied from .rodata at initialisation. Sometimes a judicious const will put the constant in .rodata, sometimes a keyword or pragma is needed, like PROGMEM on Arduino. ELF defines more sections: https://wiki.osdev.org/ELF Even when the toolchain isn't using ELF, e.g. SDCC, similar concepts will apply to sections.

Thirdly, relocation. In the absence of a linking loader, memory management units, or instruction sets that support position independent code, chunks of binary code cannot be copied from one area to another and still work. If the intention is to execute out of RAM, then the linker must be directed to place code at the targeted execution address. It won't work to generate for ROM, and then do a block copy to RAM.

Finally in looking for answers, it helps to first clarify your doubts to a small set of questions you need to answer either by searching or asking. Otherwise it's just a stream of consciousness ramble that people find difficult to follow and give up.

  Are you sure? yes | no

Eric Hertz wrote 09/05/2022 at 21:02 point

I think the "I never did understand why" should be pretty well-explained in this writing. If not, I'll try to summarize.

I have a lot of history with C, itself. Specifically, mostly, avr-gcc and other microcontrollers. A *tiny* bit of history with Assembly, mostly through these comparatively recent z80 endeavors. I've never had reason to go deeper into "object files," than to look at an occasional assembly-output. As far as I knew, they were simply like precompiled libraries.

I *also* have a rather bad history with "The Linker," as I was long ago misguided to believe it was responsible for something it's not, (which you confirmed). And because of that I was unable to understand the reference materials I found about the linker, itself. I was never able to resolve this misbelief because, well, I lost that contract/gig/job with that project unfinished because I'd spent months going down the wrong path a group of "engineers" led me down. 

I had no need, nor time, for that level of coding in any of my projects since, so only "kept an eye out" in the years following, rather than, say, trying to research it further.

This brings us to a couple years ago: As a change of pace, I started learning assembly... I had a project, and situation, where coding in C was not an option. That project ran in RAM, atop a minimal OS stored in ROM, thus I wasn't coding anything at the lowest level of setting up ISRs, etc. And the OS was responsible for loading my executable into the proper RAM location to execute from.

You see, no connection, whatsoever, to a compiler-toolchain. Simply "assemble project.asm, execute project."

This brings us to this project:

This project was a reverse-engineering of a complete functioning system, which happened to have the same z80 processor. Disassembly of the ROM was a huge part of understanding its innards. @ziggurat29 did an amazing job with that, and wrote some bootable test code for its peripherals. 

Familiar, now, with z80 assembly, I did a few slight modifications to his test code and decided that if I really wanted to continue development for/on this machine, I'd need a way to write my own test code in the way I usually do, which means small iterations. Which meant I needed a way to write the new programs to the ROM quickly. Which meant that, basically, my first project was to write the ihexFlasher. Now, There Was No Way I was going to write all that in assembly, since I couldn't test it along the way. Instead I wrote it in C and tested the majority of it under linux (parsing the ihex strings, etc). So I had to figure out how to make my C code work with the "boot" and peripheral stuff we already had.

It wasn't written to be compiler-chain-conformant... it was just bootable peripheral test-code. And, frankly, it wouldn't've mattered if it *was* somehow conformant to the compiler-toolchain, as I pointed-out, I've pretty much never had to do anything at that level before, so wouldn't've known where to start. (would it have been CRT? LibC?).

So, I lucked-out when I found that the C compiler would emit raw assembly, even though it couldn't find many of the functions it needed. At that point all I had to do was modify the calls to those functions to instead call the addresses of the functions in the boot code. It's not difficult to do, at all. Took maybe a day to figure out 

So, now, I think your question is answered. This system does not have toolchain-compliant object files or libraries or whatever you're referring to. And even if it did, I wouldn't know where to even start looking into that.

This is the path I'm on. 

Maybe it's not "the right way", but it clearly works. And, I'm learning more about "the right way" as I go... which is more than can be said for my earlier attempts at "*learning it* the right way," wherein, again, I was completely misled and wasted months, and was unable to finish the project.

  Are you sure? yes | no