The STM8S architecture uses a 3-stage pipeline, and the length of
instructions varies from 1 to 5 bytes with an average size of 2
bytes. A fetch from Flash gets 32bit (4 bytes) in one clock cycle, and because of this the instruction length has little impact on the execution time. However, when the pipeline needs to be flushed, the time
for code execution depends heavily on the location of
instructions relative to a 32bit boundary: in close "spin loops" I
observed execution time changes of more than 20% when I moved a routine by one
This said, the performance of code in RAM *) isn't as good as from Flash, since the code has to be fetched byte-by-byte instead of 32bit at a time. On the bright side, the runtime is independent from the location.
So, when testing Forth code in RAM don't expect it to perform in the same way in Flash memory. My advise is to always use a hardware timer for time critical code on the STM8S!
*) Code execution from EEPROM leads to hardware reset. An earlier version of this log wrongly assumed that RAM and EEPROM have the same execution properties!