reprint: modern printf

reprint is printf redone with decades of hindsight, revamping the semantics and syntax to make a function worthy of Hello World.

Similar projects worth following
printf is the first function every C programmer encounters, but its specification is a chaotic mess and understanding its internals gives experienced programmers nightmares. reprint refactors the format string concept into a more versatile and portable format. The aim is for the internals to be accessible to the experienced and approachable to the neophyte.

-Less RAM: printf() requires preallocated memory to buffer before it even prints. reprint does not eagerly populate a buffer, but waits for free space to be sent to it. Thus, the RAM requirements scale from tiny MCU to supercomputer

-Structs and Arrays: printf() requires all parameters to be marshalled on the stack. reprint can print from just a pointer, so arrays and structures can be printed with only the pointer

-Consistent syntax: printf() is built on a hodge-podge of English abbreviations. reprint() is built on the structure of the ASCII table

reprint is a refactoring of printf in the following ways:

1) reprint and printf are both templating functions; reprint frames its operation as a virtual machine, complete with a rigorously defined set of 'registers' and modifier bits
2) printf defines a seemingly arbitrary syntax; reprint's syntax is structured by the ASCII table
3) reprint supports binary output
4) reprint does not require input data to be marshalled via variadic arguments; packed data or pointers to a struct work just as well
5) reprint not only supports normal printf operation, but allows the programmer to 'pull' output bytes as needed or as they can fit into output buffers (such as DMA or a small buffer for interrupt use)

6) reprint acknowledges that formatted output is the first thing a programmer learns and strives to set a rigorous example for neophyte programmers to follow.

  • Code: Exponential notation

    Analog Two04/12/2016 at 16:00 0 comments

    Exponential notation (aka 'Scientific' notation) represents a numeric value as mantissa and exponent. The mantissa is value in the interval [1, 10) in decimal notation. This notation is convenient for numbers with many digits (really really small fractions or larger numbers). printf only prints floats or doubles in exponential notation, as

    /* Would print 4.200000e01 */
    float x = 42.0;
    printf("%e", x);

    In reprint, exponential notation is not bound to the type of input; it is simply a method of output. Both integers and floating point values can be represented in exponential notation:

    /* Print 4.200000e1 */
    float x = 42.0;
    reprint("\", x);
    /* Print 4.200000e1 */
    int y = 42;
    reprintf("\f.r", y);

    Printing integers in exponential notation may seem silly, but it's perfectly valid from a mathematical standpoint. If you are dealing with very large noisy counters, then exponential notation could neaten their appearance.

  • Internals: the asset of GOTO

    Analog Two04/08/2016 at 12:06 3 comments

    TRIGGER WARNING: The following content may harm the sensibilities of those who hate GOTO.

    The reprint_cb() is a state machine that interprets a format string and outputs characters based on its current state. It is natural in its control flow that from a single dispatch point, the correct code segment is executed to say, output a numeric digit or bitfield. Typically, state machines are implemented with an integer representing the state and a corresponding switch statement that peforms the dispatch.

    On embedded systems this wastes space and time. switch() statements entail a lookup table at best, and a sequence of if statements at worst. Stepping through the generated code instruction by instruction makes one very aware of this waste. Consequently, I wanted to do what our forefather assembler programmers could do: JMP or BR to an arbitrary address. Unfortunately, these capabilities were banned from Standard C to protect the masses.

    However, gcc supports labels as values. Thus, instead of maintaining an integer state, I can store a starting address to the segment I want to execute next (hint this is reprint's program counter). Instead of wading through if statements and jump table lookups, this is just a single instruction branch. The code size and execution time savings easily become apparent on an MSP430, which is the first embedded processor to run reprint.


    Great write up by Eli Bendersky

  • Code: Fixed point output

    Analog Two04/06/2016 at 20:20 2 comments

    Fixed point on microprocessors is typically preferred in place of floating point. The Naive programmers can unknowingly encumber their firmware with printf, strtod, and associated if they do not know this. reprint supports fixed point output.

    /* Printing to the hundredths place -> "42.042" */
    int x = 42042;
    reprint("\f3<r", x); 
    /* Printing to the hundredths place with printf*/
    printf("%u.%03u", x/ 1000, x%1000);

    In reprint, we simply load 2 into Register 3 (identified by the "<" character). This is the amount we shift the decimal point to the left. When printing this requires no extra calculation. Oh, and if the 2 is omitted, the shift factor is specified as part of the varargs.

    In printf, we much calculate 2 separate values, a division by a 100 and a remainder, in order to split up our source value into the integral and fractional part. We must also remember to zero pad the second number so the leading zeros show up.

    Which one do you think is simpler?

  • Code: Printing bitfields

    Analog Two04/05/2016 at 03:48 1 comment

    Data that is tightly packed typically ignores 8 bit boundaries, which entails much shifting and masking. reprintf eliminates the need for the programmer to their own shifts and masks. The following code shows printing an IPv4 Header.

        \fN;ncw prints N bits of data from this pool (in decimal by default).  */
    const char test_reprint_ipv4[] = 
        "\f\r0=cq"                     // Specify packed data
                                       // and load 16 bits; no printing
        "Version:           \f4;ncw\n" // Print  4 bits;  4 total
        "Header Words:      \f4;ncw\n" // Print  4 bits;  8 total
        "DSCP:              \f6;ncw\n" // Print  6 bits; 14 total
        "ECN:               \f2;cw\n"  // Print  2 bits; 16 total
        "Total Bytes:       \fcnq\n"   // Print 16 bit value
        "Identification:    \fcnq\n"   // Print 16 bit value
        "\f0=cq"                       // Load  16 bits; no printing
        "Flags:             \f&3;ncw\n" // Print  3 bits in binary;  3 total
        "Fragment Offset:   \f13;cw\n"  // Print 13 bits;           16 total
        "Protocol:          \fcp\n"    // Print  8 bit value
        "TTL:               \fcp\n"    // Print  8 bit value
        "Header Checksum:   \fcnq\n"   // Print 16 bit value
        "Source IP:         \fcp.\fcp.\fcp.\fcp\n"  // Print 4 1 byte values
        "Dest IP:           \fcp.\fcp.\fcp.\fcp\n"; // Print 4 1 byte values
    reprintf_ptr(test_reprint_ipv4, incoming_packet);

    The packing directive "\r" indicates the data format is tightly packed, rather than struct packed.

    The input specifier "cq" corresponds to "uint16_t", so exactly 16 bits are loaded into the Value register. The "0=" sequence specifically loads 0 into Register 4, which for formatted integer output governs the number of significant digits printed. Printing 0 significant digits is essentially a no-op but leaves the value loaded in the Value register. The input modifier "n" indicates the input datum is big endian formatted.

    The input specifier "cw" specifically calls out bitfields and assumes the bit data was already loaded into the Value Register. The ';' character identifies Register 3, which is the parameter governing how many bits are output.

    Using printf, the equivalent code (without the binary flag output of course) is:

    /* Using printf. The format string may appear simpler, but correctly extracting
    the data from the packet just to put it on the stack is a painful task.
    I'm not even sure if that part is right... */
    const char test_printf_ipv4[] = 
        "Version:           %u\n"
        "Header Words:      %u\n"
        "DSCP:              %u\n"
        "ECN:               %u\n"
        "Total Bytes:       %u\n"
        "Identification:    %u\n"
        "Flags:             %x\n"
        "Fragment Offset:   %u\n"
        "Protocol:          %u\n"
        "TTL:               %u\n"
        "Header Checksum:   %u\n"
        "Source IP:         %u.%u.%u.%u\n"
        "Dest IP:           %u.%u.%u.%u\n";
        ,incoming_packet[0] >> 4
        ,incoming_packet[0] & 0xF
        ,incoming_packet[1] >> 2
        ,incoming_packet[1] & 0x3
        ,*(uint16_t*)(incoming_packet + 2)
        ,*(uint16_t*)(incoming_packet + 4)
        ,incoming_packet[5] >> 5
        ,*(uint16_t*)(incoming_packet + 6) & 0x1FFF
        ,*(uint16_t*)(incoming_packet + 10)

  • Code: Printing values in "ones and zeros"

    Analog Two04/04/2016 at 01:10 1 comment

    Sometimes fixing a bit twiddling or other low level bug comes down to showing the individual bits. printf does not support printing an integer in radix 2, despite originating from times when code was closer to the metal. The code in reprint is as follows:

    /* Print 42 as binary */
    reprintf("\f&r", 42);
    Octal and hex of course are supported:
    /* Print 42 as hex */
    reprintf("\f$r", 42);
    /* Print 42 as octal */
    reprintf("\f%r", 42);

    These use of '$', '%', and '&' are not entirely arbitrary, as the three characters are sequential in value:

    1. '$' is 0x24 and selects hexadecimal
    2. '%' is 0x25 and selects octal
    3. '&' is 0x26 and selects radix 2 (binary)
    4. Default output is in decimal

  • Code: Indentation

    Analog Two04/03/2016 at 13:05 0 comments

    A common pattern in output is to indent a line based on its depth in a hierarchy (i.e., JSON or XML, or function depth when debugging). Though the output is the same, the method is different between reprint and printf:

    /* Indentation with N spaces on reprint and printf */
    int N = 10;
    reprintf("\f=ep", N, ' ');
    printf("%*s", N, "");
    • In printf, we use the left pad approach but put ε, (the empty string), as the value to be padded. In printf, padding is hardcoded to the ' ' character resulting in N spaces.
    • In reprint, the command is to repeatedly print a specific character N times. The syntax breaks down as follows:
    1. \f: Formatted output field header
    2. =: Store the corresponding integer in the '=' register (register 4).
    3. ep: The data input type is a character, 8 bits.

    In reprint, numeric parameters are specified by the user as loading register values (much like a microprocessor). The numeric value does not have any meaning without a corresponding input type. So when reprint parses 'e', the meaning of the register is understood to be character repetition.

  • Code: No worries about left pad

    Analog Two04/02/2016 at 11:30 0 comments

    If you are not familiar with the Node.js leftpad debacle, essentially someone wrote a single function to format a string to particular length, adding spaces to the left. He exported this function as a library and thousands of projects depended on it. After he removed his library, thousands of projects failed to build because they were missing this one simple function.

    Thankfully, there are no worries here as reprint supports left pad!

    reprintf("\f5r", 42);
    printf("%5i", 42);

    In this case, we are printing out 42 with a column width of 5. This pads 3 space ' ' characters before the 42. The pad character can be arbitrary (unlike printf), but that is another post.

  • Code: Printing signed integers

    Analog Two04/01/2016 at 10:24 0 comments

    Here is a head to head comparison of reprintf to printf:

    /* char */
    char sc = -42;
    printf("The answer is %hhi", sc);
    reprintf("The answer is \fp", sc);
    /* short */
    short ss = -4242;
    printf("The answer is %hi", ss);
    reprint("The answer is \fq", ss);
    /* int */
    int si = -424242;
    printf("The answer is %i", si);
    reprintf("The answer is \fr", si);

    Most C programmers (consciously or not) default to printing signed integers. In reprint, this is the easiest format to output as it requires only a single letter (p, q, or r) to follow \f. Namely,

    1. 'p' corresponds to "char"
    2. 'q' corresponds to "short int"
    3. 'r' corresponds to plain "int"
    4. 's' corresponds to "long int"

    The reason for starting at 'p' is simply that its corresponding hex value is 0x70, putting it at the top of its column in the ASCII table. Thus if we look at the lower 3 bits of each character:

    1. 'p' & 0x7 == 0
    2. 'q' & 0x7 == 1
    3. 'r' & 0x7 == 2
    4. 's' & 0x7 == 3

    There are even more integer types defined by C and referenced by characters beyond 's', but that is enough for now.

    Contrast this with printf, where 'hh' is the *smallest sized integer, just a single 'h' is second smallest, no modifier is "normal" and 'l' is the bigger. Arbitrary much?

    *(on some platforms char is 32 bits...and the same size as the other types.)

  • Design: Populating bitwise registers from the format string

    Analog Two03/31/2016 at 13:56 0 comments

    The Output Control Register, Input Control Register and Input Size Register are populated from the lower bits of the characters in the format string, streamlining the parsing procedure:

    1. Upon parsing a Field Header, set Output Control Register Bit 13
    2. A packing directive may immediately follow and set Input Control Register Bit 7
    3. In general, the Output Control and Input Control registers are set by the lower bits of the character data in the format field string.
    4. The Flag characters each correspond to a single bit in the Output Control Register.
    5. Output Control Register Bits 2, 5, 8 toggle to 1 if a corresponding Selector character appears in the conversion specifier. They are 0 otherwise.
    6. A final Input Size sets the lower 4 bits of the Input Size Register.

  • Design: Conversion Specifiers

    Analog Two03/30/2016 at 16:40 0 comments

    Like printf, reprint also has conversion specifiers to control data formatting. The conversion specifier breaks down to the following parts

    1. Field Header: Indicates start of conversion specifier. The \f header starts a formatted specifier, while \b header starts a binary output specifier.
    2. Packing Directive: Specify whether source data is tightly packed or packed as a C struct.
    3. Output Control: Various parameters for controlling output.
    4. Input Specification: Various parameters for interpreting the input. At minimum there is a size specification, which terminates the conversion specifier.

    The exact characters comprising these sections are shown in the ASCII table breakdown of the conversion specifier syntax.

    View all 14 project logs

    Enjoy this project?



    mrhee2u wrote 11/08/2020 at 22:40 point

    Is this project still being developed? I've got a different implementation of a stateful printf that is much more powerful. It is a complete printf language that has gone through a few generations so it has all the functionality of this reprintf() but not constrained to the reprintf() limits. I'm looking to merge with this project for a new release.

      Are you sure? yes | no

    esot.eric wrote 03/29/2016 at 07:58 point

    Could you do a log-entry or something that's basically a 'hello-world'?

      Are you sure? yes | no

    Analog Two wrote 03/29/2016 at 11:02 point

    That's a good idea; coming up...

      Are you sure? yes | no

    Similar Projects

    Does this project spark your interest?

    Become a member to follow this project and never miss any updates