This project documents the design and implementation of a string format that is
- flexible
- safe
- efficient in code space
- efficient in data space
- efficient in code execution
This is in part due to
- a compact representation with only one cache line to hit,
- its flexible implementation that instantiates the required features only,
- a definition that is adapted to static and dynamic processing,
- compatibility with POSIX's ASCIIZ format (trailing NULL bytes are preserved),
- trailing NULL byte is replaced internally by a canary,
- code and data that can be accessed at any level of abstraction.
ASCII is the basic character encoding but Unicode is supported:
- as UTF-8 byte sequences in types 3/Flex3 lists,
- as attributes in Flex1 and Flex2 byte arrays,
- and as integer code points in type UP.
The attributes make it easier to distinguish the appropriate "domains", helping with ASCII-only processing without excluding UNICODE. This makes it suitable from basic, barebones utilities as well as more elaborate applications.
To define constant strings, the basic structure is defined as:
typedef struct {
uint8_t len __attribute__ ((aligned (4)));
char text[];
} aString_8;
Getting the length of the array is as simple as getting the first word and masking off a number of bytes given by the pointer's LSB:
static inline int aStr_length(aStr_t p) {
uint32_t *q = (uint32_t *)(p & ~3);
int LSB = (p & 3);
return (*q) & ~((~0) << (LSB<<3));
}
A variant (with bit 2 of the pointer set) adds another field to keep the allocated size, helping to perform variable-sized operations:
// Same as above but with an extra 32-bit size field:
typedef struct {
uint32_t allocated __attribute__ ((aligned (8)));
uint8_t len;
char text[];
} Flex_aString_8;
Of course, the pointer that the program manages is always the address of .text[]. The actual type is given by the pointer's LSB: the same principle works for the longer 16-bit version (types 2 and F2) and the lists (types 3 and F3 ).
And you can dynamically declare/allocate Flex strings on the stack, inside a function.
-o-O-0-O-o-
Logs:
1. Dealing with re-alignment and 2 string types only
2. Context
3. 2023 : a new version
4. Evolution...
5. Merge works
6. Holding back a bit
7. More food for thoughts
8. Article !
9. Another possible extension
10. More than an error
11. Fuzzing and safety
12. Extension and type
13. Aligned Strings with Attributes
14. Attributes moved to Flex
15. It's not a liability, it's a feature!
16. The canary is singing.
17. The enhanced "aligned strings" format is named aStrA
.
.
.
https://hackaday.com/2023/02/10/modernizing-c-arrays-for-greater-memory-safety/
https://developers.redhat.com/articles/2022/09/29/benefits-limitations-flexible-array-members#nonconforming_compiler_extensions
https://people.kernel.org/kees/bounded-flexible-arrays-in-c