Hello 2024, here I am again !

The old log 5. Typing had a few ideas but some water has flowed under the bridges since...

Concerning the "just an integer" idea, I have discarded it because it's just a bat5h1t crazy time bomb that's furiously ticking and eager to blow just like it did in C. So I have decided that all the types should be fully defined at declaration time.

But what if I want a "whatever" type, just to get the ball rolling and not care until later ? Enters Mr Cockroft with his insane DEC64 format. Look at https://www.crockford.com/dec64.html ... To say that I endorse it is an exaggeration but in the context of what I intend to use if for (prototype algorithms before refining the implementation), it's "good enough" and provides some convenience for integer platforms. It is somewhat inspired by the JavaScript tradition and provides 56 bits of integerness, some weird scaling rules, but it is not as inconvenient as IEEE754 and I guess I can use if for some DSP work for example.

So here are the scalar types :

Padding (P) (no type, no read or write, no meaning)
Unsigned integer (U prefix) (not I ! because Integer is not clearly describing signedness)
Signed integer (S prefix)
Floating point (IEEE754, F prefix)
Dec floating point (prefix D)
Boolean (B)
undefined ?

The type upper case letter is followed by a number that is a power of 2, at least 8. So valid sizes are :

P : P8, P16, P32, P64...
U : U8, U16, U32, U64, U128, U256, U512, U1024, U2048...
S : S8, S16, S32, S64, S128, S256, S512, S1024, S2048, S4096...
F : F8, F16, F32, F64, F128 (not sure F256 exists or is useful, F40 and F80 exist though)
D : only D16, D32 and D64 make sense so far.
B: whatever.

Modifiers:

pointer
SIMD flag (yes, a SIMD "vector" can be considered as a scalar because it can be held in a register)
"const" flag : read-only
write-only flag
volatile

padding, ro and wo can be combined into a 2-bit field:

00 : padding (no read, no write)
01 : read only (const flag)
10 : write only (could be a "sink" or dummy)
11 : normal variable

Even that is not able to fully describe a scalar value. So a sort of "syntax" is needed...

I'm reinventing some sort of ASN.1 binary syntax in fact! But adapted and constrained to the types I expect to handle under the hood of my toy language.

Anyway a scalar type descript will not fit into a byte.

The size field is 4 bits to accommodate 16 possible sizes in bytes:

8 16 32 64 128 256 512 1024
2048 4096 8192 16384 32K 64K 128K 256K

256K is a lot... but you never know and the bits are there. You're free to set your own limit for the number of bits you want to support.

I have also defined 5 types: U/S/F/D/B, so that's 3 bits with some margin. Unicode points are just a subset, for example.

Then the modifiers:

volatile: 1 bit
pad/ro/wo/var: 2 bits
SIMD : 1 bit
pointer: 1 bit (excludes some other flags and types)

Another property I would like to add is overflow behaviour:

saturate
wraparound
... (forgot one)
trap

That fits in two more bits.

The total is 5+3+4+2=14 bits, fitting in a U16 scalar with 2 bits left for extensions. Because I have not found yet a way to describe a fixed point integer yet.

So what's the purpose of this internal, binary, unambiguous representation ?

Oh, there are many reasons to do so.

First it is a great way to compare function prototypes without crazy hassles later.

Imagine, you describe a function with a list of parameters, it gets encoded in a binary chain, so that chain is all you need to make sure an API matches between caller and callee. Just compare the chain for equality.

It is the basis for a typing hierarchy and a bytecode-like version of a program, which can be later described unambiguously across languages.

The remaining 2 bits can encode the type of the U16:

00: scalar
01: array
10: struct
11 : union ? that's how you alias incompatible types and is potentially dangerous, so it is reserved in certain modes. Pointer casting may be another solution. That's a Pandora's box waiting to be opened.

Aligned Strings are missing as well.

I also need to define a range, that would be in a struct probably.

But since it is a "bootstrap language" it is not required to support every bell and whistle, right ?

Edit :

I have forgotten the "Function" type...

Typing (for realz)

So what's the purpose of this internal, binary, unambiguous representation ?

Edit :

Discussions

Typing (for realz)

So what's the purpose of this internal, binary, unambiguous representation ?

Edit :

2024

Discussions

Become a Hackaday.io Member