Learn deep logic optimization by implementing the fastest ALU West of the Pecos River, using no specialized arithmetic chip

Similar projects worth following
"Give a man a fish and you feed him for a day; teach a man to fish and you feed him for a lifetime"

What a greater project than one that involves education? Only learning can drive people away from extremism, obscurantism, ignorance or just plain idiocy. This is my main ideal goal for this project.

And at times when large global companies take the opportunity of the increasing technology abstraction and complication to push you to buy more of their now almost magic gadgets, learning how these computer-based devices work provides a kind of refreshingly uninterested entertainment.

In this regard, the Arithmetic and Logic Unit (ALU) plays an important role: at the heart of the Central Processing Unit (CPU) which in turns is the main part of a computer, it is often overlooked or considered as a mystical element that escapes human understanding.

Thus, the second practical goal for this project will be to learn how to build the fastest ALU based on non-specialized chip

"What is a computer, and what makes it different from all other man-made machines?"

Answering these questions leads to a surprisingly long cascade of concepts and practical decisions,. But interestingly, despite that computers are actually physical world implementations, they still keep a a very close relationship to pure concepts. Mathematics can be used to demonstrate the suitability of an abstract digital computer to solve any kind of problems using algorithms, and the Boolean algebra and its applied scientific counterpart, logic design, are the tools that helped transform these abstract machines into real ones.

Unlike abstract computers, real-world computers necessarily contain some limitations and rely on some physical architectures., Nevertheless, besides their differences, all these computer architectures still rely on the concept of processor, that executes instructions realizing logic and arithmetic operations.

Each project log will introduce another step towards the final goal of an optimized Arithmetic and Logic Unir (ALU) at the heart of the processor, made out of pure logic elements. This way, one can cherry pick a subject in particular, skip it if it is already mastered, or follow each step in turn to learn logic design with almost no prerequisite but common sense.

Beware! This project will go far beyond the classical ALU design covered in academic courses. If you ever followed one of these courses in logic design and felt yourself proud of having acquired this knowledge, but with a mixed feeling of underachievement, read on, this project is for you!

By starting from the historical and almost philosophical perspective of a computer, the goal of this project is to dive quickly into the subject of logic design, although a minimum of Boolean algebra theory will necessarily be covered. The idea is to get used to the basic rules required to use the small Boolean logic elements as building blocks into circuits that progressively realize more and more complex operations.

As one of the most complex operation, the basic arithmetic adder will be covered first, pointing out its inherent performance problem. But clever pure logic optimizations will be carried out in order to overcome this problem, and further optimizations will enable us to integrate into the adder all required logic operations at no performance cost, and eventually another last useful digital operation will be added to provide a complete, extremely optimized ALU design.

These logs will use as little theory as possible to keep them readable by everybody, and its is hoped to have them as interactive as possible based on the user comments.

Hopefully in the end, this project will result in a hardware proof of concept in the form of a real-world circuit made out of discrete logic chips on a breadboard.


Logiisim file for the ALUMinimum project

circ - 126.62 kB - 04/25/2017 at 22:19


  • Part 6: Boolean Algebra

    Squonk4207/20/2017 at 11:05 0 comments

    If we try to summarize what we have seen so far:

    1. Arithmetic operations on arbitrary analog values are better performed using digital numbers to get a repeatable accuracy
    2. Unsigned integer digital numbers are represented as base 2 (binary) numbers using only 0 and 1 values, using a positional notation with enough binary digits (bits). Using only 2 values offers the simplest and most effective design to store digital numbers in a physical device
    3. Signed integer digital numbers are represented using a 2's complement notation but otherwise follow the same arithmetic rules as unsigned integer digital numbers
    4. Non-integer numbers can be represented using:
      1. a fixed-point notation where the decimal point position is implicit and a given number of bits right of it are used to hold the fractional part of the number, following the same arithmetic rules as integer digital numbers
      2. a floating point notation where the decimal point position is given by an integer exponent, and the remaining bits hold the fractional part of the number as the mantissa. Both the exponent and mantissa parts still follow the same arithmetic rules as integer digital numbers
    5. Arithmetic operations on digital numbers can be reduced to subtraction operations
    6. Subtraction operation can be implemented using the basic logic operations from the Boolean algebra, although adding the addition operation does not cost much, and having positional shift or rotation operations helps implementing multiplication and division operations more efficiently, which would otherwise require complex specific hardware
    7. Simple as well as bitwise (i.e., performed on a set of bits that can be used equivalently to store a digital number) logic conditions  can of course be implemented using the same basic logic operations from the Boolean algebra

    Thus, digital computers can perform all arithmetic and logic operations only using Boolean algebra values, operations and laws, which will be described below in details, using simple undergraduate maths.

    It is thus a formalism for describing logical relations in the same way that ordinary algebra describes numeric relations.


    Unlike elementary algebra which deals with an infinite set of numbers, Boolean algebra is using a finite set of only 2 elements. These 2 values can be indifferently called true and falseT and F0 and 1, or any other couple of symbols, but they are generally represented with the bits (binary digits) 0 and 1, as the XOR (exclusive-or) and AND (conjunction) operations (see below) plays the role of the addition (+) and multiplication (x) operations in integer arithmetic modulo 2, for which 1 + 1 = 0, but unlike common arithmetic, where 1 + 1 = 2. The OR operation (inclusive-or or disjunction) is then defined as x + y + xy.

    Manipulating only 2 values is very nice, since it enables using the method of proof of perfect induction or proof by exhaustion, i.e., the verification for all possible cases for demonstrating theorems, as the number of cases is very small.


    Boolean operations are functions having zero or more Boolean input values (in the set {0, 1}) to a single Boolean output value.

    The number of operands is the arity of the operation. An operation of arity zero, or 0-ary operation is a constant, and the Boolean algebra only deals with unary operations (with arity 1) and binary operations (with arity 2).

    Unary Operations

    If we consider all possible calculations from 1 binary input value p to an output value, there are only 2 ^ 2 = 4 unary operations:

    Unary Boolean Operations

    OperationName SymbolValue
    FALSELogicalfalse⊥, OpThe output value is always false, regardless of the input value of p.
    IDENTLogical identitypLogical identity is an operation on one input value p, for which the output value remains p.
    NOTLogical negation¬p, ~p, Np, FpLogical negation is an operation on one logical value p, or which the output value is true if its...
    Read more »

  • Part 5: The ALU

    Squonk4205/27/2017 at 09:52 0 comments

    The first motivation that predated the early computers was to automate arithmetic calculation algorithms, so it is no wonder if an arithmetic unit is at the heart of the processor and the computer itself.

    But besides calculation, algorithms can also perform data processing (including validation, sorting, summarization, aggregation, analysis, reporting, classification, etc.) and automated reasoning, which both require formal mathematical logic instead of ambiguous rhetoric, syllogism or philosophy in order to be fully automated.

    This is why the arithmetic unit is always completed with a logic unit to form the Arithmetic and Logic Unit (ALU) as the central part of the processor and computer.


    Formally, arithmetic is the branch of mathematics that studies numbers, and especially the properties of the traditional operations between them—addition, subtraction, multiplication and division. But of course, this greatly depends on how the number are represented. Even if today the main numeral system is the Hindu–Arabic one, this has not always been the case.

    There have been numeral systems using 2, 3, 4, 5, 6, 8, 10, 12, 16, 20 or 60 as a base. BTW, duodecimal (base 12) is still used today for counting hours and in music, whereas sexagesimal (base 60) is also used for measuring time, angles and geographic coordinates, mainly because of their superior highly composite number property, that makes them a more convenient number system for computing fractions than most other number systems.

    But only considering the common decimal (base 10) numeral system, the positional notation where each symbol represents an order of magnitude ("ones place", "tens place", "hundreds place", etc.) was only introduced in Europe by Leonardo Fibonacci in 1202 in his Liber Abaci, but is in wide use only since much later (since the middle of the 16th century), replacing the Roman numeral system (which is still used today in certain contexts such as years) because of its easier arithmetic rules.

    If handling digits having 10 possible symbols or values is still possible using mechanical devices with wheels, gears and teeth or cogs, this approach is difficult to transpose to electro-mechanical relays (their contact can only be in 2 different states: open or closed) or fully electronic vacuum tubes or solid-state transistors. Even if vacuum tubes and transistors can be used in analog circuits to represent an infinity of continuous values, we already saw in Part 1 that digital computing was a much better solution.

    The idea of using 2 bands of analog values as far apart as possible from one another to provide the best immunity against glitches leads to consider using a binary (base 2) numeral system with positional notation, as devised by Gottfried Wilhelm Leibniz in "Explication de l’arithmétique binaire, qui se sert des seuls caractères O et I avec des remarques sur son utilité et sur ce qu’elle donne le sens des anciennes figures chinoises de Fohy" in 1703 (translation into English here).

    Except for a few esoteric machines, all computers today are using binary numbers extensively, although not all arithmetic operations are supported in hardware, as the most complex ones (multiplication, division and square root) can be carried out by a software algorithm using additions/subtractions and left/right number positional shifts.

    The positional left shift on a binary number is equivalent to adding a number to itself, whereas the positional right shift on a binary number can (expensively) be obtained by repeatedly incrementing a counter once while another counter is incremented twice and matched against the binary number to shift positionally right. Thus, left/right number positional shifts can be obtained from additions only and are not strictly required to perform all arithmetic operations.

    And because of the general rules of arithmetic, all operations on numbers can be decomposed into a series of operations on 2 numbers called operands, producing...

    Read more »

  • Part 4: The Computer Architecture

    Squonk4205/22/2017 at 20:39 0 comments

    At the same time as defining the central role of the processor (CPU), the evolution of memory technologies provided a random-access capability that departed from the model of the Universal Turing Machine which is based on an infinite tape as its main storage unit.

    As Hao Wang wrote in 1954 in his paper "A Variant to Turing's Theory of Computing Machines" (unfortunately, behind a paywall):

    Turing's theory of computable functions antedated but has not much influenced the extensive actual construction of digital computers. These two aspects of theory and practice have been developed almost entirely independently of each other. The main reason is undoubtedly that logicians are interested in questions radically different from those with which the applied mathematicians and electrical engineers are primarily concerned. It cannot, however, fail to strike one as rather strange that often the same concepts are expressed by very different terms in the two developments

    Interestingly though, is the fact that Turing himself worked on both the theoretical and practical aspects of computers.

    As a matter of fact, the theory of computational complexity was founded in the early 1960s only, when scheming minimal universal Turing machines has been a mental sport for years (check Peter van Emde Boas's paper "Machine Models and Simulations" for a good enumeration and classification of abstract machines).

    Von Neumann (Princeton) Architecture

    Both von Neumann's "First Draft of a Report on the EDVAC" and Turing's own "Proposal for Development in the Mathematics Division of an Automatic Computing Engine (ACE)" early papers describe a computer architecture that seems completely unconnected to the concept of Universal Turing Machine at first sight.

    First, the fact that the machine is a serial computer that processes each instruction and data atomic components sequentially rather than all at once as in today's parallel-word computers makes it difficult to understand. But this choice does not influence the logical architecture per se, it is rather a compromise due to the technology available at that time with the sequential acoustic mercury delay line memories. With the practical availability of random-access memories of higher capacities such as the iconoscope and Williams tube, Arthur Burks, Herman H. Goldstine and John von Neumann published only one year later on 28 June 1946 the first edition of their famous paper "Preliminary discussion of the logical design of an electronic computing device" (followed on 2 September 1947 by a second edition), where a more modern design based on a parallel-word architecture is described.

    Second, is the more important substitution of the sequential tape by a random-access memory and registers.

    This architecture, known as the Von Neumann or Princeton architecture, was only categorized theoretically much later (in October 1964 by Calvin Elgot and Abraham Robinson in "Random-Access Stored-Program Machines, an Approach to Programming Languages", unfortunately behind a paywall) as an example implementation of a specific class of abstract register Turing-equivalent machine: the Random-Access Stored-Program (RASP) machine.

    Although attributed to von Neumann alone, there is a large controversy about the fact that all the ideas exposed in this paper are his own.

    There are at least some details that tend to prove that his ideas did not come ex nihilo: in section 5.3 of the Burks, Goldstine and von Neumann paper:

    Several of the digital computers being built or planned in this country and England are to contain a so-called "floating decimal point"

    This proves that they were aware of at least Bell labs's Mark V computer and National Physical Laboratory, UK's Pilot ACE development at that time.

    In fact, there were many direct contacts between the British and American laboratories, showing that exchanges between the concurrent projects was not uncommon at this time:

    • Newman from the Victoria University of Manchester (who later built...
    Read more »

  • Part 3: The Processor

    Squonk4205/01/2017 at 16:22 0 comments

    The Universal Turing Machine is a great and simple abstract structure, but is not really efficient to build a physical stored-program computer. The theoretical infinite tape model is not possible to implement in practice, and anyway, having a tape running back and forth is far from optimal for a physical storage medium.

    But the Turing machine was an abstract machine invented before the physical computer existed, and was intended to represent a virtual person executing a well known procedure, not as a physical implementation model.

    Clearly, the biggest challenge of the early computers was the memory capacity, and although Babbage's Analytical Engine was designed around 1835 to hold 1,000 numbers of 40 decimal digits each in a set of mechanical wheels, and flip-flops made up of purely electronic vacuum tubes were discovered back in 1920, tapes and punched cards were at that time (end of WWII) among the storage media with the best information density.

    Flip-flops on the other hand required to use 2 triodes to store a single binary digit, making them unpractical except for small amounts of storage, such as intermediate results, in what we would called today registers.

    In the "First Draft of a Report on the EDVAC" and attributed to Von Neumann was the first description of possible solutions to get a larger storage capacity using acoustic mercury delay line memories that were able to store 1,000 binary digits, and video camera tubes called iconoscopes, able to store up to an estimated 250,000 binary digits. At the same time, this paper introduced the concept of Random-Access Memory (RAM) for the first time. The first stored-program computer, the Manchester Baby, used a variant of the iconoscope, the Williams tube, to store 2,048 binary digits in a 64 x 32 array.

    This increase in storage memory capacities led to a clearer subdivision of the computer system as a whole. In the "First Draft of a Report on the EDVAC", Von Neumann distinguished:

    1. A Central Arithmetical part (CA)
    2. A Central Control (CC)
    3. Memory (M)
    4. An outside Recording medium (R)
    5. An Input (I)
    6. An Output (O)

    With (C) defined as being CA and CC (i.e. the Central Arithmetical and Central Control parts) put together.

    In "Proposal for Development in the Mathematics Division of an Automatic Computing Engine (ACE)", Turing details:

    1. A Dynamic Storage (DS)
    2. Temporary Storage (TS)
    3. An Input Organ (IO)
    4. An Output Organ (OO)
    5. A Logical Control (LC)
    6. A Central Arithmetic part (CA)
    7. Various 'trees" required in connection with LC and CA for the selection of the information required at any moment
    8. A Clock (CL)
    9. A Temperature control system for the delay lines
    10. Binary to decimal and decimal to binary converters
    11. A Starting device
    12. A power supply

    And although the Temporary Storage could be seen today as equivalent to RISC processor register banks, the explicit notion of processor is not clearly stated in his paper. But Turing himself admitted in 1947 that "[digital computing] Machines such as the ACE may be regarded as practical versions of this same type of [universal] machines. There is at least a very close analogy. Digital computing machines all have a central mechanism or control and some very extensive form of memory".

    Since then, the vast majority of modern computer feature a Central Processing Unit or CPU, also simply called processor, composed of a control unit and an arithmetic unit.

  • Part 2: The First Physical Digital Computer

    Squonk4204/30/2017 at 15:29 8 comments

    The second and last post on computer history...

    For a long time, the first physical digital computer was considered to be the ENIAC, but this is no longer considered true.

    Source:TexasDex at English Wikipedia GFDL or CC-BY-SA-3.0, via Wikimedia Commons

    It was not until the 70s that the fact that electronic computation had been used successfully during WWII by the Government Code and Cypher School (GC&CS) at Bletchley Park was acknowledged, where Turing played a major role by designing the electromechanical cryptologic bomb used for speeding up the codebreaking of the German Enigma machine.

    Source: Karsten Sperling,

    And it was not until 1983 that details on the hardware of the Colossus I machine used later during the war (February 1944) were declassified. Other details were held secret until 1986, and even today, some documents remain classified.

    Source: See page for author [Public domain], via Wikimedia Commons

    This machine was used to decipher messages from the Lorenz cipher machine used by the German High Command, and Turing was involved in it at least for some of the methods used to perform wheel-breaking, but also by recommending Tommy Flowers (with whom he worked on the bomb) to Max Newman, if not some other yet undocumented implications.

    To add to this complexity, Turing also had contacts as soon as in 1935-1938 during his Princeton years with John von Neumann who worked later on the ENIAC and the EDVAC, during his research for the first atomic bomb.

    Source: See page for author [Public domain], via Wikimedia Commons

    Because of this secrecy around Bletchley Park operation and later because of the revelation of Turing's homosexuality and death, it is very difficult to find out the truth regarding the first physical digital computer: having heroes and wise generals on the battlefield was considered for a long time to be a better image of victory during WWII than mathematicians in a dark room...

    F.H. Hinsley, official historian of GC&CS, has estimated that "the war in Europe was shortened by at least two years as a result of the signals intelligence operation carried out at Bletchley Park". But it may well be underestimated: if you consider that the Lorentz cipher was routinely cracked in July 1942 (the first battle of El Alamein was 1-27 July 1942) when for the first time the German troops were defeated, and the Mark I Colossus started its operation in February 1944 (just before D-Day), it may well be that it actually changed the war.

    Even worse, Turing's personal situation lead to strange rewriting of some part of History, amnesia, minimization of his role, if not complete credit re-attributions.

    But neither the Colossus nor the ENIAC could be actually considered as the first physical digital computer, both storing programs differently from numbers, and in this regards, were just massive electronic calculating machines.

    The first specification of an electronic stored-program general-purpose digital computer is von Neumann's "First Draft of a Report on the EDVAC" (dated May 1945) and contained little engineering detail, followed shortly by Turing's "Proposal for Development in the Mathematics Division of an Automatic Computing Engine (ACE)" (dated October-December 1945), which provided a relatively more complete specification.

    However, the EDVAC was only completed 6 years later (and not by von Neumann), and Turing left the National Physical Laboratory (NPL) in London in 1948, before a small pilot model of the Automatic Computing Engine was achieved in May 1950. He then joined Max Newman in September 1948 at the Royal Society Computing Machine Laboratory at Manchester University.

    This is where the Manchester Small-Scale Experimental Machine (SSEM), nicknamed Baby, ran its first program on the 21st June 1948 and became the world's first stored-program computer. It was built by Frederic C. Williams, Tom Kilburn and Geoff Tootill, who reported it in a letter to the Journal Nature published...

    Read more »

  • Part 1: The Digital Computer

    Squonk4204/14/2017 at 21:31 0 comments

    Today, digital computers are everywhere: not only on our desktops, but also in the palm of our hand as a smartphone, or behind the dashboard of our car. However, this situation is not very old: the first personal computers appeared in the late 70's, mobile phones at the turn of the century, and smartphones only 10 years ago with the first iPhone. Actually, this acceleration is rather unique, since computers only appeared with WW II.

    But as far as can be traced back in History, humankind used means to represent or count items: livestock, grains, etc. First using the fingers, then stones, rods, abacus and numerous other aids.

    Other specialized devices (such as astrolabe, the Antikythera mechanism or clock towers) were invented to compute astronomical positions or for navigation use.

    Eventually, more general purpose devices were built in order to perform arbitrary sets of arithmetic or logical operations, like slide rules or (electro-)mechanical calculators.

    And despite the fact that some of these machines (like Jacquard's loom, which used punched cards to control pattern being woven) introduced the notion of programmability, they cannot be considered as computers, since it applied to machine tools and not to general arithmetic or logical operations. The term of computer was used since the early 17th century to designate human computers performing long and tedious mathematical calculations.

    What would have been the first general-purpose computer is the Analytical Engine, proposed by Charles Babbage in 1834:

    Source: Bruno Barral (ByB) CC BY-SA 2.5, via Wikimedia Commons

    But unfortunately, this computer was never completed.

    Instead, at the turn of the 20th century, and because of their lower complexity, many computing machines were based on direct mapping of values to continuous physical phenomena such as electrical, mechanical or hydraulic quantities. However, because of the measurement accuracy and inherent random jitter added by various uncontrolled phenomenon, processes could not be reliably repeated with exact equivalence on these analog computers.

    In contrast, digital computers (from Latin digitus, finger) use sets of discrete symbols to represent values, and because of this indirect mapping, they are able to have repeatable results with no accuracy or random noise problem.

    And it was only in 1936 in his famous paper "On Computable Numbers, with an Application to the Entscheidungsproblem" that Alan Turing first described a class of digital computing machines now known as Turing machines. Being kind of odd by today's standards, a Turing machine can be considered having 3 distinct elements:

    1. an infinite tape, divided into a sequence of squares, each carrying a symbol from a finite alphabet
    2. a read/write head that can read the symbol contained in one single square of the tape at a time, write a new symbol there and move one square to the left or to the right
    3. a control element as a device with a finite number of internal states

    At any given time, the next operation of the Turing machine is determined by:

    1. the control element's current state
    2. the symbol read by the read head

    The operation consists in 3 parts:

    1. print a new symbol int the current square on the tape (it may be the same as the read symbol)
    2. put the control element into a new state (which may be the same as the previous state)
    3. move the read/write head one square to the left or to the right

    For its operation, a finite part of the tape is filled with a starting sequence of symbols, the remainder of the tape being left blank (containing a special blank symbol), and the read/write head is placed at a particular starting square on the tape. The Turing machine then starts to compute following its rules of operation.

    Turing Machine

    Source: wvbailey (Own work) GFDL or CC BY-SA 3.0, via Wikimedia Commons

    Despite its singular construction, the Turing machine model captures for the first time the notions of:

    1. algorithm as a self-contained sequence of actions to be performed...
    Read more »

View all 6 project logs

Enjoy this project?



Ken KD5ZXG wrote 07/05/2020 at 22:35 point

TI's SN74CBT3251 or NXP's CBT3251. Same pinout as your 74151, but with real bi-diectional 5ohm switches. Ditch lookhead in favor of relay style instant pass-through. 6nS to set up switches, but propagate through all slices at 0.25nS therafter.

Not to worry. Last I checked, Dallas was still East of the Pecos. Can't rotate right, but can A+B, A-B, B-A in 16.25nS. Conditional logics and test for A=B, A<>B in 10.25nS. I figure a rotate afterward could set switches in parallel time. +.25nS to propagate the barrel, so was no point gumming up the ALU with it.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates