• Nearly Complete

    Matthew Pearce5 days ago 0 comments

    LUT Usage now fits comfortably on Artix a7-100t - now with room for a small processor as well as the fpu.


    Overview

    A VHDL-2008 implementation of a Motorola MC68881-compatible floating-point coprocessor targeting Xilinx 7-series FPGAs. The design implements the full MC68881 instruction set including all arithmetic, transcendental, program-control, system-control, and packed-decimal operations. It uses DSP-pipelined sequential FP units for the core arithmetic datapath with multi-cycle path constraints for timing closure.

    The current plan and progress tracking live in docs/fpu-progress-checklist.md.

    Features

    • Full instruction set: FADD, FSUB, FMUL, FDIV, FSQRT, FMOD, FREM, FSCALE, FSGLDIV, FSGLMUL, FABS, FNEG, FINT, FINTRZ, FGETEXP, FGETMAN, FTST, FCMP.
    • Transcendental engine: FSIN, FCOS, FTAN, FSINCOS, FASIN, FACOS, FATAN, FATANH, FSINH, FCOSH, FTANH, FETOX, FETOXM1, FTWOTOX, FTENTOX, FLOGN, FLOGNP1, FLOG2, FLOG10. BRAM-based seed tables with Taylor/CORDIC iteration.
    • Data movement: FMOVE (all formats including packed decimal .P), FMOVEM (register lists and control registers), FMOVECR (ROM constants).
    • Program control: FScc, FBcc, FDBcc, FTRAPcc, FNOP with BSUN trap gating.
    • System control: FSAVE/FRESTORE with Null/Idle/Busy frame support (45-word Busy frame with full sub-unit save/restore hierarchy).
    • IEEE 754 compliance: NaN propagation (SNaN/QNaN discrimination, payload preservation), infinity handling, signed zero, gradual underflow, all four rounding modes (nearest, zero, +inf, -inf), single/double/extended precision.
    • Exception handling: Per-operation FPSR exception policies, FPCR trap enable, accrued exception accumulation.
    • Peripheral interface: Register-mapped bus interface with DSACK handshake, suitable for M68000/M68010 peripheral-mode operation.

    Utilization (Xilinx Artix-7 200T, post-place)

    ResourceUsedAvailableUtil>#/th###
    Slice LUTs52,361133,80039.13>#/td###
    Registers13,131267,6004.91>#/td###
    Block RAM5 tiles3651.37>#/td###
    DSP48E1337404.46>#/td###

    Non-incremental synthesis + implementation, Vivado 2025.2, xc7a200tfbg676-1. Date: 2026-03-05. Includes Section 7 CIR coprocessor interface with FSAVE/FRESTORE Busy frame support and full exception dialog paths; see "CIR feature gating" below.

    Timing

    • Target clock: 10 MHz (100 ns period) — matches MC68881 bus timing.
    • Multi-cycle path constraints on sequential FP units (mul: 4 cycles, addsub: 6 cycles, div: 6 cycles) and trig engine hold states.
    • Post-route WNS: +16.631 ns (83% slack margin at 100 ns period; effective Fmax ~12 MHz).
    • WHS (hold): no violations.

    Target device compatibility

    The design fits on several FPGA families. With CIR disabled (ENABLE_CIR_g => false), the core is ~58K LUTs and fits comfortably on smaller devices:

    DeviceLUTsDSPsFit (full)?Fit (no CIR)?
    Xilinx Artix-7 200T134,600740Yes (39%)Yes (34%)
    Xilinx Artix-7 100T63,400240Yes (~83%)Yes (~72%)
    Xilinx Zynq UltraScale+ ZU3EG~71,000360Yes (~74%)Yes (~64%)
    Intel Cyclone V 5CEBA7150,720 ALMs156YesYes

    All RTL is vendor-portable (inferred DSP/BRAM, no Xilinx IP cores). Porting to Intel/Quartus requires XDC-to-SDC constraint conversion and minor DSP inference adjustments.