Insanity 4004: First-Word Fall-Through FIFOs

Basic AXI Handshake protocol

Sometimes an engineer will see a technique on a website or in a document and think, "Wow! That's a good way to do that!" Such is what happened when I came across the AXI style handshake described in this Wikipedia article. I'd been looking for a good way for the M-32TL translator and printer modules to synchronize, and this seemed like a simple and well-tested mechanism.

This led me to consider the interface on the other side of the translator module, which is intended to talk to a FIFO. When I'd originally coded the translator I used a standard FIFO generated using ISE Coregen, figuring that was the best way to make sure the FIFO worked properly and was implemented efficiently. After discovering the AXI handshake I decided to re-implement it to use a First-Word Fall-Through FIFO because that allowed me to use the AXI handshake on both sides. Again I used Coregen.

If I wanted to run simulations using Icarus Verilog, though, I needed an alternative to a Coregen-constructed FIFO. Then I noticed that the Coregen FIFO used 27 slices for a 64 by 4-bit, common-clock FIFO. And I started looking at other FIFO implementations.

By profession I'm a software engineer with over 40 years experience. I was once hired by a company solely to fix the massive mess they'd made of their kernel-level multiprocessor synchronization. So I'm pretty comfortable dealing with both software and hardware FIFOs. But I thought I should research hardware FIFO implementations before I coded my own.

What I found was that the majority of FWFT FIFO implementations are wrappers around standard, non-FWFT FIFOs. I guess that makes sense if that's what your hardware gives you, but I know that a Xilinx dual-port RAM implemented using distributed RAM allows reading the second port without a clock. Taking advantage of this, I coded my own FWFT FIFO:

module fifo_fwft #(
    parameter DEPTH     = 16,       // FIFO depth, must be power of 2
    parameter WIDTH     = 4         // FIFO width in bits
    ) (
    input  wire             sysclk,
    // FIFO inputs interface
    input  wire             in_valid,
    input  wire [WIDTH-1:0] in_data,
    output wire             in_ready,
    // FIFO output interface
    output wire             out_valid,
    output wire [WIDTH-1:0] out_data,
    input  wire             out_ready
    );

    // Instantiate FIFO indexes
    localparam PW = $clog2(DEPTH);
    reg  [PW-1:0]   head = 0;   // Data is dequeued from the head
    reg  [PW-1:0]   tail = 0;   // Data is enqueued at the tail

    // Define the FIFO buffer
    reg  [WIDTH-1:0] fifo [0:DEPTH-1];

    // Control data input to the FIFO
    assign in_ready = (tail + 1) != head;
    always @(posedge sysclk) begin
        if (in_valid & in_ready) begin
            fifo[tail] <= in_data;
            tail <= tail + 1;
        end
    end

    // Control data output from the FIFO
    assign out_valid = tail != head;
    always @(posedge sysclk) begin
        if (out_valid & out_ready) begin
            head <= head + 1;
        end
    end
    assign out_data = fifo[head];

endmodule

When instantiated in place of the Coregen FIFO, ISE reports this uses 16 slices. Even pushing the depth from 64 to 128 only uses 21 slices.

Even better from an understanding and portability view, this does away with the dependency on Xilinx's Coregen and allows it to run under Icarus Verilog or with non-Xilinx FPGAs. If I ever need a high-speed, dual-port FIFO with separate clocks I wouldn't hesitate to use Coregen, but for this purpose it's just overkill.

Note: ISE's implementation of the $clog2 system function is FUBAR. I use my own "clog2" function instead, not shown here.

Saturday, June 27, 2020

First-Word Fall-Through FIFOs

No comments:

Post a Comment