Basic AXI Handshake protocol |
This led me to consider the interface on the other side of the translator module, which is intended to talk to a FIFO. When I'd originally coded the translator I used a standard FIFO generated using ISE Coregen, figuring that was the best way to make sure the FIFO worked properly and was implemented efficiently. After discovering the AXI handshake I decided to re-implement it to use a First-Word Fall-Through FIFO because that allowed me to use the AXI handshake on both sides. Again I used Coregen.
If I wanted to run simulations using Icarus Verilog, though, I needed an alternative to a Coregen-constructed FIFO. Then I noticed that the Coregen FIFO used 27 slices for a 64 by 4-bit, common-clock FIFO. And I started looking at other FIFO implementations.
By profession I'm a software engineer with over 40 years experience. I was once hired by a company solely to fix the massive mess they'd made of their kernel-level multiprocessor synchronization. So I'm pretty comfortable dealing with both software and hardware FIFOs. But I thought I should research hardware FIFO implementations before I coded my own.
What I found was that the majority of FWFT FIFO implementations are wrappers around standard, non-FWFT FIFOs. I guess that makes sense if that's what your hardware gives you, but I know that a Xilinx dual-port RAM implemented using distributed RAM allows reading the second port without a clock. Taking advantage of this, I coded my own FWFT FIFO:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | module fifo_fwft #( parameter DEPTH = 16, // FIFO depth, must be power of 2 parameter WIDTH = 4 // FIFO width in bits ) ( input wire sysclk, // FIFO inputs interface input wire in_valid, input wire [WIDTH-1:0] in_data, output wire in_ready, // FIFO output interface output wire out_valid, output wire [WIDTH-1:0] out_data, input wire out_ready ); // Instantiate FIFO indexes localparam PW = $clog2(DEPTH); reg [PW-1:0] head = 0; // Data is dequeued from the head reg [PW-1:0] tail = 0; // Data is enqueued at the tail // Define the FIFO buffer reg [WIDTH-1:0] fifo [0:DEPTH-1]; // Control data input to the FIFO assign in_ready = (tail + 1) != head; always @(posedge sysclk) begin if (in_valid & in_ready) begin fifo[tail] <= in_data; tail <= tail + 1; end end // Control data output from the FIFO assign out_valid = tail != head; always @(posedge sysclk) begin if (out_valid & out_ready) begin head <= head + 1; end end assign out_data = fifo[head]; endmodule |
When instantiated in place of the Coregen FIFO, ISE reports this uses 16 slices. Even pushing the depth from 64 to 128 only uses 21 slices.
Even better from an understanding and portability view, this does away with the dependency on Xilinx's Coregen and allows it to run under Icarus Verilog or with non-Xilinx FPGAs. If I ever need a high-speed, dual-port FIFO with separate clocks I wouldn't hesitate to use Coregen, but for this purpose it's just overkill.
Note: ISE's implementation of the $clog2 system function is FUBAR. I use my own "clog2" function instead, not shown here.
No comments:
Post a Comment