[SystemVerilog]常见设计模式/实践

常见设计模式/实践

RTL 设计(尤其是 ASIC)的最终目标是制作出最小、最快的电路。为此,我们需要了解综合工具如何分析和优化设计。此外,我们还关注仿真速度,因为等待测试运行实际上是在浪费工程精力。虽然综合和仿真工具有许多优化通道和转换,但最终结果的一个重要因素是设计模式,即代码是否遵循工具的设计指南。很多优化都是针对特定的设计模式进行的,这使得代码更容易被理解,从而被工具简化。此外,某些设计模式还可以简化代码结构,使代码更具可读性和可重用性。

在本章中,我们将介绍一些常见的设计实践,以及我们应该如何编写逻辑程序和结构化源代码。

4.1 Compiler Directives and Packages

与 C/C++ 类似,SystemVerilog 也定义了一个预处理阶段,在这个阶段,宏被扩展到原始源代码中。与 C/C++ 相比,SystemVerilog 的编译器指令不具备图灵完备性和通用性,这意味着即使是固定边界的递归计算也很难在 SystemVerilog 中指定。不过,它允许在 SystemVerilog 中进行一定程度的预处理。

4.1.1 Compiler Directives

该语言定义了多个编译器指令。我们将在此介绍一些最常用的宏:

  1. `__FILE__
  2. `__LINE__
  3. `define
  4. `else
  5. `elseif
  6. `ifdef
  7. `ifndef
  8. `endif
  9. `undef
  10. `timescale
  11. `include

`__FILE__ 和 `__LINE__ 的使用方法与 C/C++ 中的 __FILE__ 和 __LINE__ 相同。用户可以使用它们进行测试台调试。在预处理过程中,这两个编译器指令将被替换为实际的文件名和行号。

`define 允许你定义宏,这些宏可以在以后的代码中使用。我们将展示两个示例,第一个示例定义了值,第二个示例定义了需要参数的函数式代码片段。请注意,与 C/C++ 不同,宏在代码中使用时必须以 ` 作为前缀。

`define VALUE 10

module top (input logic clk);

logic [31:0] a;

always_ff @(posedge clk)a <= `VALUE;

endmodule

In the example above, we define `VALUE to be 10, and used it as register value. Even though we cover the usage here, please avoid defining constant values as macros in such way. It is because:

  1. It is difficult to find where the macro is defined, e.g. either from a file or command line options
  2. There is no namespace regarding macro values. If there are two macros shares the same name, whichever gets parsed later will be used. This may cause unexpected bugs that is difficult to debug, since the compiler may not issue warning for macro re-definition.

We highly recommend to use define constants in a package, which will be covered later in this chapter.

Another way to use `define is to define some code snippets which can be re-used later, as shown in the example below (also in code/04/macros_arguments.sv):

`define REGISTER(NAME, WIDTH, VALUE, CLK) \logic [WIDTH-1:0] NAME;               \always_ff @(posedge CLK) begin        \NAME <= VALUE;                    \end

module top;

logic        clk;
logic [15:0] in;

// declare 3 registers that are pipelined to signal in, in sequence
`REGISTER(reg1, 16, in,   clk)
`REGISTER(reg2, 16, reg1, clk)
`REGISTER(reg3, 16, reg2, clk)

// set the clock to 0 at time = 0, then tick the clock every 2 unit of time
initial clk = 0;
always clk = #2 ~clk;

initial beginfor (int i = 0; i < 3; i++) beginin = i;// wait for a cycle#4;// print out the register value$display("reg1: %d reg2: %d reg3: %d", reg1, reg2, reg3);end$finish;
end

endmodule

We will see the expected output, where x denotes uninitialized register value:

reg1:     0 reg2:     x reg3:     x
reg1:     1 reg2:     0 reg3:     x
reg1:     2 reg2:     1 reg3:     0

在上面的代码示例中,我们首先定义了三个以流水线方式(链式)输入信号的寄存器。宏 REGISTER 首先定义了寄存器的名称(NAME)和宽度(WIDTH),然后实例化一个 always_ff 块,并在每个时钟周期为寄存器赋值。请注意,我们必须使用 ( )来进行多行定义。

虽然有时使用宏可以节省时间,使代码更容易重复使用,但在重复代码段和宏的使用之间找到平衡点是很重要的。请记住,宏是在预处理阶段替换的,这将给源代码级调试带来挑战。由于所有宏都在全局命名空间中,因此还需要注意宏的重新定义。

在宏定义过程中,有时需要为不同的用途取消定义某些宏名。与 C/C++ 类似,可以使用 `undef 来取消宏定义。

`ifdef 和 `ifndef 可用于测试某些宏是否已定义(或未定义)。您需要用 `endif 关闭编译器指令。您还可以添加 `else 和 `elseif 来应对不同的情况。请注意,对于头文件,它们可以与 `define 一起使用,以提供包含保护,从而允许在多个地方包含头文件。它们的用法与 C/C++ 相同,因此在此不再赘述。

`timescale是一个对模拟器有用的重要编译器指令。它指定了特定设计元素中的时间计量单位和时间精度。对于任何编译单元范围,最多只能定义一个时间刻度。换句话说,在一起编译的两个不同源文件中定义时标是非法的。时间刻度 "的语法如下所示:

// general syntax
`timescale time_unit / time_precision
// e.g.
`timescale 1ns / 1ps
`timescale 1ns / 1ns

参数 time_unit 用于指定时间和延迟的测量单位,参数 time_precision 用于指定延迟值在用于仿真之前的四舍五入方式。time_unit 和 time_precision 的单位可以是 s、ms、us、ns、ps 和 fs。整数部分指定了数值大小的数量级,换句话说,有效数字只有 1、10 和 100。

时标对于模拟抖动和定时违规至关重要。任何与功率相关的分析也需要它。强烈建议在顶层测试台中包含时标,即使没有使用它。

`include 的作用与 C/C++ 中的 #include 相同,它包含另一个文件中的定义。强烈建议为包含文件提供一个包含保护。如果文件名用引号括起来,例如 `include "filename.svh",编译器将首先搜索当前工作目录,然后搜索用户指定的任何位置。如果文件名用角括弧括起来,例如`include <filename.svh>,则文件名必须是语言标准定义的文件。这一规则与 C/C++ 类似。

4.1.2 Packages

尽管 package为设计人员提供了一种共享定义的方法,但编译器指令本质上是要求编译器将包含文件的内容复制到源文件中,这是一种受 C 语言影响的传统功能。由于现代编程语言开始使用模块/包来结构源代码,例如 C++20 中的模块,SystemVerilog 引入了一种称为包的结构,允许设计人员重用定义、接口和函数。由于包是可综合的,因此强烈建议在 RTL 和测试平台中使用它。下面是一个包的示例:

package my_def_pkg;

// local parameters
localparam VALUE = 42;

// struct
typedef struct {logic a;logic b;
} my_struct_t;

// enum
typedef enum logic { RED, GREEN } color_t;

// function
function logic and_op(logic a, logic b);return a & b;
endfunction

endpackage: my_def_pkg

Here is an incomplete list of constructs that are allowed inside a package:

  1. parameter declaration, e.g. parameter and localparam
  2. function declaration, e.g. automatic function
  3. data declaration, e.g., struct and enum
  4. DPI import and export
  5. class declaration
  6. package import declaration

Since parameter cannot be redefined in side a package, we highly recommend to use localparam in lieu of parameter since they are functionally identical in a package. In other words, localparam does not have the visibility restriction in a package.

4.1.2.1 Package Import

To use the package definition in other modules, we need to use import keyword to import definition. There are several ways to import contents of a package and we will cover two commonly used approaches here:

  1. wildcard import. This is similar to Python’s from pkg_name import *:

    import my_def_pkg::*;
  2. explicit import. This is similar to Python’s from pkg_name import class_name:

    import my_def_pkg::my_struct_t;

导入后,标识符(即结构体名称或枚举值名称)可以直接在模块中使用。需要注意的是,我们可以在多个地方进行包导入。根据软件包内容的使用位置,有两种标准的方法:

  1. If the identifier is used for module port definition, the import needs to placed before port list:

    module topimport my_def_pkg::*;(input my_struct_t in);endmodule: top
  2. Otherwise, we shall put the import inside the module:

    module top;import my_def_pkg::*;my_struct_t a;
    endmodule: top
4.1.2.2 Import Packages within a Package

Like software programming languages, you can import a package content inside another package, and the “chained” imports can be visible to the consumer. Here is an example (code/04/chained_packages.sv) illustrates the package imports:

package def1_pkg;typedef enum logic[1:0] {ADD, SUB, MULT, DIV} alu_opcode_t;
endpackage: def1_pkg

package def2_pkg;// import alu_opcode_t from def1_pkgimport def1_pkg::alu_opcode_t;// define a new struct that include alu_opcode_ttypedef struct {alu_opcode_t alu_opcode;logic[7:0] addr;} opcode_t;
endpackage: def2_pkg

module top;// alu_opcode_t is NOT accessible from def2_pkg// the next line is ILLEGAL// import def2_pkg::alu_opcode_t;import def2_pkg::*;
opcode_t opcode;

endmodule: top

Notice unlike some software programming language such as Python, where the imported identifier is accessible as part of the new package, SystemVerilog prohibits such behavior. If you try to import alu_opcode_t from def2_pkg, you will get a recursive import error in the compiler.

4.1.2.3 Package Usage Caveats

由于软件包的内容是有作用域的,因此在使用通配符导入时,有可能会发生命名冲突。经验法则是,当命名冲突时,一定要使用显式导入。有些编码风格禁止使用通配符导入,这样会使代码稍显冗长,但可读性和可维护性更高。具体的范围规则不在本书讨论范围之内,感兴趣的用户可以参考 1800-2017 中的表 26-1。

另一个注意事项是,必须先编译软件包,然后再编译依赖于软件包的模块文件。一种系统化的方法是依靠 make 等构建工具来确保编译顺序。另一种简单的方法是将软件包放在其他源文件之前,同时向工具提供文件名。

4.2 Finite State Machines

有限状态机(FSM)是硬件控制逻辑的核心部分。如何设计好 FSM 会直接影响到综合和验证工作,因为这些工具对如何编写 FSM 有一定的限制。虽然 FSM 的理论超出了本书的范围,但我们将在介绍有关 FSM 的主要话题时尽可能多地涉及 FSM。

4.2.1 Moore and Mealy FSM

一般来说,硬件设计中常用的 FSM 有两种类型,即摩尔机和 Mealy 机。摩尔机以爱德华-摩尔(Edward F. Moore)命名,是一种输出值完全由当前状态决定的 FSM。另一方面,以 George H. Mealy 命名的 Mealy 机器是一种 FSM,其输出值由当前状态和当前输入决定。为了正式区分摩尔机和 Mealy 机,我们可以参考以下数学符号。

  • A finite set of states S
  • An initial state S0 such that S0∈S
  • A finite input set Σ
  • A finite output set Λ
  • A state transition function T:Σ×S → S
  • An output function G

For Moore machines, the output function is �:�→Λ, whereas for Mealy machines, the output function is �:Σ×�→Λ. Although Moore and Mealy machine are mathematically equivalent, there is a major difference when represented as a state transition diagram, as shown in Figure 4 and 5, where both diagram describes the logic that counts consecutive ones and output 1 once the count reaches 2. As a notation, the label on edges in Moore machine represents the input values and the label on the node represents the output value. In Mealy machine, the label on the edge follows input/output notation.

Figure 4: State transition diagram for Moore Machine.

Figure 4: State transition diagram for Moore Machine.

Figure 5: State transition diagram for Mealy Machine.

Figure 5: State transition diagram for Mealy Machine.

由于存在这种差异,当我们在 SystemVerilog 中设计 Moore 和 Mealy 机器时,会看到时序和面积方面的不同: - 要描述相同的控制逻辑,Moore 机器的状态往往多于 Mealy 机器 - 与 Mealy 机器相比,Moore 机器的输出往往有一个额外的周期延迟。

选择使用哪种类型的机器通常取决于您要模拟的控制逻辑。如果在计算输出时忽略输入,那么 Mealy 机器也可用作 Moore 机器,因此 Mealy 机器更为通用。虽然没有什么可以阻止您将这两种机器混合使用,但我们强烈建议您坚持使用一种编码风格,以便工具可以轻松识别您的设计。

4.2.2 FSM State Encoding

There are several different ways to encode your states �, one-hot encoding, Gray encoding, and binary encoding. Given |�|=�:

  • one-hot encoding implies that only one of its bits is set to 1 for a particular state. That means the total number of bits required to represent the states is �. The Hamming distance of this encoding is 2, meaning we have to flip 2 bits for a state transition.
  • Gray encoding, named after Frank Gray, is a special encoding scheme that only requires ���2(�) bits to encode. In addition, its Hamming distance is designed to be 1, which means only one bit change is required to transit a state
  • Binary encoding means the state value is assigned by its index in the states. As a result, it requires ���(�) to encode. Since each state transition may require flipping all bits, e.g., state 0 transits to state 3 for 2-bit state, its hamming distance is �(�).

每种编码都有各自的优势。例如,由于只需要一个比特来测试状态变量,单次热编码允许更小的多路复用逻辑,而加里编码允许更低的开关功耗,因此有利于低功耗设计。选择哪种编码更像是一个工程课题,取决于设计需求。因此,许多综合工具都提供了在综合过程中自动重新编码 FSM 状态的功能。因此,设计人员可以用一种编码方案对 FSM 进行编码,然后用另一种方案进行综合。然而,这也意味着 RTL 的综合版本与完成所有验证的原始 RTL 不同。因此,当工具对 FSM 进行重新编码时,可能会出现一些角落错误。一般来说,我们建议设计团队尽早根据一些工程实验结果决定编码方案。这样做可以确保综合与验证之间的一致性。

在 SystemVerilog 中,我们通常使用枚举来定义状态。与`define 和 localparam 等老式方法相比,使用枚举可以让编译器进行类型检查,从而使代码更安全、更易于调试。下面是几个使用一热编码、灰色编码和二进制编码的示例。

// on-hot encoding
typedef enum logic[3:0] {IDLE  = 4'b0001,READY = 4'b0010,BUSY  = 4'b0100,ERROR = 4'b1000
} hot_hot_state_t;

// Gray encoding
typedef enum logic[2:0] {RED    = 4'b00,GREEN  = 4'b01,BLUE   = 4'b11,YELLOW = 4'b10
} gray_state_t;

// binary encoding
typedef enum logic[1:0] {STAGE_0 = 2'd0,STAGE_1 = 2'd1,STAGE_2 = 2'd2,STAGE_3 = 2'd3
} binary_state_t;

4.2.3 General FSM Structure

As indicated by the formal definition of FSM, we need to design two components of the FSM: state transition logic � and output function �. However, since FSM needs to hold its state, we need another component that sequentially update the FSM state. As a result, a typical FSM always have three components, as shown in the Figure 6.

Figure 6: General FSM structure for Moore and Mealy machine.

Figure 6: General FSM structure for Moore and Mealy machine.

4.2.4 One-, Two-, and Three-Block FSM Coding Style

虽然 FSM 有三个必要的组成部分,但有时我们可以将某些组成部分合并为一个流程。因此,我们有三种流行的 FSM 编码风格,通常称为单块、双块和三块 FSM 编码风格。

在下面的小节中,我们将以连续计数 1 为例,展示不同的编码风格。所有状态的定义以 SystemVerilog 包的形式显示如下。

`ifndef COUNT_ONE_FSM_PKG
`define COUNT_ONE_FSM_PKG

package count_one_fsm_pkg;

typedef enum logic[1:0] {moore_state0,moore_state1,moore_state2
} moore_state_t;

typedef enum logic {mealy_state0,mealy_state1
} mealy_state_t;

endpackage
`endif // COUNT_ONE_FSM_PKG
4.2.4.1 Three-Block FSM Coding Style

Three-block FSM coding style is usually implemented as a Moore machine where:

  1. One block is used to update state with next_state.
  2. One block is used to determine next_state based on state and current inputs.
  3. One block is used to compute output based on state.

The complete example of three-block FSM is shown below (code/04/three_block_fsm_moore.sv):

module three_block_fsm_moore (input logic clk,input logic rst_n,input logic in,output logic out
);

import count_one_fsm_pkg::*;

moore_state_t state, next_state;

// block 1: state <- next_state
always_ff @(posedge clk, negedge rst_n) beginif (!rst_n) beginstate <= moore_state0;endelse beginstate <= next_state;end
end

// block 2: determine next_state
always_comb begincase (next_state)moore_state0: beginif (in) next_state = moore_state1;else next_state = moore_state0;endmoore_state1: beginif (in) next_state = moore_state2;else next_state = moore_state0;endmoore_state2: beginif (in) next_state = moore_state2;else next_state = moore_state0;enddefault: beginnext_state = moore_state0;endendcase    
end

// block 3: determine output based on state
always_comb begincase (state)moore_state0: out = 0;moore_state1: out = 0;moore_state2: out = 1;default: out = 0; endcase
end

endmodule: three_block_fsm_moore
4.2.4.2 Two-Block FSM Coding Style

Two-block FSM is usually implemented in Mealy machine where: 1. One block is used to update state with next_state. 2. One block is used to determine next_state and the outputs, based on state and current inputs.

The complete example of two-block FSM is shown below (code/04/two_block_fsm_mealy.sv):

module two_block_fsm_mealy (input logic clk,input logic rst_n,input logic in,output logic out
);

import count_one_fsm_pkg::*;

mealy_state_t state, next_state;

// block 1: state <- next_state
always_ff @(posedge clk, negedge rst_n) beginif (!rst_n) beginstate <= mealy_state0;endelse beginstate <= next_state;end
end

// block 2: determine next_state and output
always_comb begincase (state)mealy_state0: beginif (in) beginnext_state = mealy_state1;out = 0;endelse beginnext_state = mealy_state0;out = 0;endendmealy_state1: beginif (in) beginnext_state = mealy_state1;out = 1;endelse beginnext_state = mealy_state0;out = 0;endendendcase
end

endmodule: two_block_fsm_mealy

使用基于 Mealy 机器的双块 FSM 的好处是,只要输入发生变化,输出就会更新,而无需等待下一个周期。不过,这也给维护带来了困难。由于下一状态逻辑和输出是一起编码的,如果我们需要调整 FSM,可能需要对双块式进行重大重组。至于使用哪种方式,则由设计团队决定。

4.2.4.3 One-Block FSM Coding Style

One-block merges all the blocks together. As a result, maintaining and debugging such FSM is very challenging and we highly discourage people to adopt such FSM style unless absolute necessary. However, for completeness, we will show the code example people so that readers can recognize such programming style in practice.

module one_block_fsm_mealy (input logic clk,input logic rst_n,input logic in,output logic out
);

import count_one_fsm_pkg::*;

mealy_state_t state;

// one block: state update, next state, and output are in the same always_ff block
always_ff @(posedge clk, negedge rst_n) beginif (!rst_n) beginstate <= mealy_state0;endelse begincase (state)mealy_state0: beginif (in) beginstate <= mealy_state1;out <= 0;endelse beginstate <= mealy_state0;out <= 0;endendmealy_state1: beginif (in) beginstate <= mealy_state1;out <= 1;endelse beginstate <= mealy_state0;out <= 0;endenddefault: beginstate <= mealy_state0;out <= 0;endendcaseend
end

endmodule: one_block_fsm_mealy

4.2.5 How to Write FSM Effectively

Designing an efficient FSM requires engineering work and experiments. A typical workflow is shown below:

  1. Identify states and state transition logic and turn it into a design specification.
  2. Implement FSM based on the specification
  3. (Optional) optimize the FSM based on feedbacks.

FSM 设计的第一步涉及设计探索,包括需要多少个状态、使用什么编码风格、使用什么状态编码以及输出逻辑是什么。可视化 FSM 的常用方法是用状态转换图来表示。另一种表示 FSM 的方法是使用表格,每一行代表一个状态转换。在确定所有状态后,我们可以通过状态还原等方法进一步优化 FSM,将具有完全相同逻辑(相同输出和相同转换)的状态合并为一个状态。

一旦确定了规范,将其转化为 FSM 就非常简单了。每个转换弧都可以用我们前面讨论过的情况项来表示,输出逻辑也是如此。一旦实现完成,我们就需要针对常见的错误(如死锁或无法达到的状态)进行彻底测试。有些问题可能与实现有关,有些可能与规范有关。在任何情况下,我们都需要修正设计/规范,以满足设计要求。在本书后面讨论形式验证时,我们将讨论发现死锁和不可达状态的策略。

4.3 Ready/Valid Handshake

Ready/valid handshake is one of the most used design pattern when transferring data in a latency-insensitive manner. It consists of two components, the source and the sink, where data flows from the former to the latter. The source uses valid signal to indicate whether the data is valid and the sink uses ready signal to indicate whether it is ready to receive data, as shown in the figure below.

Figure 7: Ready/Valid block diagram

Figure 7: Ready/Valid block diagram

Because ready/valid is latency-insensitive, each signal has precise semantics at the posedge of the clock (we assume we are dealing with synchronous circuit): - If the valid signal is high @(posedge clk), we know that data is valid as well - If the ready signal is high @posedge (clk) AND the valid signal is high as well, we complete the data transfer. The size of transfer is often referred as one word. - If the system wishes to transfer more data, then we need to complete a series of one-word transfer, until the entire packet is transferred.

The timing diagram below shows cases where a transfer should or should not occur.

Figure 8: No data transfer

Figure 8: No data transfer

Figure 9: No data transfer

Figure 9: No data transfer

Figure 10: One successful ready/valid data transfer

Figure 10: One successful ready/valid data transfer

Ready/valid handshake has several design pitfalls that needs to avoid: 1. If the source waits for the sink’s ready before asserting valid and vice versa, there will be chance of deadlock since both parties are waiting for each other. To avoid this, the control signal should be computed independently. 2. If the ready/valid signals are computed purely on combinational logic, there will be a combinational loop between the source and sink. To resolve this, either source or sink needs to register the control signals, or compute the signals based on some flopped states.

4.4 Commonly Used Design Building Blocks

In this section we lists some code examples of commonly used design building blocks. These circuits are commonly used in various circuit designs and are optimized for high synthesis quality.

4.4.1 Registers

There are various types registers, such as synchronous and asynchronous registers. Each type has their own benefits. The design team should decide ahead of time what types of registers to use consistently throughout the design. All the code examples here use negative reset.

4.4.2 Asynchronous Reset Registers

Asynchronous reset register has reset on its sensitivity list.


logic r, value;

always_ff @(posedge clk, negedge rst_n) beginif (!rst_n) beginr <= 1'b0;endelse beginr <= value;end
end
4.4.2.1 Synchronous Reset Registers

Unlike Asynchronous reset registers, synchronous reset register only resets the register on clock edge, hence the name “synchronous”.

logic r, value;

always_ff @(posedge clk) beginif (!rst) beginr <= 1'b0;endelse beginr <= value;end
end
4.4.2.2 Chip-enable Registers

Chip-enable registers has additional single that enables or disables the value update (sometimes called clock-gating). On ASIC, there are usually specially design cells to handle such logic. As a result, if you follow the code example below you will get optimal synthesis result. We will use asynchronous reset register as an example.

logic r, value;

always_ff @(posedge clk, negedge rst_n) beginif (!rst_n) beginr <= 1'b0;endelse if (c_en) beginr <= value;end
end

In generally we do not recommend using your own logic control the register update, for instance, multiplexing the update value instead of using the syntax above, or creating your own clock based on the enable logic. These kinds of modification are unlikely to be picked up by the synthesis tools, hence reduce synthesis quality.

4.4.2.3 Power-up Values

Some FPGA tool chains allows initial values to be set along with declaration, as shown below. Since this approach does not work for ASIC, we do not recommend such approach if you want your code to be portable.

logic a = 1'b0;
logic value;

always_ff @(posedge clk) begina <= value;
end

4.4.3 Multiplexer

Multiplexer is a type of hardware circuit that selects output signals from a list of input signals. There are many ways to implement a multiplexer and we will cover two common implementation of multiplexers.

4.4.3.1 case-based Multiplexer

The simplest way to implement a multiplexer is using case statement. It is straightforward to implement and also allows synthesis tools to recognize the multiplexer and optimize the netlist. Here is an example of multiplexer that takes 5 inputs. Notice that the number of inputs does not need to be 2’s power.

module Mux5#(parameter int WIDTH = 1) (input  logic[WIDTH-1:0] I0,input  logic[WIDTH-1:0] I1,input  logic[WIDTH-1:0] I2,input  logic[WIDTH-1:0] I3,input  logic[WIDTH-1:0] I4,input  logic[$clog2(5):0] S,output logic[WIDTH-1:0] O
);

always_comb beginunique case (S)0: O = I0;1: O = I1;2: O = I2;3: O = I3;4: O = I4;default:O = I0;endcase
end

endmodule

Notice that default is used to handle edges cases where the select signal S is out of range or containing x.

A slightly shorten version is to merge all the input signals into an array and use index operator as multiplexer, as shown below:

module Mux#(parameter int WIDTH=1,parameter int NUM_INPUT=2) (input  logic[NUM_INPUT-1:0][WIDTH-1:0] I,input  logic[$clog2(NUM_INPUT)-1:0] S,output logic[WIDTH-1:0] O
);

assign O = (S < NUM_INPUT)?I[S]:I[0];
endmodule

In the code example above, we implicitly ask the synthesis tool to create a multiplexer for us. There are several advantage of this approach:

  1. We let synthesis tool to do its job to optimize the design
  2. The module works with any arbitrary number inputs (NUM_INPUT has to be larger than 1), as well as outputs.
4.4.3.2 AOI Multiplexer

In situations where hand-optimization is required, we can implement an AOI max. AOI stands for AND-OR-Invert, which implies the the basic logic operation we are going to do with the inputs. AOI gates are efficient with CMOS technology since we can use NAND and NOR logic gate to construct AOI gate.

There are two components of AOI mux, namely a precoder and AOI logic. The precoder translate select signal into one-hot encoding, and AOI logic merge the inputs into output based on the one-hot-encoded select signal. Here is the complete implementation of the AOI mux with 5 inputs (code/04/aoi_mux.sv).

module aoi_mux#(parameter int WIDTH=1,parameter int NUM_INPUT=2) (input  logic[NUM_INPUT-1:0][WIDTH-1:0] I,input  logic[$clog2(NUM_INPUT)-1:0] S,output logic[WIDTH-1:0] O
);

// calculate the ceiling of num_input / 2
localparam NUM_OPS = (NUM_INPUT + 1) >> 1;
localparam MAX_RANGE = NUM_INPUT >> 1;

logic [NUM_INPUT-1:0] sel_one_hot;
// simplified one-hot precoder.
assign sel_one_hot = (S < NUM_INPUT)?1 << S:0;

// intermediate results
logic [NUM_OPS-1:0][WIDTH-1:0] inter_O;

// AOI logic part
always_comb begin// working on each bitfor (int w = 0; w < WIDTH; w++) begin// half the treefor (int i = 0; i < MAX_RANGE; i++) begininter_O[i][w] = (sel_one_hot[i * 2] & I[i * 2][w]) |(sel_one_hot[i * 2 + 1] & I[i * 2 + 1][w]);end// need to take care of odd number of inputsif (NUM_INPUT % 2) begininter_O[MAX_RANGE][w] = sel_one_hot[MAX_RANGE * 2] & I[MAX_RANGE * 2][w];endend
end

// compute the final result, i.e. OR the intermediate result together
// notice that |inter_O doesn't work here since it will reduce to 1-bit signal
always_comb beginO = 0;for (int i = 0; i < NUM_OPS; i++) beginO = O | inter_O[i];end
end

endmodule

The example above can be explained with matrix operation. After one-hot encoding transformation, we create a matrix � where �[�]=���_���_ℎ�� for �∈{0,1,…,���_�����−1}. In other words, all entries in matrix S is zero except for the column indicated by the select signal, which are all one’s. The input signals can be expressed as � where each row of � is one input. We then compute the following result:������=�×�

Notice that since � only consists of one’s and zero’s, multiplication is effectively performing AND operation. Matrix ������ has similar characteristic as matrix � due to the property of one-hot encoding. To obtain the result, we can do a row-wise OR reduction to obtain the final result. Since CMOS technology is more area efficient when we fuse AND and OR operation together, instead of computing one row at a time, we can compute two rows together, hence the variable NUM_OPS is computed based on ⌈���_�����2⌉. Readers are encouraged to work out the process with some simple examples.

AOI mux is an example of how we can express the same logic in a clever way that is optimized for CMOS technology. This kind of optimization requires keen insight on the logic as well as deep understanding of logic synthesis. Unless required, we do not recommend to hand-optimize common logic such as adder or multiplexer since it may not achieve better result than synthesis tools and error prone. Use the syntax sugar offered by the SystemVerilog language and let synthesis tools do the heavy lifting. If the code follows the coding style, synthesis tools can pick up easily and perform automatic optimization.

4.5 Wishbone Protocol: A Case Study

A common place for bugs to occur is the interface between components, where each component may have different design assumptions. One approach to limit such bugs is to adhere to a well-specified protocol such that each component will follow and thus reduce the interface error. In this chapter we will take a look at a simple yet complete protocol, namely WIshbone, and how we can write RTL code based on the spec.

Unlike protocols such as AXI4, Wishbone is an open-source hardware bus interface, which allows engineers and hobbyists to share public domain designs.

4.5.1 Wishbone Introduction

Wishbone bus consists of two channels: a request channel which can either be read or write, and an acknowledge (ACK) channel. These two channels connect the bus master and slave together, as shown in the figure below.

Figure 11: Wishbone channel diagram

Figure 11: Wishbone channel diagram

The master has a list of signals specified by the specification. Notice that it is explicitly stated that IPs can change the interface name (PERMISSION 2.0.0), we will use the names used in the specification to make it easier to compare with the document. Notice that the specification follows the naming convention that suffix _O indicates output port and _I indicates input port.

There are a list of signals that’s shared between master and slave interfaces:

Table 4: Interface signals shared between Wishbone master and slave.
Signal NameFunction
CLK_IAll Wishbone output signals are registered at the rising edge of CLK_I. All Wishbone input signals are stable before the rising edge of CLK_I
DAT_IThe data input array to pass binary data. Maximum 64-bit
DAT_OThe data output array to pass binary data. Maximum 64-bit
RST_IReset signal. This signal only resets the Wishbone interface, not required to reset the other part of the IP.
TGD_IData tag type, which contains additional information about the data. Must be specified in the IP datasheet.
TGD_OData tag type, same as TGD_I

We’ll ignore TGD_I and TGD_O in this section, but keep in mind that they can transfer very useful metadata information such as error checking code to protect data.

Below shows the complete interface ports for the master (excluding the shared ports).

Table 5: Wishbone master interface ports.
Signal NameFunction
ACK_IThe acknowledge indicates the normal termination of a bus cycle
ADR_OThe address used for read/write request
CYC_OThe cycle output. When asserted, indicates a valid bus cycle in progress
STALL_IWhen asserted, indicates that the current slave is not able to accept the transfer
ERR_IWhen asserted, indicates an abnormal cycle termination
LOCK_OWhen asserted, indicates the current bus cycle is uninterruptible
RTY_IWhen asserted, indicates that the interface is not ready to accept/send data and the cycle should be retried
SEL_OIndicates where valid data is expected on the DAT_I signal array during read cycles, and where it is placed on the DAT_O signal array during write cycles
STB_OThe strobe output indicates a valid data transfer cycle. It is used to qualify other signals on the interface.
TGA_OAddress tag type, which contains information associated with address lines, which can be qualified by STR_O.
TGC_OCycle tag type, which contains information associated with bus cycles, which can be qualified by signal CYC_O.
WE_OWrite enable output, which indicates whether the current local bus cycle is a read or write cycle.

Again, we will ignore tag information. Interested readers should check out the specification.

The slave interface is symmetric with the master slave: XX_I from master will have a correspondence port XX_O in the slave and vice versa. In general, Wishbone interface is simpler than other bus interface such as Advanced Microcontroller Bus Architecture (AMBA), which is the reason why we can explain the protocol without lengthy details here.

4.5.2 Wishbone Master Example

We present here a simplified version of master module, where the read write behavior is controlled via a simple interface. For any real-world practice, we need to connect the master to an IP that directly controls the master’s behavior. We also drop the tag, lock, and byte select interface for simplicity, but keep in mind that in a real IP interface we need to implement this as well! We will focus on register read write instead of block transfer; we will also drop corner case handling such as error and retry. Interested readers should try to implement block transfer and other missing features.

First, we need to define the IO ports, where the width or the data is parametrized by WIDTH. We also need to add other parameterization for control and data signals.


module wb_master #(parameter WIDTH=32,parameter ADDR_WIDTH=16) (input  logic                 CLK_I,input  logic[WIDTH-1:0]      DAT_I,output logic[WIDTH-1:0]      DAT_O,input  logic                 RST_I,
input  logic                 ACK_I,output logic[ADDR_WIDTH-1:0] ADR_O,output logic                 CYC_O,input  logic                 STALL_I,output logic                 STB_O,output logic                 WE_O
// external controlsinput  logic                 write,input  logic                 enable,input  logic[ADDR_WIDTH-1:0] addr,input  logic[WIDTH-1:0]      wdata,output logic[WIDTH-1:0]      rdata,output logic                 ready,output logic                 ack
);

请注意,根据命名规则,STALL_I 实质上是从属设备的就绪信号,而 STB_O 则是有效信号。考虑到这一点,我们可以快速勾勒出根据控制信号发送命令的逻辑。请注意,在 Wishbone 中,每个输出都将被注册。请注意,由于我们需要等待客户端确认转换,因此我们需要一个 FSM 来确定传输状态(我们将使用 2 块 FSM 来实现)。由于我们只对单个寄存器的传输感兴趣,因此无需记录传输的字数。


typedef enum logic {IDLE,BUSY
} State;

State state;

Based on the state, we have three different outputs:

always_comb beginunique case (state)IDLE: beginCYC_O = 0;STB_O = 0;endBUSY: beginCYC_O = 1;STB_O = 1;endendcase
end

然后,我们需要根据控制信号改变状态。由于我们只对一个字的传输感兴趣,因此当外部控制信号使能为高电平且从站准备就绪时,我们就开始交易。根据是读还是写请求,我们对 WB 控制数据进行不同的设置。启动事务后,主站进入忙状态,等待从站回执。之后,主站向外部客户端发出交易结束的信号,并返回空闲状态。

always_ff @(posedge CLK_I) begin// reset on highif (RST_I) beginstate <= IDLE;
// reset all registered outputsADDR_O <= 0;WE_O <= 0;DATA_O <= 0;
// external control signalack <= 0;ready <= 1;endelse beginunique case (state)IDLE: begin// only when the we're asked to send data// and slave is readyif (enable && !STALL_I) beginADDR_O <= addr;// write requestif (write) beginDATA_O <= wdata;WE_O <= 1;end else beginDATA_O <= 0;WE_O <= 0;end
SEL_O <= 1;state <= BUSY;// external control signalready <= 0;ack <= 0;endelse begin// external control signalready <= 1;ack <= 0;endendBUSY: begin// wait for slave ackif (ACK_I) begin// we goodstate <= IDLE;DATA_O <= 0;
// we assume control client will hold this signal until response gets backif (enable) beginack <= 1;if (!write) begin// if it's a readwdata <= DAT_I;endelse beginwdata <= 0;endendendendendcaseend
end

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/309889.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【Linux】封装一下简单库 理解文件系统

提示&#xff1a;文章写完后&#xff0c;目录可以自动生成&#xff0c;如何生成可参考右边的帮助文档 目录 前言 一、封装一下简单库 二、理解一下stdin(0)、stdout(1)、stderr(3) 2.1、为什么要有0、1、2呢&#xff1f; 2.2、特点 2.3、如果我想让2也和1重定向到一个文件…

如何在Linux系统部署Joplin笔记并结合内网穿透实现无公网IP远程访问

文章目录 1. 安装Docker2. 自建Joplin服务器3. 搭建Joplin Sever4. 安装cpolar内网穿透5. 创建远程连接的固定公网地址 Joplin 是一个开源的笔记工具&#xff0c;拥有 Windows/macOS/Linux/iOS/Android/Terminal 版本的客户端。多端同步功能是笔记工具最重要的功能&#xff0c;…

华为数通方向HCIP-DataCom H12-821题库(多选题:321-340)

第321题 关于OSPF的命令描述,不正确的是: A、stub区域和totally stub区域配置了no-summary参数 B、OSPFv2和OSPF v3配置接口命令的区别是OSPF V2可以使用network命令,而OSPFv3直接 C、在接口上使能stubrouter命令用来配置次路由器为stub路由器,stub路由器可以与非stub路由 …

微软搭建零售新媒体创意工作室大举抢占数字营销广告市场

“微软新零售创意工作室新平台利用生成式人工智能&#xff0c;在几秒钟内轻松定制横幅广告。零售媒体预计到2026年将成为一个价值1000亿美元的行业。” 零售媒体在过去几年中发展迅速。根据eMarketerOpens在新窗口的数据&#xff0c;预计到2024年&#xff0c;仅美国的零售媒体…

【Node.js】Express学习笔记(黑马)

目录 初识 ExpressExpress 简介Express 的基本使用托管静态资源nodemon Express 路由路由的概念路由的使用 Express 中间件中间件的概念Express 中间件的初体验中间件的分类 初识 Express Express 简介 什么是 Express&#xff1f; 官方给出的概念&#xff1a;Express 是基于…

navicat远程连接mysql的异常解决-1130-2003-10061

结论&#xff1a; 1、修改数据库下root用户的host字段(为空或%) 2、修改 /etc/mysql/mysql.conf.d/mysqld.cnf 文件下 bind-address 的配置为 0.0.0.0 或者屏蔽此配置内容 (默认配置是&#xff1a; bind-address 127.0.0.1) 补充&#xff1a; 查看数据库下用户与host字段的关…

3D医疗图像配准 | 基于Vision-Transformer+Pytorch实现的3D医疗图像配准算法

项目应用场景 面向医疗图像配准场景&#xff0c;项目采用 Pytorch ViT 来实现&#xff0c;形态为 3D 医疗图像的配准。 项目效果 项目细节 > 具体参见项目 README.md (1) 模型架构 (2) Vision Transformer 架构 (3) 量化结果分析 项目获取 https://download.csdn.net/down…

【uniapp】vscode安装插件、ts校验、允许json文件注释

1、vscode安装的插件&#xff1a; uni-create-viewuni-hlperuniapp小程序扩展 2、ts校验 安装插件&#xff1a; pnpm i -D types/wechat-miniprogram uni-helper/uni-app-types配置tsconfig.json {"extends": "vue/tsconfig/tsconfig.json","compi…

内核是如何接收⽹络包的?

应用层做的那么舒服&#xff0c;为什么还要去看驱动和内核&#xff1f; “工作了四五年&#xff0c;并不是有4-5年经验&#xff0c;⽽是⼯作了4-5年⽽已” 引言 Linux里最重要的一个模块-网络模块,用简单的udp来举例&#xff0c;下图是我在大学的时候的基于linux的socket网络…

顶顶通呼叫中心中间件(mod_cti基于FreeSWITCH)-回铃音补偿

文章目录 前言联系我们解决问题操作步骤 前言 回铃音&#xff1a; 当别人打电话给你时&#xff0c;你的电话响铃了&#xff0c;而他听到的声音叫做回铃音。回铃音是被叫方向主叫方传送&#xff0c;也是彩铃功能的基础。我们平时打电话听到的“嘟 嘟 嘟 嘟”的声音&#xff0c;就…

mysql 查询实战-变量方式-解答

对mysql 查询实战-变量方式-题目&#xff0c;进行一个解答。&#xff08;先看题&#xff0c;先做&#xff0c;再看解答&#xff09; 1、查询表中⾄少连续三次的数字 1&#xff0c;处理思路 要计算连续出现的数字&#xff0c;加个前置变量&#xff0c;记录上一个的值&#xff0c…

postgresql uuid

示例数据库版本PG16&#xff0c;对于参照官方文档截图&#xff0c;可以在最上方切换到对应版本查看&#xff0c;相差不大。 方法一&#xff1a;自带函数 select gen_random_uuid(); 去掉四个斜杠&#xff0c;简化成32位 select replace(gen_random_uuid()::text, -, ); 官网介绍…

LoRa无线电机温振传感器,FlexLua低代码技术助力快速实现。

在物联网时代&#xff0c;无线传感技术的应用愈发广泛。其中&#xff0c;LoRa&#xff08;长距离低功耗无线技术&#xff09;作为一种适用于远距离、低功耗的通信技术&#xff0c;被广泛应用于各种物联网场景。而结合温度和振动传感技术&#xff0c;能够构建出用于监测机器状态…

pytest-yaml-sanmu(二):使用hook自定义yaml用例的执行方式

前言 本文抛砖引玉&#xff0c;通过以下几个测试框架的封装示例&#xff0c;一步步引导你实现属于自己的 yaml 测试框架&#xff1a; 加法测试 计算测试 接口测试 Web 测试 使用本插件需要对 Python 和 Pytest 较为熟练的应用经验&#xff0c;本文认为你已经具备这些条件。…

HTTP协议名词解释

一、HTTP协议通讯名词解释-URL URL(Uniform Resource Locator&#xff0c;统一资源定位符)是标识Web资源的唯一标识符。通过它即可获取其标识的资源。 最常用的URL格式如下: protocol://hostname[:port]/[path/Ifile[?paramvaluel 这个结构中有几个部分是可选的。如果端口…

Python数学建模学习-PageRank算法

1-基本概念 PageRank算法是由Google创始人Larry Page在斯坦福大学时提出&#xff0c;又称PR&#xff0c;佩奇排名。主要针对网页进行排名&#xff0c;计算网站的重要性&#xff0c;优化搜索引擎的搜索结果。PR值是表示其重要性的因子。 中心思想&#xff1a; 数量假设&#…

10kV配电室在线监控改造技术方案

安科瑞薛瑶瑶18701709087 摘要&#xff1a;目前&#xff0c;我国经济高速发展&#xff0c;社会在不断进步&#xff0c;国家加大了农村低压配电网络改造升级投入&#xff0c;低压配电网供电可靠性及供电质量得到明显提升&#xff0c;但低压配电网络自动化运维水平及农村电网用电…

如何使用Docker部署WPS Office服务并实现无公网IP远程处理文档表格

文章目录 1. 拉取WPS Office镜像2. 运行WPS Office镜像容器3. 本地访问WPS Office4. 群晖安装Cpolar5. 配置WPS Office远程地址6. 远程访问WPS Office小结 7. 固定公网地址 wps-office是一个在Linux服务器上部署WPS Office的镜像。它基于WPS Office的Linux版本&#xff0c;通过…

移动开发避坑指南——内存泄漏

在日常编写代码时难免会遇到各种各样的问题和坑&#xff0c;这些问题可能会影响我们的开发效率和代码质量&#xff0c;因此我们需要不断总结和学习&#xff0c;以避免这些问题的出现。接下来我们将围绕移动开发中常见问题做出总结&#xff0c;以提高大家的开发质量。本系列文章…

【Qt编译】ARM环境 Qt5.14.2-QtWebEngine库编译 (完整版)

ARM 编译Qt5.14.2源码 1.下载源码 下载Qt5.14.2源代码&#xff08;可根据自己的需求下载不同版本&#xff09; 下载网站&#xff1a;https://download.qt.io/new_archive/qt/5.14/5.14.2/single/ 2.相关依赖(如果需要的话) 先参考官方文档的需求进行安装&#xff1a; 官方…