Friday, April 18, 2014

Odd 'Gray Code' Sequences

Introduction

Digital design and verification engineers as well as electrical and computer engineering students are aware of the 'Gray code', a sequence of bits in which two successive values differ by only one bit. Strict gray codes are sequences that have 2n members and can be created using recursive functions (by taking advantage of the 'mirror image property') I will not go much into examples, as Wikipedia provides a thorough run-down. It is pretty easy to create n-bit gray code sequences. So, that will not be the topic of this post.

Background

Gray codes are used in digital design for clock domain crossing (CDC). Due to metastability issues when passing multiple bits of information from one clock domain to another, it is imperative that successive values being passed from one domain to another change at most by one bit. A detailed discussion of synchronizers used for CDC is a topic for another day (though we will give it cursory coverage further down in this post). For today's topic, we need to understand how gray codes are used in CDC. The typical organization is as below:



The binary counter on the left in the first clock domain may be the write pointer of a FIFO being used to share information between the two domains. This value needs to get to the second domain so that it may recognize the availability of information in the FIFO. For proper CDC, the binary counter needs to undergo a Gray code conversion (using the bin2gry circuit), pass through the synchronizer and undergo the reverse process (using the gry2bin circuit) to get the original counter value.

The Straightforward Case

Most FIFOs have 2n entries, which can make use of a (n+1) bit counter to keep track of the read and write pointers. For example, with a 3-bit counter, the 'Binary Counter Domain 1' in the above diagram would follow the sequence (000, 001, 010, 011, 100, 101, 110, 111). The bin2gry circuit would translate the values to the corresponding gray-code sequence (000, 001, 011, 010, 110, 111, 101, 100). The gry2bin circuit would do the reverse process. In most cases, we can use the Gray code counter to directly index into the FIFO and save the cost of the bin2gry conversion. The entries into the FIFO won't happen in sequential order, but that won't matter as long as the reading counter in the second domain is also of similar design and reads data from the FIFO in the same order. Unfortunately, this is where the textbooks stop, but real-world designs don't.

'Gray Code' Sequence for non-2n Counters

Consider a scenario where the binary counter on the left counts from 0 to 5 and back to 0, i.e, the count sequence is (000, 001, 010, 011, 100, 101, 000, 001 ...). The textbook method of taking a base sequence, mirroring it and adding it at the end to create a gray code sequence unfortunately doesn't work with non-2n counters.

Clive Maxfield presents a way to create non-2n gray sequences by pruning pairs from either the top or middle of the mirroring line in Part 3 of his series on Gray code fundamentals. This the advisable method for large counters. In the rest of the piece below, we cover an alternative strategy for sequences with low cardinalities (say, between 0 and 31).

Readers familiar with Johnson counters (twisted ring counters) can recognize that a 3-bit Johnson counter obeys 'gray-code' rules of changing only a single bit between successive values.

'Gray' Johnson Counter Sequence
(Even non-2n Cardinalities)
0 0 0 0
1 0 0 1
2 0 1 1
3 1 1 1
4 1 1 0
5 1 0 0

Note that the 0 to 5 sequence above has 6 values, which make it easy to create a 'gray' sequence (since each successive value will change by one bit only, they will have odd-even-odd-even.. number of 1s in them, which makes a loop-back possible).

How does one extend this to cases where the base counter loops back after an odd number of counts? Consider the case where we want to count from 0 to 6 and back (total of 7 values for the base counter). The trick in the odd-count case is to note that the CDC system under consideration doesn't impose any restriction on the number of bits actually getting across from one clock domain to the other. There is also no restriction on the mapping from the source counter to the 'gray' value. The latter aspect is important, as this allows us to do a one-to-many mapping between the source counter value and the 'gray' value.

A one-to-two mapping allows us to blow up the original problem of having an odd number of sequence values (7, in our example) to one having an even number of sequence values. In the previous case, we saw that a 3-bit Johnson counter can represent 6 values. In general, a n-bit Johnson counter, can generate 2n values that obey the 'gray' sequence rules for CDC. Consider the following table:

'Gray' Johnson Counter Sequence
(Odd non-2n Cardinalities)
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 1
2 0 0 0 0 0 1 1
3 0 0 0 0 1 1 1
4 0 0 0 1 1 1 1
5 0 0 1 1 1 1 1
6 0 1 1 1 1 1 1
0 1 1 1 1 1 1 1
1 1 1 1 1 1 1 0
2 1 1 1 1 1 0 0
3 1 1 1 1 0 0 0
4 1 1 1 0 0 0 0
5 1 1 0 0 0 0 0
6 1 0 0 0 0 0 0


An important aspect to note here is that the next value in the 'gray' sequence depends on the current code value as well as the base counter value (i.e, the bin2gry circuit is not fully combinatorial). In order to minimize latency in transferring the counter value across, it is important to understand how the CDC synchronizer works.

CDC Synchronizer

Readers interested in learning about the basics of clock domain crossing techniques would do well to peruse Clifford.E.Cummings's paper (PDF) presented at SNUG 2008. It talks about how to successfully transfer across different types of information between circuits operating on different clocks. In this post, we are mainly interested in multi-bit CDC. The basic building block of the multi-bit CDC synchronizer is the two-flop synchronizer arrangement (in Figure 3 of the paper linked above, also reproduced below).

A Two-Flop Synchronizer for Clock Domain Crossing

The synchronizer involves flopping the data once in the source domain and twice in the receiver domain before using the values in the second domain.

Note that the bin2gry circuit is necessary for 2n counters only if the original count is not in Gray code. If the counters are implemented in Gray code, then, the 'adat' signal in the above diagram is the counter value itself. The synchronizer only demands that there be no combinational logic in the path between the three flops. The 'adat' signal can be the input to combinatorial logic which generates the 'dat' signal in the above two-flop synchronizer circuit.

Implementing the Odd non-2n 'Gray' Counter

Coming back to the problem discussed before the digression towards CDC synchronizers, we find that our our modified 'gray' sequence (Johnson counter-based) can be implemented with a brute force method (i.e, map each binary count value in the source domain along with the current value being sent across to generate the next count value). A more optimal way would be to use the increment signal for the source binary counter as the enable signal for the Johnson counter too.

On the receiver side, we can let the synthesis tool do the optimization by just coding in a case statement. An alternative to derive the original counter value would be to count the number of 1s (say, OnesCount) in the (N-1) LSB bits and using the MSB to choose either OnesCount or (N - OnesCount) as the the final output of the gry2bin circuit.

Johnson counters have a couple of advantages over traditional pruned Gray code sequences for non-2n sequences because of two reasons:
  • The only logic involved in the 'next-state' computation for a Johnson counter is an inverter, compared to a sea of gates for pruned gray code sequences
  • Designers don't need to worry about 'holes' in the sequence because a separate traditional binary counter can be used to index into the FIFO
Disadvantages include:
  • Hardware cost for large sequences (need n/2 flops to count up to n)
  • Logic to recognize the pointer on the receiver side (gry2bin circuit) involves a modified priority encoder or adders and multiplexers which could potentially impact the operating speed of the design
System Verilog code for the design and testbench are hosted on EDA Playground. Feel free to copy over and experiment (and, of course, please do report any bugs that you find in the comments section so that I can fix them).

The design code alone is reproduced below:

module GRAY_CODE_ODD_N #(parameter NUM_COUNT=7) (
  
  input logic incr,
  input logic clk,
  input logic rst_l,
  
  output logic[NUM_COUNT-1:0] gray_code_ctr_val 
  
);
  
  logic[(NUM_COUNT - 1):0] gray_code_ctr ;
  
  // This is the code for a NUM_COUNT bit Johnson (twisted ring) counter
  // The design is basically similar to a circular shift register, except that the 
  // MSB is inverted before feeding into the LSB
  always_ff @(posedge clk) begin
    
    if (~rst_l) begin
      gray_code_ctr <= 'd0 ;
    end
    else if (incr) begin
      gray_code_ctr <= {gray_code_ctr[NUM_COUNT-2:0], ~gray_code_ctr[NUM_COUNT-1]};      
    end
    
  end
  
  assign gray_code_ctr_val = gray_code_ctr ;
  
endmodule: GRAY_CODE_ODD_N


module FIFO_PTR_CDC_SYNCHRONIZER #(parameter MAX_BIN_VAL=6, MAX_BIN_VAL_LOG2=3) (
  input logic clk1,
  input logic rst_clk1_l,
  input logic incr_clk1,
  input logic clk2,
  input logic rst_clk2_l,
  output logic [MAX_BIN_VAL_LOG2-1:0] fifo_ptr_val_clk2
);

  logic [MAX_BIN_VAL:0] johnson_ctr_clk1 ;
  logic [MAX_BIN_VAL:0] johnson_ctr_1_clk2, johnson_ctr_2_clk2 ;
  logic [MAX_BIN_VAL_LOG2 - 1:0] i ;
  
  // Instantiating the Johnson counter
  // Note that johnson_ctr_clk1 is already a flop in the counter module  
  GRAY_CODE_ODD_N #(MAX_BIN_VAL+1) JCTR (
    .incr (incr_clk1),
    .clk (clk1),
    .rst_l (rst_clk1_l),
    .gray_code_ctr_val (johnson_ctr_clk1)
  );
  
  // Since johnson_ctr_clk1 is already registered in clk1, the CDC synchronizer
  // just needs to flop it twice in the clk2 domain
  always_ff @(posedge clk2) begin
    if (~rst_clk2_l) begin
      johnson_ctr_1_clk2 <= 'h0 ;
      johnson_ctr_2_clk2 <= 'h0 ;
    end
    else begin
      johnson_ctr_1_clk2 <= johnson_ctr_clk1 ;
      johnson_ctr_2_clk2 <= johnson_ctr_1_clk2 ;
    end
  end
  
  // The block below implements the gry2bin logic on the receiver side
  // Synthesis tools should be able to recognize the static loop counter limit
  // and automatically unroll the loop to generate the logic equivalent.
  // The two-to-one mapping is achieved by looking at the MSB and determining the
  // position of the first set or cleared bit as appropriate - a variant of a 
  // priority encoder circuit
  always_comb begin
    if (johnson_ctr_2_clk2[MAX_BIN_VAL]) begin
      fifo_ptr_val_clk2 = 0 ;
      for (i = 0; i < MAX_BIN_VAL; i++) begin
        if (~johnson_ctr_2_clk2[i]) begin
          fifo_ptr_val_clk2 = i + 1;
        end
      end
    end
    else begin
      fifo_ptr_val_clk2 = 0 ;
      for (i = 0; i < MAX_BIN_VAL; i++) begin
        if (johnson_ctr_2_clk2[i]) begin
          fifo_ptr_val_clk2 = i + 1;
        end
      end
    end
  end
  
  
endmodule: FIFO_PTR_CDC_SYNCHRONIZER