It is assumed that the student is familiar with VHDL simulation and synthesis basics as taught in the System-on-Chip Design (SoC) course. Please review (and redo) the exercises VHD and SYN the descriptions of which are available from the web page of that course if you feel uncertain about the topic. The software environment (tool versions) is the same as for the current exercise.
The description below refers to various file names. Once you have logged in, execute the command:
get-module pow pow
to get them in a subdirectory pow, or copy them manually from the directory /home/practice/soc/exercise/modules/pow to directory pow (do not forget the "invisible" file .synopsys_dc.setup).
The circuit for this exercise is almost the same as in exercise SYN mentioned above: it consists of the entity siso_gen (generic serial-in serial-out module) with architecture gcd (greatest common divider, using Euclid's algorithm). For practical reasons, the scan signals related to design for testability have temporarily been removed (commented in the VHDL code).
As you know from the theory, power consumption in digital CMOS circuits is stronlgy related to switching activity. In order to collect realistic information on switching activity, the design should be simulated using data streams that are representative for the actual use of the integrated circuit. The Modelsim simulator is used for that purpose in the design flow of this exercise.
The flow uses simulations at two abstraction levels:
The communication between the Synopsys tools and Modelsim is based on files that use the Switching Activity Interchange Format (SAIF). There are two versions of this type of files. Forward SAIF files are generated by Synopsys and read by Modelsim. They list the signals of which the simulator should monitor the switching activity. Backward SAIF files are used for the reverse communication. They require a Synopsys run-time library that is linked to Modelsim and takes care of collecting the information on switching activity. The backward SAIF file will contain the following annotations for the signals that have been monitored:
Forward SAIF generation is only necessary for RT-level power simulations. In gate-level simulations all signals on standard-cell boundaries are automatically monitored by default.
The design flow that has been set up for this exercise consists of a number of scripts that you do not need to touch and a central file power-config in which you declare all relevant information for your synthesis and power estimations. This file should always be modified carefully in order to ensure that all scripts function well. The first variable to set is RUN_ID. It should be a unique name that identifies your current experiment. Its value is used in log files, SAIF files, etc. The meanings of other variables are explained in the accompanying comments.
Files that are explicitly generated by the scripts are located in the subdirectory synopsys_out.
The following scripts are available (in principle, you need to execute them all in the given order):
This script also adds automatic clock gating to the design if the variable GATE_CLOCK of the configuration file has value y. This amounts to disabling the clock of flipflops when it can be predicted that the flipflop will not change value in a given clock cycle (see the slides).
Apart from the log file, this script generates hierarchical and flattened standard-cell netlists in VHDL and .db format as well as an SDF file of the flattened design with information on delays. All file names start with <run_id> and have appropriate suffixes and extensions. You should perform all simulations with the flattened netlist. Note that the clock-gating cell is not a single standard cell, but consists of multiple gates. Flattening preserves the hierarchy of the clock-gating cell.
The scirpt generates two power reports: a hierchical one based on the design before flattening and later one based on the flat design.
There are, in general, three performance parameters involved in VLSI design: area, speed, and power. In order to keep things simple, speed is kept out of consideration in this project: the clock frequency is fixed to the relatively low frequency of 5 MHz (the clock period is 200 ns). This means that one concentrate on the trade-off between area and power. One expects that improving the power consumption costs area, but this is not always the case.
As mentioned earlier, the 16-bit gcd architecture of entity siso_get is used as a vehicle to explore the design flow for power estimations and low-power design. There are two input data streams for this design: gcd16_small.in contains small numbers and gcd16_large.in contains large numbers (actually the same numbers as gcd16_small.in multiplied by 127). It is expected that more power is consumed in the case of large numbers as more MSB bits are involved in the calculations.
One of the goals of this exercise is to investigate the effects of automatic clock gating. This investigations should be performed both for the data streams with small and with large numbers. This results in four design variants, with and without clock gating for both input streams. Four <run_id>'s have already been predefined for these four cases in the configuration file.
First consider the case with small numbers and no clock gating, identified by RUN_ID=nocg_small. Compile all relevant VHDL files (design, testbenches and configurations) and make sure that GCDs are indeed computed. Then go through all steps of the design flow as explained above. Do not forget to compile the flattened gate-level netlist and check that it is instantiated Write down the area, the RT-level and gate-level estimations of power.
Repeat the step above for the three other cases. Make sure that you modify the value of the RUN_ID variable when invesigating a new case. In this way, your log files are not overwritten.
Now compare all power and area figures and comment on them. It turns out that the designs with clock gating have a smaller area than their counterparts without clock gating. One would expect the opposite as clock gating itself costs area. Explain what is going on (hint: check the standard cells in both designs and consult the data sheet (see exercise SYN) to understand the differences).
The given design is a maximally parallel implementation of the filter in which 5 multipliers are instantiated. Think of ways to improve the power consumption while having 5 multipliers. Motivate each modification that you propose and the expected effect on power consumption. Then implement the proposal and estimate its power at the gate level (RT-level estimations tend to be less accurate). Keep also track of the area of each solution that you implement.
Once you have sufficiently explored the maximally parallel implementation, investigate implementations with 3, 2 or 1 multiplier. Increase the clock frequency (both in the power-config file as well as in the VHDL testbench configuration by assigning an appropriate value to the generic parameter half_clock_period) such that all implementations keep consuming input samples and producing output samples every 200 ns. While these solutions should show a decrease in area, an increase in power consumption is expected in general (e.g. due to multiplexing uncorrelated data streams).
Keep investigating alternative architectures until you have spent some 40 to 45 hours on project POW. Clever design solutions that reduce power, are of course the main goal. However, if you cannot improve a design, you may also consider design modifications that are expected to deteriorate the power consumption and verify your expectation by means of the power-estimation flow. Keep your designs realistic, however, and do not make designs that are extremely wasteful in power consumption.
Implement the CORDIC in accordance with the guidelines given above. Make your design decisions with power consumption in mind (motivate in your report). Due to the complexity of the design, it may not be feasible to make multiple implementations. Even if you have a single design variant, you should estimate its power with the flow presented here and comment on the results found.