VLSI System Design

Project POW: Power Estimation and Design for Low Power

This project is part of the examination for the VLSI System Design course at the University of Twente. The goals of this project are:

To gain practical experience in performing power simulations by means of Modelsim and the Synopsys tools Design Compiler and Power Compiler.
To apply your knowledge on design for low power to the design-space exploration of a small design.

It is assumed that the student is familiar with VHDL simulation and synthesis basics as taught in the System-on-Chip Design (SoC) course. Please review (and redo) the exercises VHD and SYN the descriptions of which are available from the web page of that course if you feel uncertain about the topic. The software environment (tool versions) is the same as for the current exercise.

The description below refers to various file names. Once you have logged in, execute the command:

get-module pow pow

to get them in a subdirectory pow, or copy them manually from the directory /home/practice/soc/exercise/modules/pow to directory pow (do not forget the "invisible" file .synopsys_dc.setup).

The circuit for this exercise is almost the same as in exercise SYN mentioned above: it consists of the entity siso_gen (generic serial-in serial-out module) with architecture gcd (greatest common divider, using Euclid's algorithm). For practical reasons, the scan signals related to design for testability have temporarily been removed (commented in the VHDL code).

Principles of Power Estimation

You are already familiar with the Synopsys Design Compiler. The Synopsys Power Compiler adds functionality related to power estimation and low-power optimizations to the Design Compiler. Both tools are available from the same command-line interface dc_shell-t. You do not need to deal with this interface directly as scripts are available that call this interface automatically. However, the scripts are not fully hidden from you as they are in the SoC course. If you know what you are doing, you can modify the scripts (at your own risk).

As you know from the theory, power consumption in digital CMOS circuits is stronlgy related to switching activity. In order to collect realistic information on switching activity, the design should be simulated using data streams that are representative for the actual use of the integrated circuit. The Modelsim simulator is used for that purpose in the design flow of this exercise.

The flow uses simulations at two abstraction levels:

At the register transfer (RT) level. This is the level of the VHDL source files in which memory elements and combinational logic can be identified. Information can be collected about the switching behavior of every memory element output and every primary input on a bit-by-bit fashion. This information can be used to guide the logic synthesis process in the Design Compiler as well as to obtain an estimation of the power consumption after logic synthesis.
At the gate level. This implies the simulation of the netlist as produced by the Design Compiler. In this case, the switching activity of every input and output of every gate in the design can be monitored. This information can then be used for a fairly accurate estimation of the power consumption (it is the most accurate method possible using logic simulation).

The communication between the Synopsys tools and Modelsim is based on files that use the Switching Activity Interchange Format (SAIF). There are two versions of this type of files. Forward SAIF files are generated by Synopsys and read by Modelsim. They list the signals of which the simulator should monitor the switching activity. Backward SAIF files are used for the reverse communication. They require a Synopsys run-time library that is linked to Modelsim and takes care of collecting the information on switching activity. The backward SAIF file will contain the following annotations for the signals that have been monitored:

T0: the time that the signal had logic value '0'.
T1: the time that the signal had logic value '1'.
TX: the time that the signal had an unknown logic value 'X'.
TC: the number of '0'-'1' and '1'-'0' transitions observerd.
IG: the number of '0'-'X'-'0' and '1'-'X'-'1' transitions observed.

This information seems to be sufficient for performing a power estimation at the desired level of accuracy.

Forward SAIF generation is only necessary for RT-level power simulations. In gate-level simulations all signals on standard-cell boundaries are automatically monitored by default.

Design Flow

Before you can do any type of power simulation or optimization, you should first make sure that your design is correct. You should work as you have learnt in the SoC Design course, write your VHDL code, perform interactive Modelsim simulations, etc. The check-design script from the SoC Design course that checks your coding style with LEDA is also available. Synthesis is being performed by the scripts below. However, you should verify your synthesized design with interactive simulations before going ahead with gate-level simulations on behalf of power estimation.

The design flow that has been set up for this exercise consists of a number of scripts that you do not need to touch and a central file power-config in which you declare all relevant information for your synthesis and power estimations. This file should always be modified carefully in order to ensure that all scripts function well. The first variable to set is RUN_ID. It should be a unique name that identifies your current experiment. Its value is used in log files, SAIF files, etc. The meanings of other variables are explained in the accompanying comments.

Files that are explicitly generated by the scripts are located in the subdirectory synopsys_out.

The following scripts are available (in principle, you need to execute them all in the given order):

syn-fwd-saif. This script generates a forward SAIF file for the RTL VHDL code specified in the configuration file. This file has the name rtl_fwd_<run_id>.saif where <run_id> is the value of the variable RUN_ID mentioned above. The log file belonging to the execution of this script has the name log_rtl_fwd_<run_id>. You should always check your log files for error messages. Warnings in log files cannot always be avoided; it is left to your judgement whether something can be done to avoid the generation of warnings.
sim-rtl. This script will run an RT-level simulation with Modelsim in batch mode. In this process the forward SAIF file is read and the backward SAIF file with name rtl_bwd_<run_id>.saif is generated. It simulates the VHDL configuration that is the value of variable RTL_CONF in the configuration file. Make sure that you have already simulated the same configuration interactively and verified that the hardware behaves correctly before running sim-rtl.
syn-rtl. This script reads the backward SAIF file performs logic synthesis and generates a log file with information about timing, area, standard cells used and especially a power estimation based on the backward SAIF file. The name of the log file is: log_syn_rtl_<run_id>. You need to have run the script syn-fwd-saif before syn-rtl because this script does not read VHDL source files but the file <run_id>_rtl.db as created by syn-fwd-saif.
This script also adds automatic clock gating to the design if the variable GATE_CLOCK of the configuration file has value y. This amounts to disabling the clock of flipflops when it can be predicted that the flipflop will not change value in a given clock cycle (see the slides).
Apart from the log file, this script generates hierarchical and flattened standard-cell netlists in VHDL and .db format as well as an SDF file of the flattened design with information on delays. All file names start with <run_id> and have appropriate suffixes and extensions. You should perform all simulations with the flattened netlist. Note that the clock-gating cell is not a single standard cell, but consists of multiple gates. Flattening preserves the hierarchy of the clock-gating cell.
The scirpt generates two power reports: a hierchical one based on the design before flattening and later one based on the flat design.
sim-gate. This script performs a gate-level simulation in batch mode using the VHDL configuration specified as the value of variable GATE_CONF in the configuration file. You are supposed to have verified the correctness of the design by simulating the same configuration interactively first (do not forget to include the SDF file in simulation as explained in the Modelsim manual available from the SoC Design web page). The result of this step is a backward gate-level SAIF file called gate_bwd_<run_id>.saif.
syn-gate. This script finally performs a gate-level power estimation using the backward SAIF file generated in the previous step. The result of the estimation can be found in the log file log_syn_gate_<run_id>.

Exercise POW-1: Getting Started

There are, in general, three performance parameters involved in VLSI design: area, speed, and power. In order to keep things simple, speed is kept out of consideration in this project: the clock frequency is fixed to the relatively low frequency of 5 MHz (the clock period is 200 ns). This means that one concentrate on the trade-off between area and power. One expects that improving the power consumption costs area, but this is not always the case.

As mentioned earlier, the 16-bit gcd architecture of entity siso_get is used as a vehicle to explore the design flow for power estimations and low-power design. There are two input data streams for this design: gcd16_small.in contains small numbers and gcd16_large.in contains large numbers (actually the same numbers as gcd16_small.in multiplied by 127). It is expected that more power is consumed in the case of large numbers as more MSB bits are involved in the calculations.

One of the goals of this exercise is to investigate the effects of automatic clock gating. This investigations should be performed both for the data streams with small and with large numbers. This results in four design variants, with and without clock gating for both input streams. Four <run_id>'s have already been predefined for these four cases in the configuration file.

First consider the case with small numbers and no clock gating, identified by RUN_ID=nocg_small. Compile all relevant VHDL files (design, testbenches and configurations) and make sure that GCDs are indeed computed. Then go through all steps of the design flow as explained above. Do not forget to compile the flattened gate-level netlist and check that it is instantiated Write down the area, the RT-level and gate-level estimations of power.

Repeat the step above for the three other cases. Make sure that you modify the value of the RUN_ID variable when invesigating a new case. In this way, your log files are not overwritten.

Now compare all power and area figures and comment on them. It turns out that the designs with clock gating have a smaller area than their counterparts without clock gating. One would expect the opposite as clock gating itself costs area. Explain what is going on (hint: check the standard cells in both designs and consult the data sheet (see exercise SYN) to understand the differences).

Main Project

The main project comes in two variants: the standard version and the challenge version. You should choose for ONLY one of the two!

Standard Version of Main Project: Power-Area Design-Space Exploration for a Second-Order IIR Filter

This exercise is a continuation of the exercise SEC of the Sysetem-on-Chip Design course. The files needed from that exercise are included in the bundle of files for the current project, POW. Please (re)familiarize yourself with that exercise: study the text until the end of exercise SEC-2 and perform some interactive simulations. Do not yet synthesize the design. Synthesis will be done as part of the power estimation flow.

Exercise POW-2

Apply the power-estimation flow to the given version of the filter, using a clock period of 200 ns and a word length of 10 bits. Assume that the given file sec.in is representative input data stream for the purpose of power estimation. Do not introduce clock gating yet.

The given design is a maximally parallel implementation of the filter in which 5 multipliers are instantiated. Think of ways to improve the power consumption while having 5 multipliers. Motivate each modification that you propose and the expected effect on power consumption. Then implement the proposal and estimate its power at the gate level (RT-level estimations tend to be less accurate). Keep also track of the area of each solution that you implement.

Once you have sufficiently explored the maximally parallel implementation, investigate implementations with 3, 2 or 1 multiplier. Increase the clock frequency (both in the power-config file as well as in the VHDL testbench configuration by assigning an appropriate value to the generic parameter half_clock_period) such that all implementations keep consuming input samples and producing output samples every 200 ns. While these solutions should show a decrease in area, an increase in power consumption is expected in general (e.g. due to multiplexing uncorrelated data streams).

Keep investigating alternative architectures until you have spent some 40 to 45 hours on project POW. Clever design solutions that reduce power, are of course the main goal. However, if you cannot improve a design, you may also consider design modifications that are expected to deteriorate the power consumption and verify your expectation by means of the power-estimation flow. Keep your designs realistic, however, and do not make designs that are extremely wasteful in power consumption.

Challenge Version of Main Project: Rotation-Mode CORDIC

Study the CORDIC algorithm as presented in the course VLSI Signal Processing (review paper by Andraka and slides). The goal is to build a rotation-mode CORDIC. Some guidelines:

The design should be implemented within the given SISO framework, which means that you can reuse the siso_gen entity and its testbench.
The system should consume an x and y value and output some scaled version of arctan(y/x).
The clock frequency should be such that a result is produced every 400 ns.
You are allowed to limit x and y to be positive values.
Accuracy, number of bits, a relevant input data stream and other specifications are left open. You are free to make reasonable choices.

Exercise POW-3

Implement the CORDIC in accordance with the guidelines given above. Make your design decisions with power consumption in mind (motivate in your report). Due to the complexity of the design, it may not be feasible to make multiple implementations. Even if you have a single design variant, you should estimate its power with the flow presented here and comment on the results found.

Deliverables

The power and area figures obtained in POW-1 and a discussion of the results found.
For POW-2: motivation for the different design solutions implemented, tables containing all measurements (power and area), power-area plots and overall conclusions.
For POW-3: see POW-2.

Last update on: Sun Aug 27 22:25:49 CEST 2006by Sabih Gerez.