VLSI System Design

Project POW: Power Estimation and Design for Low Power

This project is part of the examination for the VLSI System Design course at the University of Twente. The goals of this project are:

It is assumed that the student is familiar with VHDL simulation and synthesis basics as taught in the System-on-Chip Design (SoC) course. Please review (and redo) the exercises VHD and SYN the descriptions of which are available from the web page of that course if you feel uncertain about the topic. The software environment (tool versions) is the same as for the current exercise.

The description below refers to various file names. Once you have logged in, execute the command:

get-module pow pow

to get them in a subdirectory pow, or copy them manually from the directory /home/practice/soc/exercise/modules/pow to directory pow (do not forget the "invisible" file .synopsys_dc.setup).

The circuit for this exercise is almost the same as in exercise SYN mentioned above: it consists of the entity siso_gen (generic serial-in serial-out module) with architecture gcd (greatest common divider, using Euclid's algorithm). For practical reasons, the scan signals related to design for testability have temporarily been removed (commented in the VHDL code).

Principles of Power Estimation

You are already familiar with the Synopsys Design Compiler. The Synopsys Power Compiler adds functionality related to power estimation and low-power optimizations to the Design Compiler. Both tools are available from the same command-line interface dc_shell-t. You do not need to deal with this interface directly as scripts are available that call this interface automatically. However, the scripts are not fully hidden from you as they are in the SoC course. If you know what you are doing, you can modify the scripts (at your own risk).

As you know from the theory, power consumption in digital CMOS circuits is stronlgy related to switching activity. In order to collect realistic information on switching activity, the design should be simulated using data streams that are representative for the actual use of the integrated circuit. The Modelsim simulator is used for that purpose in the design flow of this exercise.

The flow uses simulations at two abstraction levels:

The communication between the Synopsys tools and Modelsim is based on files that use the Switching Activity Interchange Format (SAIF). There are two versions of this type of files. Forward SAIF files are generated by Synopsys and read by Modelsim. They list the signals of which the simulator should monitor the switching activity. Backward SAIF files are used for the reverse communication. They require a Synopsys run-time library that is linked to Modelsim and takes care of collecting the information on switching activity. The backward SAIF file will contain the following annotations for the signals that have been monitored:

This information seems to be sufficient for performing a power estimation at the desired level of accuracy.

Forward SAIF generation is only necessary for RT-level power simulations. In gate-level simulations all signals on standard-cell boundaries are automatically monitored by default.

Design Flow

Before you can do any type of power simulation or optimization, you should first make sure that your design is correct. You should work as you have learnt in the SoC Design course, write your VHDL code, perform interactive Modelsim simulations, etc. The check-design script from the SoC Design course that checks your coding style with LEDA is also available. Synthesis is being performed by the scripts below. However, you should verify your synthesized design with interactive simulations before going ahead with gate-level simulations on behalf of power estimation.

The design flow that has been set up for this exercise consists of a number of scripts that you do not need to touch and a central file power-config in which you declare all relevant information for your synthesis and power estimations. This file should always be modified carefully in order to ensure that all scripts function well. The first variable to set is RUN_ID. It should be a unique name that identifies your current experiment. Its value is used in log files, SAIF files, etc. The meanings of other variables are explained in the accompanying comments.

Files that are explicitly generated by the scripts are located in the subdirectory synopsys_out.

The following scripts are available (in principle, you need to execute them all in the given order):

Exercise POW-1: Getting Started

There are, in general, three performance parameters involved in VLSI design: area, speed, and power. In order to keep things simple, speed is kept out of consideration in this project: the clock frequency is fixed to the relatively low frequency of 5 MHz (the clock period is 200 ns). This means that one concentrate on the trade-off between area and power. One expects that improving the power consumption costs area, but this is not always the case.

As mentioned earlier, the 16-bit gcd architecture of entity siso_get is used as a vehicle to explore the design flow for power estimations and low-power design. There are two input data streams for this design: gcd16_small.in contains small numbers and gcd16_large.in contains large numbers (actually the same numbers as gcd16_small.in multiplied by 127). It is expected that more power is consumed in the case of large numbers as more MSB bits are involved in the calculations.

One of the goals of this exercise is to investigate the effects of automatic clock gating. This investigations should be performed both for the data streams with small and with large numbers. This results in four design variants, with and without clock gating for both input streams. Four <run_id>'s have already been predefined for these four cases in the configuration file.

First consider the case with small numbers and no clock gating, identified by RUN_ID=nocg_small. Compile all relevant VHDL files (design, testbenches and configurations) and make sure that GCDs are indeed computed. Then go through all steps of the design flow as explained above. Do not forget to compile the flattened gate-level netlist and check that it is instantiated Write down the area, the RT-level and gate-level estimations of power.

Repeat the step above for the three other cases. Make sure that you modify the value of the RUN_ID variable when invesigating a new case. In this way, your log files are not overwritten.

Now compare all power and area figures and comment on them. It turns out that the designs with clock gating have a smaller area than their counterparts without clock gating. One would expect the opposite as clock gating itself costs area. Explain what is going on (hint: check the standard cells in both designs and consult the data sheet (see exercise SYN) to understand the differences).

Main Project

The main project comes in two variants: the standard version and the challenge version. You should choose for ONLY one of the two!

Standard Version of Main Project: Power-Area Design-Space Exploration for a Second-Order IIR Filter

This exercise is a continuation of the exercise SEC of the Sysetem-on-Chip Design course. The files needed from that exercise are included in the bundle of files for the current project, POW. Please (re)familiarize yourself with that exercise: study the text until the end of exercise SEC-2 and perform some interactive simulations. Do not yet synthesize the design. Synthesis will be done as part of the power estimation flow.

Exercise POW-2

Apply the power-estimation flow to the given version of the filter, using a clock period of 200 ns and a word length of 10 bits. Assume that the given file sec.in is representative input data stream for the purpose of power estimation. Do not introduce clock gating yet.

The given design is a maximally parallel implementation of the filter in which 5 multipliers are instantiated. Think of ways to improve the power consumption while having 5 multipliers. Motivate each modification that you propose and the expected effect on power consumption. Then implement the proposal and estimate its power at the gate level (RT-level estimations tend to be less accurate). Keep also track of the area of each solution that you implement.

Once you have sufficiently explored the maximally parallel implementation, investigate implementations with 3, 2 or 1 multiplier. Increase the clock frequency (both in the power-config file as well as in the VHDL testbench configuration by assigning an appropriate value to the generic parameter half_clock_period) such that all implementations keep consuming input samples and producing output samples every 200 ns. While these solutions should show a decrease in area, an increase in power consumption is expected in general (e.g. due to multiplexing uncorrelated data streams).

Keep investigating alternative architectures until you have spent some 40 to 45 hours on project POW. Clever design solutions that reduce power, are of course the main goal. However, if you cannot improve a design, you may also consider design modifications that are expected to deteriorate the power consumption and verify your expectation by means of the power-estimation flow. Keep your designs realistic, however, and do not make designs that are extremely wasteful in power consumption.

Challenge Version of Main Project: Rotation-Mode CORDIC

Study the CORDIC algorithm as presented in the course VLSI Signal Processing (review paper by Andraka and slides). The goal is to build a rotation-mode CORDIC. Some guidelines:

Exercise POW-3

Implement the CORDIC in accordance with the guidelines given above. Make your design decisions with power consumption in mind (motivate in your report). Due to the complexity of the design, it may not be feasible to make multiple implementations. Even if you have a single design variant, you should estimate its power with the flow presented here and comment on the results found.

Deliverables


Last update on: Sun Aug 27 22:25:49 CEST 2006 by Sabih Gerez.