- Sabih Gerez: Lectures.

- The course is scheduled in the third
quarter (2A). The series of lectures for the academic year 2021-2022
will start on
*Friday, February 11, 2022*at 13:45. As things are now, the lectures will take place on campus. Check MyTimetable for the locations; be aware that the classroom may change at the last moment. - A Canvas page for this course exists for organizational issues such as registering teams, uploading assignment results (maybe), and the distribution of part of the study material.
- It is the intention to stream and record the lectures by means of
*Canvas BigBlueButton conference*. Technical issues could, however, prevent this. - The current (open) page is the main source of information for this course. It contains the lecture schedule, links to slides and links to the study material, most of which is only accessible from the Univerity of Twente domain.

A yellow background color for the slide release date either
means that the version for the current academic year is available
(when the date mentions 2022) or that the slides have not been
updated for the current academic year. A pink background means
that only last year's slides are available for the moment.

Date |
Topic |
To study |
Slides |
Slides released on |

February 11, 2022 | Organization | - | Organization | February 11, 2022 |

February 11, 2022 | Introduction, Models of computation | [Par09] | Introduction | February 5, 2021 |

February 18, 2022 | Architecture synthesis and scheduling | [Ger99] | Architectural Synthesis | February 12, 2021 |

February 18, 2022 | Overlapped scheduling | [Ger98] | ||

February 25, 2022 |
No
lecture, holiday week |
|||

March 4, 2022 | Fixed-point design | [Bou08] | Fixed-Point Design | March 4, 2022 |

March 4, 2022 | The Arx RTL Language and Toolset |
[Hof07] |
Arx Version with audio |
March 4, 2022 |

March 11, 2022 | Algorithm transformations | [Par95] | Transformations
Addendum |
March 14, 2022 |

March 11 + March 18, 2022 | The CORDIC Algorithm | [And98] and [Loe00] | CORDIC | March 19, 2022 |

March 18, 2022 | Polyphase implementation of multirate filters | [Lan02] and [Vai90]
The part of the theory on downsampling is compulsory,
the part on upsampling is optional. |
Polyphase implementation | March 16, 2018 |

March 18 + March 25, 2022 | Multiplierless filter design | [Hew00], [Vor07], [Aks14] and [Kot03] | Multiplierless Filter Design | March 25, 2022 |

March 25, 2022 | Software synthesis | Sections I and II of [Bha00] | Software Synthesis | March 25, 2022 |

April 1, 2022 | Code generation | Sections III and IV of [Bha00] + [Goo05] + [Kes19] | Code Generation | March 26, 2021 |

April 1, 2022 | Modern DSP Architectures |
[Anj11] |
DSP Architectures | April 3, 2022 |

April 8, 2022 | No lecture |
- | - | - |

Most of the material can only be accessed through the collective
subscriptions of LISA, the library services of the University of
Twente. Such access is automatic when on campus. For off-campus
access, please consult the
information page of LISA on this topic. I can
recommend the *lean-library plug-in*.

- [And98]
- Andraka, R.,
*A Survey of CORDIC Algorithms for FPGA-Based Computers*, 6th International Symposium on Field Programmable Gate Arrays, Monterey, CA., pp 191-200, (1998). Online copy (only in UT domain). - [Aks14]
- Aksoy, L., P. Flores and J. Monteiro,
*A Tutorial on Multiplierless Design of FIR Filters: Algorithms and Architectures*, Circuits, Systems and Signal Processing, Vol. 33(6), pp 1689-1719, (2014). Online copy (only in UT domain). - [Anj11]
- Anjum, O, T. Ahonen, F. Garzia, J. Nurmi, C. Brunelli and H.
Berg,
*State-of-the-Art Baseband DSP Platforms for Software-Defined Radio: A Survey*, EURASIP Journal on Wireless Communication and Networking, Vol. 2011(5). Online copy (only in UT domain). - [Bha00]
- Bhattacharyya, S.S., R. Leupers and P. Marwedel,
*Software Synthesis and Code Generation for Signal Processing Systems*, IEEE Transactions on Circuits and Systems---II, Analog and Digital Signal Processing, Vol. 47(9), (September 2000). Online copy (only in UT domain).Caption of

*Figure 6*: last subscript of*y*should be*n-1*instead of*n*.Right column of

*Page 856*: Read*Figure 9(b)*where*9(a)*is mentioned and vice versa.Contents of

*Figure 17*: In order to be consistent with next figures, rewrite "x = a - b" and "y = a - b + c * d". - [Bou08]
- Bouganis, C.S. and G.A. Constantinides,
*Synthesis of DSP Algorithms from Infinite Precision Specifications*, In: P. Coussy and A. Morawiec (Eds.), High-Level Synthesis, From Algorithm to Digital Circuit, Springer, pp 197-214, (2008). Online copy (only in UT domain).Those interested in a detailed analysis of the

*probability density function*of the truncation error after multiplication can consult the followin non-compulsory paper:Ahmadi, A. and M. Zwolinski,

*Fixed-Point Multiplication: A Probabilistic Bit-Pattern View*, Microelectronics Reliability, Vol. 51(4), pp 790-796, (April 2011). Online copy (only in UT domain).You can skip Seciton 11.3 (2D FIR filters).

Page 203, halfway bottom paragraph: twice add a minus sign to 2's exponent (so 2**n should become 2**-n).

Page 204, Equation 11.10: the "close" parenthesis with exponent 2 should move to the end of the equation.

- [Chi12]
- Chiueh, T.D., P.Y. Tsai, I.W. Lai,
*Baseband Receiver Design for Wireless MIMO-OFDM Communications,*Second Edition, IEEE, Wiley and Sons Singapore, (2012). Online copy of Chapter 9 (only in UT domain). - [Ger98]
- Gerez, S.H., S.M. Heemstra de Groot, E.R. Bonsma and M.J.M.
Heijligers,
*Overlapped Scheduling Techniques for High-Level Synthesis and Multiprocessor Realizations of DSP Algorithms*, In: J.C. Lopez, R. Hermida and W. Geisselhardt (Eds.), Advanced Techniques for Embedded System Design and Test, Kluwer Academic Publishers, Boston, pp 125-150, (1998).You can skip Section 6.5.3 on the

*efficient computation of the iteration-period bound*. - [Ger99]
- Gerez, S.H.,
*High-Level Synthesis (Chapter 12)*in*Algorithms for VLSI Design Automation*, John Wiley and Sons, Chichester, (1999).You can skip Section 12.4.3 on

*force-directed scheduling*. - [Goo05]
- Goossens, G., D. Lanneer and P. Dytrych,
*Design of Low Power Processor Cores using a Retargetable Tool Flow*, In: C. Piguet (Ed.), Low-Power Electronics Design, CRC Press, Boca Raton, (2005). - [He98]
- He, S. and M. Torkelson,
*Designing Pipeline FFT Processor for OFDM (De)modulation*, URSI International Symposium on Signals, Systems and Electronics, ISSSE'98, pp. 257-262, (1998). Online copy (only in UT domain). - [Hew00]
- Hewlitt, R.M. and E.S. Swartzlander,
*Canonical Signed Digit Representation for FIR Digital Filters*, IEEE Workshop on Signal Processing Systems, SiPS 2000, Lafayette, LA, pp 416-426, (2000). Online copy (only in UT domain). - [Hof07]
- Hofstra, K.L. and S.H. Gerez,
*Arx: A Toolset for the Efficient Simulation and Direct Synthesis of High-Performance Signal Processing Algorithms*, International Conference on High Performance Embedded Architectures and Compilers, Ghent, Belgium, (January 2007). Online copy (only in UT domain). - [Kes19]
- Kessler, C.W.,
"Compiling for VLIW DSPs", In: S.S. Bhattacharyya et al. (Eds.),
*Handbook of Signal Processing Systems*, Springer Nature, pp. 979-1020, (2019). Online copy (only in UT domain).Sections 1 and 2 are compulsory, the remaining sections are optional.

- [Kot03]
- Kotteri, K.A., A.E. Bell and J.E. Carletta,
*Quantized FIR Filter Design Using Compensating Zeros*, IEEE Signal Processing Magazine, pp 60-67, (November 2003). Online copy (only in UT domain).Optional text!

- [Lan02]
- Langlois, J.M.P., D. Al-Khalili and R.J. Inkol,
*Polyphase Filter Approach for High Performance, FPGA-Based Quadrature Demodulation*, Journal of VLSI Signal Processing, Vol. 32, pp 237-254, (2002). Online copy (only in UT domain).*Correction for Equation 6*: there should not be a factor 2 in front of h_LP(m). - [Loe00]
- Loehning, M., T. Hentschel and G. Fettweis,
*Digital Down Conversion in Software Radio Terminals*, 10th European Signal Processing Conference, EUSIPCO 2000, pp 1517-1520, (2000). Online copy. - [Par95]
- Parhi, K.K.,
*High-Level Algorithm and Architecture Transformations for DSP Synthesis*, Journal of VLSI Signal Processing, Vol. 9, pp 121-143, (1995). Online copy (only in UT domain).You can skip Sections 6 (folding) and 8 (relaxed look-ahead).

*Comments on Figure 6.*The issue is that unfolding can improve the processor utilization. The explanation in the paper is not correct.The schedule shown in Figure 6(b) is

*rate optimal*i.e. it repeats at the iteration-period bound (T0min) value of 3. In this period, the total of the computations to be performed is 9 (4 operations of 2 and 1 of 1) time units. The lower bound on the number of processors is 3 (=9/3). However, this bound cannot be met. The reason is that the schedule needs to repeat every 3 time units. This means that a separate processor is necessary for each of the operations A to D that take two time units (a processor that would execute two of them would require an iteration period of 4). One has an average processor utilization of 75% (9/12).Figure 6(c) shows a schedule of the graph after 2-unfolding. The unfolded graph contains 2 iterations of the original graph. This schedule is also rate optimal which means that the 2 iterations are executed in 6 time units. The optimal number of processor in this situation would be again 3 (=18/6). There now exists a schedule that reaches 100% processor utilization (the available 6 time units per processor can now be filled optimally with operations of 2 time units).

In Figure 6(b), the operations A0, B0, C0, D0 and E0 belong to one iteration. The schedule has an

*iteration period*of 3 (A1 starts 3 time units after A0, etc.) a*latency*of 7 (output on E0) and a*span*of 8 (end of D0).In Figure 6(c), the operations A0/A1, B0/B1, C0/C1, D0/D1 and E0/E1 belong to one iteration. The schedule has an

*iteration period*of 6 (A2 starts 6 time units after A0, etc.) and a*latency*and*span*of 12 (output on E1).*Comments on Figure 9(a)*. According to me, two inequalities are incorrect: r(A2) - r(M1) <= 2 and r(M4) - r(A3) <= -1. - [Par09]
- Park, H.W., H. Oh and S. Ha,
*Multiprocessor SoC Design Methods and Tools,*IEEE Signal Processing Magazine, pp. 72-79, (November 2009). Online copy (only in UT domain). - [Vai90]
- Vaidyanathan, P.P.,
*Multirate Digital Filters, Filter Banks, Polyphase Networks, and Applications: A Tutorial*, Proceedings of the IEEE, Vol. 78(1), pp 56-93, (January 1990). Online copy (only in UT domain).Optional text.

- [Vor07]
- Voronenko, Y. and M. Pueschel,
*Multiplierless Multiple Constant Multiplication*, ACM Transactions on Algorithms, Vol. 3(2), (May 2007). Online copy (only in UT domain).Only study Section 1 (until page 6); the rest is optional.

For the practical parts of the exercises, you will need to connect
to server `xoc2.ewi.utwente.nl` following the
infrastructure-guidelines
page.

The projects for the
*2021-2022 edition* of this course are as given in the table below:

Acronym |
Title |
Max. points |
Nominal
Load |
Start
after lecture of |

MAP | Mapping Data-Flow Graphs to RTL Designs | 30 |
30 hours | March 4, 2022 |

TRA | Data-Flow-Graph Transformations | 10 |
10 hours | March 11, 2022 |

GFS | The GFSK Receiver | 60 |
60 hours | March 18, 2022 |

When ready with all projects, you should send me (Sabih Gerez) your reports for the three projects by e-mail.

The mark will be based on the reports and the defense of your work in an oral examination session. The mark is basically the sum of points obtained for the projects divided by 10:

The performance at the oral exam can lead to a correction of at most one point up or down. In principle, the members of a project team will receive the same mark unless there are strong indications of differences in performance.

The course needs to be terminated within the quarter in which it is taught. The delivery deadlines are as follows:

- The deadline for an unlimited final grade (maximum of 10) is Sunday, April 24, 2022, 22:00.
- The fallback deadline for a final grade limited to a maximum of 7 is Sunday, May 1, 2021, 22:00. If you choose for this deadline, this means that your grade is computed as max(7, "your grade based on the points collected by you, after correction during the oral session").

- The basics of VHDL.
- Using Questasim (Modelsim) for VHDL simulation.
- Using the scripts for VHDL synthesis with Synopsys, understanding synthesis reports on performance measures on speed and area.
- Performing post-synthesis VHDL simulations.

The students concerned do not need to carry out all exercises. It is left up to them to spend as much time as needed to achieve the goals mentioned above. Depending on the students' background 10 to 20 hours are supposed to be sufficient. No reports need to be delivered. This activity does not contribute to the grading of this course.

Before doing any exercise, you need to set up a connection from
your PC to server `xoc2.ewi.utwente.nl` following the
infrastructure-guidelines
page.

- My list of books on different aspects of VLSI design including VLSI for signal processing.

Go (back) to | Sabih's Home Page. |