March 2009 - February 2011



# Center for Embedded Systems

An NSF Industry/University Cooperative Research Center

## **Results Report: Years 1 & 2**















## **Results Report: Years 1 & 2**

## A bit of history

Originally called the Consortium for Embedded Systems, this organization was established in 2001 with support from Intel, Motorola, and Arizona State University. The goal was to build an ecosystem of knowledge and expertise in embedded systems.

#### Fast forward a decade

Now known as the Center for Embedded Systems, the Center runs a successful internship program, conducts joint industry-university research projects, provides research support for graduate students and faculty, awards fellowships to students, and administers a relevant embedded systems curriculum in collaboration with industry partners. In 2009, the National Science Foundation designated the Center an "industry / university cooperative research center" (I/UCRC), one of more than 50 centers in the country, each covering a different technology area.

#### Vision for the future

To compete successfully in today's knowledge economy, industry needs a skilled workforce with post-baccalaureate education covering a range of topics across several engineering disciplines. To this end, we are embarking on a program encompassing education, industrial training, innovative research and entrepreneurship. This report summarizes results in these areas for the Center's first two years of operation as an NSF I/UCRC.



#### Academic Members:





## **Industry Partners:**







Raytheon









💬 ΤΟΥΟΤΑ





## **Center for Embedded Systems Research Areas of Expertise**

#### Power, Energy and Thermal Aware Design

- Low power circuit architectures and design tools
- Dynamic peformance, power, energy and thermal management for multicore embedded systems
- Statistical variation aware design of digital systems
- Energy efficient architectures and code optimization for embedded systems

#### Electronic System-level Design (ESL) and Technologies

- Modeling and simulation
- Hardware/software co-design and optimization
- Trusted, reliable, and secure design

#### **Embedded Multicore Architectures and Programming**

- Network-on-Chip design and optimization
- Compilation of stream applications on multicore processors
- Highly power-efficient programmable accelerators
- Soft error resilient system design
- Design and programming of low power embedded systems
- Embedded GPU computing
- Temperature- and variations-aware architectures and programming

#### **Embedded Software System**

- Real-time scheduling
- Embedded systems for smart grids
- Middleware and VM for embedded systems
- Embedded software instrumentation and tools

#### **Cyber-Physical Systems**

- Modeling and simulation
- Model based formal verification and semi-formal testing
- Model based synthesis from high-level specifications

#### Integrated Circuit Technologies, Design, and Test

- Semiconductors for hostile environments
- Device physics and modeling
- Microelectronic device and sensor design and manufacturing
- Analog/RF/mixed signal circuit design and test
- Testing and silicon debug of digital circuits









## **Technology Advances and Economic Impact**

## CES Metrics/Economic Impact

| Universities                     |
|----------------------------------|
| Industry Members                 |
| Research Projects                |
| Faculty Researchers              |
| Professional (staff)             |
| Research Associates (bachelors)  |
| Research Associates (masters)    |
| Research Associates (doctorates) |
| Industry Internships             |
| Graduates / Hires                |
| Students through Curriculum      |
| Presentations                    |
| Research Publications            |
| Products / Tech Transfer         |
| Patents                          |
|                                  |





## Technology Advance: Design Tool for Mobile Low Power Processors

Embedded smart devices such as cellular phones and tablets have emerged as the new technology drivers for the semiconductor industry. According to a recent study there are 5.2 billion cellular phone subscribers worldwide, and the market has grown by 15% over the previous year.\* The smart phone market has grown by 72% in the same period. It is expected that the number of low power mobile processors that are used by such devices will hit the 500 million mark by 2015.\*\* CES Researchers work with the market leader in mobile low power processors aimed at smart phones and tablets.

Current-day mobile low processor chips have evolved from erstwhile single core processors into multi-core architectures that integrate 10-20 processor cores, 40-60 customized hardware units or accelerators, and many memory blocks. In other words, the state-of-the-art mobile processors integrate upwards of 100 fairly complex intellectual property blocks (processor cores, hardware accelerators or memory blocks) or "IP blocks" into a single chip. Such architectures are emerging because of an ever-increasing need for higher performance, stringent low power requirements and short time to market. Consequently, mobile processor designers integrate several pre-designed IP blocks in to a complex system-level multi-core architecture that is able to deliver the performance within the limited power budget of the battery pack.

Because the overall chip architecture is, in fact, an integration of many IP blocks, the on-chip interconnection architecture that connects these IP blocks together in to a cohesive system has emerged as a key determinant of the mobile processor performance and power consumption. The interconnection architecture is implemented as a Network-on-Chip much like the Internet of today, and is built up of routers that are connected to each other and IP blocks.

As part of a Center sponsored project, experts on Network-on-Chip (NoC) developed a computer-aided design (CAD) tool chain for developing the NoC architecture for future mobile processor chips. The NoC tool chain can automatically generate a high performance and low power on-chip interconnection architecture that is able to successfully address several design requirements including multiple traffic classes (such as guaranteed throughput, best effort), multiple use-cases, deadlock avoidance, multiple clock islands, bit-width optimization and router arity constraints. The tool chain is able to automate and perform a design task that takes several weeks of manual effort in a matter of minutes. Consequently, the synthesized interconnection architecture and the overall mobile processor as a whole depicts better performance, lower power consumption, and takes a much shorter time to design.

## **Economic Impact:**

N R H N 3 3 7

Next generation smart phones will have much higher performance requirements with the same or incrementally longer battery lifetimes as current phones. Consequently, future generations of mobile processor chips will integrate an ever-increasing number of IP blocks (several hundreds) on the same chip. The NoC design tool developed as part of CES research will help make these technology advancements possible for companies competing in the mobile low processor market.

<sup>\*</sup> Oct 2010, http://mobithinking.com/mobile-marketing-tools/latest-mobile-stats

<sup>\*\*</sup> http://www.petrovgroup.com/pdfs/March%2016%20%20IM%20processors%20 DigiTimes.pdf



## Noc-Noc ESL Design Infrastructure



## **Center Work Products**

## **Publications**

- 1. Skoufis, M.N. and Tragoudas, S., "An online failure detection method for data buses using multithreshold receiving logic," *IEEE Trans. Computers*, vol. 61, no. 2, pp. 187-198, 2012.
- 2. Hanumaiah, V and Vrudhula, S., "Performance optimal online DVFS and task migration techniques for thermally constrained multi-core processors," *IEEE Transactions on Computer-Aided Design of Integrated Circuits*, vol. 30, pp. 1677-1690, Nov. 2011.
- 3. S. Gangadhar and S. Tragoudas, "A probabilistic approach to diagnose SETs," in *Proceedings of the 2011 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)*, pp. 261-267, October 3-5 2011.
- 4. Karmarkar, K. and Tragoudas, S., "Error correction encoding for multi-threshold capture mechanism," in *Proceedings of the 17th IEEE International On-Line Test Symposium, (IOLTS 2011)*, pp. 157-162, July 2011.
- 5. Pierce, L. and Tragoudas, S., "Multi-level secure JTAG architecture," in *Proceedings of the 17th IEEE International On-Line Testing Symposium*, (IOLTS 2011), pp.208-209, pp. 13-15, July 2011.
- 6. Gangadhar, S. and Tragoudas, S., "An analytical method for estimating SET propagation," in *Proceedings of the 27th IEEE VLSI Test Symposium (VTS 2011)*, pp. 197-202, May 1-5 2011.
- Chalivendra, G., Hanumiah, V., and Vrudhula, S., "A New Balanced 4-moduli set {2^k;2^n-1, 2^n+1, 2^(n+1)-1} and its reverse converter design for efficient FIR filter implementation," in *Proceedings of the Great Lake Symposium for VLSI (GLSVLSI)*, (Lausanne, Switzerland), May 2011.
- 8. Hanumiah, V. and Vrudhula, S., "Reliability-aware thermal management for hard real-time applications on multi-core processors," in *Proceedings of the Design Automation Test in Europe (DATE)*, (Grenoble, France), March 14-18, 2011.
- Annapureddy, Y.S.R., Liu, C., Fainekos, G., and Sankaranarayanan, S., "S-TaLiRo: A Tool for Temporal Logic Falsification for Hybrid Systems, "in The *Proc. of Tools and Algorithms for the Construction and Analysis of Systems* (book), vol. 6650/2011, Saarbrucken, Germany, March 2011.
- 10. Hanumaiah, V and Vrudhula, S., "Temperature-aware DVFS for hard real-time applications on multi-core processors," *IEEE Transactions* on Computers, 2011 (to appear).
- 11. Abbas, H. and Fainekos, G., "Linear Hybrid System Falsification through Local Search," Automated Technology for Verification and Analysis, vol. 6996, pp. 503-510, 2011.
- 12. Fainekos, G., "Tools and Algorithms for the Construction and Analysis of Systems," in *Proceedings of the 17th International Conference,* (TACAS), 2011.
- 13. Zhang, C. and Wang, H., "Reduction of parasitic capacitance impact in low-power SAR ADC," *IEEE Transactions on Instrumentation and Measurement*, issue 99, pp. 1-8, November 10, 2010.
- 14. Annapureddy; Y.S.R. and Fainekos, G, "Ant colonies for temporal logic falsification of hybrid systems," in *Proceedings of the 36th Annual Conference of IEEE Industrial Electronics*," (Glendale, AZ),November 2010.
- 15. Mohamed, M., Muhammad, R. and Harackiewicz, F.J., "Ultra wideband hybrid dielectric resonator antenna (DRA) with parasitic ring", in *Proceedings of the International Conference on Wireless Information technology and Systems*, 2010.
- Morsy, M.M., Khan, M.R., and Harackiewicz, F.J., "Ultra wideband hybrid dielectric resonator antenna (DRA) with parasitic ring," in Proceedings of the 2010 IEEE International Conference on Wireless Information Technology and Systems (ICWITS), (Honolulu, Hawaii), pp. 1-4, August 28 - September 3 2010.
- 17. Skoufis, M.N. and Tragoudas, S., "On-line detection of random voltage perturbations in buses with multiple-threshold receivers," in *Proceedings of the 16th IEEE International On-Line Test Symposium, (IOLTS 2010)*, pp. 249-254, July 2010.
- 18. Skoufis, M.N., Karmarkar, K., Tragoudas, S. and, Haniotakis, T., "A data capturing method for buses on chip," from *IEEE Trans. on Circuits and Systems*, vol. 57-I, no. 7, pp. 1631-1641, 2010.
- 19. C., Zhang, Wang, H., and Yen, M., "Power analog circuit design for RFID sensing circuits," in *Proceedings of the 2010 IEEE International Conference on RFID*, pp. 16-21, April, 14-15 2010.
- 20. Karmarkar, K. and Tragoudas, S., "Scalable codeword generation for coupled buses," *Design Automation and Test in Europe (DATE)*, pp. 729-734, February 2010.
- 21. Lee, Y-H, Song, Y.W., Girme, R., Zaveri, S., and Chen, Y., "Replay debugging for multi-threaded embedded software," in *Proceedings of the IEEE/IFIP International Conferences on Embedded and Ubiquitous Computing*, 2010.

Arizona State University

## Robust Testing for Networked Control Systems and Mixed-Signal Systems

Researcher: Georgios Fainekos | Student: Yashwanth Annapureddy

### **Project Overview**

- This project addresses the problem of the functional verification of complex cyber-physical systems
- The goal of the project is the development of randomized algorithms and software tools for the detection of operating conditions that generate undesirable system behaviors

## **Highlights/Technology Transfer**

- Development of an automated testing technique for complex models developed in Simulink/Stateflow
- Development of an improved testing methodology for systems that can be modeled as affine hybrid automata
- Our framework allows the formalization of functional requirements and the early detection of design errors at two different stages:
  - 1. at the stage of eliciting the functional requirements
  - 2. at the stage of building a formal model of the system

#### **Executive Summary**

- The verification of the functional correctness of safety critical cyber-physical systems is a challenging and extremely urgent problem
- We developed randomized tools and algorithms that provide a fully automatic framework for the detection of system behaviors that do not satisfy such functional requirements

http://www.public.asu.edu/~gfaineko/

#### Model Based Development (MBD) cycle using S-TALIRO



## **Project Tasks Deliverables**

Development of a heuristic optimization algorithm (EACO) based on the methodology of Ant Colony Optimization for the temporal logic falsification problem

Theoretical framework for combining stochastic and robust testing on affine hybrid automata

Tuning of the parameters of EACO on a number of benchmark problems

Development of a theoretical framework for modular falsification performance/power/resource usage



## **Project Overview**

The main research focus is to investigate ultra low power ADC circuits for the target telemetry circuit and other low-power sensing applications.

Developed circuit techniques can be evaluated by design engineers of the member companies, and potentially benefit their commercial design.

Developed sensing devices that consist of sol-gel sensor and telemetry circuits can be experimented with commercial products from member companies, potentially helping the companies find new niche markets for their products.

## **Highlights/Technology Transfer**

Completed design of the telemetry circuit: both schematic and layout (using a CMOS 0.13u technology).

Developed circuit techniques to implement ultra low-power ADC circuits and far-field telemetry circuits.

Fabricated telemetry circuits and complete sensing devices containing both telemetry IC and sol-gel sensor will be the deliverables of the next phase of the project.

## System Block Diagram

#### Components that have been designed:

- Charge-pump rectifier
- Voltage regulator
- Resistance detection
- ADC



Arizona State University

## Replay Debugging for Multi-threaded and Multi-core Embedded Systems

Researcher: Yann-Hang Lee | Student: Young Wn Song

## **Project Overview**

When we analyze and debug embedded software:

- · Significant overhead due to probe effect
- Doesn't work well for embedded software which is with multiple threads, I/O and timing dependent, and nondeterministic

Use reproducible execution in debugging, profiling, and program analysis

- To avoid instrumentation overhead in real-time execution
- To emulate program execution in different architecture (single and multi-core)

## **Highlights/Technology Transfer**

#### Reproducible execution

- Execution sequence > Partial order of synchronous events
- Preserve the order and apply the same messages and IO > reproducible execution

#### **Event Order Variation Analysis**

- Verify that the execution with recording is as same as the one without recording
- Get event ordering with no probe effect through simulating timed execution behavior and processor scheduler
- Incorporate the multi-core scheduler for analysis of software migration from single core to multicore processors

## **Project Tasks Deliverables**

A record/replay library for Linux and vxWork environment

A plug-in of Eclipse to invoke record/replay-based debugger functionality

A record/replay facility for Android applications

Simulation-based probe-effect analyzer

Replay-based execution analysis, profiling, and race detection

CORBI project – integration and support to collect static and dynamic metrics for the targeted risk analysis model

## **Executive Summary**

## To develop a record/replay framework to support program execution analysis.

- A recording step of minimal overhead to enable a replay of thread events in their original logical order.
- A replay of the program to support comprehensive analysis. The results:
- Avoid instrumentation overhead which may alter the execution behavior.
- Use correct execution semantics at thread level to approximate the details of execution behavior at architectural level

#### http://rts.lab.asu.edu/ReplayDebugger









SIUC Research Report | Southern Illinois University Carbondale

Alien Hardware Detection in Integrated Circuits Through Delay Measurements and Computations

Researcher: Spyros Tragoudas

## **Project Overview**

A scalable algorithmic approach to determine the positions of the injected alien hardware

- · Safeguarding intellectual property, secure design
- The presence of malicious integrated circuits (ICs) in warfare tools or medical equipment may lead to catastrophic consequences

#### Scalable Algorithmic Approach involving

- Integer Linear program (ILP) modeling
- Bounded-Satisfiability modeling

## **Highlights/Technology Transfer**

## **Executive Summary**

Each constraint of the ILP is modeled as a clause with non-negated literals.

Clauses are bounded based on the excessive delay

Consider the following constraints: G1 + G9 = 1  $G3 + G7 + G12 \le 2$  G3 + G6 + G10 = 1 G8 + G12 = 1G5 + G10 = 0

The CNF is modeled as: F = (G1 V G9) ^ (G3 V G6 V G10) ^ (G8 V G12) ^ (G3 V G7 V G12) 1 1 2

Netlist

Test

pattern

generation

compute path CUT

path

Bounds on Clauses: (G5 V G10) = 0 G5 = 0, G10 = 0

"SAT based identification of multiple delay defects in integrated Circuits", internal document submitted to member.

A scalable algorithmic approach to identify paths with excessive delays (where alien hardware exists)

An integer linear program driven software tool to identify a set of positions of the alien hardware

Software tool to prune the search space of the ILP and help identify the location of alien hardware with utmost accuracy

ATPG assisted software tool for better diagnosis of alien hardware

Software tool to identify all strong robustly (SR) testable path delays

Software tool to identify all Advanced Measurable (AM) paths

| A scalable algorithmic approach to identify paths with excessive<br>delays (where alien hardware exists)                    | March 2011 | Complete | Compare                        |
|-----------------------------------------------------------------------------------------------------------------------------|------------|----------|--------------------------------|
| An integer linear program driven software tool to identify a set of<br>positions of the alien hardware                      | March 2011 | Complete | <b>—</b>                       |
| Software tool to prune the search space of the ILP and help<br>identify the location of alien hardware with utmost accuracy | March 2011 | Complete | Restricted<br>clauses<br>& CNF |
| ATPG assisted software tool for better diagnosis of alien hardware                                                          | March 2011 | Complete | SAT solver                     |
| Software tool to identify all strong robustly (SR) testable path delays                                                     | March 2011 | Complete | Collect                        |
| Software tool to identify all Advanced Measurable (AM) paths                                                                | March 2011 | Complete | suspects                       |

Arizona State University

Memory-Aware Compilation for Modern Multi-core Processors

Researcher: Aviral Shrivastava | Student: Ke Bai

## **Project Overview**

#### Problem / Rationale

Embedded systems are going to use multicore systems, however programming them is tough. Our compiler technology makes the extremely power-efficient distributed-memory multicore processors easier to program.

#### **Project Description**

Manage all the code and data of the application in a constant amount of space in the local memory of each core. Toward this, we will develop compiler techniques to:

- Manage stack data in limited space
- Manage heap data in limited space
- · Manage code in limited space

## **Executive Summary**

#### Irreversible trend toward multicores

 Distributed memory multicores are extremely power-efficient, but very difficult to program

#### Objective of this proposal

- Compiler does automatic memory management
- Greatly simplifies programming
- Able to exploit power-efficient execution

#### Work

- Automatic memory management
- Integrate in GCC

#### http://aviral.lab.asu.edu/?p=95

## **Highlights/Technology Transfer**

#### Cell SPU Compiler ready for delivery and on-site testing

#### 6 conference papers, 2 journals, and 4 Masters thesis

SDRM: Simultaneous Determination of Regions and Function-to-Region Mapping for Scratchpad Memories, HIPC 2009

Y1-2

Y1-2

Y1-2

Y1-2

Y1-2

Ongoing

Complete

Complete

Complete

Complete

- A software solution for dynamic stack management on scratch pad memory, ASPDAC 2009
- A Software-only solution to use Scratch Pads for Stack Data, TCAD 2009
- Dynamic code mapping for limited local memory systems, ASAP 2010
- Heap Data Management for Limited Local Memory (LLM) Multi-core Processors, CODES+ISSS 2010
- Stack Data Management for Limited Local Memory (LLM) Multi-core Processors, ASAP 2011
- Vector class on limited local memory (LLM) multi-core processors, CASES 2011
- A Software-Only Scheme for Managing Heap Data on Limited Local Memory (LLM) Multi-core Processors, TECS (to be published)

## **Project Tasks Deliverables**

Develop technique to manage code in limited space Y1-2 Complete

Integrating code management in GCC

Develop technique to manage stack data in limited space

Integrating stack data management in GCC

Develop technique to manage heap data in limited space

Integrating heap data management in GCC



Extremely power-efficient cell processor has distributed memory

Arizona State University Automated Design and Evaluation of

**Network-on-Chip Architectures for Communication Centric System-on-Chip Devices** 

Researcher: Karam Chatha | Students: Glenn Leary, Jyothi Arlagaddha

## **Project Overview**

#### Problem / Rationale

- Advent of multi-processor system-on-chip (MPSoC) devices with hundreds of IP blocks (processors, hardware accelerators and memories)
- Project addresses the design and evaluation of network-on-chip (NoC) based on-chip communication architecture for MPSoC devices

#### **Project Description**

- Project will generate the NoC architecture and its detailed evaluation for a commercial grade SoC aimed at wireless communication market
- NoC architecture will consist of a graph based description of the topology and register transfer-level description of the architecture

## **Highlights/Technology Transfer**

- PI and students worked closely with members R&D to expand the ASU NoC tool chain capabilities
- Tool chain can now address multiple clock islands and varying bit-widths
- The ASU NoC tool chain is now fully integrated with member R&D NoC design process
- Design and evaluation of two commercial grade SoC was conducted as part of the project

## Proj

| Project Tasks Deliverables                                                                       |           |        |
|--------------------------------------------------------------------------------------------------|-----------|--------|
| Characterizing the communication requirements of the SoC, and constraints of the NoC library.    | Quarter 1 | Comple |
| Generating the first RTL design for the NoC and evaluating its power/performance/resource usage. | Quarter 2 | Comple |
| Customization of the tool chain to target the SoC and library con-<br>straints                   | Quarter 3 | Comple |
| Generating the final NoC design and documenting its performance/power/resource usage             | Quarter 4 | Comple |

## **Executive Summary**

Advent of multi-processor system-on-chip (MPSoC)

- Hundreds of IP blocks on a chip
- Daunting on-chip communication challenges
- Network-on-Chip as solution

Project focuses on network-on-chip design tools

- Evaluation with commercial grade SoC designs
- Integration with member R&D NoC design flow

## http://chatha.faculty.asu.edu/lab\_website/noc\_noc.html



SIUC Research Report | Southern Illinois University Carbondale

## **Dielectric Resonator Antennas (DRAs)**

Researcher: Frances J. Harackiewicz Students: Hemachandra Gorla, David Addison, Mohammed M. Morsy

## **Project Overview**

#### Problem Statement

- Designing of the wideband DRA(e.g. 2-18GHz
- Designing of the wideband DRA for low frequency band(e.g. 50Mhz)

#### **Project Description**

- Resonator antennas are good by nature
- Dielectric antennas can be smaller in size than metal radiators
- High dielectric materials will offer small size

## Executive Summary

Design the several DRAs with CST software

Parametric analysis

Fairfield simulation and impedance matching

Building the prototype

Measure the results with vector network analyzer and anechoic chamber

Compare the simulated and measured results

## Highlights/Technology Transfer

- Exploring the electrically small DRAs for wideband and low frequency applications
- Extend bandwidth and low frequency with complex designs using spirals
- Best design simulations for the building prototypes
- Several parametric simulations on the rectangular DRA and cylindrical DRA
- Validate prototype with measurments

| Investigating the primary designs for wideband<br>(Rectangular DRA)                        | May 2011    | Complete |
|--------------------------------------------------------------------------------------------|-------------|----------|
| Investigating the primary design for low frequency band (Rectangular DRA, Cylindrical DRA) | June 2011   | Complete |
| Applying the parametric analysis of different parameters of the<br>wideband DRA            | July 2011   | Complete |
| Measurements for the cylindrical DRA for wideband DRA                                      | August 2011 | Ongoing  |
| Low frequency DRA simulation                                                               | August 2011 | Complete |



SIUC Research Report | Southern Illinois University Carbondale

## **Distance Estimation to a Transmitter** With a Secure Network of Receivers

Researcher: Spyros Tragoudas

## **Project Overview**

Problem / Rationale (benefit to industry members)

- Finding a transmitter in an unknown location with the known location of an array of receivers
- Inverse of telecommunications localization problem

## **Executive Summary**

#### Location approximation for radio-frequency (RF) power source

- Networked, intelligent on-line analysis of signal characteristics
- GPS unavailable

#### Approach

- No transmitter cooperation required



| Specifications for system to estimate distance on straight-line path to Tx    | Complete |
|-------------------------------------------------------------------------------|----------|
| Specifications for system to estimate distance in low-obstruction environment | Complete |
| Refinements to system to cope with urban environments                         | Ongoing  |

Arizona State University

## Modeling and Optimization of Energy Efficient Multicore Processors

Researcher: Sarma Vrudhula | Student: Vinay Hanumaiah

## **Project Overview**

#### Dynamic thermal management (DTM) of multi-cores

A unified framework to automatically, and dynamically control the speeds and voltages of multi-core processors, move tasks among cores, control the fan speed, etc., targeting various objectives

**Objectives:** Maximize performance, Minimize Peak Temperature, Maximize Performance/Watt

**Constraints:** Maximum temperature, frequency-voltage relationship, target application BER (bit-error rate), tasks' start and end times (deadlines)

**Deliverable:** Build a practical solution to deploy on real processors

## **Executive Summary**

#### Unified DTM framework for multi-core processors

- Objectives: maximum performance, minimum peak temperature, maximum performance/Watt, min. power
- Controls: DVFS, task migration and active cooling
- Constraints: maximum temperature, task deadlines, memory BER, complex frequency-voltage-BER relation

#### A DTM simulator incorporating above

- Handle hundreds of cores
- Enable design space exploration

#### **Closed-loop DTM controller**

- Intel Sandybridge processor
- Real-time control for above objectives

## **Highlights/Technology Transfer**

Magma - a fast and accurate thermal-aware design architectural simulator that incorporates

- Design space exploration
- Dynamic thermal management DVFS and task migration
- Leakage dependence on temperature

Built on Matlab<sup>™</sup>. Utilizes HotSpot and PTScalar simulators.

Released a major stable version 2.0 incorporating DVFS and task migration.

Source available for free download at http://vrudhula.lab.asu.edu/magma

| Performance optimal dynamic voltage and frequency scaling (DVFS)                                    | Y1-2 | Complete |
|-----------------------------------------------------------------------------------------------------|------|----------|
| Performance optimal task migration                                                                  | Y1-2 | Complete |
| Minimizing peak temperature using DVFS with constraints on start and end times                      | Y1-2 | Complete |
| Maximizing performance/Watt using DVFS, task migration and ac-<br>tive cooling                      | Y1-2 | Complete |
| Minimizing power consumption using optimal voltage constraints on target Bit Error Rate in Memories | Y1-2 | Complete |
| DTM - I - I for a second allow for all second at the s                                              |      | Ongoing  |







## **Project Overview & Description**

**Problem / Rationale** (benefit to industry members) positions of the injected alien hardware

- · Faster, more reliable on-chip communication
- Improved performance of the network on chip.

#### **Project Description**

- Multi-threshold capture mechanism improves bus performance
- Inbuilt redundancy helps achieve error correction with less redundancy

## **Executive Summary**

Data to be transmitted is encoded using proposed technique and sent over the bus

Multi-threshold comparators determine the range of transient voltage on the bus line

Receiver FSM predicts the transmitted code-word based on information gathered from the comparators

Decoder recovers the original data from the code-word predicted by the Receiver FSM

Throughput is improved by using pipelined architecture

## **Highlights/Technology Transfer**

#### Multi-threshold latching technique proposed to improve bus performance

Based on prediction of behavior of the bus in presence of crosstalk

#### Error detection and correction capabilities

- Early latching technique is capable of error detection to certain extent but cannot perform error correction.
- Proper bus encoding technique used with proposed method can extend the error detection capability to error correction.
- Such encoding will make the proposed method more robust

#### Scalability

Scalability of code-word selection algorithm is improved using binary decision diagrams

| Development of early clock on-chip mechanisms and architectures for buses | 2009       | Complete |  |
|---------------------------------------------------------------------------|------------|----------|--|
| Scalable code-word generation for coupled buses                           | March 2010 | Complete |  |
| Error correction encoding for single bit adjacent range errors            | July 2011  | Complete |  |
| Scalability of code-word selection algorithm for error correction         | Dec. 2011  | Complete |  |



**Received Data** 

Arizona State University

Platform Level Dynamic Switching Between Loosely Timed/Approximate Timed SystemC Models

Researcher: Karam Chatha | Student: Haesung Lee

## **Project Overview**

#### Problem / Rationale (benefit to industry members)

- SystemC TLM 2.0 has emerged has the defacto standard for system-level performance modeling
- Typically there are two distinct SystemC models (loosely timed, LT, and approximately timed, AT) that are developed

#### **Project Description**

- Develop a standardized SystemC template to combine LT/AT coding style into one single model with dynamic LT/AT switching capability
- Develop platform level switching mechanisms that ensure the continuous functional correctness and facilitate architectural exploration.

## **Highlights/Technology Transfer**

- Developed a novel standardized template and co-modeling format for LT/AT dynamic switching
- Integrated functional models for ARM cores from Imperas OVP for evalutions
- Developed a NoC library that incorporates LT/AT dynamic switching
- Implemented 3 benchmarks to evaluate the model
  FFT, DCT, Autocorelation
- Evaluated designs with up to 16 cores

## **Project Tasks Deliverables**

Obtain and familiarize with the following models: processor (ISS and bus accurate), Quarter 1 Complete memory controller and memory (TLM-2.0), interconnect/busses (TLM-2.0), and peripherals (I/O and hardware accelerators in TL-2.0).

| Create a test platform with simple application                                   | Quarter 2 | Complete |
|----------------------------------------------------------------------------------|-----------|----------|
| Investigate and create mechanism for dynamic switching between LT and AT models. | Quarter 3 | Complete |
| Evaluate the switching mechanisms with a sample application and create report    | Quarter 4 | Complete |

## **Executive Summary**

SystemC TLM 2.0 widely used for system-level performance modeling

- Typically two (loosely timed, LT and approximately time, AT) distinct SystemC models are developed and maintained
- LT is fast/inaccurate, AT is slow/more accurate
- Concerns with functional equivalence, and simulation speed of AT

Developed a standard for integrated LT/AT SystemC modeling

- Can dynamically trade-off simulation speed for accuracy
- Evaluated with 16 core ARM design





## **Results Report: Years 1 & 2**

Dr. Sarma Vrudhula Center Director, ASU (480) 965-4748 Vrudhula@asu.edu **Dr. Spyros Tragoudas** Site Director, SIUC (618) 453-7027 Spyros@engr.siu.edu