This project explores temperature sensor architectures for 3DIC System-on-Chips. Conventional temperature sensors employ carefully tuned voltage references to eliminate Vdd sensitivity, incurring significant area/power consumption overhead. In this project, we employ multiple diverse, imperfect sensors, and rely on sensor fusion to extract temperature. We seek to eliminate supply voltage dependency at the sensor level through statistical-learning based characterization of sensors. The result is an ultra-modular, low-footprint, mostly-digital temperature sensor that can connect directly (without a regulated supply voltage) to a wide range of digital supply voltages. Low HW footprint architecture allows it to be placed ubiquitously across the chip stack to provide fine-grained temporal and local temperature information, bringing opportunities for future thermal and power management applications.
Research
Energy-efficient, Robust True Random Number Generators (TRNGs)

The increase in the volume of private data communication between networked devices has created a demand for true random number generators (TRNGs) which are key building blocks in a variety of digital cryptographic systems. Hardware TRNG implementations typically exploit device noise as an entropy source for random bit generation [2]. TRNGs based on timing jitter are simpler to implement in scaled technologies but are dissipative and achieve lower performance compared to metastability-based TRNGs [4]. However, metastability-based TRNGs typically require careful design and calibration to mitigate the impact of PVT variation. Moreover, both jitter and metastability-based TRNGs in standard CMOS exhibit weak randomness due to correlation arising from substrate coupling and 1/f noise, affecting both the quality and output rate of generated bits.
This project presents an energy-efficient versatile, NIST-compliant TRNG architecture capable of generating random bits from an imperfect physical random-number-generator (PRNG) with significant levels of bias and correlation. The key idea behind the proposed architecture is to exploit efficient hardware implementations of 1) a Markov chain-based de-correlator that removes the autocorrelation of an incoming PRNG bit-stream, and 2) a 4-level, Iterative Von-Neumann (IVN) corrector that removes bias from de-correlated bits. The ASIC implementation of the TRNG, combined with a PRNG based on sense amplifier meta-stability, achieves a peak energy efficiency of 2.58 pJ/bit while operating over a wide voltage and frequency range of 0.5–1 V and 4.4–200 MHz respectively.
More details will be posted upon publication of this work.
MATIC
As a result of the increasing demand for deep neural network (DNN)-based services, efforts to develop dedicated hardware accelerators for DNNs are growing rapidly. However, while accelerators that have high performance and efficiency on convolutional deep neural networks (Conv-DNNs) have been developed, less progress has been made with regards to fully- connected DNNs (FC-DNNs), which are inherently memory-bound.
In this work, we propose MATIC (Memory Adaptive Training with In-situ Canaries), a methodology that enables aggressive voltage scaling of accelerator weight memories to improve the energy-efficiency of DNN accelerators. To enable accurate operation with voltage overscaling, MATIC combines the characteristics of destructive SRAM reads with the error resilience of neural networks in a memory-adaptive training process. Furthermore, PVT-related voltage margins are eliminated using bit-cells from synaptic weights as in-situ canaries to track runtime environmental variation. Demonstrated on a low-power DNN accelerator fabricated in 65 nm CMOS, MATIC enables up to 60-80 mV of voltage overscaling (3.3× total energy reduction versus the nominal voltage), or 18.6× application error reduction.
Further details about this work can be found in the Date paper and in the expanded TCAS-1 paper
High-Density Neural Signal Recording

Chronic brain computer interface (BCI) applications face several key engineering challenges. Future BCIs will require both high electrode density and large spatial coverage, resulting in thousands of electrodes. BCIs require closed-loop neuromodulation, which generates large stimulation artifacts that obfuscate important signals shortly after stimulation. Power density requirements due to tissue heating remain restrictive, particularly in monolithic solutions. Additionally, a single-chip solution with efficient operation for both electrocorticography (ECoG) (<500Hz signals) and single neuron recording (<10kHz signals) is highly desirable.
In this project, we demonstrate a channel, process and frequency scalable, recording system in standard TSMC 65nm CMOS. Key contributions of this architecture to the state-of-the-art are: 10x higher recording channel density by using highly multiplexed recording channels; robust operation that combines low-precision data conversion to achieve high-precision recording; realtime common-mode and differential-mode artifact suppression at the amplifier inputs. The system scales gracefully in frequency and channel-count without significantly affecting efficiency, making it useful for a variety of biopotential acquisition applications.
More details can be found in our VLSI Symposium Paper
Computationally locked PLLs

Multi-core server processors, heterogeneous mobile SoCs, and an increasing number of IoT applications can experience significant power and performance benefits from PLL lock-time reduction during wakeup (cold-start) and re-lock. Existing PLLs feature lock-times of approximately 100 REFCLK cycles: for relock. Fast lock-techniques have been proposed but they assume no temperature variation, require prior knowledge of PVT gain or incur significant steady-state performance degradation. In this project, we proposed, and successfully demonstrated a technique that performs runtime computation of accurate phase-frequency PLL equations to robustly achieve phase-lock 8x more rapidly than traditional all-Digital PLL (ADPLL) architectures. To further support Computational Lock, we also developed a novel wide dynamic-range, high resolution and fast resolving TDC architecture. Computational Lock does not impact steady-state PLL operation and can be applied to a broad range of ADPLLs. We demonstrate the proposed technique on a 1-2 GHz ADPLL intended for system clocking applications in 65nm CMOS.
More details can be found in our VLSI Symposium Paper.
Computationally Enabled Minimum Energy Point Tracking

Integrated circuits for ultra-low power applications strive to minimize total system energy while satisfying performance requirements. The supply voltage (Vdd) can be set to a Minimum Energy Point (MEP), where leakage and dynamic energy are suitably balanced. However, controlling operating frequency (fclk) while concurrently tracking a MEP sensitive to PVT and switching activity is not possible. Meanwhile, the traditional approach of locking to the minimum required frequency (ftarg), and adjusting Vdd to maintain timing slack precludes the possibility of minimum energy computing. Therefore, there exists a need for a minimum energy computing architecture that meets performance requirements.
Prior work has demonstrated MEP tracking across PVT and switching activity variation. The approach relies on large capacitors for sample-and-hold operation at subthreshold frequencies, and cannot account for the significant regulator losses (often amounting to 10%–50% of total energy) necessary for total system energy minimization. Furthermore, clock generation using a free-running oscillator is a requirement in prior work, precluding any regulation of fclk.
This effort explores a digital architecture for total system energy minimization subject to perfor-mance requirements (see Figure). The design supports two modes of operation — MEP-lock and perf-lock — and seamless, uninterrupted execution during transitions between them. In MEP-lock¬, the design tunes Vdd to first search for and then track the minimum total Energy Per Cycle (tEPC) point inclusive of regulator losses.
More details can be found in our upcoming ISSCC Paper.