IMSE Publications

Found results matching for:

Author: Ricardo Carmona Galán
Year: Since 2002

Journal Papers


An Efficient TDC using a Dual-Mode Resource-Saving Method Evaluated in a 28-nm FPGA
M. Parsakordasiabi, I. Vornicu, A. Rodriguez-Vazquez and R. Carmona-Galan
Journal Paper · IEEE Transactions on Instrumentation and Measurement, vol. 71, article 2000413, 2022
abstract      doi      

FPGA-based time-to-digital converters (TDCs) are required to be accurate, linear, and fast, while at the same time employing a reduced number of resources. Pushing these requirements to the limit is challenging, although it is constantly required by many applications. This article presents a dual-mode tapped-delay-line (TDL)-propagating 1's and 0's in alternating measurement cycles-architecture for a field-programmable gate array (FPGA)-based TDC that complies with the mentioned specifications. The dead-time of the proposed TDC is reduced to one system clock cycle by using a toggling input stage and a dual-mode counter-based encoder. To improve the TDC linearity, the TDL sampling sequence is tuned separately for each operating mode. The presented architecture employs a low-resources dual-mode combinatory encoder of one- and zero-counters to remove the bubbles and cover both operating modes. A dual-mode bin-width calibration has been carried out to improve the TDC performance in each mode. The proposed architecture has been implemented on a Xilinx Artix-7 FPGA. Experimental results have shown a differential nonlinearity (DNL) within [-0.71 1.05] least significant bit (LSB) and an integral nonlinearity (INL) within [-0.85 0.86] LSB for the propagation of 1's. DNL and INL are within [-0.73 1.06] LSB and [-1.17 0.04] LSB, respectively, for the propagation of 0's. The LSB size is 22.1 ps and the TDC precision is 22.35 ps. A comparison with recently published state-of-the-art FPGA-based TDCs is provided at the end of the article.

Architecture-Level Optimization on Digital Silicon Photomultipliers for Medical Imaging
F. Bandi, V. Ilisie, I. Vornicu, R. Carmona-Galan, J.M. Benlloch and A. Rodriguez-Vazquez
Journal Paper · Sensors, vol. 22, no. 1, article 122, 2022
abstract      doi      

Silicon photomultipliers (SiPMs) are arrays of single-photon avalanche diodes (SPADs) connected in parallel. Analog silicon photomultipliers are built in custom technologies optimized for detection efficiency. Digital silicon photomultipliers are built in CMOS technology. Although CMOS SPADs are less sensitive, they can incorporate additional functionality at the sensor plane, which is required in some applications for an accurate detection in terms of energy, timestamp, and spatial location. This additional circuitry comprises active quenching and recharge circuits, pulse combining and counting logic, and a time-to-digital converter. This, together with the disconnection of defective SPADs, results in a reduction of the light-sensitive area. In addition, the pile-up of pulses, in space and in time, translates into additional efficiency losses that are inherent to digital SiPMs. The design of digital SiPMs must include some sort of optimization of the pixel architecture in order to maximize sensitivity. In this paper, we identify the most relevant variables that determine the influence of SPAD yield, fill factor loss, and spatial and temporal pile-up in the photon detection efficiency. An optimum of 8% is found for different pixel sizes. The potential benefits of molecular imaging of these optimized and small-sized pixels with independent timestamping capabilities are also analyzed.

Special Issue on Embedded Vision Architectures for Machine Learning
F. Berry, L. Maggiani and R. Carmona-Galan
Journal Paper · Journal of Signal Processing Systems for Signal Image and Video Technology, 2022
abstract      doi      

This special issue presents different architectural compromises involved in the execution of heavy computational loads related to machine learning on CPUs, GPUs, FPGAs and ASICs, and even their impact on memory hierarchies and storage. This subject has always been a central topic at the Workshop on Architecture of Smart Cameras (WASC), a workshop especially dedicated to bring together researchers and engineers covering the all aspects of the implementation of smart cameras. It has been held in different cities, from Clermont-Ferrand in 2012 to Ghent in 2020. Smart cameras are embedded vision systems that are required to produce a semantic understanding of the scene and to generate a response. The incorporation of deep neural networks represents a significant leap in performance, but an efficient implementation is needed not to compromise the scarce resources found in embedded platforms. This special issue focuses on addressing this issue.

Design of High-Efficiency SPADs for LiDAR Applications in 110nm CIS Technology
I. Vornicu, J.M. López-Martínez, F.N. Bandi, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · IEEE Sensors Journal, vol. 21, no. 4, pp 4776-4785, 2021
abstract      doi      

Single photon avalanche diodes (SPADs) featuring a high detection rate of near-IR photons are much desired for outdoor LiDAR based on direct time-of-flight (ToF). This article presents the complete design flow of a SPAD detector for LiDAR. First, the selection of the emitter wavelength is discussed, considering the maximum allowed power underlying eye safety regulations, solar irradiance, and reflected signal power. Then, the choice of the SPAD structure is discussed based on the TCAD simulation of quantum efficiency and crosstalk. Next, the proposed P-well/Deep N-well SPAD is explained. The electro-optical characterization of the detectors is presented as well. The performance of the time-of-flight image sensors is determined by the characteristics of the individual SPADs. To fully characterize this technology, devices with various sizes, shapes, and guard ring widths have been fabricated and tested. The measured mean breakdown voltage is 18 V. The proposed structure has a 0.4 Hz/µ m2 dark count rate and 0.5% afterpulsing. The FWHM (total) jitter and photon detection probability at 850nm wavelength are of 92 ps and 10%. All figures have been measured at 3 V excess voltage. Finally, the performance of the SPAD detector is analyzed by evaluating the signal-to-noise ratio at different acquisition times. Distance ranging measurements have been performed, achieving a depth resolution of 1 cm up to 6.3 m range.

A Low-Resources TDC for Multi-Channel Direct ToF Readout based on a 28-nm FPGA
M. Parsakordasiabi, I. Vornicu, A. Rodríguez-Vázquez and R. Carmona-Galán
Journal Paper · Sensors, vol. 21, no. 1, article 308, 2021
abstract      doi      pdf

In this paper, we present a proposed field programmable gate array (FPGA)-based time-to-digital converter (TDC) architecture to achieve high performance with low usage of resources. This TDC can be employed for multi-channel direct Time-of-Flight (ToF) applications. The proposed architecture consists of a synchronizing input stage, a tuned tapped delay line (TDL), a combinatory encoder of ones and zeros counters, and an online calibration stage. The experimental results of the TDC in an Artix-7 FPGA show a differential non-linearity (DNL) in the range of [-0.953, 1.185] LSB, and an integral non-linearity (INL) within [-2.750, 1.238] LSB. The measured LSB size and precision are 22.2 ps and 26.04 ps, respectively. Moreover, the proposed architecture requires low FPGA resources.

Compact Macro-Cell with OR Pulse Combining for Low Power Digital-SiPM
I. Vornicu, F.N. Bandi, R. Carmona-Galan and A. Rodriguez-Vazquez
Journal Paper · IEEE Sensors Journal, vol. 20, no. 21, pp 12817 - 12826, 2020
abstract      doi      

High-density digital-Silicon Photomultipliers call for high-performance Single Photon Avalanche Diodes (SPAD) front-ends. Power consumption and fill factor are significant concerns in this kind of sensors. This paper presents a compact and power-efficient macro-cell where several SPADs share the active recharge circuitry for increased fill factor. Integrated with 110nm technology for image sensors, the array of macro-cells has 30% fill factor. Also, following the first firing of any SPAD during the same macro-pixel dead-time, the other SPADs are disabled for power saving. Of course, subsequent triggers are lost. However, they would have been masked by the OR pulse combining scheme. Besides this event-driven disabling feature, the macro-cell includes circuitry to disable noisy devices - similar to other SiPM cells. Also, the macro-cell features control of the dead-time. This paper describes the macro-cell concept, its associated analysis and design equations. Key parameters of the design are discussed to optimize power consumption. Design scalability is contemplated as well. Experimental results proved that the power efficiency of the proposed scheme depends on the illumination power. Also, power efficiency is linked to the pulse overlapping probability. For example, power saving up to 30% is obtained with 4 sub-cells per macro-cell, when pulse overlapping is about 11% for correlated light or the pulse rate per sub-cell is about 100kHz for uncorrelated light.

Fixed Pattern Noise Analysis for Feature Descriptors in CMOS APS Images
J. Zapata-Pérez, G. Domenech-Asensi, R. Ruiz-Merino, J.J. Martínez-Álvarez, J. Fernández-Berni and R. Carmona-Galán
Journal Paper · Sensing and Imaging, vol. 21, article 14, 2020
abstract      doi      pdf

This paper provides a comparative performance evaluation of local features for images from CMOS APS sensors affected by fixed pattern noise for different combinations of common detectors and descriptors. Although numerous studies report comparisons of local features designed for ordinary visual images, their performance on images with fixed pattern noise is far less assessed. The goal of this work is to develop a tool that allows to evaluate the performance of computer vision algorithms and their implementations subject to deviations of the physical parameters of the CMOS sensor. This tool will facilitate the quantification of the high-level effects produced by circuit random noise, enabling the optimization of the sensor during the design flow with specifications much closer to the application scope. Likewise, this tool will provide the electronic designer with a relationship between high-level algorithm accuracy and maximum fixed pattern noise. Thus the contribution is double: (1) to evaluate the performance of both local float type and more recent binary type detectors and descriptors when combined under a variety of image transformations, and (2) to extract relevant information from circuit-level simulation and to develop a basic noise model to be employed in the design of the feature descriptor evaluation. The utility of this approach is illustrated by the evaluation of the effect of column-wise and pixel-wise fixed pattern noise at the sensor on the performance of different local feature descriptors.

Comparison between Digital Tone-Mapping Operators and a Focal-Plane Pixel-Parallel Circuit
G.M.S. Nunes, F.D.V.R. Oliveira, M.C.Q. Farias, J.G. Gomes, A. Petraglia, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · Signal Processing: Image Communication, vol. 88, article 115937, 2020
abstract      doi      pdf

In a previous work, we proposed a color extension of an existing focal-plane tone-mapping operator (FPTMO) and introduced circuit modifications that led to smaller sensing array area and simpler off-chip white-balance operations. The proposed FPTMO chip was fabricated and its experimental test setup is under development. In the present work, we systematically compare the previously proposed FPTMOs with conventional digital tone-mapping operators (DTMOs), both in terms of overall image quality and execution time. To assess the image quality, we use the Tone-Mapped image Quality Index (TMQI) metric, the Blind Tone-Mapped Quality Index (BTMQI) metric, and an image colorfulness metric. By statistically comparing the results given by both metrics, we verify that the proposed FPTMOs achieve results similar to those obtained with DTMOs, occasionally outperforming them. To compare the tone-mapping operators execution times, we perform a detailed complexity analysis. Results show that system-level software FPTMO descriptions (FPTMO functionality described in software) rank among the fastest DTMOs and that the hardware FPTMO (described in software) speed-up depends on the target frame rate. For 30-frames-per-second color image generation, the fastest FPTMO is 15 times faster than the DTMO with highest TMQI and BTMQI metric scores, and 170 times faster than the DTMO with best colorfulness metric score.

PreVIous: A Methodology for Prediction of Visual Inference Performance on IoT Devices
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galan and A. Rodríguez-Vázquez
Journal Paper · IEEE Internet of Things Journal, vol. 7, no. 10, pp 9227-9240, 2020
abstract      doi      pdf

This paper presents PreVIous, a methodology to predict the performance of convolutional neural networks (CNNs) in terms of throughput and energy consumption on vision-enabled devices for the Internet of Things. CNNs typically constitute a massive computational load for such devices, which are characterized by scarce hardware resources to be shared among multiple concurrent tasks. Therefore, it is critical to select the optimal CNN architecture for a particular hardware platform according to prescribed application requirements. However, the zoo of CNN models is already vast and rapidly growing. To facilitate a suitable selection, we introduce a prediction framework that allows to evaluate the performance of CNNs prior to their actual implementation. The proposed methodology is based on PreVIousNet, a neural network specifically designed to build accurate per-layer performance predictive models. PreVIousNet incorporates the most usual parameters found in state-of-the-art network architectures. The resulting predictive models for inference time and energy have been tested against comprehensive characterizations of seven well-known CNN models running on two different software frameworks and two different embedded platforms. To the best of our knowledge, this is the most extensive study in the literature concerning CNN performance prediction on low-power low-cost devices. The average deviation between predictions and real measurements is remarkably low, ranging from 3% to 10%. This means state-of-the-art modeling accuracy. As an additional asset, the fine-grained a priori analysis provided by PreVIous could also be exploited by neural architecture search engines.

Performance Assessment of Deep Learning Frameworks through Metrics of CPU Hardware Exploitation on an Embedded Platform
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · International Journal of Electrical and Computer Engineering Systems, vol. 11, no. 1, pp 1-11, 2020
abstract      doi      pdf

In this paper, we analyze heterogeneous performance exhibited by some popular deep learning software frameworks for visual inference on a resource-constrained hardware platform. Benchmarking of Caffe, OpenCV, TensorFlow, and Caffe2 is performed on the same set of convolutional neural networks in terms of instantaneous throughput, power consumption, memory footprint, and CPU utilization. To understand the resulting dissimilar behavior, we thoroughly examine how the resources in the processor are differently exploited by these frameworks. We demonstrate that a strong correlation exists between hardware events occurring in the processor and inference performance. The proposed hardware-aware analysis aims to find limitations and bottlenecks emerging from the joint interaction of frameworks and networks on a particular CPU-based platform. This provides insight into introducing suitable modifications in both types of components to enhance their global performance. It also facilitates the selection of frameworks and networks among a large diversity of these components available these days for visual understanding.

Compressive Imaging using RIP-compliant CMOS Imager Architecture and Landweber Reconstruction
M. Trevisi, A. Akbari, M. Trocan, Á. Rodríguez-Vázquez and R. Carmona-Galán
Journal Paper · IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 2, pp 387-399, 2020
abstract      doi      pdf

In this paper we present a new image sensor architecture for fast and accurate compressive sensing (CS) of natural images. Measurement matrices usually employed in compressive sensing CMOS image sensors (CS-CIS) are recursive pseudo-random binary matrices. We have proved that the restricted isometry property (RIP) of these matrices is limited by a low sparsity constant. The quality of these matrices is also affected by the non-idealities of pseudo-random numbers generators (PRNG). To overcome these limitations, we propose a hardware-friendly pseudo-random ternary measurement matrix generated on-chip by means of class III elementary cellular automata (ECA). These ECA present a chaotic behaviour that emulates random CS measurement matrices better than other PRNG. We have combined this new architecture with a block-based CS smoothed-projected Landweber (BCS-SPL) reconstruction algorithm. By means of single value decomposition (SVD) we have adapted this algorithm to perform fast and precise reconstruction while operating with binary and ternary matrices. Simulations are provided to qualify the approach.

Compact Real-Time Inter-Frame Histogram Builder for 15-bits High-Speed ToF-Imagers based on Single-Photon Detection
I. Vornicu, A. Darie, R. Carmona-Galan and A. Rodriguez-Vazquez
Journal Paper · IEEE Sensors Journal, vol. 19, no. 6, pp 2181-2190, 2019
abstract      doi      pdf

Time-of-flight image sensors based on single-photon detection, i.e. SPADs, require some filtering of pixel readings. Accurate depth measurements are only possible if the jitter of the detector is mitigated. Moreover, the time stamp needs to be effectively separated from uncorrelated noise such as dark counts and background illumination. A powerful tool for this is building a histogram of a number of pixel readings. Future generation of ToF imagers are seeking to increase spatial and temporal resolution along with the dynamic range and frame rate. Under these circumstances, storing the complete histogram for every pixel becomes practically impossible. Considering that most of the information contained by the histogram represents noise, we propose a highly efficient method to store just the relevant data required for ToF computation. This method makes use of the shifted inter-frame histogram (SifH). It requires a memory as low as 128 times smaller than storing the complete histogram if the pixel values are coded on up to 15 bits. Moreover, a fixed 28 words memory is enough to process histograms containing up to 215 bins. In exchange, the overall frame rate only decreases to one half. The hardware implementation of this algorithm is presented. Its remarkable robustness for a low SNR of the ToF estimation is demonstrated by Matlab simulations and FPGA implementation using input data from a SPAD camera prototype.

Optimum Selection of DNN Model and Framework for Edge Inference
D.Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and Á. Rodríguez-Vázquez
Journal Paper · IEEE Access, vol. 6, pp 51680-51692, 2018
abstract      doi      pdf

This paper describes a methodology to select the optimum combination of deep neural network and software framework for visual inference on embedded systems. As a first step, benchmarking is required. In particular, we have benchmarked six popular network models running on four deep learning frameworks implemented on a low-cost embedded platform. Three key performance metrics have been measured and compared with the resulting 24 combinations: accuracy, throughput, and power consumption. Then, application-level specifications come into play. We propose a figure of merit enabling the evaluation of each network/framework pair in terms of relative importance of the aforementioned metrics for a targeted application. We prove through numerical analysis and meaningful graphical representations that only a reduced subset of the combinations must actually be considered for real deployment. Our approach can be extended to other networks, frameworks, and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of embedded deep learning technology.

Optimum Network/Framework Selection from High-Level Specifications in Embedded Deep Learning Vision Applications
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · Lecture Notes in Computer Science LNCS, vol. 11182, pp 369-379, 2018
abstract      doi      pdf

This paper benchmarks 16 combinations of popular Deep Neural Networks and Deep Learning frameworks on an embedded platform. A Figure of Merit based on high-level specifications is introduced. By sweeping the relative weight of accuracy, throughput and power consumption on global performance, we demonstrate that only a reduced set of the analyzed combinations must actually be considered for real deployment. We also report the optimum network/framework selection for all possible application scenarios defined in those terms, i.e. weighted balance of the aforementioned parameters. Our approach can be extended to other networks, frameworks and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of Deep Learning technology.

Guest editorial special issue on computational image sensors and smart camera hardware
J. Fernández-Berni, R. Carmona-Galán, G. Sicard and A. Dupret
Journal Paper · International Journal of Circuit Theory and Applications, Computational Image Sensors and Smart Camera Hardware, vol. 46, no. 9, pp 1577-1579, 2018
abstract      doi      

Recent advances in both software and hardware technologies are enabling the emergence of vision as a key sensorial modality in various application scenarios. Concerning hardware, all of the components along the signal chain play a significant role when it comes to implementing smart vision-enabled systems. At the front end, new circuit structures for sensing, processing, and signal conditioning are adding functionalities in CMOS imagers beyond the mere generation of 2-D intensity maps. Moreover, the development of vertical integration technologies is facilitating monolithic realizations of visual sensors where the incorporation of computational capabilities has no impact at all on image quality. Typically, the outcome of the front-end device in a smart camera will be a preprocessed flow of information ready for further efficient analysis. At this point, specific ICs known as vision processing units can be inserted to accelerate the processing flow according to the targeted application. On the other hand, reconfigurability is a valuable asset in the ever-changing field of vision. FPGAs leverage cutting-edge digital technologies to offer flexible hardware for exploration of different memory arrangements, data flows, and processing parallelization. It is precisely parallelization for which GPUs constitute an interesting alternative in smart cameras when massive pixel-level operation is required. This is the case of state-of-the-art vision algorithms based on convolutional neural networks. At higher level, DSPs and multicore CPUs make software development notably easier at the cost of losing hardware specificity. Overall, this special issue aims at covering some of the latest research works in the vast ecosystem of hardware for artificial vision.

Applications of event-based image sensors - Review and analysis
J.A. Leñero-Bardallo, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · International Journal of Circuit Theory and Applications, vol. 46, no. 9, pp 1620-1630, 2018
abstract      doi      pdf

The spread of event-driven asynchronous vision sensors during the last years has increased significantly the industrial interest and the application scenarios for them. This article reviews the main fields of application that event-based image sensors have found during the last 20 years. We focus in the description of applications where such devices can outperform conventional frame-based sensors. The practical functions of the three main families of asynchronous event-based sensors are analyzed. The article also studies what are the factors that increase nowadays the demand of sensors that minimize the power and bandwidth consumption. Moreover, the technological factors that have facilitated the development of asynchronous sensors are discussed.

Asynchronous spiking pixel with programmable sensitivity to illumination
J.A. Leñero-Bardallo, M. Delgado-Restituto, R. Carmona-Galan and A. Rodriguez-Vazquez
Journal Paper · IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 11, pp 3854-3863, 2018
abstract      doi      pdf

A spiking pixel to be used in image sensor arrays for asynchronous frame-based operation is presented. The pixel features both local and global adaptive sensitivity to the illumination level. Local adaptation is performed by adjusting the voltage stored in an embedded analog memory according to the average illumination within a neighborhood. Global adaptation to the overall illumination of the array is implemented by adjusting a voltage value common to all the pixels. These programming capabilities allow full control on the sensor sensitivity, pixel output data flow, and energy consumption, thus, overcoming the limitations observed in current image sensors based on spiking pixels. Experimental results validate the functionality of the proposal.

On the analysis and detection of flames with an asynchronous spiking image sensor
J.A. Leñero-Bardallo, J.M. Guerrero-Rodríguez, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · IEEE Sensors Journal, vol. 18, no. 16, pp 6588-6595, 2018
abstract      doi      pdf

We have investigated the capabilities of a custom asynchronous spiking image sensor operating in the Near Infrared (NIR) band to study flame radiation emissions, monitor their transient activity, and detect their presence. Asynchronous sensors have inherent capabilities, i.e. good temporal resolution, high dynamic range, and low data redundancy. This makes them competitive against Infrared (IR) cameras and CMOS frame-based NIR imagers. In the article, we analyze, discuss and compare the experimental data measured with our sensor against results obtained with conventional devices. A set of measurements have been taken to study the flame emission levels and their transient variations. Moreover, a flame detection algorithm, adapted to our sensor asynchronous outputs, has been developed. Results show that asynchronous spiking sensors have an excellent potential for flame analysis and monitoring.

CMOS Vision Sensors: Embedding Computer Vision at Imaging Front-Ends
A. Rodríguez-Vázquez, J. Fernández-Berni, J.A. Leñero-Bardallo, I. Vornicu and R. Carmona-Galán
Journal Paper · IEEE Circuits and Systems Magazine, vol. 18, no. 2, pp 90-107, 2018
abstract      doi      pdf

CMOS Image Sensors (CIS) are key for imaging technologies. These chips are conceived for capturing optical scenes focused on their surface, and for delivering electrical images, commonly in digital format. CISs may incorporate intelligence; however, their smartness basically concerns calibration, error correction and other similar tasks. The term CVISs (CMOS VIsion Sensors) defines other class of sensor front-ends which are aimed at performing vision tasks right at the focal plane. They have been running under names such as computational image sensors, vision sensors and silicon retinas, among others. CVIS and CISs are similar regarding physical implementation. However, while inputs of both CIS and CVIS are images captured by photo-sensors placed at the focal-plane, CVISs primary outputs may not be images but either image features or even decisions based on the spatial-temporal analysis of the scenes. We may hence state that CVISs are more ‘intelligent’ than CISs as they focus on information instead of on raw data. Actually, CVIS architectures capable of extracting and interpreting the information contained in images, and prompting reaction commands thereof, have been explored for years in academia, and industrial applications are recently ramping up. One of the challenges of CVISs architects is incorporating computer vision concepts into the design flow. The endeavor is ambitious because imaging and computer vision communities are rather disjoint groups talking different languages. The Cellular Nonlinear Network Universal Machine (CNNUM) paradigm, proposed by Profs. Chua and Roska, defined an adequate framework for such conciliation as it is particularly well suited for hardware-software co-design. This paper overviews CVISs chips that were conceived and prototyped at IMS E Vision Lab over the past twenty years. Some of them fit the CNNUM paradigm while others are tangential to it. All of them employ per-pixel mixed-signal processing circuitry to achieve sensor-processing concurrency in the quest of fast operation with reduced energy budget.

Real-Time Inter-Frame Histogram Builder for SPAD Image Sensors
I. Vornicu, R. Carmona-Galan and A. Rodriguez-Vazquez
Journal Paper · IEEE Sensors Journal, vol. 18, no. 4, pp 1576-1584, 2018
abstract      doi      pdf

CMOS image sensors based on single-photon avalanche-diodes (SPAD) are suitable for 2D and 3D vision. Limited by uncorrelated noise and/or low illumination conditions, image capturing becomes nearly impossible in a single-shot exposure time. Moreover the depth accuracy is affected by jitter. Therefore, many frames need to be taken to reconstruct the final accurate image. The proposed reconstruction algorithm is based on pixel-wise histogram building. Specifically, a histogram is built on the fly for each pixel of the array from the ongoing acquired frames. This paper presents the design and implementation on FPGA of a real-time pixel-wise inter-frame histogram builder at 1kfps. The design has been proven with a 64×64-pixels SPAD camera. Its remarkable robustness has been demonstrated in harsh conditions such as 42 kHz of dark count rate (DCR) and high background illumination up to 20 times larger than the DCR. The system has a graphic user interface for 2D/3D imager configuration, image streaming and pixel-wise histogram streaming.

Special issue on computational image sensors and smart camera hardware
J. Fernández-Berni, R. Carmona-Galán, G. Sicard A. Dupret
Journal Paper · International Journal of Circuit Theory and Applications, vol. 45, no. 6, pp 729-730, 2017
abstract      doi      pdf

Embedded computer vision is becoming a disruptive technological component for key market drivers of the semiconductor industry like smartphones, the Internet of Things or automotive. The recent incorporation of advanced artificial intelligence techniques into robust and precise inference schemes is underpinning this disruptiveness, along with the increase of on-chip computational power and the development of tools for rapid prototyping and experimentation. As a result, visual sensing is being embedded in all kinds of products and services, with different degrees of scene understanding. These products either did not exist before or are ousting existing ones. In this scenario, vision-enabled systems are expected to have ever-growing need for lower power consumption, lower cost, smaller form factor, higher image resolution and higher throughput. These burdensome requirements demand new approaches when it comes to dealing with the massive flow of information associated with the visual stimulus. In particular, early vision stands out as the critical stage where raw pixels are transformed into useful features for the targeted task. In order to effectively cope with this stage, front-end hardware resources play a major role. The incorporation of advanced sensing and computational capabilities in image sensors allows exploiting parallelism and distributed memory from the very beginning of the signal processing chain. Sensing structures can be designed to produce multiple sensorial modalities, e.g. 2D/depth. Circuit blocks can be tuned to adapt the response of the sensor to distinct specifications of the subsequent processing stage, where parallelization and memory management will be also critical. An adequate dataflow organization in FPGAs, GPUs, DSPs, etc. is of utmost importance in order to preserve the goodness of previous performance boosters and further increase the throughput. All in all, this special issue focuses on those aspects of embedded vision systems having an impact on their degree of integrated intelligence.

Arrayable Voltage-Controlled Ring-Oscillator for Direct Time-of-Flight Image Sensors
I. Vornicu, R. Carmona-Galan and Rodriguez-Vazquez
Journal Paper · IEEE Transactions on Circuits and Systems I-Regular Papers, vol. 64, no. 11, pp 2821-2834, 2017
abstract      doi      pdf

Direct time-of-flight (d-ToF) estimation with high frame rate requires the incorporation of a time-to-digital converter (TDC) at pixel level. A feasible approach to a compact implementation of the TDC is to use the multiple phases of a voltage-controlled ring-oscillator (VCRO) for the finest bits. The VCRO becomes central in determining the performance parameters of a d-ToF image sensor. In this paper, we are covering the modeling, design, and measurement of a CMOS pseudo-differential VCRO. The oscillation frequency, the jitter due to mismatches and noise and the power consumption are analytically evaluated. This design has been incorporated into a 64x64-pixel array. It has been fabricated in a 0.18 mu m standard CMOS technology. Occupation area is 28x29 mu m(2) and power consumption is 1.17 mW at 850 MHz. The measured gain of the VCRO is of 477 MHz/V with a frequency tuning range of 53%. Moreover, it features a linearity of 99.4% over a wide range of control frequencies, namely, from 400 to 850 MHz. The phase noise is of -102 dBc/Hz at 2 MHz offset frequency from 850 MHz. The influence of these parameters in the performance of the TDC has been measured. The minimum time bin of the TDC is 147 ps with a rms DNL/INL of 0.13/1.7LSB.

Sun sensor based on a luminance spiking pixel array
J.A. Lenero-Bardallo, L. Farian, J.M. Guerrero-Rodriguez, R. Carmona-Galan and A. Rodriguez-Vazquez
Journal Paper · IEEE Sensors Journal, vol. 17, no. 20, pp 6578-6588, 2017
abstract      doi      pdf

We present a novel sun sensor concept. It is the very first sun sensor built with an Address Event Representation (AER) spiking pixel matrix. Its pixels spike with a frequency proportional to illumination. It offers remarkable advantages over conventional digital sun sensors based on Active Pixel Sensor (APS) pixels. Its output data flow is quite reduced. It is possible to resolve the sun position just receiving one single event operating in Time-to-First-Spike (TFS) mode. It operates with a latency in the order of milliseconds. It has higher dynamic range than APS image sensors (higher than 100dB). A custom algorithm to compute the centroid of the illuminated pixels is presented. Experimental results are provided.

Gaussian Pyramid: Comparative Analysis of Hardware Architectures
F.D.V.R. Oliveira, J.G.R.C. Gomes, J. Fernandez-Berni, R. Carmona-Galan, R. del Rio and A. Rodriguez-Vazquez
Journal Paper · IEEE Transactions on Circuits and Systems I-Regular Papers, vol. 64, no. 9, pp 2308-2321, 2017
abstract      doi      

This paper addresses a comparison of architectures for the hardware implementation of Gaussian image pyramids. Main differences between architectural choices are in the sensor front-end. One side is for architectures consisting of a conventional sensor that delivers digital images and which is followed by digital processors. The other side is for architectures employing a non-conventional sensor with per-pixel embedded preprocessing structures for Gaussian spatial filtering. This later choice belongs to the general category of " artificial retina" sensors which have been for long claimed as potentially advantageous for enhancing throughput and reducing energy consumption of vision systems. These advantages are very important in the internet of things context, where imaging systems are constantly exchanging information. This paper attempts to quantify these potential advantages within a design space in which the degrees of freedom are the number and type of ADCs, single-slope, SAR, cyclic, Sigma Delta, and pipeline, and the number of digital processors. Results show that speed and energy advantages of preprocessing sensors are not granted by default and are only realized through proper architectural design. The methodology presented for the comparison between focal-plane and digital approaches is a useful tool for imager design, allowing for the assessment of focal-plane processing advantages.

Compensation of PVT Variations in ToF Imagers with In-Pixel TDC
I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · Sensors, vol. 17, no. 5, article 1072, 2017
abstract      doi      pdf

The design of a direct time-of-flight complementary metal-oxide-semiconductor (CMOS) image sensor (dToF-CIS) based on a single-photon avalanche-diode (SPAD) array with an in-pixel time-to-digital converter (TDC) must contemplate system-level aspects that affect its overall performance. This paper provides a detailed analysis of the impact of process parameters, voltage supply, and temperature (PVT) variations on the time bin of the TDC array. Moreover, the design and characterization of a global compensation loop is presented. It is based on a phase locked loop (PLL) that is integrated on-chip. The main building block of the PLL is a voltage-controlled ring-oscillator (VCRO) that is identical to the ones employed for the in-pixel TDCs. The reference voltage that drives the master VCRO is distributed to the voltage control inputs of the slave VCROs such that their multiphase outputs become invariant to PVT changes. These outputs act as time interpolators for the TDCs. Therefore the compensation scheme prevents the time bin of the TDCs from drifting over time due to the aforementioned factors. Moreover, the same scheme is used to program different time resolutions of the direct time-of-flight (ToF) imager aimed at 3D ranging or depth map imaging. Experimental results that validate the analysis are provided as well. The compensation loop proves to be remarkably effective. The spreading of the TDCs time bin is lowered from: (i) 20% down to 2.4% while the temperature ranges from 0 °C to 100 °C; (ii) 27% down to 0.27%, when the voltage supply changes within ±10% of the nominal value; (iii) 5.2 ps to 2 ps standard deviation over 30 sample chips, due to process parameters´ variation.

A Wide Linear Dynamic Range Image Sensor Based on Asynchronous Self-Reset and Tagging of Saturation Events
J.A. Leñero-Bardallo, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · IEEE Journal of Solid-State Circuits, vol. 52, no. 6, pp 1605-1617, 2017
abstract      doi      pdf

We report a high dynamic range (HDR) image sensor with a linear response that overcomes some of the limitations of sensors with pixels with self-reset operation. It operates similar to an active pixel sensor, but its pixels have a novel asynchronous event-based overflow detection mechanism. Whenever the pixel voltages at the integration capacitance reach a programmable threshold, the pixels self-reset and send out asynchronously an event indicating this. At the end of the integration period, the voltage at the integration capacitance is digitized and readout. Combining this information with the number of events fired by each pixel, it is possible to render linear HDR images. Event operation is transparent to the final user. There is no limitation for the number of self-resets of each pixel. The output data format is compatible with frame-based devices. The sensor was fabricated in the AMS 0.18-μm HV technology. A detailed system description and experimental results are provided in this paper. The sensor can render images with an intra-scene dynamic range of up to 130 dB with linear outputs. The pixels' pitch is 25 μm and the sensor power consumption is 58.6 mW.

TFET-based Well Capacity Adjustment in Active Pixel Sensor for Enhanced High Dynamic Range
J. Fernández-Berni, M. Niemier, X.S. Hu, H. Lu, W. Li, P. Fay, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · Electronics Letters, vol.53, no. 9, pp 622-624, 2017
abstract      doi      pdf

A tunnel field-effect transistor (TFET)-based pixel circuit for well capacity adjustment that does not require subthreshold operation on the part of the reset transistor is presented. In CMOS, this subthreshold operation leads to temporal noise, distortion and fixed pattern noise, becoming a primary limiting performance factor. In the proposed circuit, the asymmetric conduction associated with TFETs is exploited. This property, arising from the inherent physical structure of the device, provides the selective well adjustments during photo-integration which are demanded for achieving high dynamic range. A GaN-based heterojunction TFET has been designed according to the specific requirements for this application.

A CMOS Digital SiPM with Focal-Plane Light-Spot Statistics for DOI Computation
I. Vornicu, F.N. Bandi, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · IEEE Sensors Journal, vol. 17, no. 3, pp 632-643, 2017
abstract      doi      pdf

Silicon photomultipliers can be used to infer the depth-of-interaction (DOI) in scintillator crystals. DOI can help to improve the quality of the positron emission tomography images affected by the parallax error. This paper contemplates the computation of DOI based on the standard deviation of the light distribution. The simulations have been carried out by GAMOS. The design of the proposed digital silicon photomultiplier (d-SiPM) with focal plane detection of the center of mass position and dispersion of the scintillation light is presented. The d-SiPM shares the same off-chip time-to-digital converter such that each pixel can be individually connected to it. A miniature d-SiPM 8×8 single-photon avalanche-diode (SPAD) array has been fabricated as a proof of concept. The SPADs along each row and column are connected through an OR combination technique. It has 256×256μm2 without peripherals circuits and pads. The fill factor is about 11%. The average dark count rate of the mini d-SiPM is of 240 kHz. The average photon detection efficiency is 5% at 480 nm wavelength, room temperature, and 0.9 V excess voltage. The dynamic range is of 96 dB. The sensor array features a time resolution of 212 ps. The photon-timing SNR is 81 dB. The focal plane statistics of the light-spot has been proved as well by measurements.

Low-Power CMOS Vision Sensor for Gaussian Pyramid Extraction
M. Suárez, V.M. Brea, J. Fernández-Berni, R. Carmona-Galán, D. Cabello and A. Rodríguez-Vázquez
Journal Paper · IEEE Journal of Solid-State Circuits, vol. 52, no. 2, pp 483-495, 2017
abstract      doi      pdf

This paper introduces a CMOS vision sensor chip in a standard 0.18 μm CMOS technology for Gaussian pyramid extraction. The Gaussian pyramid provides computer vision algorithms with scale invariance, which permits having the same response regardless of the distance of the scene to the camera. The chip comprises 176 x 120 photosensors arranged into 88 x 60 processing elements (PEs). The Gaussian pyramid is generated with a double-Euler switched capacitor (SC) network. Every PE comprises four photodiodes, one 8 b single-slope analog-to-digital converter, one correlated double sampling circuit, and four state capacitors with their corresponding switches to implement the double-Euler SC network. Every PE occupies 44 x 44 μm^2. Measurements from the chip are presented to assess the accuracy of the generated Gaussian pyramid for visual tracking applications. Error levels are below 2% full-scale output, thus making the chip feasible for these applications. Also, energy cost is 26.5 nJ/px at 2.64 Mpx/s, thus outperforming conventional solutions of imager plus microprocessor unit.

Enhanced Sensitivity of CMOS Image Sensors by Stacked Diodes
J.A. Lenero-Bardallo, M. Delgado-Restituto, R. Carmona-Galan and A. Rodriguez-Vazquez
Journal Paper · IEEE Sensors Journal, vol. 16, no. 23, pp 8448-8455, 2016
abstract      doi      

We have investigated and compared the performance of photodiodes built with stacked p/n junctions operating in parallel versus conventional ones made with single p/n junctions. We propose a method to characterize and compare photodiodes sensitivity. For this purpose, a dedicated chip in the standard AMS 180-nm HV technology has been fabricated. Four different sensor structures were implemented and compared. Experimental results are provided. Measurements show sensitivity enhancement ranging from 55% to 70% within the 500-1100 nm spectral region. The larger increment is happening in the near infrared band (up to 62%). Such results make stacked photodiodes suitable candidates for the implementation of photosensors in vision chips designed for standard CMOS technologies.

Special issue on architectures of smart cameras for real-time applications
R. Carmona-Galán, F. Berry, R. Kleihorst and D. Ginhac
Journal Paper · Journal of Real-Time Image Processing, vol. 12, no. 4, pp 633-634, 2016
abstract      doi      

This special issue focuses on those topics having an incidence on the smart camera architecture, either emerging from the design of smart sensing and processing devices or imposed by the specifications for a particular distributed camera network and its applications. The collection of papers presented here starts by emphasizing the influence of the architecture in the performance of smart camera systems.

Image Sensing Scheme Enabling Fully-Programmable Light Adaptation and Tone Mapping with a Single Exposure
J. Fernández-Berni, F.D.V.R. Oliveira, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · IEEE Sensors Journal, vol. 16, no. 13, pp. 5121-5122, 2016
abstract      doi      pdf

This letter presents new insights into a high dynamic range (HDR) technique recently reported. We demonstrate that two intertwined photodiodes per pixel can perform tone mapping under unconstrained illumination conditions with a single exposure. Experimental results attained from a prototype chip confirm the proposed theoretical framework. It opens the door to the realization of imagers providing HDR images free of artifacts without requiring any digital post-processing at all.

Single-Exposure HDR Technique Based on Tunable Balance Between Local and Global Adaptation
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 63, no. 5, pp. 488-492, 2016
abstract      doi      pdf

This brief describes a high-dynamic-range technique that compresses wide ranges of illuminations into the available signal range with a single exposure. An online analysis of the image histogram provides the sensor with the necessary feedback to dynamically accommodate changing illumination conditions. This adaptation is accomplished by properly weighing the influence of local and global illuminations on each pixel response. The main advantages of this technique with respect to similar approaches previously reported are as follows: 1) standard active-pixel-sensor circuitry can be used to render the pixel values and 2) the resulting compressed image representation is ready either for readout or for early vision processing at the very focal plane without requiring any additional peripheral circuit block. Experimental results from a prototype smart image sensor achieving a dynamic range of 102 dB are presented.

A Bio-Inspired Vision Sensor with Dual Operation and Readout Modes
J.A. Leñero-Bardallo, P. Häfliger, R. Carmona-Galán and A. Rodriguez-Vazquez
Journal Paper · IEEE Sensors Journal, vol. 16, no. 2, pp. 317-330, 2016
abstract      doi      

This paper presents a novel event-based vision sensor with two operation modes: 1) intensity mode and spatial contrast detection. They can be combined with two different readout approaches: 1) pulse density modulation and time-to-first spike. The sensor is conceived to be a node of an smart camera network made up of several independent an autonomous nodes that send information to a central one. The user can toggle the operation and the readout modes with two control bits. The sensor has low latency (below 1 ms under average illumination conditions), low power consumption (19 mA), and reduced data flow, when detecting spatial contrast. A new approach to compute the spatial contrast based on inter-pixel event communication less prone to mismatch effects than diffusive networks is proposed. The sensor was fabricated in the standard AMS4M2P 0.35-μm process. A detailed system-level description and experimental results are provided.

Compact CMOS active quenching/recharge circuit for SPAD arrays
I. Vornicu, R. Carmona-Galán, B. Pérez-Verdú and A. Rodríguez-Vázquez
Journal Paper · International Journal of Circuit Theory and Applications, vol. 44, no. 4, pp 917-928, 2016
abstract      doi      pdf

Avalanche diodes operating in Geiger mode are able to detect single photon events. They can be employed to photon counting and time-of-flight estimation. In order to ensure proper operation of these devices, the avalanche current must be rapidly quenched, and, later on, the initial equilibrium must be restored. In this paper, we present an active quenching/recharge circuit specially designed to be integrated in the form of an array of single-photon avalanche diode (SPAD) detectors. Active quenching and recherge provide benefits like an accurately controllable pulse width and afterpulsing reduction. In addition, this circuit yields one of the lowest reported area occupations and power consumptions. The quenching mechanism employed is based on a positive feedback loop that accelerates quenching right after sensing the avalanche current. We have employed a current starved inverter for the regulation of the hold-off time, which is more compact than other reported controllable delay implementations. This circuit has been fabricated in a standard 0.18 μm complementary metal-oxide-semiconductor (CMOS) technology. The SPAD has a quasi-circular shape of 12 μm diameter active area. The fill factor is about 11%. The measured time resolution of the detector is 187 ps. The photon-detection efficiency (PDE) at 540 nm wavelength is about 5% at an excess voltage of 900 mV. The break-down voltage is 10.3 V. A dark count rate of 19 kHz is measured at room temperature. Worst case post- layout simulations show a 117 ps quenching and 280 ps restoring times. The dead time can be accurately tuned from 5 to 500 ns. The pulse-width jitter is below 1.8 ns when dead time is set to 40 ns.

Time interval generator with 8 ps resolution and wide range for large TDC array characterization
I. Vornicu, R. Carmona-Galán and Á. Rodríguez-Vázquez
Journal Paper · Analog Integrated Circuits and Signal Processing, vol. 87, no. 2, pp 181-189, 2016
abstract      doi      

Accurate generation of picosecond-resolution wide-range time intervals gives rise to a new time-efficient method for the characterization of large arrays of time-to-digital converters involved in time resolved imaging. This paper presents the design and measurement of a time interval generator based on FPGA technology. Although it can be employed in different automatic test setups, it has been designed to characterize an array of time-to-digital converters. It can work as periodic pulse/frequency generator but also as a digital-to-time converter. The accuracy of periodic pulse generator is around 20 ps RMS jitter for a pulse width ranging from 600 ps to 33 μs. The incremental time resolution is 8 ps and the repetition rate is up to 2 MHz. The accuracy of the digital-to-time converter is less than 0.8LSB DNL and 2LSB INL, whilst the time resolution is 27 ps. Full characterization of the module is reported including a comparison with state-of-the-art instruments in this field. The measurement results of the time-to-digital converter array driven by the designed digital-to-time converter module are presented as well. The effectiveness of the proposed method is evaluated by comparing it with the statistical code density test.

On-chip time-of-flight estimation in standard CMOS technology
I. Vornicu, R. Carmona-Galán and Á. Rodríguez-Vázquez
Journal Paper · SPIE Newsroom, Published online Feb 2015
abstract      doi      

In the last decade, CMOS image sensors (CISs) have reached a considerable level of maturity and their performance is now comparable with CCD sensors, in terms of image quality. CISs have almost completely replaced CCDs in commercial photo cameras and mobile phones. The main advantage of using CMOS technology is the possibility of integrating additional intelligence at the sensor level. Complex image processing algorithms can be run on-chip at high frame rates. A possible future development for CIS technology is to capture 3D information from a scene. This, however, requires active illumination schemes.

Bottom-up performance analysis of focal-plane mixed-signal hardware for Viola-Jones early vision tasks
J. Fernández-Berni, R. Carmona-Galán, R. del Río and A. Rodríguez-Vázquez
Journal Paper · International Journal of Circuit Theory and Applications, vol. 43, no. 8, pp 1063-1079, 2015
abstract      doi      pdf

Focal-plane mixed-signal arrays have traditionally been designed according to the general claim that moderate accuracy in processing is affordable. The performance of their circuitry has been analyzed in these terms without a comprehensive study of the ultimate consequences of such moderate accuracy. In this paper, for the first time to the best of our knowledge, we do carry out this study. We move expectable performance of mixed-signal image processing hardware directly into the vision algorithm making use of it. This permits to close a wider design loop, enabling a more aggressive design of this kind of hardware provided that the algorithm, at the highest level -semantic interpretation of the scene-, can afford it. Thus, we present a thorough analysis of the non-idealities associated with the implementation of a QVGA array tailored for the distinctive characteristics of the Viola-Jones processing framework. The resulting deviation models are then introduced in the processing flow of this framework provided by the OpenCV library. We have found, contrary to what could be expected, that these deviations do not necessarily degrade the performance of the Viola-Jones algorithm. They could be even beneficial for certain high-level specifications. Additionally, we demonstrate the architectural advantages of our approach: exploitation of focal-plane distributed memory and ultra-low-power operation.

A CMOS imager for time-of-flight and photon counting based on single photon avalanche diodes and in-pixel time-to-digital converters
I. Vornicu, R. Carmona-Galán and Á. Rodríguez-Vázquez
Journal Paper · Romanian Journal of Information Science and Technology, vol. 17, no. 4, pp 353-371, 2014
abstract      pdf

The design of a CMOS image sensor based on single-photon avalanche-diode (SPAD) array with in-pixel time-to-digital converter (TDC) is presented. The architecture of the imager is thoroughly described with emphasis on the characterization of the TDCs array. It is targeted for 3D image reconstruction. Several techniques as fast quenching/recharge circuit with tunable dead-time and time gated-operation are applied to reduce the noise and the power consumption. The chip was fabricated in a 0.18 μm standard CMOS process and implements a double functionality: time-of-flight (ToF) estimation and photon counting. The imager features a programmable time resolution of the array of TDCs down to 145 ps. The measured accuracy of the minimum time bin is lower than ± 1LSB DNL and ± 1.7 LSB INL. The TDC jitter over the full dynamic range is less than 1 LSB.

High dynamic range adaptation for ROI tracking based on reconfigurable concurrent dual-sensing
J. Fernández-Berni, R. Carmona-Galán, R. del Río and A. Rodríguez-Vázquez
Journal Paper · Electronics Letters, vol. 50, no. 24, pp 1832-1834, 2014
abstract      doi      pdf

A single-exposure technique to extend the dynamic range of vision sensors is presented. It is particularly suitable for vision algorithms requiring region-of-interest (ROI) tracking under varying illumination conditions. The operation is supported by two intertwined photodiodes at pixel level and two digital registers at the periphery of the pixel matrix. These registers divide the focal plane into independent regions within which automatic concurrent adjustment of the integration time takes place for each frame. At pixel level, one of the photodiodes senses the pixel value itself, whereas the other, in collaboration with its counterparts in every prescribed ROI, senses the mean illumination of that specific ROI. An additional circuitry interconnecting both photodiodes asynchronously determines the integration period for each ROI according to its mean illumination. The experimental results for a quarter video graphics array prototype CMOS vision sensor are reported.

Detecting single-electron events in TEM using low-cost electronics and a silicon strip sensor
L.C. Gontard, G. Moldovan, R. Carmona-Galán, C. Lin and A.I. Kirkland
Journal Paper · Microscopy, vol. 63, no. 2, pp. 119-130, 2014
abstract      doi      

There is great interest in developing novel position-sensitive direct detectors for transmission electron microscopy (TEM) that do not rely in the conversion of electrons into photons. Direct imaging improves contrast and efficiency and allows the operation of the microscope at lower energies and at lower doses without loss in resolution, which is especially important for studying soft materials and biological samples.We investigate the feasibility of employing a silicon strip detector as an imaging detector for TEM. This device, routinely used in high-energy particle physics, can detect small variations in electric current associated with the impact of a single charged particle. The main advantages of using this type of sensor for direct imaging in TEM are its intrinsic radiation hardness and large detection area. Here, we detail design, simulation, fabrication and tests in a TEM of the front-end electronics developed using low-cost discrete components and discuss the limitations and applications of this technology for TEM.

Focal-plane sensing-processing: A power-efficient approach for the implementation of privacy-aware networked visual sensors
J. Fernandez-Berni, R. Carmona-Galan, R. del Rio, R. Kleihorst, W. Philips and A. Rodriguez-Vazquez
Journal Paper · Sensors, vol. 14, no. 8, pp. 15203-15226, 2014
abstract      doi      pdf

The capture, processing and distribution of visual information is one of the major challenges for the paradigm of the Internet of Things. Privacy emerges as a fundamental barrier to overcome. The idea of networked image sensors pervasively collecting data generates social rejection in the face of sensitive information being tampered by hackers or misused by legitimate users. Power consumption also constitutes a crucial aspect. Images contain a massive amount of data to be processed under strict timing requirements, demanding high-performance vision systems. In this paper, we describe a hardware-based strategy to concurrently address these two key issues. By conveying processing capabilities to the focal plane in addition to sensing, we can implement privacy protection measures just at the point where sensitive data are generated. Furthermore, such measures can be tailored for efficiently reducing the computational load of subsequent processing stages. As a proof of concept, a full-custom QVGA vision sensor chip is presented. It incorporates a mixed-signal focal-plane sensing-processing array providing programmable pixelation of multiple image regions in parallel. In addition to this functionality, the sensor exploits reconfigurability to implement other processing primitives, namely block-wise dynamic range adaptation, integral image computation and multi-resolution filtering. The proposed circuitry is also suitable to build a granular space, becoming the raw material for subsequent feature extraction and recognition of categorized objects.

Smart camera architecture
F. Berry, R. Kleihorst and R. Carmona-Galán
Journal Paper · Journal of Systems Architecture, vol. 59, no. 10 part A, pp 817, 2013
abstract      doi      

Abstract not available

A hierarchical vision processing architecture oriented to 3D integration of smart camera chips
R. Carmona-Galán, Á. Zarándy, C. Rekeczky, P. Földesy, A. Rodríguez-Pérez, C. Domínguez-Matas, J. Fernández-Berni, G. Liñán-Cembrano, B. Pérez-Verdú, Z. Kárász, M. Suárez-Cambre, V. Brea-Sánchez, T. Roska, Á. Rodríguez-Vázquez
Journal Paper · Journal of Systems Architecture, vol. 59, no. 10 part A, pp 908-919, 2013
abstract      doi      pdf

This paper introduces a vision processing architecture that is directly mappable on a 3D chip integration technology. Due to the aggregated nature of the information contained in the visual stimulus, adapted architectures are more efficient than conventional processing schemes. Given the relatively minor importance of the value of an isolated pixel, converting every one of them to digital prior to any processing is inefficient. Instead of this, our system relies on focal-plane image filtering and key point detection for feature extraction. The originally large amount of data representing the image is now reduced to a smaller number of abstracted entities, simplifying the operation of the subsequent digital processor. There are certain limitations to the implementation of such hierarchical scheme. The incorporation of processing elements close to the photo-sensing devices in a planar technology has a negative influence in the fill factor, pixel pitch and image size. It therefore affects the sensitivity and spatial resolution of the image sensor. A fundamental tradeoff needs to be solved. The larger the amount of processing conveyed to the sensor plane, the larger the pixel pitch. On the contrary, using a smaller pixel pitch sends more processing circuitry to the periphery of the sensor and tightens the data bottleneck between the sensor plane and the memory plane. 3D integration technologies with a high density of through-silicon-vias can help overcome these limitations. Vertical integration of the sensor plane and the processing and memory planes with a fully parallel connection eliminates data bottlenecks without compromising fill factor and pixel pitch. A case study is presented: a smart vision chip designed on a 3D integration technology provided by MIT Lincoln Labs, whose base process is 0.15 μm FD-SOI. Simulation results advance performance improvements with respect to the state-of-the-art in smart vision chips.

Ultralow-power processing array for image enhancement and edge detection
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 59, no. 11, pp 751-755, 2012
abstract      doi      pdf

This paper presents a massively parallel processing array designed for the 0.13-μm 1.5-V standard CMOS base process of a commercial 3-D through-silicon via stack. The array, which will constitute one of the fundamental blocks of a smart CMOS imager currently under design, implements isotropic Gaussian filtering by means of a MOS-based RC network. Alternatively, this filtering can be turned into anisotropic by a very simple voltage comparator between neighboring nodes whose output controls the gate of the elementary MOS resistor. Anisotropic diffusion enables image enhancement by removing noise and small local variations while preserving edges. A binary edge image can also be attained by combining the output of the voltage comparators. In addition to these processing capabilities, the simulations have confirmed the robustness of the array against process variations and mismatch. The power consumption extrapolated for VGA-resolution array processing images at 30 fps is 570 μW.

IC-PCR 1000 Control Using a Wireless Sensor Network
M. Bakkali, C. Mascareñas-Pérez-Íñigo and R. Carmona-Galán
Journal Paper · International Journal of Computer and Communication Engineering, vol. 1, no. 3, pp 290-292, 2012
abstract      doi      pdf

We put forward in this paper a new methodbased on a Wireless Sensor Network from Crossbowtechnology for the IC-PCR1000 control and communication tolessen the impact of the unwanted EMI. This method is a newmethod that allows working in the band LF/LW/MW/FMwithout the EMI emitted by the power supply of the PC andIC-PCR1000

Early forest fire detection by vision-enabled wireless sensor networks
J. Fernández-Berni, R. Carmona-Galán, J.F. Martínez-Carmona and A. Rodríguez-Vázquez
Journal Paper · International Journal of Wildland Fire, vol. 21, no. 8, pp 938-949, 2012
abstract      doi      pdf

Wireless sensor networks constitute a powerful technology particularly suitable for environmental monitoring. With regard to wildfires, they enable low-cost fine-grained surveillance of hazardous locations like wildland-urban interfaces. This paper presents work developed during the last 4 years targeting a vision-enabled wireless sensor network node for the reliable, early on-site detection of forest fires. The tasks carried out ranged from devising a robust vision algorithm for smoke detection to the design and physical implementation of a power-efficient smart imager tailored to the characteristics of such an algorithm. By integrating this smart imager with a commercial wireless platform, we endowed the resulting system with vision capabilities and radio communication. Numerous tests were arranged in different natural scenarios in order to progressively tune all the parameters involved in the autonomous operation of this prototype node. The last test carried out, involving the prescribed burning of a 9520-m shrub plot, confirmed the high degree of reliability of our approach in terms of both successful early detection and a very low false-alarm rate. Journal compilation.

CMOS-3D Smart Imager Architectures for Feature Detection
V.M. Brea, J. Fernández-Berni, R. Carmona-Galán, G. Liñán, D. Cabello and A. Rodríguez-Vázquez
Journal Paper · IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 2, no. 4, pp 723-736, 2012
abstract      doi      pdf

This paper reports a multi-layered smart image sensor architecture for feature extraction based on detection of interest points. The architecture is conceived for 3-D integrated circuit technologies consisting of two layers (tiers) plus memory. The top tier includes sensing and processing circuitry aimed to perform Gaussian filtering and generate Gaussian pyramids in fully concurrent way. The circuitry in this tier operates in mixed-signal domain. It embeds in-pixel correlated double sampling, a switched-capacitor network for Gaussian pyramid generation, analog memories and a comparator for in-pixel analog-to-digital conversion. This tier can be further split into two for improved resolution; one containing the sensors and another containing a capacitor per sensor plus the mixed-signal processing circuitry. Regarding the bottom tier, it embeds digital circuitry entitled for the calculation of Harris, Hessian, and difference-of-Gaussian detectors. The overall system can hence be configured by the user to detect interest points by using the algorithm out of these three better suited to practical applications. The paper describes the different kind of algorithms featured and the circuitry employed at top and bottom tiers. The Gaussian pyramid is implemented with a switched-capacitor network in less than 50 μs, outperforming more conventional solutions.

All-MOS implementation of RC networks for time-controlled Gaussian spatial filtering
J. Fernández-Berni and R. Carmona-Galán
Journal Paper · International Journal of Circuit Theory and Applications,vol. 40, no. 8, pp 859-876, 2012
abstract      doi      pdf

This paper addresses the design and VLSI implementation of MOS-based RC networks capable of performing time-controlled Gaussian filtering. In these networks, all the resistors are substituted one by one by a single MOS transistor biased in the ohmic region. The design of this elementary transistor is carefully realized according to the value of the ideal resistor to be emulated. For a prescribed signal range, the MOSFET in triode region delivers an interval of instantaneous resistance values. We demonstrate that, for the elementary 2-node network, establishing the design equation at a particular point within this interval guarantees minimum error. This equation is then corroborated for networks of arbitrary size by analyzing them from a stochastic point of view. Following the design methodology proposed, the error committed by an MOS-based grid when compared with its equivalent ideal RC network is, despite the intrinsic nonlinearities of the transistors, below 1% even under mismatch conditions of 10%. In terms of image processing, this error hardly affects the outcome, which is perceptually equivalent to that of the ideal network. These results, extracted from simulation, are verified in a prototype vision chip with QCIF resolution manufactured in the AMS 0.35Âμm CMOS-OPTO process. This prototype incorporates a focal-plane MOS-based RC network that performs fully programmable Gaussian filtering. Copyright © 2011 John Wiley & Sons, Ltd. This paper addresses the design and VLSI implementation of all-MOS RC networks capable of performing timecontrolled Gaussian filtering. Following the design methodology proposed, the error committed by a MOS-based grid when compared to its equivalent ideal RC networks is, despite the intrinsic nonlinearities of the transistors, below 1% even under mismatch conditions of 10%. These results, extracted from simulation, are verified in a prototype vision chip with QCIF resolution manufactured in the AMS 0.35Âμm CMOS-OPTO process. Copyright © 2011 John Wiley & Sons, Ltd. Copyright © 2011 John Wiley & Sons, Ltd.

FLIP-Q: a QCIF resolution focal-plane array for low-power image processing
J. Fernández-Berni, R. Carmona-Galán and L. Carranza-González
Journal Paper · IEEE Journal of Solid-State Circuits, vol. 46,  no. 3, pp 669-680, 2011
abstract      doi      pdf

This paper reports a 176x144-pixel smart image sensor designed and fabricated in a 0.35 mu m CMOS-OPTO process. The chip implements a massively parallel focal-plane processing array which can output different simplified representations of the scene at very low power. The array is composed of pixel-level processing elements which carry out analog image processing concurrently with photosensing. These processing elements can be grouped into fully-programmable rectangular-shape areas by loading the appropriate interconnection patterns into the registers at the edge of the array. The targeted processing can be thus performed block-wise. Readout is done pixel-by-pixel in a random access fashion. On-chip 8b ADC is provided. The image processing primitives implemented by the chip, experimentally tested and fully functional, are scale space and Gaussian pyramid generation, fully-programmable multiresolution scene representation-including foveation-and block-wise energy-based scene representation. The power consumption associated to the capture, processing and A/D conversion of an image flow at 30 fps, with full-frame processing but reduced frame size output, ranges from 2.7 mW to 5.6 mW, depending on the operation to be performed.

On the implementation of linear diffusion in transconductance-based cellular nonlinear networks
J. Fernández-Berni and R. Carmona-Galán
Journal Paper · International Journal of Circuit Theory and Applications, vol. 37, no. 4, pp 543-567, 2009
abstract      doi      

In theory, cellular nonlinear networks (CNN) are well capable of implementing discrete-space linear diffusion by means of the appropriate templates. In practice, good results have not been demonstrated with transconductance-based circuits. In this paper, we prove that inherent mismatch to very large scale integration implementation is the reason. Although previous works consider that the small perturbations of the network parameters lead to small deviations from the ideal behavior, we consider that this is over optimistic. When interactions between nodes are supported by unidirectional building blocks, originally balanced current paths are realized by mismatched elements. In the case of linear diffusion, the singular location of natural frequencies of the system implies that a small perturbation of balanced current paths renders qualitatively different network dynamics. We analyze and compare a set of linear templates performing unconstrained and constrained linear diffusion in transconductance-based CNN hardware. Several numerical examples are also presented to visualize the consequences of mismatch on the processing. Finally, in order to emphasize the importance of having balanced current paths, we tested the influence of mismatch in a grid built with MOS transistors. In spite of their nonlinearity, the resulting network is much more robust to mismatch. This last result coincides with previous studies. Copyright 2008 John Wiley & Sons, Ltd.

Performance evaluation and limitations of a vision system on a reconfigurable/programmable chip
J. Fernández-Pérez, F.J. Sánchez-Fernández and R. Carmona-Galán
Journal Paper · Journal of Universal Computer Science, vol. 13, no. 3, pp 440-453, 2007
abstract      doi      pdf

This paper presents a survey of the characteristics of a vision system implemented in a reconfigurable/programmable chip (FPGA). System limitations and performance have been evaluated in order to derive specifications and constraints for further vision system synthesis. The system hereby reported has a conventional architecture. It consists in a central microprocessor (CPU) and the necessary peripheral elements for data acquisition, data storage and communications. It has been designed to stand alone, but a link to the programming and debugging tools running in a digital host (PC) is provided. In order to alleviate the computational load of the central microprocessor, we have designed a visual co-processor in charge of the low-level image processing tasks. It operates autonomously, commanded by the CPU, as another system peripheral. The complete system, without the sensor, has been implemented in a single reconfigurable chip as a SOPC. The incorporation of a dedicated visual co-processor, with specific circuitry for low-level image processing acceleration, enhances the system throughput outperforming conventional processing schemes. However, time-multiplexing of the dedicated hardware remains a limiting factor for the achievable peak computing power. We have quantified this effect and sketched possible solutions, like replication of the specific image processing hardware.

ACE16k: The third generation of mixed-signal SIMD-CNN ACE chips toward VSoCs
A. Rodríguez-Vázquez, G. Liñán-Cembrano, L. Carranza, E. Roca-Moreno, R. Carmona-Galán, F. Jiménez-Garrido, R. Domínguez-Castro and S. Espejo-Meana
Journal Paper · IEEE Transactions on Circuits and Systems I-Regular Papers, vol. 51, no. 5, pp 851-863, 2004
abstract      doi      pdf

Today, with 0.18-μm technologies mature and stable enough for mixed-signal design with a large variety of CMOS compatible optical sensors available and with 0.09-μm technologies knocking at the door of designers, we can face the design of integrated systems, instead of just integrated circuits. In fact, significant progress has been made in the last few years toward the realization of vision systems on chips (VSoCs). Such VSoCs are eventually targeted to integrate within a semiconductor substrate the functions of optical sensing, image processing in space and time, high-level processing, and the control of actuators. The consecutive generations of ACE chips define a roadmap toward flexible VSoCs. These chips consist of arrays of mixed-signal processing elements (PEs) which operate in accordance with single instruction multiple data (SIMD) computing architectures and exhibit the functional features of CNN Universal Machines. They have been conceived to cover the early stages of the visual processing path in a fully parallel manner, and hence more efficiently than DSP-based systems. Across the different generations, different improvements and modifications have been made looking to converge with the newest discoveries of neurobiologists regarding the behavior of natural retinas. This paper presents considerations pertaining to the design of a member of the third generation of ACE chips, namely to the so-called ACE16k chip. This chip, designed in a 0.35-μm standard CMOS technology, contains about 3.75 million transistors and exhibits peak computing figures of 330 GOPS, 3.6 GOPS/mm(2) and 82.5 GOPS/W. Each PE in the array contains a reconfigurable computing kernel capable of calculating linear convolutions on 3 x 3 neighborhoods in less than 1.5 μs, imagewise Boolean combinations in less than 200 ns, imagewise arithmetic operations in about 5 μs, and CNN-like temporal evolutions with a time constant of about 0.5 μs. Unfortunately, the many ideas underlying the design of this chip cannot be covered in a single paper; hence, this paper is focused on, first, placing the ACE16k in the ACE chip roadmap and, then, discussing the most significant modifications of ACE16K versus its predecessors in the family.

Second-order neural core for bioinspired focal-plane dynamic image processing in CMOS
R. Carmona-Galán, F. Jiménez-Garrido, C.M. Domínguez-Mata, R. Domínguez-Castro, S. Espejo-Meana, I. Petras and A. Rodríguez-Vázquez
Journal Paper · IEEE Transactions on Circuits and Systems I-Regular Papers, vol. 51, no. 5, pp 913-925, 2004
abstract      doi      

Based on studies of the mammalian retina, a bioinspired model for mixed-signal array processing has been implemented on silicon. This model mimics the way in which images are processed at the front-end of natural visual pathways, by means of programmable complex spatio-temporal dynamic. When embedded into a focal-plane processing chip, such a model allows for online parallel filtering of the captured image; the outcome of such processing can be used to develop control feedback actions to adapt the response of photoreceptors to local image features. Beyond simple, resistive grid filtering, it is possible to program other spatio-temporal processing operators into the model core, such as nonlinear and anisotropic diffusion, among others. This paper presents analog and mixed-signal very large-scale integration building blocks to implement this model, and illustrates their operation through experimental results taken from a prototype chip fabricated in a 0.5-mum CMOS technology.

Reaction-diffusion navigation robot control: From chemical to VLSI analogic processors
A. Adamatzky, P. Arena, A. Basile, R. Carmona-Galán, B. de Lacy-Costello, L. Fortuna, M. Frasca and A. Rodríguez-Vázquez
Journal Paper · IEEE Transactions on Circuits and Systems I-Regular Papers, vol. 51, no. 5, pp 926-938, 2004
abstract      doi      pdf

We introduce a new methodology and experimental implementations for real-time wave-based robot navigation in a complex, dynamically changing environment. The main idea behind the approach is to consider the robot arena as an excitable medium, in which moving objects-obstacles and the target-are represented by sites of autowave generation: the target generates attractive waves, while the obstacles repulsive ones. The moving robot detects traveling and colliding wave fronts and uses the information about dynamics of the autowaves to adapt its direction of collision-free motion toward the target. This approach allows us to achieve a highly adaptive robot behavior and thus an optimal path along which the robot reaches the target while avoiding obstacles. At the computational and experimental levels, we adopt principles of computation in reaction-diffusion (RD) nonlinear active media. Nonlinear media where autowaves are used for information processing purposes can therefore be considered as RD computing devices. In this paper, we design and experiment with three types of RD processors: experimental and computational Belousov-Zhabotinsky chemical processor, computational CNN processor, and experimental RD-CNN very large-scale integration chip-the complex analog and logic computing engine (CACE1k). We demonstrate how to experimentally implement robot navigation using space-time snapshots of active chemical medium and how to overcome low-speed limitation of this "wetware" implementation in CNN-based silicon processors.

Exploration of spatial-temporal dynamic phenomena in a 32 × 32-Cell stored program two-layer CNN universal machine chip prototype
I. Petrás, C. Rekeczky, T. Roska, R. Carmona, F. Jiménez-Garrido and A. Rodríguez-Vázquez
Journal Paper · Journal of Circuits, Systems and Computers, vol. 12, no. 6, pp 691-710, 2003
abstract      doi      

This paper describes a full-custom mixed-signal chip that embeds digitally programmable analog parallel processing and distributed image memory on a common silicon substrate. The chip was designed and fabricated in a standard 0.5 μm CMOS technology and contains approximately 500 000 transistors. It consists of 1024 processing units arranged into a 32 × 32 grid. Each processing element contains two coupled CNN cores, thus, constituting two parallel layers of 32 × 32 nodes. The functional features of the chip are in accordance with the 2nd Order Complex Cell CNN-UM architecture. It is composed of two CNN layers with programmable inter- and intra-layer connections between cells. Other features are: cellular, spatial-invariant array architecture; randomly selectable memory of instructions; random storage and retrieval of intermediate images. The chip is capable of completing algorithmic image processing tasks controlled by the user-selected stored instructions. The internal analog circuitry is designed to operate with 7-bits equivalent accuracy. The physical implementation of a CNN containing second order cells allows real-time experiments of complex dynamics and active wave phenomena. Such well-known phenomena from the reaction-diffusion equations are traveling waves, autowaves, and spiral-waves. All of these active waves are demonstrated on-chip. Moreover this chip was specifically designed to be suitable for the computation of biologically inspired retina models. These computational experiments have been carried out in a developmental environment designed for testing and programming the analogic (analog-and-logic) programmable array processors.

Conferences


Accurate Face Recognition on Highly Compressed Samples
A. Khan, J. Fernández-Berni and R. Carmona-Galán
Conference · International Conference on Signal Image Technology and Internet Based Systems SITIS 2022
abstract     

Compressive sensing is an emerging field for lowdimensional data acquisition. Samples are acquired in the compressed domain and utilized for signal reconstruction or as input features for a classifier. In this work, hardware-aware face recognition using compressed samples was investigated. A linear support vector machine (SVM) classifier was exploited with compressed samples as input features; Faces can be reliably recognized with high average accuracy (up to 99%). To assess the robustness of the proposed scheme, three image datasets covering different facial and illumination conditions were analyzed. Random (binary) and structured (Haar-transform-based) measurement matrices were employed for generating compressed samples. For one of the datasets, Extended Yale B, and using a random binary measurement matrix, the proposed scheme achieved 82% accuracy from as few as 15 compressed samples, which means a 1/20480 sensing ratio. Accuracy and compression are also remarkably high with respect to the state-of-the-art for the other two datasets.

An Architecture for On-Chip Face Recognition in a Compressive Image Sensor
A. Khan, J. Fernandez-Berni and R. Carmona-Galan
Conference · IEEE International System-On-Chip Conference SOCC 2022
abstract     

Compressive sensing has been widely explored for image reconstruction; however, compressed samples themselves contain relevant signal information and, hence, could be exploited for inference purposes. Many previous studies investigated image recognition on compressed samples, but few of them considered on-chip realization. In this study, an architecture for face recognition that exploits compressed samples was investigated. We found that by using a linear support-vector-machine (SVM) classifier with such samples as input, faces can be recognized with higher performance than in previous works for the level of compression expected in a system-on-chip (SoC) implementation. To compare our results with those of existing works, a figure of merit is proposed. Three image datasets were analyzed to cover diverse aspects, such as variation in illumination, different poses, aging effect, and changing backgrounds. The compression scheme shows robustness under this variety of input signals. A pseudo-diagonal measurement matrix and an architecture suited for in-sensor on-chip implementation are proposed. The resulting inference framework is suitable for an SoC implementation encompassing an image sensor to perform CS acquisition and a DSP to run the SVM.

High-Level Inference On-Chip Enabled by Compressed Sensing
A. Khan, J. Fernández-Berni and R. Carmona-Galán
Conference · Conference on Design of Circuits and Integrated Systems DCIS 2021
abstract     

Compressed sensing (CS) proposes an alternative approach for image acquisition. The traditional method implies acquiring an image and then compressing it. In CS, the image is directly acquired in the compressed domain. Learning can be performed in the compressed domain, known as compressed learning. When it comes to conducting inference on a CS-based image, compressed learning eliminates the need of reconstructing the compressively acquired image. Thus, in this work, compressed learning is suggested for high-level inference. Our ultimate goal is to implement a CMOS smart sensor-processor chip exploiting CS for on-chip inference based on the hypotheses that working in the compressed domain allows implementing high-level inference under severe restrictions on computational and power resources.

Design of Readout Channels for Direct-ToF LiDAR
M. Parsakordasiabi, A. Rodríguez-Vázquez and R. Carmona-Galán
Conference · Conference on Design of Circuits and Integrated Systems DCIS 2021
abstract     

Direct-time-of-flight (d-ToF) readout channels are required to be precise, high speed, and linear while preserving low resources for multi-channel applications. Although meeting these requirements seems to be difficult, they are highly demanded in many applications like light detection and ranging (LiDAR) sensors. This thesis project is dedicated to the design of a high-linearity high-measurement throughput low-resources FPGA-based time-to-digital converters (TDCs). We are working on the reduction of the dead-time by using a toggling input stage and a dual-mode counter-based encoder. In addition, the linearity is improved by using a dual-mode bin-width calibrator and a robust encoder.

A Novel Approach for Measurement Throughput Maximization in FPGA-based TDCs
M. Parsakordasiabi, I. Vornicu, A. Rodríguez-Vázquez and R. Carmona-Galán
Conference · International Conference on Event-Based Control, Communication and Signal Processing EBCCSP 2021
abstract     

This paper presents a new approach for dead-time minimization while preserving low resource usage and high resolution in FPGA-based time-to-digital (TDC) converters. The proposed TDC architecture can be employed in applications in which many events need to be detected in a short time, such as time-of-flight positron emission tomography (ToF-PET) applications. The presented architecture consists of a toggling input stage, a tapped delay line (TDL), a dual-mode counter-based encoder, a coarse counter, and a bin width calibration stage. The minimum dead-time of TDL TDCs is two clock cycles. The proposed architecture reduced dead-time to one clock cycle. The measurement results of the proposed low-resources TDC in an Artix-7 FPGA show [-0.80, 1.34] LSB differential nonlinearity (DNL) and [-0.73, 1.97] LSB integral non-linearity (INL). The measured LSB size and single-shot precision (SSP) are 22.1 ps and 28.43 ps, respectively.

Photon-Detection Timing-Jitter Model in Verilog-A
J.M. López-Martínez, R. Carmona-Galán and A. Rodríguez-Váquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2020
abstract      pdf

Single-photon avalanche diodes can be employed to register the arrival of an individual photon. They are biased beyond breakdown voltage, and thus the electron-hole pairs generated by any incident photon is accelerated by the strong electric field triggering an avalanche current. In recent years, there have been attempts to model its characteristics in Verilog-A HDL. However, none of them have modelled its photon-detection timing jitter. This paper explains the mechanism of avalanche triggering and proposes a first approach to model it in Verilog-A. Comparison with experimental data and data reported in literature validates the model.

Limitation of SPADs quantum efficiency due to the dopants concentration gradient
J.M. López-Martínez, R. Carmona-Galán and A. Rodríguez-Váquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2020
abstract      pdf

Single-photon avalanche diodes are highly sensitive devices capable of registering the arrival of an individual photon. They are biased beyond breakdown voltage, and thus the electron-hole pairs generated by any incident photon is accelerated by the strong electric field triggering an avalanche current. This paper examines the role of the dopants concentration gradient in the gathering of photons in these devices, and how it can be engineered to maximize quantum efficiency and explain his role to minimize undesirable effects like crosstalk.

Cellular-Neural-Network Focal-Plane Processor as Pre-Processor for ConvNet Inference
L.C. Gontard, R. Carmona-Galán and A. Rodríguez-Váquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2020
abstract      pdf

Cellular Neural Networks (CNN) can be embodied in the form of a focal-plane image processor. They represent a computing paradigm with evident advantages in terms of energy and resources. Their operation relies in the strong parallelization of the processing chain thanks to a distributed allocation of computing resources. In this way, image sensing and ultra-fast processing can be embedded in a single chip. This makes them good candidates for portable and/or distributed applications in fields like autonomous robots or smart cities. With the irruption of visual features learning through convolutional neural networks (ConvNets), several works attempt to implement this functionality within the CNN framework. In this paper we carry out some experiments on the implementation of ConvNets with CNN hardware in the form of a focal-plane image processor. It is shown that ultra-fast inference can be implemented, using as an example a LeNet-based ConvNet architecture.

Vertically Stacked CMOS-Compatible Photodiodes for Scanning Electron Microscopy
L.C. Gontard, J.A. Leñero-Bardallo, F.M. Varela-Feria and R. Carmona-Galán
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2020
abstract      pdf

This paper reports the use of vertically stacked photodiodes as compact solid-state spectrometers for transmission scanning electron microscopy. SEM microscopes operate by illuminating the sample with accelerated electrons. They can have one or more solid-state sensors. In this work we have tested a set of stacked photodiodes fabricated in a standard 180nm HV-CMOS technology without process modifications. We have measured their sensitivity to electron irradiation in the energy range between 10keV and 30keV. We have also assessed their radiation hardness. The experiments are compared with Monte Carlo simulations to investigate their spectral sensitivity.

VersaTile Convolutional Neural Network Mapping on FPGAs
A. Muñío-Gracia, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2020
abstract      pdf

Convolutional Neural Networks (ConvNets) are directed acyclic graphs with node transitions determined by a set of configuration parameters. In this paper, we describe a dynamically configurable hardware architecture that enables data allocation strategy adjustment according to ConvNets layer characteristics. The proposed flexible scheduling solution allows the accelerator design to be portable across various scenarios of computation and memory resources availability. For instance, FPGA block-RAM resources can be properly balanced for optimization of data distribution and minimization of off-chip memory accesses. We explore the selection of tailored scheduling policies that translate into efficient on-chip data reuse and hence lower energy consumption. The system can autonomously adapt its behavior with no need of platform reconfiguration nor user supervision. Experimental results are presented and compared with state-of-the-art accelerators.

Demo: CNN Performance Prediction on a CPU-based Edge Platform
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez Vázquez
Conference · International Conference on Sustainable Development in Civil Engineering ICSDC 2019
abstract     

The implementation of algorithms based on Dee p Learning at edge visual systems is currently a challenge. In addition to accuracy, the network architecture also has an impact on inference performance in terms of throughput and power consumption. This demo showcases per layer inference performance of various convolut ional neural networks running at a low cost edge platform . Furthermore, a n empirical model is applied to predict processing time and power consumption prior to actually running the networks A comparison between the prediction from our model and the actual inference performance is displayed in real time.

PhD Forum: A survey on FPGA-based high-resolution TDCs
M. Parsakordasiabi, I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICSDC 2019
abstract     

Time-to-digital converters based on Nutt method are especially suitable for FPGA implementation. They are able to provide high resolution, range and linearity with low resources usage. The core of this architecture consist in a coarse counter for long range, a fine time interpolator for high resolution and real-time calibration for high linearity. This paper reviews different time interpolation and real-time calibration techniques. Moreover, a comparison of state-of-the-art FPGA-based TDCs is presented as well.

PhD Forum: Impact of CNNs Pooling Layer Implementation on FPGAs Accelerator Design
A. Muñío-Gracia, J. Fernández-Berni, R. Carmona-Galán and Á. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICSDC 2019
abstract     

Convolutional Neural Networks have demonstrated their competence in extracting information from data, especially in the field of computer vision. Their computational complexity prompts for hardware acceleration. The challenge in the design of hardware accelerators for CNNs is providing a sustained throughput with low power consumption, for what FPGAs have captured community attention. In CNNs pooling layers are introduced to reduce model spatial dimensions. This work explores the influence of pooling layers modification in some state-of-the-art CNNs, namely AlexNet and SqueezeNet. The objective is to optimize hardware resources utilization without negative impact on inference accuracy.

Low-Noise and High-Efficiency Near-IR SPADs in 110nm CIS Technology
I. Vornicu, F. Bandi, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · European Solid-State Device Research Conference ESSDERC 2019
abstract     

Photon detection at longer wavelengths is much desired for LiDAR applications. Silicon photodiodes with deeper junctions and larger multiplication regions are more sensitive to near-IR photons. This paper presents the complete electro-optical characterization of a P-well/Deep N-well single photon avalanche diode integrated in 110nm CIS technology. Devices with various sizes, shapes and guard ring widths have been fabricated and tested. The measured mean breakdown voltage is of 18V. The proposed structure has 0.4Hz/um2 dark count rate, 0.5% afterpulsing, 188ps FWHM (total) jitter and around 10% photon detection probability at 850nm wavelength. All figures have been measured at 3V excess voltage.

Evaluation of Architectures for FPGA-Implementation of High-Resolution TDCs
M. Parsakordasiabi, I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2019
abstract     

Time-to-digital converters (TDCs) are a central component in systems based on time-delay assessment. The principal characteristics to be sought for in a TDC are high resolution, long time range, linearity and low power consumption. Besides, field-programmable gate arrays (FPGAs) represent an interesting option to explore fully-digital TDC architectures, because of their flexibility, shorter development time and lower implementation cost than ASICs. They are reconfigurable and usually built on the finest silicon technologies. The purpose of this work is to identify the different architectures that lead to high-resolution TDCs on FPGA, and to compare them in terms of the appropriate figures of merit. The most extended method to cover a long time interval while preserving a high time resolution is to combine a coarse counter with a fine time interpolator. Two techniques have been widely used to implement the interpolator, namely a tapped delay line (TDL) and a multiple-phase clock interpolator. Exploiting fast carry chains present in most modern FPGAs, sub-clock-period resolution have been achieved, down to tens of picoseconds. Other important aspects of the TDC design are the thermometer-to-binary encoder, the minimization of the clock skew, the analysis of the influence of voltage and temperature changes and bin-width calibration. Accordingly, we report an analysis of the different TDC architectures on FPGA based on their performance characteristics.

Towards a Simplified Procedure for CNN Performance Prediction on Embedded Platforms
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2019
abstract     

Vision is arguably the technical field benefiting the most from the renaissance of artificial intelligence in the last few years. In particular, the convergence of massive datasets for training, boosted computational power, and enhanced machine learning techniques has given rise to highly accurate vision algorithms -even outperforming humans in certain tasks- based on convolutional neural networks (CNNs). The potential of these algorithms has attracted attention from many parties, both in academia and industry, spurring the development of a myriad of hardware platforms and software frameworks. The challenge now is how to efficiently leverage and integrate this variety of components in practical realizations, taking also into account that CNN models keep evolving at a rapid pace. With this scenario in mind, we have been working on a simplified procedure to predict the performance of CNNs running on embedded platforms in terms of throughput and power consumption. The objective is to facilitate the evaluation of the aforementioned components and CNN models prior to actually implementing them, thereby speeding up the deployment of optimal solutions. In this talk, we will describe key aspects of the proposed procedure. Specifically, we will elaborate on SweepNet, a deep neural network tailored for meaningful per-layer characterization. The performance models extracted from SweepNet for a hardware platform allow to accurately predict layer by layer the execution time and energy consumption of any other CNN running on that platform.

On the Balanced Allocation of Convolutional Neural Network Models on FPGAs
A. Muñío-Gracia, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2019
abstract     

Deep Learning (DL) algorithms have demonstrated their competence in accurately extracting information from data, especially in the field of computer vision. DL has emerged as an end-to-end approach based on learned multi-level scene representations. A number of open-source frameworks have been created to describe convolutional neural network (CNN) models -a class of the deep neural networks (DNNs) that support DL. Their computational complexity prompts for hardware acceleration. The challenge in the design of hardware accelerators for CNNs is providing a sustained throughput with low power consumption. In order to test our architectural proposals, we will be employing FPGAs. They are reconfigurable, efficient, and have adjustable precision. FPGAs permit architectural exploration with shorter development time and lower cost than ASICs. This work introduces an scalable, frameworkagnostic, architecture whose behavior self-adapts to the selected CNN configuration. A design space analysis is performed for some state-of-the-art CNNs, namely VGG-16, Tiny DarkNet, and SqueezeNet. The objective is a balanced allocation of resources. For this, tiling parameterization will be optimized attending to decisive performance criteria such as the number of memory accesses, data movement policy and throughput.

On the Correlation of CNN Performance and Hardware Metrics for Visual Inference on a Low-Cost CPU-based Platform
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Conference on Systems, Signals and Image Processing IWSSIP 2019
abstract      pdf

While providing the same functionality, the various Deep Learning software frameworks available these days do not provide similar performance when running the same network model on a particular hardware platform. On the contrary, we show that the different coding techniques and underlying acceleration libraries have a great impact on the instantaneous throughput and CPU utilization when carrying out the same inference with Caffe, OpenCV, TensorFlow and Caffe2 on an ARM Cortex-A53 multi-core processor. Direct modelling of this dissimilar performance is not practical, mainly because of the complexity and rapid evolution of the toolchains. Alternatively, we examine how the hardware resources are distinctly exploited by the frameworks. We demonstrate that there is a strong correlation between inference performance - including power consumption - and critical parameters associated with memory usage and instruction flow control. This identified correlation is a preliminary step for the development of a simple empirical model. The objective is to facilitate selection and further performance tuning among the ever-growing zoo of deep neural networks and frameworks, as well as the exploration of new network architectures.

TOF estimation based on compressed real-time histogram builder for SPAD image sensors
I. Vornicu, A. Darie, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2019
abstract     

This paper presents a FPGA implementation of a novel depth map estimation algorithm for direct time-of-flight CMOS image sensors (dToF-CISs) based on single-photon avalanche-diodes (SPADs). Conventional ToF computation algorithms rely on complete ToF histograms. The next generation of high speed dToF-CIS is expected to have wide dynamic range and high depth resolution. Applications such as 3D imaging based on dToF-CISs require pixel-level ToF histograms which have to be stored by huge fully-random access memory (RAM) modules. The proposed shifted inter-frame histogram (SiFH) algorithm has the same accuracy but requires a memory footprint 128 times smaller than the conventional algorithm. Thus a much larger number of pixels can be resolved using limited block RAM resources of FPGAs. Moreover the overall frame rate is also remarkably improved compared to the scanning method. The proof of concept of the SiFH algorithm on 15 bits has been implemented on Spartan-3E. An automated testbench was developed to confirm that no ambiguity errors occur along the entire dynamic range.

On the implementation of asynchronous sun sensors
J.A. Leñero-Bardallo, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IS&T International Symposium on Electronic Imaging 2019
abstract     

Abstract not avaliable

An Experimentally-Validated Verilog-A SPAD Model Extracted from TCAD Simulation
J.M. López-Martínez, I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Conference on Electronics Circuits and Systems ICECS 2018
abstract     

Single-photon avalanche diodes (SPAD) are photodetectors with exceptional characteristics. This paper proposes a new approach to model them in Verilog-A HDL with the help of a powerful tool: TCAD simulation. Besides, to the best of our knowledge, this is first model to incorporate a trap-assisted tunneling mechanism, a cross-section temperature dependence of the traps, and the self-heating effect. Comparison with experimental data establishes the validity of the model.

1D Cellular Automata for Pulse Width Modulated Compressive Sampling CMOS Image Sensors
M. Trevisi, R. Carmona-Galan and A. Rodriguez-Vazquez
Conference · International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2018
abstract     

Compressive sensing (CS) is an alternative to the Shannon limit when the signal to be acquired is known to be sparse or compressible in some domain. Since compressed samples are non-hierarchical packages of information, this acquisition technique can be employed to overcome channel losses and restricted data rates. The quality of the compressed samples that a sensor can deliver is affected by the measurement matrix used to collect them. Measurement matrices usually employed in CS image sensors are recursive random-like binary matrices obtained using pseudo-random number generators (PRNG). In this paper we analyse the performance of these PRNGs in order to understand how their non-idealities affect the quality of the compressed samples. We present the architecture of a CMOS image sensor that uses class-III elementary cellular automata (ECA) and pixel pulse width modulation (PWM) to generate onchip a measurement matrix and high the quality compressed samples.

Gaussian Pyramid: Comparative Analysis of Hardware Architectures
F.D.V.R. Oliveira, J.G.R.C. Gomes, J. Fernandez-Berni, R. Carmona-Galan, R. del Rio and A. Rodriguez-Vazquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2018
abstract     

A comparison of architectures for hardware implementation of Gaussian image pyramids is addressed. Architectures consisting of a conventional sensor followed by digital processors are compared to architectures employing per-pixel embedded pre-processing structures. The later is potentially advantageous for enhancing throughput and reducing energy consumption, important features in the IoT context. These advantages are quantified considering different numbers of digital processors and ADCs, and different ADCs types. Results show that the advantages of pre-processing sensors are not granted by default, requiring proper architectural design. The methodology presented for comparing focal-plane and digital approaches allows for the assessment of focal-plane processing advantages.

Results of 'iCaVeats', a project on the integration of architectures and components for embedded vision
R. Carmona-Galán, J. Fernández-Berni, A. Rodríguez-Vázquez, P. López-Martínez, V.M. Brea-Sánchez, D. Cabello-Ferrer, G. Domenech-Asensi, R. Ruiz-Merino and J. Zapata-Pérez
Conference · International Conference on Distributed Smart Cameras ICDSC 2018
abstract      pdf

iCaveats is a Project on the integration of components and architectures for embedded vision in transport and security applications. A compact and efficient implementation of autonomous vision systems is difficult to be accomplished by using the conventional image processing chain. In this project we have targeted alternative approaches, that exploit the inherent parallelism in the visual stimulus, and hierarchical multilevel optimization. A set of demos showcase the advances at sensor level, in adapted architectures for signal processing and in power management and energy harvesting.

On the characterization of light sources irradiation profiles with an HDR image sensor
J.A. Leñero-Bardallo, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2018
abstract     

We demonstrate how light emissions of very bright light sources can be rendered with an HDR image sensor with linear operation. We showcase the device usefulness to study transient variations of very high illumination levels and to determine the irradiance profile of light sources. The sensor can track transient illumination changes at video rates, preserving details of darker regions within the visual scene.

CMOS-SPAD camera prototype for single-sensor 2D/3D imaging
I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2018
abstract      pdf

One of the research lines explored in project 'iCaveats' has been the combined capture of 2D and 3D visual information. With the objective of power-efficient feature learning/extraction, combined 2D/3D imaging is a useful tool to work on a lightweight but rich description of the scene. Single-sensor capture of both modalities is a potential improvement in cost and efficiency. In this demo, we present the performance and features of a CMOS-SPAD camera prototype that realizes photon counting and direct time-of-flight (d-ToF). The central elements of the camera module are a 64x64 SPAD imager and a FPGA board for real time histograming and image reconstruction at 1kfps.

On-The-Fly Deployment of Deep Neural Networks on Heterogeneous Hardware in a Low-Cost Smart Camera
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galan and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2018
abstract     

This demo showcases a low-cost smart camera where different hardware configurations can be selected to perform image recognition on deep neural networks. Both the hardware configuration and the network model can be changed any time on the fly. Up to 24 hardware-model combinations are possible, enabling dynamic reconfiguration according to prescribed application requirements.

Optimum Network/Framework Selection from High-Level Specifications in Embedded Deep Learning Vision Applications
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Advanced Concepts for Intelligent Vision Systems ACIVS 2018
abstract     

This paper benchmarks 16 combinations of popular Deep Neural Networks and Deep Learning frameworks on an embedded platform. A Figure of Merit based on high-level specifications is introduced. By sweeping the relative weight of accuracy, throughput and power consumption on global performance, we demonstrate that only a reduced set of the analyzed combinations must actually be considered for real deployment. We also report the optimum network/framework selection for all possible application scenarios defined in those terms, i.e. weighted balance of the aforementioned parameters. Our approach can be extended to other networks, frameworks and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of Deep Learning technology.

Arrayable Voltage-Controlled Ring-Oscillator for Direct Time-of-Flight Image Sensors
I. Vornicu, R. Carmona-Galán and Á. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2018
abstract     

Direct time-of-flight (d-ToF) estimation with high frame rate requires the incorporation of a time-todigital converter (TDC) at pixel level. A feasible approach to a compact implementation of the TDC is to use the multiple phases of a voltage-controlled ring-oscillator (VCRO) for the finest bits. The VCRO becomes central in determining the performance parameters of a d-ToF image sensor. In this paper we are covering the modeling, design and measurement of a CMOS pseudo-differential VCRO. The oscillation frequency, the jitter due to mismatches and noise and the power consumption are analytically evaluated. This design has been incorporated into a 64×64-pixel array. It has been fabricated in a 0.18μm standard CMOS technology. Occupation area is 28×29μm2 and power consumption is 1.17mW at 850MHz. The measured gain of the VCRO is of 477MHz/V with a frequency tuning range of 53%. Moreover, it features a linearity of 99.4% over a wide range of control frequencies, namely from 400MHz to 850MHz. The phase noise is of -102dBc/Hz at 2MHz offset frequency from 850MHz. The influence of these parameters in the performance of the TDC has been measured. The minimum time bin of the TDC is 147ps with a RMS DNL/ INL of 0.13/ 1.7LSB.

Live Demonstration: Low-Power Low-Cost Cyber-Physical System for Bird Monitoring
A. García-Rodríguez, R. Rodríguez-Sakamoto, J. Fernández-Berni, R. del Río, J. Marín, M. Baena, J. Bustamante, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2018
abstract      pdf

This live demonstration showcases a cyber-physical system tailored for inexpensive remote bird monitoring. A comprehensive analysis of the application requirements along with a tight system integration have given rise to a smart autonomous nest-box ready for deployment. This nest-box includes radiofrequency identification (RFID), a weighing scale, two temperature sensors, passive infrared devices (PIR), massive data storage and internet connection via mobile infrastructure. It is powered through a solar panel. The bill of materials has been diminished 77% with respect to the previous version of the nest-box whereas the power consumption has been reduced 84%.

Color Tone-Mapping Circuit for a Focal-Plane Implementation
G.M.S. Nunes, F.D.V.R. Oliveira, J.G. Gomes, A. Petraglia, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2018
abstract      pdf

In this article, we present a review of the driving principles and parameters of a previously reported focal-plane tone-mapping operator. We then extend it in order to include color information processing. The signal processing operations required for handling color images are white balance and demosaicing. Neither white balance nor demosaicing are carried out in the focal plane, in order to avoid increasing circuit size and complexity. Since, in this case, white balance is carried out after tone mapping, multiplication of red and blue channels by constant gains may lead to wrong color results. An alternative approach is proposed, in which different gains are assigned for every red and blue pixel of the matrix. Because of the introduction of color, a modification in the original circuit is proposed, which affects the integration time of red and blue pixels. This modification leads to a reduction in the number of photodiodes required in the pixel array, and hence to a reduction of the sensing circuit area. The results produced by the operator are compared to those obtained from two other digital tone-mapping operators.

Concurrent focal-plane generation of compressed samples from time-encoded pixel values
M. Trevisi, H.C. Bandala, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Design Automation and Test in Europe DATE 2018
abstract      pdf

Compressive sampling allows wrapping the relevant content of an image in a reduced set of data. It exploits the sparsity of natural images. This principle can be employed to deliver images over a network under a restricted data rate and still receive enough meaningful information. An efficient implementation of this principle lies in the generation of the compressed samples right at the imager. Otherwise, i. e. digitizing the complete image and then composing the compressed samples in the digital plane, the required memory and processing resources can seriously compromise the budget of an autonomous camera node. In this paper we present the design of a pixel architecture that encodes light intensity into time, followed by a global strategy to pseudo-randomly combine pixel values and generate, on-chip and on-line, the compressed samples.

Performance Analysis of Real-Time DNN Inference on Raspberry Pi
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · SPIE Real-Time Image and Video Processing 2018
abstract      pdf

Deep Neural Networks (DNNs) have emerged as the reference processing architecture for the implementation of multiple computer vision tasks. They achieve much higher accuracy than traditional algorithms based on shallow learning. However, it comes at the cost of a substantial increase of computational resources. This constitutes a challenge for embedded vision systems performing edge inference as opposed to cloud processing. In such a demanding scenario, several open-source frameworks have been developed, e.g. Ca e, OpenCV, TensorFlow, Theano, Torch or MXNet. All of these tools enable the deployment of various state-of-the-art DNN models for inference, though each one relies on particular optimization libraries and techniques resulting in di erent performance behavior. In this paper, we present a comparative study of some of these frameworks in terms of power consumption, throughput and precision for some of the most popular Convolutional Neural Networks (CNN) models. The benchmarking system is Raspberry Pi 3 Model B, a low-cost embedded platform with limited resources. We highlight the advantages and limitations associated with the practical use of the analyzed frameworks. Some guidelines are provided for suitable selection of a speci c tool according to prescribed application requirements.

Introduction to ACHIEVE: a European Training Network based on the Experience of EUNEVIS
R. Carmona-Galan
Conference · Workshop on the Architecture of Smart Cameras WASC 2017
abstract     

Abstract not avaliable

Gaussian Pyramid: Comparative Analysis of Hardware Architectures
F.D.V.R. Oliveira, J.G.R.C. Gomes, J. Fernandez-Berni, R. Carmona-Galan, R. del Rio and A. Rodriguez-Vazquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2017
abstract     

Abstract not avaliable

Compressed Sampling CMOS Imager based on Asynchronous Random Pixel Contributions
M. Trevisi, H.C. Bandala-Hernandez, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2017
abstract     

Abstract not avaliable

Characterization of Electrical Crosstalk in 4T-APS Arrays using TCAD Simulations
J.M. López-Martínez, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Conference on Ph.D Research in Microelectronics and Electronics PRIME 2017
abstract     

TCAD simulations have been conducted on a CMOS image sensor in order to characterize the electrical component of the crosstalk between pixels through the study of the electric field distribution. The image sensor consists on a linear array of five pinned photodiodes (PPD) with their transmission gates, floating diffusion and reset transistors. The effect of the variations of the thickness of the epitaxial layer has been addressed as well. In fact, the depth of the boundary of the epitaxial layer affects quantum efficiency (QE) so a correlation with crosstalk has been identified.

Design of a Compact and Low-Power TDC for an Array of SiPM´s in 110nm CIS Technology
F.N. Bandi, I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Conference on Ph.D Research in Microelectronics and Electronics PRIME 2017
abstract     

Silicon photomultipliers (SiPMs) are meant to substitute photomultiplier tubes in high-energy physics detectors and nuclear medicine. This is because of their -to name a few interesting Properties- compactness, lower bias voltage, tolerance to magnetic fields and finer spatial resolution. SiPMs can also be built in CMOS technology. This allows the incorporation of active quenching and recharge schemes at cell level and processing circuitry at pixel level. One of the elements that can lead to finer temporal resolutions is the time-to-digital converter (TDC). In this paper we describe the architecture of a compact TDC to be included at each pixel of an array of SiPMs. It is compact and consumes low power. It is based on a voltage controlled oscillator that generates multiple internal phases that are interpolated to provide time resolution below the time delay of a single gate. Simulation results of a 11b TDC based on a 4-stage VCRO in 110nm CIS technology yield a time resolution of 80.0ps, a DNL of ±0.28 LSB, a INL ±0.52 LSB, and a power consumption of 850μW.

Compressive Image Sensor Architecture with On-Chip Measurement Matrix Generation
M. Trevisi, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Conference on Ph.D Research in Microelectronics and Electronics PRIME 2017
abstract     

A CMOS image sensor architecture that uses a cellular automaton for the pseudo-random compressive sampling matrix generation is presented. The image sensor employs inpixel pulse-frequency modulation and column wise pulse counters to produce compressed samples. A common problem of compressive sampling applied to image sensors is that the size of a full-frame compressive strategy is too large to be stored in an on-chip memory. Since this matrix has to be transmitted to or from the reconstruction system its size would also prevent practical applications. A full-frame compressive strategy generated using a 1-D cellular automaton showing a class III behavior neither needs a storage memory nor needs to be continuously transmitted. In-pixel pulse frequency modulation and up-down counters allow the generation of differential compressed samples directly in the digital domain where it is easier to improve the required dynamic range. These solutions combined together improve the accuracy of the compressed samples thus improving the performance of any generic reconstruction algorithm.

TCAD Simulation of Electrical Crosstalk in 4T-Active Pixel Sensors
J.M. López-Martínez, R. Carmona-Galán, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2017
abstract     

CMOS image sensors (CIS) are widely used nowadays in consumer electronics as well as in high-end applications. This is mainly due to their advantages regarding low dark current and low noise characteristics of the pinned photodiode (PPD). Much effort has been put into better understanding key electrical properties of PPDs, like full well capacity, photodiode´s capacitance or pinning voltage. Another important source of sensitivity degradation is crosstalk (CTK). It has been assessed for CCDs and some CMOS devices. However, addressing CTK in CMOS 4T-APS pixels at the design phase is not easy, mainly due to the unavailability of CIS technology parameters.an additional problem is the computational cost of TCAD simulation; e.g., a five pixel linear array like the one shown in Fig. 1, already introduce long periods of computing due to the complexity of the structure. Crosstalk occurs when the charge generated by photon incident on a pixel are finally sensed by a neighboring pixel. CTK degrades performance, cutting down spatial resolution, reducing the overall sensitivity, degrading color separation, and increasing image noise. Crosstalk is defined as the percentage of the total charge generated by incident light that is diverted to non-illuminated pixels in the neighborhood. There are two components in CTK. Optical crosstalk is related to illumination, reflection, refraction and scattering of photons in the different layers of the material that cover the photodiode. This generates stray photons that are absorbed in the neighborhood. The second component is electrical, and it involves the diffusion of photo-generated carriers between adjacent devices. The characterization of electrical CTK in 4T-APS can be achieved using TCAD tools. Particularly, the relation between CKT and quantum efficiency (QE) can be explored and linked to the thickness of the epitaxial layer.

On the design of sun sensors with event-based operation
J.A. Leñero-Bardallo, L. Farian, J.M. Guerrero-Rodríguez, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2017
abstract     

Abstract not avaliable

A sun sensor implemented with an asynchronous luminance vision sensor
J.A. Leñero-Bardallo, L. Farian, J.M. Guerrero-Rodríguez, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · European Solid-State Circuits Conference ESSCIRC 2017
abstract     

A sun sensor implemented with a spiking pixel matrix is reported. It is the very first sun sensor based on an asynchronous event-based pixel array. A paradigm associted to classic digital sun sensors is solved with this approach. Only the pixels illuminated by the sun light are readout. Hence, the output data flow is quite reduced. The computational load to resolve the sun position is quite low, comparing to prior sensors. Sensor's latency is in the order of milliseconds. The advantages over implementations with APS pixels are more reduced data flow, less latency, and higher dynamic range.

SPAD-Sensor Camera Prototype for 2D/3D Imaging
I. Vornicu, R. Carmona-Galán and Á. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2017
abstract     

This demo presents a camera prototype based on a SPAD (single-photon avalanche diode) image sensor for 2D/3D scene reconstruction. SPADs are intrinsically binary devices that fire at the detection of a single photon entering the capture volume. They can be employed in photon counting mode, which provides a pixel output that is proportional to the brightness of the corresponding object in the scene. They can be employed also in combination with a time-to-digital converter (TDC), in order to provide a timestamp from which the time-of-flight (ToF) of the light reflected by the objects can be inferred, and thus their distance to the objective. In the first case, illumination does not have a structure. In the second case, a pulsed laser with picosecond jitter is required to ensure the appropriate accuracy in the estimation of the distances. The prototype camera presented here employs a 64×64 SPAD array with active quenching and recharge and in-pixel TDC, allowing high frame rate acquisition. Highly efficient circuit design techniques are employed to ensure image capturing under a high level of uncorrelated noise such as dark count and background illumination. It has a depth resolution of 1cm at 6-0.1nW/mm2 illumination power.

VCRO-based TDCs in submicron CIS technology
F.N. Bandi, I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2017
abstract     

Time-to-Digital Converters (TDCs) based on Voltage Controlled Ring Oscillators (VCROs) provides a good trade off between area occupation, time resolution and power consumption. These specifications are determined by applications like nuclear medicine and high energy physics imaging, in which an accurate timestamp of the detected photons is needed and small area footprint to maximize fill factor is desired. This is specially true when the number of incident photons is low and oversampling is impossible. Other TDC architectures like pulse stretching, Vernier delay lines, time amplification or multi-path gated ring oscillator are able to provide finer time resolution at the price of higher area occupation and power consumption. If in sensor intregation is desired, these area and power increments are prohibitive. VCROs provides a large number of alternatives during the design phase, each one with their advantages and disadvantages. The first step is the selection of the stage topology, that is, single-ended, differential and pseudo-differential. In this application, pseudo-differential stages outperforms the other alternatives in terms of lower power consumption, lower jitter and better noise rejection. The second step consist in the selection of a pseudo-differential stage using a common metric. To this end, the two most used pseudo-differential stages were compared in terms of time resolution, by using the small signal model and the GBW product. Analytical expression points out that pseudo-differential stage with cross-coupled inverters have finer time resolution than pseudo-differential stage with cross-coupled PMOS. Pre-layout simulations support the analytical expression and shows a clear difference between the time resolution of each stage.

Pipeline AER Arbitration with Event Aging
J.A. Leñero-Bardallo, F. Pérez-Peña, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2017
abstract     

We present a simple circuit to handle communication between cells of neuromorphic arrays. It allows cells to operate continuously without waiting for acknowledgement signals back from the AER (Address Event Representation) arbitration circuitry. The module also implements aging of cell petitions i.e., old petitions to access to the AER bus are automatically discarded to give priority to the more recent ones and alleviate the bus congestion. The new arbitration scheme has been implemented and tested. A particular application scenario with an image sensor with spiking pixels that sense light continuously is explained. Experimental data obtained with real visual scenes are provided.

Live Demonstration: Photon Counting and Direct ToF Camera Prototype Based on CMOS SPADs
I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2017
abstract      pdf

This demonstrator reveals the performance and features of a single photon avalanche diode (SPAD) camera prototype. It is aimed to 2D/3D vision by photon counting and direct time-of-flight (d-ToF), respectively. The imager is built on a standard CMOS technology without any opto flavor or high voltage option. The camera module consists of a 64×64 SPAD imager and a FPGA board for real time image reconstruction at 1kfps.

Photon Counting and Direct ToF Camera Prototype based on CMOS SPADs
I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2017
abstract      pdf

This paper presents a camera prototype for 2D/3D image capture in low illumination conditions based on single-photon avalanche-diode (SPAD) image sensor for direct time-of-flight (d-ToF). The imager is a 64×64 array with in-pixel TDC for high frame rate acquisition. Circuit design techniques are combined to ensure successful 3D image capturing under low sensitivity conditions and high level of uncorrelated noise such as dark count and background illumination. Among them an innovative time gated front-end for the SPAD detector, a reverse start-stop scheme and real-time image reconstruction at 1kfps are incorporated by the imager. To the best of our knowledge, this is the first ToF camera based on a SPAD sensor fabricated and proved for 3D image reconstruction in a standard CMOS process without any opto-flavor or high voltage option. It has a depth resolution of 1cm at an illumination power from less than 6nW/mm2 down to 0.1nW/mm2.

In the quest of vision-sensors-on-chip: Pre-processing sensors for data reduction
A. Rodríguez-Vázquez, R. Carmona-Galán, J. Fernández-Berni, V. Brea, J.A. Leñero-Bardallo
Conference · IS&T International Symposium on Electronic Imaging 2017
abstract     

This paper shows that the implementation of vision systems benefits from the usage of sensing front-end chips with embedded pre-processing capabilities -called CVIS. Such embedded pre-processors reduce the number of data to be delivered for ulterior processing. This strategy, which is also adopted by natural vision systems, relaxes system-level requirements regarding data storage and communications and enables highly compact and fast vision systems. The paper includes several proof-o-concept CVIS chips with embedded pre-processing and illustrate their potential advantages.

A Compressive Domain Saliency-Based Adaptive Measurement Method for Image Recovery
H. Li, M. Trocan, R. Carmona-Galán and M. Trevisi
Conference · IEEE International Conference on Electronics Circuits and Systems ICECS 2016
abstract     

This paper proposes an image compressed sensing method by adaptive measurement matrix based on saliency detection in the compressive domain. The saliency mapping algorithm is simply the difference between adjacent compressive measurements of neighboring image patches. The adjustment algorithm of subsampling rate is a multi - group approximation by sorting and clustering. It is verified by simulation experiments that the proposed method has higher reconstruction quality than the classical methods with fixed measurement matrices.

Demo: Image Sensing Scheme Enabling Fully-Programmable Light Adaptation and Tone Mapping with a Single Exposure
J. Fernández-Berni, F.D.V.R. Oliveira, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2016
abstract     

This demo showcases a High Dynamic Range (HDR) technique recently reported. We demonstrate that two intertwined photodiodes per pixel can perform tone mapping under unconstrained illumination conditions with a single exposure. The proposed technique has been implemented on a prototype smart image sensor achieving a dynamic range of 102dB. It opens the door to the realization of smart cameras and vision sensors capable of rendering HDR images free of artifacts without requiring any digital post-processing at all.

Non-Recursive Method for Motion Detection from a Compressive-Sampled Video Stream
M. Trevisi, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2016
abstract     

This presentation introduces a non-recursive algorithm for motion detection directly from the analysis of compressed samples. The objective of this research is to create an algorithm able to detect, in real-time, the presence of moving objects over a fixed background from a compressive-sampled greyscale video stream. Many difficulties arise using this type of algorithm because it violates the fundamental principles of compressive sensing reconstruction that lie beneath traditional recursive methods. Recursive reconstruction methods even if accurate need large amounts of time and resources because they aim to retrieve all of the information contained within a scene. Our method is based on two key considerations. The first is that the targeted information of a moving element compared to a fixed background is really small. The second is an appropriate choice of a sub-Gaussian compressive sampling strategy. Our aim is to reduce the focus of general reconstruction in order to retrieve only objects of interest. This has resulted in a lightweight fast detection algorithm. It can be used to process compressed samples derived from a video stream with a speed of 100fps in order to detect the presence of moving objects over a fixed background. We have compared the performance of our algorithm with a highly efficient reconstruction algorithm, NESTA, to understand which benefits and limitations we are facing while trying to handle compressive-sampled information without recurring to standard techniques. While trying to target specific information within the compressed samples thus not following a conventional reconstruction technique may deliver worse reconstruction errors it is also true that it can benefit from faster processing times (Fig. 1) opening the possibility to new applications of CS.

Image dynamic range extension by using stacked (unmatched) photodiodes in CMOS
R. Carmona-Galán, J. A. Leñero-Bardallo, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2016
abstract     

Capturing images containing unevenly illuminated areas within the same frame is very useful in application fields like surveillance, assisted driving, intelligent transportation, or industrial applications with high intra-scene contrast. Without the appropriate dynamic range to allocate these diverse illumination values, obtaining a detailed view of the brightest zones can easily obscure other elements in the scene. In order to increase the image dynamic range within the same frame, different techniques have been developed: using a sensor with a companding scheme, providing the means to avoid saturation, or employing multiple image captures. The problem with multiple captures is that uncorrelation between the different integration times can generate inexistent edges and distort the interpretation of the scene. In order to realize multiple captures in parallel, we need to be simultaneously sensitive to different illumination ranges. CMOS technology offers a variety of devices to capture light in the visible and near infrared range. If a deep-n-well is available, these structures can be stacked so spatial alignment is obtained by construction (Fig. 1a). The conversion gain of the different photodiodes is defined by their capacitance per unit area (Fig. 1b); therefore each of them will render a different voltage for the same light intensity. This discrepancy in the response can be exploited to extract information from different illumination ranges simultaneously. In this way, light can be sensed in parallel with different conversion gains and the resulting output voltages can then be digitized and combined into a single digital word with a larger number of bits. This mechanism for dynamic range extension does not depend on the difference of exposure times, so artifacts related with unmatched dynamics in the sensor and the scene can be avoided.

Non-Recursive Method for Motion Detection from a Compressive-Sampled Video Stream
M. Trevisi, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Conference on Ph.D Research in Microelectronics and Electronics PRIME 2016
abstract     

This paper introduces a non-recursive algorithm for motion detection directly from the analysis of compressed samples. The objective of this research is to create an algorithm able to detect, in real-time, the presence of moving objects over a fixed background from a compressive-sampled greyscale video stream. Many difficulties arise using this type of algorithm because it violates the fundamental principles of compressive sensing reconstruction that lie beneath traditional recursive methods. Recursive reconstruction methods even if accurate need large amounts of time and resources because they aim to retrieve all of the information contained within a scene. Our method is based on two key considerations. The first is that the targeted information of a moving element compared to a fixed background is really small. The second is an appropriate choice of a sub-Gaussian compressive sampling strategy. Our aim is to reduce the focus of general reconstruction in order to retrieve only objects of interest. This algorithm can be used to process compressed samples derived from a video stream with a speed of 100fps. This makes possible to detect the presence of moving objects directly from compressed samples with limited resources.

Demo: HDR image sensor with linear response and asynchronous detection of saturation
J.A. Leñero-Bardallo, R. Carmona-Galan and A. Rodriguez-Vazquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2016
abstract     

Abstract not avaliable

Pixel-wise parameter adaptation for single-exposure extension of the image dynamic range
R. Carmona-Galán, J.A. Leñero-Bardallo, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2016
abstract      pdf

High dynamic range imaging is central in application fields like surveillance, intelligent transportation and advanced driving assistance systems. In some scenarios, methods for dynamic range extension based on multiple captures have shown limitations in apprehending the dynamics of the scene. Artifacts appear that can put at risk the correct segmentation of objects in the image. We have developed several techniques for the on-chip implementation of single-exposure extension of the dynamic range. We work on the upper extreme of the range, i. e. administering the available full-well capacity. Parameters are adapted pixel-wise in order to accommodate a high intra-scene range of illuminations.

Experimental Evidence of Power Efficiency due to Architecture in Cellular Processor Array Chips
R. Carmona-Galán, J. Fernández Berni and A. Rodríguez-Vázquez
Conference · International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2016
abstract      pdf

Speeding up algorithm execution can be achieved by increasing the number of processing cores working in parallel. Of course, this speedup is limited by the degree to which the algorithm can be parallelized. Equivalently, by lowering the operating frequency of the elementary processors, the algorithm can be realized in the same amount of time but with measurable power savings. An additional result of parallelization is that using a larger number of processors results in a more efficient implementation in terms of GOPS/W. We have found experimental evidence for this in the study of massively parallel array processors, mainly dedicated to image processing. Their distributed architecture reduces the energy overhead dedicated to data handling, thus resulting in a power efficient implementation.

In-Pixel Voltage-Controlled Ring-Oscillator for Phase Interpolation in ToF Image Sensors
I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems, ISCAS 2016
abstract      pdf

The design and measurements of a CMOS pseudo-differential voltage-controlled ring-oscillator (VCRO) are presented. It is aimed to act as time interpolator for arrayable picosecond time-to-digital convertors (TDC). This design is incorporated into a 64×64 array of TDCs for time-of-flight (ToF) measurement. It has been fabricated in a 0.18µm standard CMOS technology. Small occupation area of 28×29μm2 and low average power consumption of 1.17mW at 850MHz are promising figures for this application field. Embedded phase alignment and instantaneous start-up time are required to minimize the offset of time interval measurements. The measured gain of the VCRO is of 477MHz/V with a frequency tuning range of 53%. Moreover it features a linearity of 99.4% over a wide range of control frequencies, namely from 400MHz to 850MHz. The phase noise is of 102dBc/Hz at 2MHz offset frequency from 850MHz.

Hardware-Aware Performance Evaluation for the Co-Design of Image Sensors and Vision Algorithms
C. Villegas-Pachón, R. Carmona-Galán, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference · Int. Conf. on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design SMACD 2016
abstract      pdf

The top-down approach to system design allows obtaining separate specifications for each subsystem. In the case of vision systems, this means propagating system-level specifications down to particular specifications for e. g. the image sensor, the image processor, etc. This permits to adopt different design strategies for each one of them, as long as they meet their own specifications. This approach can lead to over-design, which is not always affordable. Conversely, if higher-level specifications are too tight, they can lead to impossible specifications at the lower levels. This is certainly the case for embedded vision systems in which high-performance needs to be paired with a very restricted power budget. In order to explore alternative architectures, we need tools that allow for simultaneous optimization of different blocks. However, the link between low-level non-idealities and high-level performance is missing. CAD tools for the design and verification of analog and mixed-signal integrated circuits are not well suited for the simulation of higher-level functionalities. Our approach is to extract relevant data from circuit-level simulation and to build an OpenCV model to be employed in the design of the algorithm. The utility of this approach is illustrated by the evaluation of the effect of column-wise and pixel-wise FPN at the sensor on the performance of Viola-Jones face detection.

Live Demonstration: Single-Exposure HDR Image Acquisition Based on Tunable Balance Between Local and Global Adaptation
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems, ISCAS 2016
abstract      pdf

This live demonstration showcases a high dynamic range technique that compresses wide ranges of illuminations into the available signal range with a single exposure. In order to accomplish such compression, concurrent sensing-processing takes place at the focal plane, weighing the influence of local and global illumination on each pixel response during the image capture. This process is driven by an on-line analysis of the image histogram that also enables the dynamic accommodation of changing illumination conditions. The proposed technique has been implemented on a prototype smart image sensor achieving a dynamic range of 102dB.

Focal-plane scale space generation with a 6T pixel architecture
F. Oliveira, J.G. Gomes, R. Carmona-Galán, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference · IS&T International Symposium on Electronic Imaging 2016
abstract      pdf

Fill factor and focal-plane implementation of instrumental image processing steps, we propose a simple modification in a standard pixel architecture in order to allow for charge redistribution among neighboring pixels. As a result, averaging operations may be performed at the focal plane, and image smoothing based on Gaussian filtering may thus be implemented. By averaging neighboring pixel values, it is also possible to generate intermediate data structures that are required for the computation of Haar-like features. To show that the proposed hardware is suitable for computer vision applications, we present a systemlevel comparison in which the scale-invariant feature transform (SIFT) algorithm is executed twice: first, on data obtained with a classical Gaussian filtering approach, and then on data generated from the proposed approach. Preliminary schematic and extracted layout pixel simulations are also presented.

A high dynamic range linear vision sensor with event asynchronous and frame-based synchronous operation
J.A. Leñero-Bardallo, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IS&T International Symposium on Electronic Imaging 2016
abstract     

We present a novel High-Dynamic-Range (HDR) image sensor with linear output. Photogenerated charge is continuously integrated at every pixel without saturating. Each time the photodiode voltage reaches a programmable threshold, the pixel resets and starts over integrating charge again. With an eventbased approach, it is possible to count the number of times (if any) that a pixel has saturated during exposure. Pixel illumination is represented with a 20-bit word. The most significant 12b represent the number of times that a pixel has saturated during exposure. The least significant 8b are the result of an analog-todigital conversion in the end of exposure. Thus, pixels provide linear outputs proportional to light intensity. A dynamic range of 120dB is expected. The maximum dynamic range that can be measured is limited by the maximum event rate that the chip peripheral circuitry can handle and by the space dedicated on memory to store the event information. Pixel pitch is 25μm. A prototype sensor with 128 x 96 pixels has been implemented in the AMS 180nm CMOS-HV technology. In this article, the pixel operation will be explained. Preliminary experimental results and snapshots will be also provided.

High-Level Performance Evaluation of Object Detection Based on Massively Parallel Focal-Plane Acceleration Requiring Minimum Pixel Area Overhead
E. Parra-Barrero, J. Fernández-Berni, F.D.V.R. Oliveira, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Conference on Computer Vision Theory and Applications VISAPP 2016
abstract      pdf

Smart CMOS image sensors can leverage the inherent data-level parallelism and regular computational flow of early vision by incorporating elementary processors at pixel level. However, it comes at the cost of extra area having a strong impact on the sensor sensitivity, resolution and image quality. In this scenario, the fundamental challenge is to devise new strategies capable of boosting the performance of the targeted vision pipeline while minimally affecting the sensing function itself. Such strategies must also feature enough flexibility to accommodate particular application requirements. From these high-level specifications, we propose a focal-plane processing architecture tailored to speed up object detection via the Viola-Jones algorithm. This architecture is supported by only two extra transistors per pixel and simple peripheral digital circuitry that jointly make up a massively parallel reconfigurable processing lattice. A performance evaluation of the proposed scheme in terms of accuracy and acceleration for face detection is reported.

Efficient (Low-Level) Feature Extraction in CMOS Vision Sensors
R. Carmona-Galán
Conference · Advancements in Circuits and Imaging, 2015
abstract     

Abstract not avaliable

On the Design of a Sparsifying Dictionary for Compressive Image Feature Extraction
M. Trevisi, R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · IEEE International Conference on Electronics Circuits and Systems ICECS 2015
abstract     

Although there are some works reported on feature extraction from compressed samples, none of them consider the implementation of the feature extractor as a part of the sensor itself. Our approach is to introduce a sparsifying dictionary, feasibly implementable at the focal plane, which describes the image in terms of features. This allows a standard reconstruction algorithm to directly recover the interesting image features, discarding the irrelevant information. In order to validate the approach, we have integrated a Harris-Stephens corner detector into the compressive sampling process. We have evaluated the accuracy of the reconstructed corners compared to applying the detector to a reconstructed image.

CMOS Image Sensor Architecture for Focal Plane Early Vision Processing
F.D.V.R. de Oliveira, J.G.R.C. Gomes, R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2015
abstract     

This paper presents a pixel architecture that aims at validating the idea that with a small change in the pixel it is possible to perform important image processing computations at the focal-plane without significantly affecting the fill factor. An overview of two algorithms that may benefit from this new pixel architecture is given, namely the SIFT for object recognition and the Viola-Jones for face detection. A brief discussion of the limitations of the computations performed inside the pixel matrix and the future work is also presented.

Hardware-oriented feature extraction based on compressive sensing
M. Trevisi, R. Carmona-Galán and Á. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2015
abstract     

Feature extraction is used to reduce the amount of resources required to describe a large set of data. A given feature can be represented by a matrix having the same size as the original image but having relevant values only in some specific points. We can consider this sets as being sparse. Under this premise many algorithms have been generated to extract features from compressive samples. None of them though is easily described in hardware. We try to bridge the gap between compressive sensing and hardware design by presenting a sparsifying dictionary that allows compressive sensing reconstruction algorithms to recover features. The idea is to use this work as a starting point to the design of a smart imager capable of compressive feature extraction. To prove this concept we have devised a simulation by using the Harris corner detection and applied a standard reconstruction method, the Nesta algorithm, to retrieve corners instead of a full image.

Compressive Feature Extraction
M. Trevisi, R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2015
abstract     

Compressive sensing (CS) provides an alternative to Nyquist-Shannon sampling when the signal to acquire is known to be sparse or compressible. A sparse signal has a small number of nonzero components compared to its total length. This property can either exist in the sampling domain of the signal or with respect to other basis. Representing a signal in a transform basis involves the choice of a dictionary, a set of elementary signals, used to decompose the signal. When performing analysis of complex data one of the major problems stems from the number of variables involved. Feature extraction is used to reduce the amount of resources required to describe a large set of data. A given feature is often represented by a set of parameters to be evaluated. This set has relevant values only in correspondence of said features. We can consider sets derived this way as being sparse. This sparked the idea to merge a feature extraction algorithm with the compressive sensing theory. To do so we try to adapt one of the most practical CS reconstruction algorithms, the Nesterov algorithm applied to CS (NESTA) to extract the features of one of the simplest corner detection algorithms, the Harris and Stephens algorithm. Our aim is to compare the performance of the combined Harris-NESTA algorithm over the application of a Harris algorithm on a NESTA reconstructed image and to do so we devised a test that takes into account four different parameters.

Assessment of circuit non-idealities' effect on algorithm performance via OpenCV modeling
C. Villegas-Pachón, R. Carmona-Galán and J. Fernández-Berni
Conference · Workshop on the Architecture of Smart Cameras WASC 2015
abstract     

CAD tools for the design and verification of analog and mixed-signal integrated circuits are not well suited for the simulation of higher-level functionalities. The evaluation of the effect of circuit non-idealities on the algorithm performance is precluded for the image sensor design team. For the conventional model in computer vision, in which image capture and processing are completely separated tasks, this does not represent a problem. Chip designers will work for a particular set of specifications, i. e. spatial and temporal resolution, power consumption, etc. At the other end, computer scientists will take care of the algorithm once they receive their pictures fitting to the prescribed specifications. The result of this mindset is an architecture that is theoretically universal, although may not be capable of solving every problem when timing and power requirements are taken into consideration. In those applications fields in which smart camera chips can help overcoming these limitations, a different approach needs to be taken in order to come out with optimal solutions. Let us point out here that, in smart camera architectures, computational efficiency is generally provided by appropriate partition of algorithm tasks, parallelization of heavy loads and using distributed and close-to-sensor resources. Sometimes these actions will require the design of specific circuit blocks and ad-hoc image sensing strategies. All of this needs to be worked out at transistor level, but at the same time, their effect in the overall performance of the algorithm needs to be quickly and accurately evaluated in order to guide the design flow. Our proposal is to make use of the flexibility and versatility of an environment like OpenCV to incorporate hardware non-idealities to the evaluation of the algorithm performance. One of the major attractions of this approach is that computer vision experts will be able to consider lower-level deviations when designing and fine-tuning their vision algorithms without having to develop any expertise in chip design and IC CAD tools.

A high dynamic range image sensor with linear response based on asynchronous event detection
J.A. Leñero-Bardallo, R. Carmona-Galán and Á. Rodríguez-Vázquez
Conference · European Conference on Circuit Theory and Design ECCTD 2015
abstract      pdf

This paper investigates the potential of an image sensor that combines event-based asynchronous outputs with conventional integration of photocurrents. Pixels voltages can be read out following a traditional approach with a source follower and analog-to-digital converter. Furthermore, pixels have circuitry to implement Pulse Density Modulation (PDM) sending out pulses with a frequency that is proportional to the photocurrent. Both read-out approaches operate simultaneously. Their information is combined to render high dynamic range images. In this paper, we explain the new vision sensor concept and we develop a theoretical analysis of the expected performance in standard AMS 0.18 µm HV technology. Moreover, we provide a description of the vision sensor architecture and its main blocks.

Live demonstration: Gaussian pyramid extraction with a CMOS vision sensor
M. Suarez, V.M. Brea, J. Fernandez-Berni, R. Carmona-Galan, D. Cabello and A. Rodriguez-Vazquez
Conference · IEEE International Symposium on Circuits and Systems, ISCAS 2015
abstract     

This live demonstration is related to ISCAS track 'Imagers and Vision Processing'. It showcases the Gaussian pyramid with a CMOS vision sensor with a 176 × 120 pixel array in standard 0.18 μm CMOS technology. The sensing elements are 3T-APS with in-pixel ADC and CDS. The Gaussian pyramid is extracted concurrently with a double-Euler switched-capacitor network on the same substrate, giving RMSE errors below 1.2% of FSO. The chip provides a Gaussian pyramid of 3 octaves with 6 scales each with an energy cost of 26.5 nJ/px at 2.64 Mpx/s.

Automatic DR and spatial sampling rate adaptation for secure and privacy-aware ROI tracking based on focal-plane image processing
R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · International Image Sensor Workshop IISW 2015
abstract     

Embedded camera systems for the consumer mobile and wearable application market need to operate in a tight power budget. They need to cope with a vast range of illumination conditions, and at the same time, they need to incorporate enough intelligence to implement security and privacy-protection directives. The incorporation of image signal processing at the focal-plane can help reducing the necessary resources to implement tasks like DR adaptation and privacy-aware ROI tracking. In this paper we present a vision sensor that is able to perform single-exposure HDR imaging and ROI obfuscation on-chip, with the help of a reduced set of focal-plane processing elements.

On the Calibration of a SPAD-Based 3D Imager with in-Pixel TDC Using a Time-Gated Technique
I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2015
abstract     

The optical characterization of a CMOS 64×64 single-photon avalanche-diode (SPAD) array with in-pixel 11b time-to-digital converter (TDC) is presented. The overall full-width half-maximum (FWHM) of the detector ensemble SPAD plus TDC is 690ps. The sensor has been fabricated in a 0.18μm standard CMOS technology which features an average dark-count rate (DCR) of 42Khz at 1V excess voltage (Ve) and room temperature. The detector successfully uses its time-gating capability to mitigate this large amount of noise enabling the sensor for accurate time-of-flight (ToF) measurements. The effectiveness of the time-gating technique is experimentally demonstrated. According to measurements, a time window of 400ns is enough to ensure that the TDC is triggered by light rather than by spurious events.

A SPAD-based 3D imager with in-pixel TDC for 145ps-accuracy ToF measurement
I. Vornicu, R. Carmona-Galán and Á. Rodríguez-Vázquez
Conference · IS&T International Symposium on Electronic Imaging 2015
abstract      pdf

The design and measurements of a CMOS 64×64 Single-Photon Avalanche-Diode (SPAD) array with in-pixel Time-to-Digital Converter (TDC) are presented. This paper thoroughly describes the imager at architectural and circuit level with particular emphasis on the characterization of the SPAD-detector ensemble. It is aimed to 2D imaging and 3D image reconstruction in low light environments. It has been fabricated in a standard 0.18μm CMOS process, i. e. without high voltage or low noise features. In these circumstances, we are facing a high number of dark counts and low photon detection efficiency. Several techniques have been applied to ensure proper functionality, namely: i) time-gated SPAD front-end with fast active-quenching/recharge circuit featuring tunable dead-time, ii) reversed start-stop scheme, iii) programmable time resolution of the TDC based on a novel pseudo-differential voltage controlled ring oscillator with fast start-up, iv) a global calibration scheme against temperature and process variation. Measurements results of individual SPAD-TDC ensemble jitter, array uniformity and time resolution programmability are also provided.

Real-time single-exposure ROI-driven HDR adaptation based on focal-plane reconfiguration
J. Fernández-Berni, R. Carmona-Galán, R. del Río, R. Kleihorst, W. Philips and Á. Rodríguez-Vázquez
Conference · SPIE Real-Time Image and Video Processing 2015
abstract      pdf

This paper describes a prototype smart imager capable of adjusting the photo-integration time of multiple regions of interest concurrently, automatically and asynchronously with a single exposure period. The operation is supported by two intertwined photo-diodes at pixel level and two digital registers at the periphery of the pixel matrix. These registers divide the focal-plane into independent regions within which automatic concurrent adjustment of the integration time takes place. At pixel level, one of the photo-diodes senses the pixel value itself whereas the other, in collaboration with its counterparts in a particular ROI, senses the mean illumination of that ROI. Additional circuitry interconnecting both photo-diodes enables the asynchronous adjustment of the integration time for each ROI according to this sensed illumination. The sensor can be reconfigured on-the-fly according to the requirements of a vision algorithm.

A CMOS 0.18μm 64x64 Single Photon Image Sensor with in-Pixel 11b Time-to-Digital Converter
I. Vornicu, R. Carmona and Á. Rodríguez-Vázquez
Conference · International Semiconductor Conference CAS 2014
abstract     

Abstract not avaliable

Live Demo: Real-time Focal-plane Face Obfuscation through Programmable Pixelation
J. Fernández-Berni, R. Carmona-Galán, R. del Río, J.A. Leñero-Bardallo, R. Kleihorsty, W. Philipsy and Á. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2014
abstract     

Privacy concerns are hindering the introduction of smart camera networks in application scenarios like retailing analytics, factories or elderly care. Indeed, there is usually no need of dealing with sensitive data when it comes to carrying out a meaningful visual analysis in these scenarios. Time spent by customers in front of a showcase, trajectories of workers around a manufacturing site or fall detection in a nursing home are three examples where video analytics can be performed without compromising privacy. But still the idea of networked cameras pervasively collecting data generates social rejection in the face of sensitive information being tampered by hackers or misused by legitimate users. New strategies must be developed in order to ensure privacy from the very point where sensitive data are generated: the sensors. Protection measures embedded on-chip at the front-end sensor of each network node significantly reduce the number of trusted system components as well as the impact of potential software flaws. In this demonstration, we present a full-custom QVGA vision sensor that can be reconfigured to implement programmable pixelation of image regions at the focal plane. According to the literature, pixelation provides the best performance in terms of balance between privacy protection and intelligibility of the surveyed scene.

Wide Range 8ps Incremental Resolution Time Interval Generator based on FPGA technology
I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Conference on Electronics, Circuits, and Systems ICECS 2014
abstract     

Accurate generation of picosecond-resolution wide-range time intervals has become a necessity for the characterization of time-to-digital converters involved in time resolved imaging. This paper presents the design and measurement of a time interval generator based on FPGA technology. Although it can be employed in different automatic test setups, it has been designed to characterize an array of timeto-digital converters. It can work as periodic pulse/ frequency generator but also as a digital-to-time converter. The accuracy of periodic pulse generator is around 20ps RMS jitter over a time range of 600ps to 33μs. The incremental time resolution is 8ps and the repetition rate is up to 2MHz.The accuracy of the digital-to-time converter is less than 0.8LSB DNL and 2LSB INL, whilst the time resolution is 27ps. Full characterization of the module is reported including a comparison with state-of-the-art instruments in this field.

Demo: A prototype vision sensor for real-time focal-plane obfuscation through tunable pixelation
J. Fernández-Berni, R. Carmona-Galán, R. del Río, R. Kleihorst, W. Philips and A. Rodríguez-Vázquez
Conference · IEEE/ACM Int. Conference on Distributed Smart Cameras ICDSC 2014
abstract      pdf

Privacy concerns are hindering the introduction of smart camera networks in prospective application scenarios like retail analytics, factory monitoring or elderly care. The idea of networked cameras pervasively collecting data generates social rejection in the face of sensitive information being tampered by hackers or misused by legitimate users. New strategies must be developed in order to ensure privacy from the very point where sensitive data are generated: the sensors. Protection measures embedded on-chip at the front-end sensor of each network node significantly reduce the number of trusted system components as well as the impact of potential software flaws. In this demonstration, we present a full-custom QVGA vision sensor that can be recongured to implement programmable pixelation of image regions at the focal plane. In particular, we show on-the-fly focal-plane face obfuscation supported by the Viola-Jones frontal face detector provided by OpenCV.

A 26.5 nJ/px 2.64Mpx/s CMOS vision sensor for gaussian pyramid extraction
M. Suárez-Cambre, V. Brea, J. Fernández-Berni, R. Carmona-Galán, D. Cabello and A. Rodríguez-Vázquez
Conference · European Solid-State Circuits Conference ESSCIRC 2014
abstract      pdf

This paper introduces a CMOS vision sensor to extract the Gaussian pyramid with an energy cost of 26.5 nJ/px at 2.64 Mpx/s, thus outperforming conventional solutions employing an imager and a separate digital processor. The chip, manufactured in a 0.18 μm CMOS technology, consists of an arrangement of 88×60 processing elements (PEs) which captures images of 176×120 resolution and performs concurrent parallel processing right at pixel level. The Gaussian pyramid is generated by using a switched-capacitor network. Every PE includes four photodiodes, four MiM capacitors, one 8-bit single-slope ADC and one CDS circuit, occupying 44x44 μm2 . Suitability of the chip is assessed by using metrics pertaining to visual tracking.

Fire detection with a frame-less vision sensor working in the NIR band
J.A. Leñero-Bardallo, J. Fernández-Berni, R. Carmona-Galán, P. Häfliger and Á. Rodríguez-Vázquez
Conference · International Conference on Forest Fire Research ICFFR 2014
abstract      doi      pdf

This paper draws the attention of the community about the capabilities of an emerging generation of bio-inspired vision sensors to be used in fire detection systems. Their principle of operation will be described. Moreover experimental results showing the performance of an event-based vision sensor will be provided. The sensor was intended to monitor flames activity without using optic filters. In this article, we will also extend this preliminary work and explore how its outputs can be processed to detect fire in the environment.

A QVGA Vision Sensor with Multi-functional Pixels for Focal-Plane Programmable Obfuscation
J. Fernández-Berni, R. Carmona Galán, R. del Río and Á. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2014
abstract      pdf

Privacy awareness constitutes a critical aspect for smart camera networks. An ideal awless protection of sensitive information would boost their application scenarios. However, it is still far from being achieved. Numerous challenges arise at diferent levels, from hardware security to subjective perception. Generally speaking, it can be stated that the closer to the image sensing device the protection measures take place, the higher the privacy and security attainable. Likewise, the integration of heterogeneous camera components becomes simpler since most of them will not require to consider privacy issues. The ultimate objective would be to incorporate complete protection directly into a smart image sensor in such a way that no sensitive data would be delivered off-chip while still permitting the targeted video analytics. This paper presents a 320x240-px prototype vision sensor embedding processing capabilities useful for accomplishing this objective. It is based on recongurable focal-plane sensing-processing that can provide programmable obfuscation. Pixelation of tunable granularity can be applied to multiple image regions in parallel. In addition to this functionality, the sensor exploits reconfigurability to implement other processing primitives, namely block-wise high dynamic range, integral image computation and Gaussian filtering. Its power consumption ranges from 42.6mW for high dynamic range operation to 55.2mW for integral image computation at 30fps. It has been fabricated in a standard 0.18μm CMOS process.

Towards an ultra-low-power low-cost wireless visual sensor node for fine-grain detection of forest fires
J. Fernández-Berni, R. Carmona-Galán, J.A. Leñero-Bardallo, R. Kleihorst and Á. Rodríguez-Vázquez
Conference · International Conference on Forest Fire Research ICFFR 2014
abstract      pdf

Advances in electronics, sensor technologies, embedded hardware and software are boosting the application scenarios of wireless sensor networks. Specifically, the incorporation of visual capabilities into the nodes means a milestone, and a challenge, in terms of the amount of information sensed and processed by these networks. The scarcity of resources -power, processing and memory- imposes strong restrictions on the vision hardware and algorithms suitable for implementation at the nodes. Both, hardware and algorithms must be adapted to the particular characteristics of the targeted application. This permits to achieve the required performance at lower energy and computational cost. We have followed this approach when addressing the detection of forest fires by means of wireless visual sensor networks. From the development of a smoke detection algorithm down to the design of a low-power smart imager, every step along the way has been influenced by the objective of reducing power consumption and computational resources as much as possible. Of course, reliability and robustness against false alarms have also been crucial requirements demanded by this specific application. All in all, we summarize in this paper our experience in this topic. In addition to a prototype vision system based on a full-custom smart imager, we also report results from a vision system based on ultra-low-power low-cost commercial imagers with a resolution of 30x30 pixels. Even for this small number of pixels, we have been able to detect smoke at around 100 meters away without false alarms. For such tiny images, smoke is simply a moving grey stain within a blurry scene, but it features a particular spatio-temporal dynamics. As described in the manuscript, the key point to succeed with so low resolution thus falls on the adequate encoding of that dynamics at algorithm level.

Gaussian pyramid extraction with a CMOS vision sensor
M. Suárez, V.M. Brea, J. Fernández-Berni, R. Carmona-Galán, D. Cabello and A. Rodríguez-Vázquez
Conference · International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2014
abstract      pdf

This paper addresses a CMOS vision sensor with 176x120 pixels in standard 0.18 μm CMOS technology that computes the Gaussian pyramid. The Gaussian pyramid is extracted with a double-Euler switched-capacitor network, giving RMSE errors below 1.2% of full-scale value. The chip provides a Gaussian pyramid of 3 octaves with 6 scales each with an energy cost of 26.5 nJ at 2.64 Mpx/s.

Comparative Analysis of Compressive Sensing Strategies for Smart Compressive Image Sensors
M. Trevisi, R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2014
abstract     

Compressive sensing (CS) first appeared eight years ago as a new kind of signal processing theory. Since then, only a few steps have been taken to turn this theory into feasible practice. We have identified two important gaps that stand behind the lag on practical compressive image sensors. The first one is that, technologically speaking, there is not yet any imager based on CS that is able to overrule, at least in resolution/decryption-speed ratio, the capabilities of current standard imagers. The second one is that none of the published reports on compressive image sensors mention how the compressive sensing strategy is passed from the sensor, which is delivering image samples, to the image reconstruction algorithm at the reception side. In order to capture compressed image samples, it is necessary that the imagers implement a compressive strategy, which has the form of a matrix that convolves the original signal. There are two sets of methods that are primarily implemented nowadays to build a compressive strategy, the first one is to pick each element from a random distribution, preferably Gaussian, and the second is to arrange in random order the rows of an incoherent orthobasis matrix, preferably Fourier matrix. The selection of the compressive sensing strategy has an incidence on the physical implementation, for instance it is easier to implement a binary mask on each pixel than to multiply its value by a real number. In order to analyze the effect of this selection in the reconstruction from the samples delivered by the compressive image sensor, we have simulated the process with a MATLAB test bench and compared the reconstruction times and the RMSE vs. the number of samples delivered of three different sensing strategies using a 64×64 image of Lena as test image and a Total Variation NESTA algorithm as reconstruction algorithm.

Parallel Processing Architectures and Power Efficiency in Smart Camera Chips
R. Carmona-Galán, J. Fernández-Berni, M. Trevisi and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2014
abstract     

Because of the massive amount of data, image and video processing represents a huge computational demand. Providing the necessary resources in embedded systems is not an easy task. Providing them on-chip needs a serious reconsideration of the processing architecture. In order to speed up processing, the number of processors/cores operating in parallel can be increased. This is an intuitive result, but there are two drawbacks. First, this speedup is limited by the degree to which the algorithm can be parallelized, what is known as Amdahl's law. Second, the more hardware operating at the same time the higher the power consumption. However, image processing and, in general, the processing visual information that keeps a retinotopic topology, affords an inherent parallelism that can be exploited to a great extent. Furthermore, there is an additional and less intuitive result of parallelization, which is that using a larger number of processors renders a more efficient implementation in terms of GOPS/W. Evidence of this can be found in massively parallel array processors. Their distributed architecture is adapted to the nature of the visual stimulus to the point that the amount of energy dedicated to data transmission and memory operations is largely reduced. The result is a power efficient implementation, perfectly suited for autonomous embedded vision systems working on a restricted power budget. This trend is observed in multi- and many-core processor chips, GPUs and different types of single-instruction multiple-data (SIMD) arrays containing from tens to hundreds of processing elements (PE). In the case of analog array processors, a similar relation is observed despite the fact that the disparity between design techniques, signal representation and the computation of the effective number of OPS advises against any sort of comparison.

Form Factor Improvement of Smart-Pixels for Vision Sensors through 3-D Vertically-Integrated Technologies
A. Rodríguez-Vázquez, R. Carmona-Galán, J. Fernández Berni, S. Vargas, J.A. Leñero, M. Suárez, V. Brea and B. Pérez-Verdú
Conference · IEEE Latin American Symposium on Circuits and Systems LASCAS 2014
abstract      pdf

While conventional CMOS active pixel sensors embed only the circuitry required for photo-detection, pixel addressing and voltage buffering, smart pixels incorporate also circuitry for data processing, data storage and control of data interchange. This additional circuitry enables data processing be realized concurrently with the acquisition of images which is instrumental to reduce the number of data needed to carry to information contained into images. This way, more efficient vision systems can be built at the cost of larger pixel pitch. Vertically-integrated 3D technologies enable to keep the advnatges of smart pixels while improving the form factor of smart pixels.

Smart imaging for power-efficient extraction of Viola-Jones local descriptors
J. Fernández-Berni, R. Carmona-Galán, R. del Río, J.A. Leñero-Bardallo, M. Suárez-Cambre and A. Rodríguez-Vázquez
Conference · IS&T International Symposium on Electronic Imaging 2014
abstract      pdf

In computer vision, local descriptors permit to summarize relevant visual cues through feature vectors. These vectors constitute inputs for trained classifiers which in turn enable diferent high-level vision tasks. While local descriptors certainly alleviate the computation load of subsequent processing stages by preventing them from handling raw images, they still have to deal with individual pixels. Feature vector extraction can thus become a major limitation for conventional embedded vision hardware. In this paper, we present a power-eficicient sensing-processing array conceived to provide the computation of integral images at diferent scales. These images are intermediate representations that speed up feature extraction. In particular, the mixed-signal array operation is tailored for extraction of Haar-like features. These features feed the cascade of classifiers at the core of the Viola-Jones framework. The processing lattice has been designed for the standard UMC 0.18μm 1P6M CMOS process. In addition to integral image computation, the array can be reprogrammed to deliver other early vision tasks: concurrent rectangular area sum, block-wise HDR imaging, Gaussian pyramids and image pre-warping for subsequent reduced kernel filtering.

A 176x120 Pixel CMOS Vision Chip for Gaussian Filtering with Massivelly Parallel CDS and A/D-Conversion
M. Suárez, V.M. Brea, D. Cabello, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · European Conference on Circuit Theory and Design ECCTD 2013
abstract      pdf

This paper conveys a proof-of-concept chip for Gaussian pyramid generation for image feature detectors. Gaussian filtering and image resizing are performed with a switched-capacitor (SC) network. The chip is conceived as the mapping of a CMOS-3D architecture for feature detectors onto a conventional technology, with some functionality removed, and the corresponding area overhead with respect to that of a CMOS-3D architecture, but preserving masivelly parallel Correlated Double Sampling (CDS) and A/D conversion. The chip has been fabricated on a die of 5×5 mm2 with 0.18 μm CMOS technology, achieving an array of 176×120 sensing elements (pixels). The pixels are arranged in Processing Elements (PEs). Every PE comprises four photodiodes, four SC nodes, one CDS circuit, and local circuitry for one ADC. Every PE occupies an area of 44×44 μm2. The chip senses an image and computes the Gaussian pyramid with an average power consumption lower than 75 nW/pixel at 30 frames/s.

An Ultra-Low-Power Voltage-Mode Asynchronous WTA-LTA Circuit
J. Fernández-Berni, R. Carmona-Galán and A. Rodríquez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2013
abstract      pdf

This paper presents an asynchronous mixed-signal WTA-LTA circuit conceived to carry out local minimum maximum indexing in massively parallel image processing arrays. The hardware is focused on energy-efficient operation. We describe a realization for the standard CMOS base process of a commercial 3-D TSV stack featuring a power consumption of only 20pW per elementary cell at 30fps. The proposed block is also capable of resolving small voltage differences without requiring any external reference. This leads to a hit percentage greater than 90% even when taking into account global process variations and mismatch conditions.

A CMOS 8×8 SPAD Array for Time-of-Flight Measurement and Light-Spot Statistics
I. Vornicu, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2013
abstract     

The design and simulation of a CMOS 8 x 8 single photon avalanche diode (SPAD) array is presented. The chip has been fabricated in a 0.18μm standard CMOS technology and implements a double functionality: measuring the Time-of-Flight with the help of a pulsed light source; or computing focal-plane statistics in biomedical imaging applications based on a concentrated light-spot. The incorporation of on-chip processing simplifies the interfacing of the array with the host system. The pixel pitch is 32μm, while the diameter of the quasi-circular active area of the SPADs is 12μm. The 113μm2 active area is surrounded by a T-well guard ring. The resulting breakdown voltage is 10V with a maximum excess voltage of 1.8V. The pixel incorporates a novel active quenching/reset circuit. The array has been designed to operate with a laser pulsed at 20Mhz. The overall time resolution is 115ps. Focal-plane statistics are obtained in digital format. The maximum throughput of the digital output buffers is 200Mbps.

Real-Time Remote Reporting of Motion Analysis with Wi-Flip
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Int. Workshop on Cellular Nanoscale Networks and their Applications CNNA 2012
abstract      pdf

This paper describes a real-time application programmed into Wi-FLIP, a wireless smart camera resulting from the integration of FLIP-Q, a prototype mixed-signal focal-plane array processor, and Imote2, a commercial WSN platform. The application consists in scanning the whole scene by sequentially analyzing small regions. Within each region, motion is detected by background subtraction. Subsequently, information related to that motion - intensity and location - is radio-propagated in order to remotely account for it. By aggregating this information along time, a motion map of the scene is built. This map permits to visualize the different activity patterns taking place. It also provides an elaborated representation of the scene for further remote analysis, preventing raw images from being transmitted. In particular, the scene inspected in this demo corresponds to vehicular traffic in a motorway. The remote representation progressively built enables the assessment of the traffic density.

Low-Power Vision Chips based on Focal-Plane Feature Extraction for Visually-Assisted Autonomous Navigation
R. Carmona-Galán
Conference · Workshop on Smart Cameras for robotic applications, IEEE/RSJ Int. Conf. on Intelligent Robots and Systems IROS 2012
abstract      pdf

Avoiding obstacles and finding the way around are tasks that can greatly benefit from an efficient implementation of vision. While higher level vision can be performed by conventional microprocessors at an acceptable rate, lower level vision represents a heavy computational load to deal with. The usual sensor plus ADC plus microprocessor scheme either fails to meet the timing requirements or fails to operate under a low power budget. Since the information contained in the visual stimulus is highly redundant, converting every single pixel value to digital prior to any processing is inefficient. Instead, we are working in adapted architectures in which the parallelism that is inherent to lower level vision tasks is largely exploited. This hierarchical approach emulates the organizational principles of biological vision systems, by using an array of elementary and relatively coarse processors to achieve global computation, and also the operation of the elementary cells, by using analog and mixed-signal processing building blocks. Our chips are capable of efficiently extracting image features and salient points at the focal plane in order to facilitate the task of identifying objects and interpreting the scene.

A CMOS-3D Reconfigurable Architecture with In-pixel Processing for Feature Detectors
M. Suárez, V.M. Brea, F. Pardo, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International 3D System Integration Conference 3DIC 2012
abstract      pdf

This paper introduces a two-tier CMOS-3D architecture for generation of Gaussian pyramids, detection of extrema, and calculation of spatial derivatives in an image. Such tasks are included in modern feature detectors, which in turn can be used for operations like object detection, image registration or tracking. The top tier of the architecture contains the image acquisition circuits in an array of 320 × 240 active photodiode sensors (APS) driving a smaller array of 160 × 120 analog processors for low-level image processing. The top tier comprises in-pixel Correlated Double Sampling (CDS), a switched-capacitor network for Gaussian pyramid generation, analog memories and a comparator for in-pixel Analog to Digital Converter (ADC). The reuse of circuits for different functions permits to have a small area for every pixel. The bottom tier of the architecture contains a frame buffer with a set of registers acting as a frame-buffer with a one-to-one correspondence with the analog processors in the top tier, the digital circuitry necessary for the extrema detection and the calculation of the first and second spatial derivatives in the image, as well as Harris and Hessian point detectors. For the time being, a behavioral model of the first tier including mismatch and feedthrough and charge injection errors is discussed. Also, a VHDL model for the bottom tier is addressed. The two-tier architecture is conceived for its implementation on the 130 nm CMOS-3D technology from Tezzaron. A companion chip will perform the higher-level operations as well as communications. In this technology an area of 300 μm2 per analog processor has been estimated. The architecture proposed for pyramid generation lets a frame rate of 180 frames/s for an ADC conversion time of 120 μs. The architecture has been proved with object detection for a given feature detector.

Design of a smart camera SoC in a 3D-IC technology
R. Carmona-Galán, J. Fernández-Berni, S. Vargas-Sierra, G. Liñán-Cembrano, A. Rodríguez-Vázquez, V. Brea-Sánchez, M. Suárez-Cambre and D. Cabello-Ferrer
Conference · Workshop on Architecture of Smart Camera, 2012
abstract      pdf

Conventional digital signal processing architectures introduce data bottlenecks and are inefficient when dealing with multidimensional sensory signals; Architectures adapted to the nature of the stimulus are more efficient in terms of power consumption per operation but¿;Concurrent sensing, processing and memory in planar technologies introduces serious limitations to image resolution and image size via the penalties in fill factor and pixel pitch; 3D integrated circuit technologies with a dense TSV distribution permits eliminating data bottlenecks without degrading image resolution and size.

Power-efficient focal-plane image representation for extraction of enriched Viola-Jones features
J. Fernández-Berni, L. Acasandrei, R. Carmona-Galán, A. Barriga-Barrios and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2012
abstract      pdf

This paper describes the use of a reconfigurable focal-plane processing array in order to achieve an image representation which dramatically reduces the computational load of the Viola-Jones object detection framework. Additionally, such representation provides richer information than the simple sum of pixels within rectangular regions originally defined in this framework. As a result, more elaborated features could be devised to speed up the execution of the subsequent attentional cascade, boosting thus the performance of the whole algorithm. The proposed circuitry has been successfully implemented in a CMOS prototype smart imager. Experimental results are given, demonstrating the suitability of the approach presented to efficiently deliver enriched Viola-Jones features.

In-pixel generation of gaussian pyramid images by block reusing in 3D-CMOS
M. Suárez, V.M. Brea, D. Cabello, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2012
abstract      pdf

This paper introduces an architecture of a switched-capacitor network for Gaussian pyramid generation. Gaussian pyramids are used in modern scale-and rotation-invariant feature detectors or in visual attention. Our switched-capacitor architecture is conceived within the framework of a CMOS-3D-based vision system. As such, it is also used during the acquisition phase to perform analog storage and Correlated Double Sampling (CDS). The paper addresses mismatch, and switching errors like feedthrough and charge injection. The paper also gives an estimate of the area occupied by each pixel on the 130nm CMOS-3D technology by Tezzaron. The validity of our proposal is assessed through object detection in a scale-and rotation-invariant feature detector.

Switched-capacitor networks for scale-space generation
F. Pozas-Flores, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · European Conference on Circuit Theory and Design ECCTD 2011
abstract      pdf

In scale-space filtering signals are represented at several scales, each conveying different details of the original signal. Every new scale is the result of a smoothing operator on a former scale. In image processing, scale-space filtering is widely used in feature extractors as the Scale-Invariant Feature Transform (SIFT) algorithm. RC networks are posed as valid scale-space generators in focal-plane processing. Switched-capacitor networks are another alternative, as different topologies and switching rate offer a great flexibility. This work examines the parallel and the bilinear implementations as two different switched-capacitor network topologies for scale-space filtering. The paper assesses the validity of both topologies as scale-space generators in focal-plane processing through object detection with the SIFT algorithm.

Image filtering by reduced kernels exploiting kernel structure and focal-plane averaging
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · European Conference on Circuit Theory and Design ECCTD 2011
abstract      pdf

Incorporating multi-resolution capabilities into imagers renders additional power saving mechanisms in the subsequent image processing. In this paper, we show how, by exploiting a certain mask structure, 3 × 3 kernels can be reduced to 2 × 2 kernels if charge redistribution is provided at the focal plane of the imaging device. More precisely, by averaging and shifting a half-resolution pixel grid, we will have a pre-processed image, subsampled by a factor of 2 on each dimension, that can be filtered with a mask of a reduced size. Very useful image filtering kernels, like a 3 × 3 Gaussian kernel for image smoothing, or the well-known Sobel operators, fall into this category of reducible kernels. Operating onto the pre-processed image with one of these reduced kernels represents a smaller number of operations per pixel than realizing all the multiply-accumulate operations needed to apply a 3 × 3 kernel. Memory accesses are reduced in the same fraction. Concerning the difficulties of providing this pre-processed image representation, we propose a methodology for obtaining it at a very low power cost. It requires the implementation of user definable image subdivision and subsampling at the focal plane. Experimental results are given, obtained from measurements on a CMOS imager prototype chip incorporating these multi-resolution capabilities. © 2011 IEEE.

Demo: Real-time remote reporting of active regions with Wi-FLIP
J. Fernández-Berni, R. Carmona-Galán, G. Liñán-Cembrano, A. Zarándy and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2011
abstract      pdf

This paper describes a real-time application programmed into Wi-FLIP, a wireless smart camera resulting from the integration of FLIP-Q, a focal-plane low-power image processor, and Imote2, a commercial WSN platform. The application, though simple, shows the potentiality of the reduced scene representations achievable at FLIP-Q to speed up the processing. It consists of detecting the active regions within the scene being surveyed, that is, those regions undergoing thresholded variations with respect to the background. If an activity pattern is prescribed, FLIP-Q enables the reconfigurability of the image plane accordingly, making its detection and tracking easier. For each frame, the number of active regions is calculated and wirelessly reported in real time. A base station picks up the radio signal and sends the information to a PC via USB, also in real time. Frame rates up to around 10fps have been achieved, although it greatly depends on the light conditions and the image plane division grid. © 2011 IEEE.

Wi-FLIP: A wireless smart camera based on a focal-plane low-power image processor
J. Fernández-Berni, R. Carmona-Galán, G. Liñán-Cembrano, A. Zarándy and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2011
abstract      pdf

This paper presents Wi-FLIP, a vision-enabled WSN node resulting from the integration of FLIP-Q, a prototype vision chip, and Imotel, a commercial WSN platform. In Wi-FLIP, image processing is not only constrained to the digital domain like in conventional architectures. Instead, its image sensor - the FLIP-Q prototype - incorporates pixel-level processing elements (PEs) implemented by analog circuitry. These PEs are interconnected, rendering a massively parallel SIMD-based focal-plane array. Low-level image processing tasks fit very well into this processing scheme. They feature a heavy computational load composed of pixel-wise repetitive operations which can be realized in parallel with moderate accuracy. In such circumstances, analog circuitry, not very precise but faster and more area- and power-efficient than its digital counterpart, has been extensively reported to achieve better performance. The Wi-FLIP's image sensor does not therefore output raw but pre-processed images that make the subsequent digital processing much lighter. The energy cost of such pre-processing is really low - 5.6mW for the worst-case scenario. As a result, for the configuration where the Imote2's processor works at minimum clock frequency, the maximum power consumed by our prototype represents only the 5.2% of the whole system power consumption. This percentage gets even lower as the clock frequency increases. We report experimental results for different algorithms, image resolutions and clock frequencies. The main drawback of this first version of Wi-FLIP is the low frame rate reachable due to the non-standard GPIO-based FLIPQ-to-Imote2 interface. © 2011 IEEE.

Multi-resolution low-power gaussian filtering by reconfigurable focal-plane binning
J. Fernández-Berni, R. Carmona-Galán, F. Pozas-Flores, A. Zarándy and A. Rodríguez-Vázquez
Conference · SPIE Microtechnologies for the New Millennium 2011
abstract      pdf

Gaussian filtering is a basic tool for image processing. Noise reduction, scale-space generation or edge detection are examples of tasks where different Gaussian filters can be successfully utilized. However, their implementation in a conventional digital processor by applying a convolution kernel throughout the image is quite inefficient. Not only the value of every single pixel is taken into consideration sucessively, but also contributions from their neighbors need to be taken into account. Processing of the frame is serialized and memory access is intensive and recurrent. The result is a low operation speed or, alternatively, a high power consumption. This inefficiency is specially remarkable for filters with large variance, as the kernel size increases significantly. In this paper, a different approach to achieve Gaussian filtering is proposed. It is oriented to applications with very low power budgets. The key point is a reconfigurable focal-plane binning. Pixels are grouped according to the targeted resolution by means of a division grid. Then, two consecutive shifts of this grid in opposite directions carry out the spread of information to the neighborhood of each pixel in parallel. The outcome is equivalent to the application of a 3x3 binomial filter kernel, which in turns is a good approximation of a Gaussian filter, on the original image. The variance of the closest Gaussian filter is around 0.5. By repeating the operation, Gaussian filters with larger variances can be achieved. A rough estimation of the necessary energy for each repetition until reaching the desired filter is below 20nJ for a QCIF-size array. Finally, experimental results of a QCIF proof-of-concept focal-plane array manufactured in 0.35 mu m CMOS technology are presented. A maximum RMSE of only 1.2% is obtained by the on-chip Gaussian filtering with respect to the corresponding equivalent ideal filter implemented off-chip.

Design of a smart SiPM based on focal-plane processing elements for improved spatial resolution in PET
F. Pozas-Flores, R. Carmona-Galán, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference · SPIE Microtechnologies for the New Millennium 2011
abstract      pdf

Single-photon avalanche diodes are compatible with standard CMOS. It means that photo-multipliers for scintillation detectors in nuclear medicine (i. e. PET, SPECT) can be built in inexpensive technologies. These silicon photo-multipliers consist in arrays of, usually passively-quenched, SPADs whose output current is sensed by some analog readout circuitry. In addition to the implementation of photosensors that are sensitive to single-photon events, analog, digital and mixed-signal processing circuitry can be included in the same CMOS chip. For instance, the SPAD can be employed as an event detector, and with the help of some in-pixel circuitry, a digitized photo-multiplier can be built in which every single-photon detection event is summed up by a counter. Moreover, this concurrent processing circuitry can be employed to realize low level image processing tasks. They can be efficiently implemented by this architecture given their intrinsic parallelism. Our proposal is to operate onto the light-induced signal at the focal plane in order to obtain a more elaborated record of the detection. For instance, by providing some characterization of the light spot. Information about the depth-of-interaction, in scintillation detectors, can be derived from the position and shape of the scintillation light distribution. This will ultimately have an impact on the spatial resolution that can be achieved. We are presenting the design in CMOS of an array of detector cells. Each cell contains a SPAD, an MOS-based passive quenching circuit and drivers for the column and row detection lines.

Focal-plane generation of multi-resolution and multi-scale image representation for low-power vision applications
J. Fernández-Berni, R. Carmona-Galán, L. Carranza-González, A. Zarandy and A. Rodríguez-Vázquez
Conference · SPIE Infrared Technology and Applications XXXVII, 2011
abstract      pdf

Early vision stages represent a considerably heavy computational load. A huge amount of data needs to be processed under strict timing and power requirements. Conventional architectures usually fail to adhere to the specifications in many application fields, especially when autonomous vision-enabled devices are to be implemented, like in lightweight UAVs, robotics or wireless sensor networks. A bioinspired architectural approach can be employed consisting of a hierarchical division of the processing chain, conveying the highest computational demand to the focal plane. There, distributed processing elements, concurrent with the photosensitive devices, influence the image capture and generate a pre-processed representation of the scene where only the information of interest for subsequent stages remains. These focal-plane operators are implemented by analog building blocks, which may individually be a little imprecise, but as a whole render the appropriate image processing very efficiently. As a proof of concept, we have developed a 176x144-pixel smart CMOS imager that delivers lighter but enriched representations of the scene. Each pixel of the array contains a photosensor and some switches and weighted paths allowing reconfigurable resolution and spatial filtering. An energy-based image representation is also supported. These functionalities greatly simplify the operation of the subsequent digital processor implementing the high level logic of the vision algorithm. The resulting figures, 5.6mW@30fps, permit the integration of the smart image sensor with a wireless interface module (Imote2 from Memsic Corp.) for the development of vision-enabled

On-site forest fire smoke detection by low-power autonomous vision sensor
J. Fernández-Berni, R. Carmona-Galán, L. Carranza-González, A. Cano-Rojas, J. F. Martínez-Carmona, A. Rodríguez-Vázquez and S. Morillas-Castillo
Conference · International Conference on Forest Fire Research ICFFR 2010
abstract      pdf

Early detection plays a crucial role to prevent forest fires from spreading. Wireless vision sensor networks deployed throughout high-risk areas can perform fine-grained surveillance and thereby very early detection and precise location of forest fires. One of the fundamental requirements that need to be met at the network nodes is reliable low-power on-site image processing. It greatly simplifies the communication infrastructure of the network as only alarm signals instead of complete images are transmitted, anticipating thus a very competitive cost. As a first approximation to fulfill such a requirement, this paper reports the results achieved from field tests carried out in collaboration with the Andalusian Fire-Fighting Service (INFOCA). Two controlled burns of forest debris were realized (www.youtube.com/user/vmoteProject). Smoke was successfully detected on-site by the EyeRISTM v1.2, a general-purpose autonomous vision system, built by AnaFocus Ltd., in which a vision algorithm was programmed. No false alarm was triggered despite the significant motion other than smoke present in the scene. Finally, as a further step, we describe the preliminary laboratory results obtained from a prototype vision chip which implements, at very low energy cost, some image processing primitives oriented to environmental monitoring.

Offset-compensated comparator with full-input range in 150nm FDSOI CMOS-3D technology
M. Suárez, V.M. Brea, C. Domínguez-Matas, R. Carmona, G. Liñán and A. Rodríguez-Vázquez
Conference · IEEE Latin American Symposium on Circuits and Systems LASCAS 2010
abstract      pdf

This paper addresses an offset-compensated comparator with full-input range in the 150nm FDSOI CMOS-3D technology from MIT- Lincoln Laboratory. The comparator discussed here makes part of a vision system. Its architecture is that of a self-biased inverter with dynamic offset correction. At simulation level, the comparator can reach a resolution of 0.1mV in an area of approximately 220μm2 with a time response of less than 40ns and a static power dissipation of 1.125μW.

In-pixel ADC for a vision architecture on CMOS-3D technology
M. Suarez, V.M. Brea, C. Dominguez Matas, R. Carmona, G. Liñán and A. Rodríguez-Vázquez
Conference · IEEE 3D System Integration Conference 3DIC 2010
abstract      pdf

This paper addresses the design of an 8-bit single-slope in-pixel ADC for a 3D chip architecture intended for airborne surveillance and reconnaissance applications. The 3D chip architecture comprises a sensor layer with a resolution of 320 × 240 pixels bump-bonded to a three-tier chip on the 150 nm FDSOI CMOS-3D technology from MIT-Lincoln Laboratories. The top tier is a mixed-signal layer with 160 × 120 processing elements. The ADC is distributed between the top two tiers. The top tier contains both global and local circuitry. The ramp generation is implemented with global circuitry through an 8-bit unary current-steering DAC. The end of conversion at every pixel or processing element is triggered by a local comparator. The digital words are stored in a frame-buffer in an intermediate tier. The area of the local circuitry in the ADC is consumed by the comparator, capable of reaching less than 3 mV of resolution in less than 150 ns with less than 220 μm2, and by the memory cells, each one storing 6 8-bit words along with two additional bits in less than 50 μm × 50 μm. Every ADC conversion is performed in less than 120 μs.

Circuital and Architectural Challenges for the Design of PET Medical Imaging Systems using CMOS
A. Rodríguez-Vázquez, R. Carmona-Galán, G. Liñán, R. del Río and B. Pérez-Verdú
Conference · International Workshop on Biomedical Applications of Micro-PET, 2010
abstract     

Abstract not available

A prototype node for wireless vision sensor network applications development
M. Bakkali, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Symposium on I/V Communications and Mobile Networks ISIVC 2010
abstract      pdf

This paper presents a prototype vision-enabled sensor node based on a commercial vision system of reduced size and power consumption. The wireless infrastructure for the deployment of a distributed smart camera network based on these nodes is provided by commercial motes. The smart camera, based on a low-power bio-inspired processing scheme, enables in-node image processing and vision tools. This permits to elaborate a lighter representation of the scene, keeping the relevant information in terms of detected elements, features and events, alleviating the data transmission through the network. Therefore by passing only the relevant information to the neighboring sensor nodes, distributed and collaborative vision is possible with the limited data rates available in commercial wireless sensor networks. Communication between the different components of the system is supported by the available UARTs and GPIOs. Several examples of in-node image processing and feature detection has been tested in the prototype, and information at different abstraction levels has been broadcasted to the network. © 2010 IEEE.

A 3-D chip architecture for optical sensing and concurrent processing
A. Rodríguez-Vázquez, R. Carmona, C. Domínguez-Matas, M. Suárez-Cambre, V. Brea, F. Pozas, G. Liñán, P. Foldessy, A. Zarandy and C. Rekeczky
Conference · SPIE EUROPHOTONICS 2010
abstract      pdf

This paper presents an architecture for the implementation of vision chips in 3-D integration technologies. This architecture employs the multi-functional pixel concept to achieve full parallel processing of the information and hence high processing speed. The top layer includes an array of optical sensors which are parallel-connected to the second layer, consisting of an array of mixed-signal read-out and pre-processing cells. Multiplexing is employed so that each mixedsignal cell handles several optical sensors. The two remaining layer are respectively a memory (used to store different multi-scale images obtained at the mixed-signal layer) and an array of digital processors. A prototype of this architecture has been implemented in a FDSOI CMOS-3D technology with Through-Silicon-Vias of 5um x 5um pitch. © 2010 Copyright SPIE - The International Society for Optical Engineering.

Digital processor array implementation aspects of a 3D multi-layer vision architecture
P. Földesy, R. Carmona-Galán, Á. Zarándy, C. Rekeczky, A. Rodríguez-Vázquez and T. Roska
Conference · International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2010
abstract      pdf

Technological aspects of the 3D integration of a multi-layer combined mixed-signal and digital sensor-processor array chip is described. The 3D integration raises the question of signal routing, power distribution, and heat dissipation, which aspects are considered systematically in the digital processor array layer as part of the multi layer structure. We have developed a linear programming based evaluation system to identify the proper architecture and its parameters. © 2010 IEEE.

Simplified state update calculation for fast and accurate digital emulation of CNN dynamics
F. Pozas-Flores, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2010
abstract      pdf

Compared to other one-step integration methods, the 4th-order Runge-Kutta is much more accurate while still consisting in a rather reduced algorithmic structure. However, in terms of the computing power, it is more expensive than others. While the Forward Euler's method updates the state variable with a single evaluation of the derivative, 4th-order Runge-Kutta's method requires four. This is the reason why, when simulation speed is a central matter, e. g. in the digital emulation of CNN dynamics, the speed-accuracy trade-off is resolved in favour of the simpler, though less accurate, methods. A workaround for the computationally intensive calculation of the state variable update can be found for certain CNN models. If a FSR CNN model is employed, where the state variable is not allowed to go beyond the limits of the linear region of the cell output characteristic, the output can be identified with the state. In these conditions, and having linear templates, the update of the state variable can be computed, for a 4th-order Runge-Kutta's method, with a single function evaluation. It means that a digital emulation of the CNN dynamics following this method is as light-weighted as a Forward Euler's integrator, but much more accurate. © 2010 IEEE.

Robust focal-plane analog processing hardware for dynamic texture segmentation
J. Fernández-Berni and R. Carmona-Galán
Conference · International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2010
abstract      pdf

Cellular Nonlinear Networks (CNN) establish a theoretical framework in which programmable focal-plane image processing arrays can be developed. The conventional support for its analog programmability in VLSI is the implementation of transconductor-based multiplication of the input, output and state variables times the corresponding template elements. However, some distributions of weights can be greatly affected by the intrinsic nonidealities of the physical implementation. This is exactly the case when implementing linear diffusion within a transconductor-based CNN implementation. In this paper we propose an alternative implementation: a resistive grid based on MOSFETs operating in the triode region to realize linear diffusion of the input image, considered as the initial state of the network. In addition, these MOS-resistors can be employed as switches in order to sub-divide the image into bins, sized to track features on the appropriate scale. Thus, by simply controlling the size of the binning and for how long the pixel voltages will diffuse, it will be possible to segment and track dynamic textures along an image flow. Each frame of the flow is described by a smaller image in which each pixel represents the energy of the corresponding image bin, once the non-relevant spatial frequency components have been filtered out. We will demonstrate that the resulting low-resolution representation of the scene is very robust to the different sources of nonidealities in a standard CMOS technology. © 2010 IEEE.

3D multi-layer vision architecture for surveillance and reconnaissance applications
P. Földesy, R. Carmona-Galan, A. Zarándy, C. Rekeczky, A. Rodríguez-Vázquez and T. Roska
Conference · European Conference on Circuit Theory and Design ECCTD 2009
abstract      pdf

The architecture and the design details of a multilayer combined mixed-signal and digital sensor-processor array chip is shown. The processor layers are fabricated with 3D integration technology, and the sensor layer is integrated via bump bonding technology. The chip is constructed of a 320x240 sensor array layer, closely coupled with a 160x120 mixed-signal processor array layer, a digital frame buffer layer, and an 8x8 digital fovea processor array layer. The chip is designed to solve image registration and feature extraction above 1000FPS. ©2009 IEEE.

Low-power focal-plane dynamic texture segmentation based on programmable image binning and diffusion hardware
J. Ferández-Berni and R. Carmona-Galán
Conference · SPIE Microtechnologies: Bioengineered and Bioinspired Systems IV, 2009
abstract      pdf

Stand-alone applications of vision are severely constrained by their limited power budget. This is one of the main reasons why vision has not yet been widely incorporated into wireless sensor networks. For them, image processing should be suscribed to the sensor node in order to reduce network traffic and its associated power consumption. In this scenario, operating the conventional acquisition-digitization-processing chain is unfeasible under tight power limitations. A bio-inspired scheme can be followed to meet the timing requirements while maintaining a low power consumption. In our approach, part of the low-level image processing is conveyed to the focal-plane thus speeding up system operation. Moreover, if a moderate accuracy is permissible, signal processing is realized in the analog domain, resulting in a highly efficient implementation. In this paper we propose a circuit to realize dynamic texture segmentation based on focal-plane spatial bandpass filtering of image subdivisions. By the appropriate binning, we introduce some constrains into the spatial extent of the targeted texture. By running time-controlled linear diffusion within each bin, a specific band of spatial frequencies can be highlighted. Measuring the average energy of the components in that band at each image bin the presence of a targeted texture can be detected and quantified. The resulting low-resolution representation of the scene can be then employed to track the texture along an image flow. An application specific chip, based on this analysis, is being developed for natural spaces monitoring by means of a network of low-power vision systems. ©2009 SPIE.

Accurate design of a MOS-based resistive network for time-controlled diffusion filtering
J. Fernández-Berni and R. Carmona-Galán
Conference · European Conference on Circuit Theory and Design ECCTD 2009
abstract      pdf

This paper analyses a MOS-based resistive network suitable for massively parallel image processing. The inclusion of MOS transistors biased in the ohmic region instead of true resistors permits certain control over the underlying spatial filtering while reducing the required area for VLSI implementation. However, it also leads to nonlinearities and thereby to errors with respect to an ideal resistive grid. By studying an elementary network composed of only two nodes we determine the guidelines to be followed in order to minimize the error according to the selected signal range. These guidelines are then extrapolated to larger networks demonstrating that pretty accurate networks can be achieved even for relatively wide signal ranges. Simulations are employed to validate the extrapolated results. The numerical examples will also allow to visualize how the insertion of MOS transistors affects the spatial filtering carried out by the grid.

A VLSI-oriented and power-efficient approach for dynamic texture recognition applied to smoke detection
J. Fernández-Berni, R. Carmona-Galán and L. Carranza-González
Conference · International Conference on Computer Vision Theory and Applications VISAPP 2009
abstract      pdf

The recognition of dynamic textures is fundamental in processing image sequences as they are very common in natural scenes. The computation of the optic flow is the most popular method to detect, segment and analyse dynamic textures. For weak dynamic textures, this method is specially adequate. However, for strong dynamic textures, it implies heavy computational load and therefore an important energy consumption. In this paper, we propose a novel approach intented to be implemented by very low-power integrated vision devices. It is based on a simple and flexible computation at the focal plane implemented by power-efficient hardware. The first stages of the processing are dedicated to remove redundant spatial information in order to obtain a simplified representation of the original scene. This simplified representation can be used by subsequent digital processing stages to finally decide about the presence and evolution of a certain dynamic texture in the scene. As an application of the proposed approach, we present the preliminary results of smoke detection for the development of a forest fire detection system based on a wireless vision sensor network.

A vision-based monitoring system for very early automatic detection of forest fires
J. Fernández-Berni, R. Carmona-Galán and L. Carranza-González
Conference · International Conference on Modelling, Monitoring and Management of Forest Fires, 2008
abstract      doi      pdf

This paper describes a system capable of detecting smoke at the very beginning of a forest fire with a precise spatial resolution. The system is based on a wireless vision sensor network. Each sensor monitors a small area of vegetation by running on-site a tailored vision algorithm to detect the presence of smoke. This algorithm examines chromaticity changes and spatio-temporal patterns in the scene that are characteristic of the smoke dynamics at early stages of propagation. Processing takes place at the sensor nodes and, if that is the case, an alarm signal is transmitted through the network along with a reference to the location of the triggered zone - without requiring complex GIS systems. This method improves the spatial resolution on the surveilled area and reduces the rate of false alarms. An energy efficient implementation of the sensor/processor devices is crucial as it determines the autonomy of the network nodes. At this point, we have developed an ad hoc vision algorithm, adapted to the nature of the problem, to be integrated into a single-chip sensor/processor. As a first step to validate the feasibility of the system, we applied the algorithm to smoke sequences recorded with commercial cameras at real-world scenaRíos that simulate the working conditions of the network nodes. The results obtained point to a very high reliability and robustness in the detection process.

Practical limitations to the implementation of resistive grid filtering in cellular neural networks
J. Fernández-Berni and R. Carmona-Galán
Conference · European Conference on Circuit Theory and Design ECCTD 2007
abstract     

This paper analyzes the effect of mismatch in the realization of resistive grid filtering by VLSI CNN hardware. The study of the dynamic routes of a simple network consisting in only two CNN cells is enough to point out the dramatic consequences of the unavoidable mismatches between the circuit building blocks of the cells. Previous works consider that small perturbations in the network parameters lead to small deviations from the ideal behaviour, rendering their studies valid. We find, however, that these models are overly optimistic. The nonlinear nature of the cell output and the singular location of the roots in the ideal system imply that small perturbations in the parameters may render a qualitatively different, non-convergent and undesired dynamics. These effects are greatly noticeable when the network is driven by its initial conditions, and not through the B-template. The numerical study of these effects in a 64 x 64 network confirms the extrapolation of the results found in the evolution of the 2-cell assembly. In order to point out the importance of using the same paths for the transmission of symmetric contributions, we have tested the influence of mismatch in a non-ideal resistive grid built from n-type MOS transistors in ohmic region. In spite of the nonlinearity in the behavior of these resistors, the resulting network seems to be much more robust to mismatch. This last result coinciding with previous studies.

A focal-plane image processor for low power adaptive capture and analysis of the visual stimulus
C.M. Domínguez-Matas, R. Carmona-Galán, F.J. Sánchez-Fernández and A. Rodríguez-Vázquez
Conference · International Symposium on Circuits and Systems ISCAS 2007
abstract     

Portable applications of artificial vision are limited by the fact that conventional processing schemes fail to meet the specifications under a tight power budget. A bio-inspired approach, based in the goal-directed organization of sensory organs found in nature, has been employed to implement a focal-plane image processor for low power vision applications. The prototype contains a multi-layered CNN structure concurrent with 32x32 photosensors with locally programmable integration time for adaptive image capture with on-chip local and global adaptation mechanisms. A more robust and linear multiplier block has been employed to reduce irregular analog wave propagation ought to asymmetric synapses. The predicted computing power per power consumption, 142MOPS/mW, is orders of magnitude above what rendered by conventional architectures.

A bio-inspired vision front-end chip with spatio-temporal processing and adaptive image capture
C.M. Domínguez-Matas, R. Carmona-Galán, F.J. Sánchez-Fernández, J. Cuadri and A. Rodríguez-Vázquez
Conference · International Workshop on Computer Architecture for Machine Perception and Sensing CAMPS 2006
abstract     

This paper presents an advanced CMOS imager with concurrent parallel processing for early-vision tasks. The network is arranged in two layers of 32 x 32 programmable mixed-signal elementary processors with programmable linear feedback and control masks and inter-layer connections for continuous-time cellular neural network dynamics. The ratio of the time constants of these layers is user selectable. There are also feedforward connections to a faster third layer, intended to combine of the outputs of the other two in parallel. We have employed a restricted set of weights' trading flexibility for robustness. It results in a more linear multiplier block and, consequently, a significant reduction of irregularities in the propagation of analog waves ought to asymmetric synapses. Also global and local adaptation to illumination conditions, both throughout the scene and from frame to frame, are implemented on-chip, making use of the available focal-plane processing capabilities. The predicted computing power per area and power consumption is amongst the largest reported, what renders this kind of devices as an efficient front-end for portable applications of artificial vision.

Robust symmetric multiplication for programmable analog VLSI array processing
C. Domínguez-Matas, R. Carmona-Galán, F.J. Sánchez-Fernández and A. Rodríguez-Vázquez
Conference · IEEE International Conference on Electronics, Circuits and Systems ICECS 2006
abstract     

This paper presents an electrically programmable analog multiplier. The circuit performs the multiplication between an input variable and an electrically selectable scaling factor. The multiplier is divided in several blocks: a linearized transconductor, binary weighted current mirrors and a differential to single-ended current adder. This paper shows the advantages introduced using a linearized OTA-based multiplier. The circuit presented renders higher linearity and symmetry in the output current than a previously reported single-transistor multiplier. Its inclusion in an array processor based on CNN allows for a more accurate implementation of the processing model and a more robust weight distribution scheme than those found in previous designs.

Experiments on global and local adaptation to illumination conditions based on focal-plane average computation
C.M. Domínguez-Matas, F.J. Sánchez-Fernández, R. Carmona-Galán and E. Roca-Moreno
Conference · International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2006
abstract     

This paper presents some experiments with an integrating pixel with adjustable gain for adaptive image capture. The adaptation mechanism consists in regulating the exposure time according to both global and local illumination conditions. Local adaptation is based in the concurrent processing of local information in the present and previous frames. Global adaptation corrects the average integration time towards the optimum starting on the average pixel value in the previous frame. Experimental results from a prototype chip, which contains a 16x16 array of adaptive pixels, are presented. Tests results have shown that global adaptation enlarges the interframe dynamic range up to 4 decades (80dB). This image DR is limited by test setup, as part of the control scheme is implemented off-chip. Meanwhile, local adaptation produces some histogram equalization and reaches the theoretical intraframe DR enhancement of 6dB. Both adaptive mechanisms have been incorporated into an analog parallel array processor based on a multilayer CNN for image capture based on focal-plane processing.

3-layer CNN chip for focal-plane complex dynamics with adaptive image capture
C.M. Domínguez-Matas, R. Carmona-Galán, F.J. Sánchez-Fernández and A. Rodríguez-Vázquez
Conference · IEEE International Workshop on Cellular Neural Networks and their Applications CNNA 2006
abstract     

This paper presents a CMOS implementation of a layered CNN concurrent with 32x32 photosensors with locally programmable integration time for adaptive image capture. The network is arranged in two layers containing feedback and control templates, inter-layer connections and programmable ratio of time constants. There are also feedforward connections to a third layer, which is faster, and devoted exclusively for combining the outputs of the other two. A more robust and linear multiplier block has been employed to reduce irregular analog wave propagation ought to asymmetric synapses. Global and local adaptation circuits are included on-chip. The predicted computing power per power consumption, 240MOPS/mW, is amongst the largest reported, what renders this kind of devices as especially adequate for portable applications of artificial vision.

A CNN-driven locally adaptive CMOS image sensor
R. Carmona, C.M. Domínguez-Matas, J. Cuadri, F. Jiménez-Garrido and A. Rodríguez-Vázquez
Conference · International Symposium on Circuits and Systems ISCAS 2004
abstract     

A bioinspired model for mixed-signal array mimics the way in which images are processed in the visual pathway. Focal-plane processing of images permits local adaptation of photoreceptor structures in silicon. Beyond simple resistive grid filtering, nonlinear and anisotropic diffusion can be programmed in this CNN chip. This paper presents the local circuitry for sensors adaptation based on the mixed-signal VLSI parallel processing infrastructure in CMOS.

Programmable retinal dynamics in a CMOS mixed-signal array processor chip
R. Carmona, F. Jiménez-Garrido, R. Domiguez-Castro, S. Espejo and A. Rodríguez-Vázquez
Conference · Conference on Bioengineered and Bioinspired Systems 2003
abstract     

The low-level image processing that takes place in the retina is intended to compress the relevant visual information to a manageable size. The behavior of the external layers of the biological retina has been successfully modelled by a Cellular Neural Network, whose evolution can be described by a set of coupled nonlinear differential equations. A mixed-signal VLSI implementation of the focal-plane low-level image processing based upon this biological model constitutes a feasible and cost effective alternative to conventional digital processing in real-time applications. For these reasons, a programmable array processor prototype chip has been designed and fabricated in a standard 0.5mum CMOS technology. The integrated system consists of a network of two coupled layers, containing 32 x 32 elementary processors, running at different time constants. Involved image processing algorithms can be programmed on this chip by tuning the appropriate interconnections weights. Propagative, active wave phenomena and retina-like effects can be observed in this chip. Design challenges, trade-offs, the buildings blocks and some test results are presented in this paper.

Analog weight buffering strategy for CNN chips
G. Liñán-Cembrano, A. Rodríguez-Vázquez, R. Carmona, S. Espejo and R. Domínguez Castro
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2003
abstract     

Large, gray-scale CNN chips employ analog signals to achieve high-density in the internal distribution of the template parameters. Despite the design strategies adopted at the circuitry employed to implement the weights, accuracy is ultimately limited by the controlling signals. This paper presents a buffering strategy intended to achieve 8-bit equivalent accuracy in the distribution of the internal analog signals, as employed in the chips ACE4k [1], ACE16k [2], and CACE1k [3].

Bio-inspired analog parallel array processor chip with programmable spatio-temporal dynamics
R. Carmona, F. Jiménez-Garrido, R. Domínguez-Castro, S. Espejo and A. Rodríguez-Vázquez
Conference · IEEE International Symposium on Circuits and Systems ISCAS 2002
abstract     

A bio-inspired model for an analog parallel array processor (APAP), based on studies on the vertebrate retina, permits the realization of complex spatio-temporal dynamics in VLSI. This model mimics the way in which images are processed in the natural visual pathway what renders a feasible alternative for the implementation of early vision tasks in standard technologies. A prototype chip has been designed and fabricated in 0.5mum CMOS. Design challenges, trade-offs and the building blocks of such a high-complexity system (0.5 x 10(6) transistors, most of them operating in analog mode) are presented in this paper.

Bio-inspired analog VLSI design realizes programmable complex spatio-temporal dynamics on a single chip
R. Carmona, F. Jiménez-Garrido, R. Domínguez-Castro, S. Espejo and A. Rodríguez-Vázquez
Conference · Design, Automation and Test in Europe Conference and Exhibition DATE 2002
abstract     

A bio-inspired model for an analog parallel array processor (APAP), based on studies on the vertebrate retina, permits the realization of complex spatio-temporal dynamics in VLSI. This model mimics the way in which images are processed in the visual pathway what renders a feasible alternative for the implementation of early vision tasks in standard technologies. A prototype chip has been designed in 0.5 mum CMOS. Design challenges, trade-offs and the building blocks of such a high-complexity system (0.5 x 10(6) transistors, most of them operating in analog mode) are presented in this paper.

CMOS realization of a 2-layer CNN universal machine chip
R. Carmona, F. Jiménez-Garrido, R. Domínguez-Castro, S. Espejo and A. Rodríguez-Vázquez
Conference · IEEE International Workshop on Cellular Neural Networks and Their Applications CNNA 2002
abstract     

Some of the features of the biological retina can be modelled by a cellular neural network (CNN) composed of two dynamically coupled layers of locally connected elementary nonlinear processors. In order to explore the possibilities of these complex spatio-temporal dynamics in image processing, a prototype chip has been developed implementing this CNN model with analog signal processing blocks. This chip has been designed in a 0.5 mum CMOS technology. Design challenges, trade-offs and the building blocks of such a high-complexity system (0.5 x 10(6) transistors, most of them operating in analog mode) are presented in this paper(a).

Books


Low-power smart imagers for vision-enabled sensor networks
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Book · 156 p, 2012
abstract      link      pdf

This book presents a comprehensive, systematic approach to the development of vision system architectures that employ sensory-processing concurrency and parallel processing to meet the autonomy challenges posed by a variety of safety and surveillance applications. Coverage includes a thorough analysis of resistive diffusion networks embedded within an image sensor array. This analysis supports a systematic approach to the design of spatial image filters and their implementation as vision chips in CMOS technology. The book also addresses system-level considerations pertaining to the embedding of these vision chips into vision-enabled wireless sensor networks. Describes a system-level approach for designing of vision devices and embedding them into vision-enabled, wireless sensor networks; Surveys state-of-the-art, vision-enabled WSN nodes; Includes details of specifications and challenges of vision-enabled WSNs; Explains architectures for low-energy CMOS vision chips with embedded, programmable spatial filtering capabilities; Includes considerations pertaining to the integration of vision chips into off-the-shelf WSN platforms.

Book Chapters


Image feature extraction acceleration
J. Fernández-Berni, M. Suárez-Cambre, R. Carmona-Galán, V. Brea, R. del Río, D. Cabello and Á. Rodríguez-Vázquez
Book Chapter · Image Feature Detectors and Descriptors, SCI, vol. 630, pp 109-132, 2016
abstract      doi      pdf

Image feature extraction is instrumental for most of the best-performing algorithms in computer vision. However, it is also expensive in terms of computational and memory resources for embedded systems due to the need of dealing with individual pixels at the earliest processing levels. In this regard, conventional system architectures do not take advantage of potential exploitation of parallelism and distributed memory from the very beginning of the processing chain. Raw pixel values provided by the front-end image sensor are squeezed into a high-speed interface with the rest of system components. Only then, after deserializing this massive dataflow, parallelism, if any, is exploited. This chapter introduces a rather different approach from an architectural point of view. We present two Application-Specific Integrated Circuits (ASICs) where the 2-D array of photo-sensitive devices featured by regular imagers is combined with distributed memory supporting concurrent processing. Custom circuitry is added per pixel in order to accelerate image feature extraction right at the focal plane. Specifically, the proposed sensing-processing chips aim at the acceleration of two flagships algorithms within the computer vision community: the Viola-Jones face detection algorithm and the Scale Invariant Feature Transform (SIFT). Experimental results prove the feasibility and benefits of this architectural solution.

Focal-plane dynamic texture segmentation by programmable binning and scale extraction
J. Fernández-Berni and R. Carmona-Galán
Book Chapter · Focal-Plane Sensor-Processor Chips, pp 105-124, 2011
abstract      doi      pdf

Dynamic textures are spatially repetitive time-varying visual patterns that present, however, some temporal stationarity within their constituting elements. In addition, their spatial and temporal extents are a priori unknown. This kind of pattern is very common in nature; therefore, dynamic texture segmentation is an important task for surveillance and monitoring. Conventional methods employ optic flow computation, though it represents a heavy computational load. Here, we describe texture segmentation based on focal-plane space-scale generation. The programmable size of the subimages to be analysed and the scales to be extracted encode sufficient information from the texture signature to warn its presence. A prototype smart imager has been designed and fabricated in 0.35 μm CMOS, featuring a very low-power scale-space representation of used-defined subimages.

VISCUBE: A multi-layer vision chip
A. Zarándy, C. Rekeczky, P. Földesy, R. Carmona-Galán, G. Liñán-Cembrano, S. Gergely, A. Rodríguez-Vázquez and T. Roska
Book Chapter · Focal-Plane Sensor-Processor Chips, pp 181-208, 2011
abstract      doi      pdf

Vertically integrated focal-plane sensor-processor chip design, combining image sensor with mixed-signal and digital processor arrays on a four layer structure is introduced. The mixed-signal processor array is designed to perform early image processing, while the role of the digital processor array is to accomplish foveal processing. The architecture supports multiscale, multifovea processing. The chip has been designed on a 0.15um feature sized 3DM2 SOI technology provided by MIT Lincoln Laboratory.

Cellular multi-core processor carrier chip for nanoantenna integration and experiments
A. Zarandy, P. Foldesy, R. Carmona, C. Rekeczky, J.A. Bean and W. Porod
Book Chapter · Cellular Nanoscale Sensory Wave Computing, pp 147-162, 2010
abstract      doi      pdf

A new generation of IR and sub-millimeter wave radiation detector imager array with integrated per channel high-gain capacitive amplifiers and digital signal processing/enhancement circuitry was designed. The multi-core processor carrier chip, with the analog interface and the digital processor array were implemented in standard 0.18-μm CMOS technology and verified within a compact measurement system. Characterization with external photosensors has been completed and the associated measurement results are presented. A concept for nano antenna type detector array integration to the processor carrier was also developed, and some preliminary experiments have been conducted with metal-oxide-metal (MOM) diodes.

Vertebrate retina emulation using multi-layer array-processor mixed-signal chips
R. Carmona-Galán, A. Rodríguez-Vázquez, R. Domínguez-Castro and S. Espejo Meana
Book Chapter · Smart Adaptive Systems on Silicon, pp 85-101, 2004
abstract      doi      

A bio-inspired model for an analog programmable array processor (APAP), based on stu dies on the vertebrate retina, has permitted the realization of complex programmable spatio-temporal dynamics in VLSI. This model mimics the way in which images are processed in the visual pathway, what renders a feasible alternative for the implementation of early vision tasks in standard technologies. A prototype chip has been designed and fabricated in 0.5μm CMOS. It renders a computing power per silicon area and power consumption that is amongst the highest reported for a single chip. The details of the bio-inspired network model, the analog building block design challenges and trade-offs and some functional tests results are presented here.

Other publications


No results

  • Journals585
  • Conferences1171
  • Books30
  • Book chapters81
  • Others9
  • 20245
  • 202335
  • 202281
  • 202183
  • 2020103
  • 201977
  • 2018106
  • 2017111
  • 2016104
  • 2015111
  • 2014104
  • 201380
  • 2012108
  • 2011102
  • 2010120
  • 200977
  • 200867
  • 200770
  • 200665
  • 200578
  • 200468
  • 200362
  • 200259
RESEARCH