Spanish National Research Council · University of Seville
 HOME
INTRANET
esp    ing
IMSE-CNM in Digital.CSIC


 


In all publications
Author: Fernández Berni , Jorge
Year: Since 2002
All publications
On-The-Fly Deployment of Deep Neural Networks on Heterogeneous Hardware in a Low-Cost Smart Camera
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galan and A. Rodríguez-Vázquez
Conference - ACM International Conference on Distributed Smart Cameras ICDSC 2018
DOI: 10.1145/3243394.3243705    » doi
[abstract]
This demo showcases a low-cost smart camera where different hardware configurations can be selected to perform image recognition on deep neural networks. Both the hardware configuration and the network model can be changed any time on the fly. Up to 24 hardware-model combinations are possible, enabling dynamic reconfiguration according to prescribed application requirements.

Optimum Network/Framework Selection from High-Level Specifications in Embedded Deep Learning Vision Applications
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper - Lecture Notes in Computer Science LNCS, vol. 11182, pp 369-379, 2018
SPRINGER    DOI: 10.1007/978-3-030-01449-0_31    ISSN: 0302-9743    » doi
[abstract]
This paper benchmarks 16 combinations of popular Deep Neural Networks and Deep Learning frameworks on an embedded platform. A Figure of Merit based on high-level specifications is introduced. By sweeping the relative weight of accuracy, throughput and power consumption on global performance, we demonstrate that only a reduced set of the analyzed combinations must actually be considered for real deployment. We also report the optimum network/framework selection for all possible application scenarios defined in those terms, i.e. weighted balance of the aforementioned parameters. Our approach can be extended to other networks, frameworks and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of Deep Learning technology.

Guest editorial special issue on computational image sensors and smart camera hardware
J. Fernández-Berni, R. Carmona-Galán, G. Sicard and A. Dupret
Journal Paper - International Journal of Circuit Theory and Applications, Computational Image Sensors and Smart Camera Hardware, vol. 46, no. 9, pp 1577-1579, 2018
JOHN WILEY & SONS    DOI: 10.1002/cta.2551    ISSN: 0098-9886    » doi
[abstract]
Recent advances in both software and hardware technologies are enabling the emergence of vision as a key sensorial modality in various application scenarios. Concerning hardware, all of the components along the signal chain play a significant role when it comes to implementing smart vision-enabled systems. At the front end, new circuit structures for sensing, processing, and signal conditioning are adding functionalities in CMOS imagers beyond the mere generation of 2-D intensity maps. Moreover, the development of vertical integration technologies is facilitating monolithic realizations of visual sensors where the incorporation of computational capabilities has no impact at all on image quality. Typically, the outcome of the front-end device in a smart camera will be a preprocessed flow of information ready for further efficient analysis. At this point, specific ICs known as vision processing units can be inserted to accelerate the processing flow according to the targeted application. On the other hand, reconfigurability is a valuable asset in the ever-changing field of vision. FPGAs leverage cutting-edge digital technologies to offer flexible hardware for exploration of different memory arrangements, data flows, and processing parallelization. It is precisely parallelization for which GPUs constitute an interesting alternative in smart cameras when massive pixel-level operation is required. This is the case of state-of-the-art vision algorithms based on convolutional neural networks. At higher level, DSPs and multicore CPUs make software development notably easier at the cost of losing hardware specificity. Overall, this special issue aims at covering some of the latest research works in the vast ecosystem of hardware for artificial vision.

Optimum Network/Framework Selection from High-Level Specifications in Embedded Deep Learning Vision Applications
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - Advanced Concepts for Intelligent Vision Systems ACIVS 2018
[abstract]
This paper benchmarks 16 combinations of popular Deep Neural Networks and Deep Learning frameworks on an embedded platform. A Figure of Merit based on high-level specifications is introduced. By sweeping the relative weight of accuracy, throughput and power consumption on global performance, we demonstrate that only a reduced set of the analyzed combinations must actually be considered for real deployment. We also report the optimum network/framework selection for all possible application scenarios defined in those terms, i.e. weighted balance of the aforementioned parameters. Our approach can be extended to other networks, frameworks and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of Deep Learning technology.

CMOS Vision Sensors: Embedding Computer Vision at Imaging Front-Ends
A. Rodríguez-Vázquez, J. Fernández-Berni, J.A. Leñero-Bardallo, I. Vornicu and R. Carmona-Galán
Journal Paper - IEEE Circuits and Systems Magazine, vol. 18, no. 2, pp 90-107, 2018
IEEE    DOI: 10.1109/MCAS.2018.2821772    ISSN: 1531-636X    » doi
[abstract]
CMOS Image Sensors (CIS) are key for imaging technologies. These chips are conceived for capturing optical scenes focused on their surface, and for delivering electrical images, commonly in digital format. CISs may incorporate intelligence; however, their smartness basically concerns calibration, error correction and other similar tasks. The term CVISs (CMOS VIsion Sensors) defines other class of sensor front-ends which are aimed at performing vision tasks right at the focal plane. They have been running under names such as computational image sensors, vision sensors and silicon retinas, among others. CVIS and CISs are similar regarding physical implementation. However, while inputs of both CIS and CVIS are images captured by photo-sensors placed at the focal-plane, CVISs primary outputs may not be images but either image features or even decisions based on the spatial-temporal analysis of the scenes. We may hence state that CVISs are more ‘intelligent’ than CISs as they focus on information instead of on raw data. Actually, CVIS architectures capable of extracting and interpreting the information contained in images, and prompting reaction commands thereof, have been explored for years in academia, and industrial applications are recently ramping up. One of the challenges of CVISs architects is incorporating computer vision concepts into the design flow. The endeavor is ambitious because imaging and computer vision communities are rather disjoint groups talking different languages. The Cellular Nonlinear Network Universal Machine (CNNUM) paradigm, proposed by Profs. Chua and Roska, defined an adequate framework for such conciliation as it is particularly well suited for hardware-software co-design. This paper overviews CVISs chips that were conceived and prototyped at IMS E Vision Lab over the past twenty years. Some of them fit the CNNUM paradigm while others are tangential to it. All of them employ per-pixel mixed-signal processing circuitry to achieve sensor-processing concurrency in the quest of fast operation with reduced energy budget.

Live Demonstration: Low-Power Low-Cost Cyber-Physical System for Bird Monitoring
A. García-Rodríguez, J. Fernández-Berni, R. del Río, J. Marín, M. Baena, J. Bustamante, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - IEEE International Symposium on Circuits and Systems ISCAS 2018
[abstract]
This live demonstration showcases a cyber-physical system tailored for inexpensive remote bird monitoring. A comprehensive analysis of the application requirements along with a tight system integration have given rise to a smart autonomous nest-box ready for deployment. This nest-box includes radiofrequency identification (RFID), a weighing scale, two temperature sensors, passive infrared devices (PIR), massive data storage and internet connection via mobile infrastructure. It is powered through a solar panel. The bill of materials has been diminished 77% with respect to the previous version of the nest-box whereas the power consumption has been reduced 84%.

Color Tone-Mapping Circuit for a Focal-Plane Implementation Sign In or Purchase
G.M.S. Nunes, F.D.V.R. Oliveira, J.G. Gomes, A. Petraglia, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - IEEE International Symposium on Circuits and Systems ISCAS 2018
[abstract]
In this article, we present a review of the driving principles and parameters of a previously reported focal-plane tone-mapping operator. We then extend it in order to include color information processing. The signal processing operations required for handling color images are white balance and demosaicing. Neither white balance nor demosaicing are carried out in the focal plane, in order to avoid increasing circuit size and complexity. Since, in this case, white balance is carried out after tone mapping, multiplication of red and blue channels by constant gains may lead to wrong color results. An alternative approach is proposed, in which different gains are assigned for every red and blue pixel of the matrix. Because of the introduction of color, a modification in the original circuit is proposed, which affects the integration time of red and blue pixels. This modification leads to a reduction in the number of photodiodes required in the pixel array, and hence to a reduction of the sensing circuit area. The results produced by the operator are compared to those obtained from two other digital tone-mapping operators.

Concurrent focal-plane generation of compressed samples from time-encoded pixel values
M. Trevisi, H.C. Bandala, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - Design Automation and Test in Europe DATE 2018
[abstract]
Compressive sampling allows wrapping the relevant content of an image in a reduced set of data. It exploits the sparsity of natural images. This principle can be employed to deliver images over a network under a restricted data rate and still receive enough meaningful information. An efficient implementation of this principle lies in the generation of the compressed samples right at the imager. Otherwise, i. e. digitizing the complete image and then composing the compressed samples in the digital plane, the required memory and processing resources can seriously compromise the budget of an autonomous camera node. In this paper we present the design of a pixel architecture that encodes light intensity into time, followed by a global strategy to pseudo-randomly combine pixel values and generate, on-chip and on-line, the compressed samples.

Performance Analysis of Real-Time DNN Inference on Raspberry Pi
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - SPIE Real-Time Image and Video Processing 2018
[abstract]
Deep Neural Networks (DNNs) have emerged as the reference processing architecture for the implementation of multiple computer vision tasks. They achieve much higher accuracy than traditional algorithms based on shallow learning. However, it comes at the cost of a substantial increase of computational resources. This constitutes a challenge for embedded vision systems performing edge inference as opposed to cloud processing. In such a demanding scenario, several open-source frameworks have been developed, e.g. Ca e, OpenCV, TensorFlow, Theano, Torch or MXNet. All of these tools enable the deployment of various state-of-the-art DNN models for inference, though each one relies on particular optimization libraries and techniques resulting in di erent performance behavior. In this paper, we present a comparative study of some of these frameworks in terms of power consumption, throughput and precision for some of the most popular Convolutional Neural Networks (CNN) models. The benchmarking system is Raspberry Pi 3 Model B, a low-cost embedded platform with limited resources. We highlight the advantages and limitations associated with the practical use of the analyzed frameworks. Some guidelines are provided for suitable selection of a speci c tool according to prescribed application requirements.

Special issue on computational image sensors and smart camera hardware
J. Fernández-Berni, R. Carmona-Galán, G. Sicard A. Dupret
Journal Paper - International Journal of Circuit Theory and Applications, vol. 45, no. 6, pp 729-730, 2017
JOHN WILEY & SONS    DOI: 10.1002/cta.2363    ISSN: 0098-9886    » doi
[abstract]
Abstract not avaliable

Gaussian Pyramid: Comparative Analysis of Hardware Architectures
F.D.V.R. Oliveira, J.G.R.C. Gomes, J. Fernandez-Berni, R. Carmona-Galan, R. del Rio and A. Rodriguez-Vazquez
Conference - Workshop on the Architecture of Smart Cameras WASC 2017
[abstract]
Abstract not avaliable

Gaussian Pyramid: Comparative Analysis of Hardware Architectures
F.D.V.R. Oliveira, J.G.R.C. Gomes, J. Fernandez-Berni, R. Carmona-Galan, R. del Rio and A. Rodriguez-Vazquez
Journal Paper - IEEE Transactions on Circuits and Systems I-Regular Papers, vol. 64, no. 9, pp 2308-2321, 2017
IEEE    DOI: 10.1109/TCSI.2017.2709280    ISSN: 1549-8328    » doi
[abstract]
This paper addresses a comparison of architectures for the hardware implementation of Gaussian image pyramids. Main differences between architectural choices are in the sensor front-end. One side is for architectures consisting of a conventional sensor that delivers digital images and which is followed by digital processors. The other side is for architectures employing a non-conventional sensor with per-pixel embedded preprocessing structures for Gaussian spatial filtering. This later choice belongs to the general category of " artificial retina" sensors which have been for long claimed as potentially advantageous for enhancing throughput and reducing energy consumption of vision systems. These advantages are very important in the internet of things context, where imaging systems are constantly exchanging information. This paper attempts to quantify these potential advantages within a design space in which the degrees of freedom are the number and type of ADCs, single-slope, SAR, cyclic, Sigma Delta, and pipeline, and the number of digital processors. Results show that speed and energy advantages of preprocessing sensors are not granted by default and are only realized through proper architectural design. The methodology presented for the comparison between focal-plane and digital approaches is a useful tool for imager design, allowing for the assessment of focal-plane processing advantages.

TCAD Simulation of Electrical Crosstalk in 4T-Active Pixel Sensors
J.M. López-Martínez, R. Carmona-Galán, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference - Workshop on the Architecture of Smart Cameras WASC 2017
[abstract]
CMOS image sensors (CIS) are widely used nowadays in consumer electronics as well as in high-end applications. This is mainly due to their advantages regarding low dark current and low noise characteristics of the pinned photodiode (PPD). Much effort has been put into better understanding key electrical properties of PPDs, like full well capacity, photodiode´s capacitance or pinning voltage. Another important source of sensitivity degradation is crosstalk (CTK). It has been assessed for CCDs and some CMOS devices. However, addressing CTK in CMOS 4T-APS pixels at the design phase is not easy, mainly due to the unavailability of CIS technology parameters.an additional problem is the computational cost of TCAD simulation; e.g., a five pixel linear array like the one shown in Fig. 1, already introduce long periods of computing due to the complexity of the structure. Crosstalk occurs when the charge generated by photon incident on a pixel are finally sensed by a neighboring pixel. CTK degrades performance, cutting down spatial resolution, reducing the overall sensitivity, degrading color separation, and increasing image noise. Crosstalk is defined as the percentage of the total charge generated by incident light that is diverted to non-illuminated pixels in the neighborhood. There are two components in CTK. Optical crosstalk is related to illumination, reflection, refraction and scattering of photons in the different layers of the material that cover the photodiode. This generates stray photons that are absorbed in the neighborhood. The second component is electrical, and it involves the diffusion of photo-generated carriers between adjacent devices. The characterization of electrical CTK in 4T-APS can be achieved using TCAD tools. Particularly, the relation between CKT and quantum efficiency (QE) can be explored and linked to the thickness of the epitaxial layer.

TFET-based Well Capacity Adjustment in Active Pixel Sensor for Enhanced High Dynamic Range
J. Fernández-Berni, M. Niemier, X.S. Hu, H. Lu, W. Li, P. Fay, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper - Electronics Letters, vol.53, no. 9, pp 622-624, 2017
IEEE    DOI: 10.1049/el.2016.4548    ISSN: 0013-5194     » doi
[abstract]
A tunnel field-effect transistor (TFET)-based pixel circuit for well capacity adjustment that does not require subthreshold operation on the part of the reset transistor is presented. In CMOS, this subthreshold operation leads to temporal noise, distortion and fixed pattern noise, becoming a primary limiting performance factor. In the proposed circuit, the asymmetric conduction associated with TFETs is exploited. This property, arising from the inherent physical structure of the device, provides the selective well adjustments during photo-integration which are demanded for achieving high dynamic range. A GaN-based heterojunction TFET has been designed according to the specific requirements for this application.

In the quest of vision-sensors-on-chip: Pre-processing sensors for data reduction
A. Rodríguez-Vázquez, R. Carmona-Galán, J. Fernández-Berni, V. Brea, J.A. Leñero-Bardallo
Conference - IS&T International Symposium on Electronic Imaging 2017
[abstract]
This paper shows that the implementation of vision systems benefits from the usage of sensing front-end chips with embedded pre-processing capabilities -called CVIS. Such embedded pre-processors reduce the number of data to be delivered for ulterior processing. This strategy, which is also adopted by natural vision systems, relaxes system-level requirements regarding data storage and communications and enables highly compact and fast vision systems. The paper includes several proof-o-concept CVIS chips with embedded pre-processing and illustrate their potential advantages.

Low-Power CMOS Vision Sensor for Gaussian Pyramid Extraction
M. Suárez, V.M. Brea, J. Fernández-Berni, R. Carmona-Galán, D. Cabello and A. Rodríguez-Vázquez
Journal Paper - IEEE Journal of Solid-State Circuits, vol. 52, no. 2, pp 483-495, 2017
IEEE    DOI: 10.1109/JSSC.2016.2610580    ISSN: 0018-9200    » doi
[abstract]
This paper introduces a CMOS vision sensor chip in a standard 0.18 μm CMOS technology for Gaussian pyramid extraction. The Gaussian pyramid provides computer vision algorithms with scale invariance, which permits having the same response regardless of the distance of the scene to the camera. The chip comprises 176 x 120 photosensors arranged into 88 x 60 processing elements (PEs). The Gaussian pyramid is generated with a double-Euler switched capacitor (SC) network. Every PE comprises four photodiodes, one 8 b single-slope analog-to-digital converter, one correlated double sampling circuit, and four state capacitors with their corresponding switches to implement the double-Euler SC network. Every PE occupies 44 x 44 μm^2. Measurements from the chip are presented to assess the accuracy of the generated Gaussian pyramid for visual tracking applications. Error levels are below 2% full-scale output, thus making the chip feasible for these applications. Also, energy cost is 26.5 nJ/px at 2.64 Mpx/s, thus outperforming conventional solutions of imager plus microprocessor unit.

Special Section Guest Editorial: Advances on Distributed Smart Cameras
J. Fernández-Berni, F. Berry and C. Micheloni
Journal Paper - Journal of Electronic Imaging, vol. 25, no. 4, pp 1-2, 2016
SPIE    DOI: 10.1117/1.JEI.25.4.041001    ISSN: 1017-9909    » doi
[abstract]
This special section was aimed at bringing together the latest contributions to the exciting multidisciplinary field of distributed smart cameras. An open call for papers was issued, with relevant topics ranging from vision chips and dedicated real-time image processing hardware to high-level information processing and smart camera networks. Invitations were also issued for extended versions of selected works from the 2015 edition of the flagship academic event in this field, the International Conference on Distributed Smart Cameras, where the guest editors served as technical program chairs. A total of 20 manuscripts were submitted, out of which 12 papers were finally accepted -60% acceptance rate- after peer review by at least two experts in the field.

Demo: Image Sensing Scheme Enabling Fully-Programmable Light Adaptation and Tone Mapping with a Single Exposure
J. Fernández-Berni, F.D.V.R. Oliveira, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - International Conference on Distributed Smart Cameras ICDSC 2016
[abstract]
This demo showcases a High Dynamic Range (HDR) technique recently reported. We demonstrate that two intertwined photodiodes per pixel can perform tone mapping under unconstrained illumination conditions with a single exposure. The proposed technique has been implemented on a prototype smart image sensor achieving a dynamic range of 102dB. It opens the door to the realization of smart cameras and vision sensors capable of rendering HDR images free of artifacts without requiring any digital post-processing at all.

Image dynamic range extension by using stacked (unmatched) photodiodes in CMOS
R. Carmona-Galán, J. A. Leñero-Bardallo, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference - Workshop on the Architecture of Smart Cameras WASC 2016
[abstract]
Capturing images containing unevenly illuminated areas within the same frame is very useful in application fields like surveillance, assisted driving, intelligent transportation, or industrial applications with high intra-scene contrast. Without the appropriate dynamic range to allocate these diverse illumination values, obtaining a detailed view of the brightest zones can easily obscure other elements in the scene. In order to increase the image dynamic range within the same frame, different techniques have been developed: using a sensor with a companding scheme, providing the means to avoid saturation, or employing multiple image captures. The problem with multiple captures is that uncorrelation between the different integration times can generate inexistent edges and distort the interpretation of the scene. In order to realize multiple captures in parallel, we need to be simultaneously sensitive to different illumination ranges. CMOS technology offers a variety of devices to capture light in the visible and near infrared range. If a deep-n-well is available, these structures can be stacked so spatial alignment is obtained by construction (Fig. 1a). The conversion gain of the different photodiodes is defined by their capacitance per unit area (Fig. 1b); therefore each of them will render a different voltage for the same light intensity. This discrepancy in the response can be exploited to extract information from different illumination ranges simultaneously. In this way, light can be sensed in parallel with different conversion gains and the resulting output voltages can then be digitized and combined into a single digital word with a larger number of bits. This mechanism for dynamic range extension does not depend on the difference of exposure times, so artifacts related with unmatched dynamics in the sensor and the scene can be avoided.

Pixel-wise parameter adaptation for single-exposure extension of the image dynamic range
R. Carmona-Galán, J.A. Leñero-Bardallo, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference - International Conference on Distributed Smart Cameras ICDSC 2016
[abstract]
High dynamic range imaging is central in application fields like surveillance, intelligent transportation and advanced driving assistance systems. In some scenarios, methods for dynamic range extension based on multiple captures have shown limitations in apprehending the dynamics of the scene. Artifacts appear that can put at risk the correct segmentation of objects in the image. We have developed several techniques for the on-chip implementation of single-exposure extension of the dynamic range. We work on the upper extreme of the range, i. e. administering the available full-well capacity. Parameters are adapted pixel-wise in order to accommodate a high intra-scene range of illuminations.

Experimental Evidence of Power Efficiency due to Architecture in Cellular Processor Array Chips
R. Carmona-Galán, J. Fernández Berni and A. Rodríguez-Vázquez
Conference - International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2016
[abstract]
Speeding up algorithm execution can be achieved by increasing the number of processing cores working in parallel. Of course, this speedup is limited by the degree to which the algorithm can be parallelized. Equivalently, by lowering the operating frequency of the elementary processors, the algorithm can be realized in the same amount of time but with measurable power savings. An additional result of parallelization is that using a larger number of processors results in a more efficient implementation in terms of GOPS/W. We have found experimental evidence for this in the study of massively parallel array processors, mainly dedicated to image processing. Their distributed architecture reduces the energy overhead dedicated to data handling, thus resulting in a power efficient implementation.

Hardware-Aware Performance Evaluation for the Co-Design of Image Sensors and Vision Algorithms
C. Villegas-Pachón, R. Carmona-Galán, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference - Int. Conf. on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design SMACD 2016
[abstract]
The top-down approach to system design allows obtaining separate specifications for each subsystem. In the case of vision systems, this means propagating system-level specifications down to particular specifications for e. g. the image sensor, the image processor, etc. This permits to adopt different design strategies for each one of them, as long as they meet their own specifications. This approach can lead to over-design, which is not always affordable. Conversely, if higher-level specifications are too tight, they can lead to impossible specifications at the lower levels. This is certainly the case for embedded vision systems in which high-performance needs to be paired with a very restricted power budget. In order to explore alternative architectures, we need tools that allow for simultaneous optimization of different blocks. However, the link between low-level non-idealities and high-level performance is missing. CAD tools for the design and verification of analog and mixed-signal integrated circuits are not well suited for the simulation of higher-level functionalities. Our approach is to extract relevant data from circuit-level simulation and to build an OpenCV model to be employed in the design of the algorithm. The utility of this approach is illustrated by the evaluation of the effect of column-wise and pixel-wise FPN at the sensor on the performance of Viola-Jones face detection.

Image Sensing Scheme Enabling Fully-Programmable Light Adaptation and Tone Mapping with a Single Exposure
J. Fernández-Berni, F.D.V.R. Oliveira, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper - IEEE Sensors Journal, vol. 16, no. 13, pp. 5121-5122, 2016
IEEE    DOI: 10.1109/JSEN.2016.2559158    ISSN: 1530-437X    » doi
[abstract]
This letter presents new insights into a high dynamic range (HDR) technique recently reported. We demonstrate that two intertwined photodiodes per pixel can perform tone mapping under unconstrained illumination conditions with a single exposure. Experimental results attained from a prototype chip confirm the proposed theoretical framework. It opens the door to the realization of imagers providing HDR images free of artifacts without requiring any digital post-processing at all.

Single-Exposure HDR Technique Based on Tunable Balance Between Local and Global Adaptation
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper - IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 63, no. 5, pp. 488-492, 2016
IEEE    DOI: 10.1109/TCSII.2015.2505263    ISSN: 1549-7747    » doi
[abstract]
This brief describes a high-dynamic-range technique that compresses wide ranges of illuminations into the available signal range with a single exposure. An online analysis of the image histogram provides the sensor with the necessary feedback to dynamically accommodate changing illumination conditions. This adaptation is accomplished by properly weighing the influence of local and global illuminations on each pixel response. The main advantages of this technique with respect to similar approaches previously reported are as follows: 1) standard active-pixel-sensor circuitry can be used to render the pixel values and 2) the resulting compressed image representation is ready either for readout or for early vision processing at the very focal plane without requiring any additional peripheral circuit block. Experimental results from a prototype smart image sensor achieving a dynamic range of 102 dB are presented.

Live Demonstration: Single-Exposure HDR Image Acquisition Based on Tunable Balance Between Local and Global Adaptation
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - IEEE International Symposium on Circuits and Systems, ISCAS 2016
[abstract]
This live demonstration showcases a high dynamic range technique that compresses wide ranges of illuminations into the available signal range with a single exposure. In order to accomplish such compression, concurrent sensing-processing takes place at the focal plane, weighing the influence of local and global illumination on each pixel response during the image capture. This process is driven by an on-line analysis of the image histogram that also enables the dynamic accommodation of changing illumination conditions. The proposed technique has been implemented on a prototype smart image sensor achieving a dynamic range of 102dB.

Focal-plane scale space generation with a 6T pixel architecture
F. Oliveira, J.G. Gomes, R. Carmona-Galán, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference - IS&T International Symposium on Electronic Imaging 2016
[abstract]
Fill factor and focal-plane implementation of instrumental image processing steps, we propose a simple modification in a standard pixel architecture in order to allow for charge redistribution among neighboring pixels. As a result, averaging operations may be performed at the focal plane, and image smoothing based on Gaussian filtering may thus be implemented. By averaging neighboring pixel values, it is also possible to generate intermediate data structures that are required for the computation of Haar-like features. To show that the proposed hardware is suitable for computer vision applications, we present a systemlevel comparison in which the scale-invariant feature transform (SIFT) algorithm is executed twice: first, on data obtained with a classical Gaussian filtering approach, and then on data generated from the proposed approach. Preliminary schematic and extracted layout pixel simulations are also presented.

High-Level Performance Evaluation of Object Detection Based on Massively Parallel Focal-Plane Acceleration Requiring Minimum Pixel Area Overhead
E. Parra-Barrero, J. Fernández-Berni, F.D.V.R. Oliveira, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - International Conference on Computer Vision Theory and Applications VISAPP 2016
[abstract]
Smart CMOS image sensors can leverage the inherent data-level parallelism and regular computational flow of early vision by incorporating elementary processors at pixel level. However, it comes at the cost of extra area having a strong impact on the sensor sensitivity, resolution and image quality. In this scenario, the fundamental challenge is to devise new strategies capable of boosting the performance of the targeted vision pipeline while minimally affecting the sensing function itself. Such strategies must also feature enough flexibility to accommodate particular application requirements. From these high-level specifications, we propose a focal-plane processing architecture tailored to speed up object detection via the Viola-Jones algorithm. This architecture is supported by only two extra transistors per pixel and simple peripheral digital circuitry that jointly make up a massively parallel reconfigurable processing lattice. A performance evaluation of the proposed scheme in terms of accuracy and acceleration for face detection is reported.

Image feature extraction acceleration
J. Fernández-Berni, M. Suárez-Cambre, R. Carmona-Galán, V. Brea, R. del Río, D. Cabello and Á. Rodríguez-Vázquez
Book Chapter - Image Feature Detectors and Descriptors, SCI, vol. 630, pp 109-132, 2016
SPRINGER    DOI: 10.1007/978-3-319-28854-3_5    ISBN: 978-3-319-28852-9    » doi
[abstract]
Image feature extraction is instrumental for most of the best-performing algorithms in computer vision. However, it is also expensive in terms of computational and memory resources for embedded systems due to the need of dealing with individual pixels at the earliest processing levels. In this regard, conventional system architectures do not take advantage of potential exploitation of parallelism and distributed memory from the very beginning of the processing chain. Raw pixel values provided by the front-end image sensor are squeezed into a high-speed interface with the rest of system components. Only then, after deserializing this massive dataflow, parallelism, if any, is exploited. This chapter introduces a rather different approach from an architectural point of view. We present two Application-Specific Integrated Circuits (ASICs) where the 2-D array of photo-sensitive devices featured by regular imagers is combined with distributed memory supporting concurrent processing. Custom circuitry is added per pixel in order to accelerate image feature extraction right at the focal plane. Specifically, the proposed sensing-processing chips aim at the acceleration of two flagships algorithms within the computer vision community: the Viola-Jones face detection algorithm and the Scale Invariant Feature Transform (SIFT). Experimental results prove the feasibility and benefits of this architectural solution.

On the Design of a Sparsifying Dictionary for Compressive Image Feature Extraction
M. Trevisi, R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference - IEEE International Conference on Electronics Circuits and Systems ICECS 2015
[abstract]
Although there are some works reported on feature extraction from compressed samples, none of them consider the implementation of the feature extractor as a part of the sensor itself. Our approach is to introduce a sparsifying dictionary, feasibly implementable at the focal plane, which describes the image in terms of features. This allows a standard reconstruction algorithm to directly recover the interesting image features, discarding the irrelevant information. In order to validate the approach, we have integrated a Harris-Stephens corner detector into the compressive sampling process. We have evaluated the accuracy of the reconstructed corners compared to applying the detector to a reconstructed image.

CMOS Image Sensor Architecture for Focal Plane Early Vision Processing
F.D.V.R. de Oliveira, J.G.R.C. Gomes, R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference - International Conference on Distributed Smart Cameras ICDSC 2015
[abstract]
This paper presents a pixel architecture that aims at validating the idea that with a small change in the pixel it is possible to perform important image processing computations at the focal-plane without significantly affecting the fill factor. An overview of two algorithms that may benefit from this new pixel architecture is given, namely the SIFT for object recognition and the Viola-Jones for face detection. A brief discussion of the limitations of the computations performed inside the pixel matrix and the future work is also presented.

Compressive Feature Extraction
M. Trevisi, R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference - Workshop on the Architecture of Smart Cameras WASC 2015
[abstract]
Compressive sensing (CS) provides an alternative to Nyquist-Shannon sampling when the signal to acquire is known to be sparse or compressible. A sparse signal has a small number of nonzero components compared to its total length. This property can either exist in the sampling domain of the signal or with respect to other basis. Representing a signal in a transform basis involves the choice of a dictionary, a set of elementary signals, used to decompose the signal. When performing analysis of complex data one of the major problems stems from the number of variables involved. Feature extraction is used to reduce the amount of resources required to describe a large set of data. A given feature is often represented by a set of parameters to be evaluated. This set has relevant values only in correspondence of said features. We can consider sets derived this way as being sparse. This sparked the idea to merge a feature extraction algorithm with the compressive sensing theory. To do so we try to adapt one of the most practical CS reconstruction algorithms, the Nesterov algorithm applied to CS (NESTA) to extract the features of one of the simplest corner detection algorithms, the Harris and Stephens algorithm. Our aim is to compare the performance of the combined Harris-NESTA algorithm over the application of a Harris algorithm on a NESTA reconstructed image and to do so we devised a test that takes into account four different parameters.

Assessment of circuit non-idealities' effect on algorithm performance via OpenCV modeling
C. Villegas-Pachón, R. Carmona-Galán and J. Fernández-Berni
Conference - Workshop on the Architecture of Smart Cameras WASC 2015
[abstract]
CAD tools for the design and verification of analog and mixed-signal integrated circuits are not well suited for the simulation of higher-level functionalities. The evaluation of the effect of circuit non-idealities on the algorithm performance is precluded for the image sensor design team. For the conventional model in computer vision, in which image capture and processing are completely separated tasks, this does not represent a problem. Chip designers will work for a particular set of specifications, i. e. spatial and temporal resolution, power consumption, etc. At the other end, computer scientists will take care of the algorithm once they receive their pictures fitting to the prescribed specifications. The result of this mindset is an architecture that is theoretically universal, although may not be capable of solving every problem when timing and power requirements are taken into consideration. In those applications fields in which smart camera chips can help overcoming these limitations, a different approach needs to be taken in order to come out with optimal solutions. Let us point out here that, in smart camera architectures, computational efficiency is generally provided by appropriate partition of algorithm tasks, parallelization of heavy loads and using distributed and close-to-sensor resources. Sometimes these actions will require the design of specific circuit blocks and ad-hoc image sensing strategies. All of this needs to be worked out at transistor level, but at the same time, their effect in the overall performance of the algorithm needs to be quickly and accurately evaluated in order to guide the design flow. Our proposal is to make use of the flexibility and versatility of an environment like OpenCV to incorporate hardware non-idealities to the evaluation of the algorithm performance. One of the major attractions of this approach is that computer vision experts will be able to consider lower-level deviations when designing and fine-tuning their vision algorithms without having to develop any expertise in chip design and IC CAD tools.

Live demonstration: Gaussian pyramid extraction with a CMOS vision sensor
M. Suarez, V.M. Brea, J. Fernandez-Berni, R. Carmona-Galan, D. Cabello and A. Rodriguez-Vazquez
Conference - IEEE International Symposium on Circuits and Systems, ISCAS 2015
[abstract]
This live demonstration is related to ISCAS track 'Imagers and Vision Processing'. It showcases the Gaussian pyramid with a CMOS vision sensor with a 176 × 120 pixel array in standard 0.18 μm CMOS technology. The sensing elements are 3T-APS with in-pixel ADC and CDS. The Gaussian pyramid is extracted concurrently with a double-Euler switched-capacitor network on the same substrate, giving RMSE errors below 1.2% of FSO. The chip provides a Gaussian pyramid of 3 octaves with 6 scales each with an energy cost of 26.5 nJ/px at 2.64 Mpx/s.

Automatic DR and spatial sampling rate adaptation for secure and privacy-aware ROI tracking based on focal-plane image processing
R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference - International Image Sensor Workshop IISW 2015
[abstract]
Embedded camera systems for the consumer mobile and wearable application market need to operate in a tight power budget. They need to cope with a vast range of illumination conditions, and at the same time, they need to incorporate enough intelligence to implement security and privacy-protection directives. The incorporation of image signal processing at the focal-plane can help reducing the necessary resources to implement tasks like DR adaptation and privacy-aware ROI tracking. In this paper we present a vision sensor that is able to perform single-exposure HDR imaging and ROI obfuscation on-chip, with the help of a reduced set of focal-plane processing elements.

Real-time single-exposure ROI-driven HDR adaptation based on focal-plane reconfiguration
J. Fernández-Berni, R. Carmona-Galán, R. del Río, R. Kleihorst, W. Philips and Á. Rodríguez-Vázquez
Conference - SPIE Real-Time Image and Video Processing 2015
[abstract]
This paper describes a prototype smart imager capable of adjusting the photo-integration time of multiple regions of interest concurrently, automatically and asynchronously with a single exposure period. The operation is supported by two intertwined photo-diodes at pixel level and two digital registers at the periphery of the pixel matrix. These registers divide the focal-plane into independent regions within which automatic concurrent adjustment of the integration time takes place. At pixel level, one of the photo-diodes senses the pixel value itself whereas the other, in collaboration with its counterparts in a particular ROI, senses the mean illumination of that ROI. Additional circuitry interconnecting both photo-diodes enables the asynchronous adjustment of the integration time for each ROI according to this sensed illumination. The sensor can be reconfigured on-the-fly according to the requirements of a vision algorithm.

Bottom-up performance analysis of focal-plane mixed-signal hardware for Viola-Jones early vision tasks
J. Fernández-Berni, R. Carmona-Galán, R. del Río and A. Rodríguez-Vázquez
Journal Paper - International Journal of Circuit Theory and Applications, vol. 43, no. 8, pp 1063-1079, 2015
JOHN WILEY & SONS    DOI: 10.1002/cta.1996    ISSN: 0098-9886    » doi
[abstract]
Focal-plane mixed-signal arrays have traditionally been designed according to the general claim that moderate accuracy in processing is affordable. The performance of their circuitry has been analyzed in these terms without a comprehensive study of the ultimate consequences of such moderate accuracy. In this paper, for the first time to the best of our knowledge, we do carry out this study. We move expectable performance of mixed-signal image processing hardware directly into the vision algorithm making use of it. This permits to close a wider design loop, enabling a more aggressive design of this kind of hardware provided that the algorithm, at the highest level -semantic interpretation of the scene-, can afford it. Thus, we present a thorough analysis of the non-idealities associated with the implementation of a QVGA array tailored for the distinctive characteristics of the Viola-Jones processing framework. The resulting deviation models are then introduced in the processing flow of this framework provided by the OpenCV library. We have found, contrary to what could be expected, that these deviations do not necessarily degrade the performance of the Viola-Jones algorithm. They could be even beneficial for certain high-level specifications. Additionally, we demonstrate the architectural advantages of our approach: exploitation of focal-plane distributed memory and ultra-low-power operation.

Using 3-D Technologes for Form Factor Improvement of Low-Power Vision Sensors
J. Fernández-Berni, S. Vargas, J.A. Leñero and B. Pérez-Verdú
Conference - IEEE Latin American Symposium on Circuits and Systems LASCAS 2014
[abstract]
While conventional CMOS active pixel sensors embed only the circuitry required for photo-detection, pixel addressing and voltage buffering, smart pixels incorporate also circuitry for data processing, data storage and control of data interchange. This additional circuitry enables data processing be realized concurrently with the acquisition of images which is instrumental to reduce the number of data needed to carry to information contained into images. This way, more efficient vision systems can be built at the cost of larger pixel pitch. Vertically-integrated 3D technologies enable to keep the advnatges of smart pixels while improving the form factor of smart pixels.

Live Demo: Real-time Focal-plane Face Obfuscation through Programmable Pixelation
J. Fernández-Berni, R. Carmona-Galán, R. del Río, J.A. Leñero-Bardallo, R. Kleihorsty, W. Philipsy and Á. Rodríguez-Vázquez
Conference - Workshop on the Architecture of Smart Cameras WASC 2014
[abstract]
Privacy concerns are hindering the introduction of smart camera networks in application scenarios like retailing analytics, factories or elderly care. Indeed, there is usually no need of dealing with sensitive data when it comes to carrying out a meaningful visual analysis in these scenarios. Time spent by customers in front of a showcase, trajectories of workers around a manufacturing site or fall detection in a nursing home are three examples where video analytics can be performed without compromising privacy. But still the idea of networked cameras pervasively collecting data generates social rejection in the face of sensitive information being tampered by hackers or misused by legitimate users. New strategies must be developed in order to ensure privacy from the very point where sensitive data are generated: the sensors. Protection measures embedded on-chip at the front-end sensor of each network node significantly reduce the number of trusted system components as well as the impact of potential software flaws. In this demonstration, we present a full-custom QVGA vision sensor that can be reconfigured to implement programmable pixelation of image regions at the focal plane. According to the literature, pixelation provides the best performance in terms of balance between privacy protection and intelligibility of the surveyed scene.

High dynamic range adaptation for ROI tracking based on reconfigurable concurrent dual-sensing
J. Fernández-Berni, R. Carmona-Galán, R. del Río and A. Rodríguez-Vázquez
Journal Paper - Electronics Letters, vol. 50, no. 24, pp 1832-1834, 2014
IET    DOI: 10.1049/el.2014.3136    ISSN: 0013-5194    » doi
[abstract]
A single-exposure technique to extend the dynamic range of vision sensors is presented. It is particularly suitable for vision algorithms requiring region-of-interest (ROI) tracking under varying illumination conditions. The operation is supported by two intertwined photodiodes at pixel level and two digital registers at the periphery of the pixel matrix. These registers divide the focal plane into independent regions within which automatic concurrent adjustment of the integration time takes place for each frame. At pixel level, one of the photodiodes senses the pixel value itself, whereas the other, in collaboration with its counterparts in every prescribed ROI, senses the mean illumination of that specific ROI. An additional circuitry interconnecting both photodiodes asynchronously determines the integration period for each ROI according to its mean illumination. The experimental results for a quarter video graphics array prototype CMOS vision sensor are reported.

Demo: A prototype vision sensor for real-time focal-plane obfuscation through tunable pixelation
J. Fernández-Berni, R. Carmona-Galán, R. del Río, R. Kleihorst, W. Philips and A. Rodríguez-Vázquez
Conference - IEEE/ACM Int. Conference on Distributed Smart Cameras ICDSC 2014
[abstract]
Privacy concerns are hindering the introduction of smart camera networks in prospective application scenarios like retail analytics, factory monitoring or elderly care. The idea of networked cameras pervasively collecting data generates social rejection in the face of sensitive information being tampered by hackers or misused by legitimate users. New strategies must be developed in order to ensure privacy from the very point where sensitive data are generated: the sensors. Protection measures embedded on-chip at the front-end sensor of each network node significantly reduce the number of trusted system components as well as the impact of potential software flaws. In this demonstration, we present a full-custom QVGA vision sensor that can be recongured to implement programmable pixelation of image regions at the focal plane. In particular, we show on-the-fly focal-plane face obfuscation supported by the Viola-Jones frontal face detector provided by OpenCV.

A 26.5 nJ/px 2.64Mpx/s CMOS vision sensor for gaussian pyramid extraction
M. Suárez-Cambre, V. Brea, J. Fernández-Berni, R. Carmona-Galán, D. Cabello and A. Rodríguez-Vázquez
Conference - European Solid-State Circuits Conference ESSCIRC 2014
[abstract]
This paper introduces a CMOS vision sensor to extract the Gaussian pyramid with an energy cost of 26.5 nJ/px at 2.64 Mpx/s, thus outperforming conventional solutions employing an imager and a separate digital processor. The chip, manufactured in a 0.18 μm CMOS technology, consists of an arrangement of 88×60 processing elements (PEs) which captures images of 176×120 resolution and performs concurrent parallel processing right at pixel level. The Gaussian pyramid is generated by using a switched-capacitor network. Every PE includes four photodiodes, four MiM capacitors, one 8-bit single-slope ADC and one CDS circuit, occupying 44x44 μm2 . Suitability of the chip is assessed by using metrics pertaining to visual tracking.

Focal-plane sensing-processing: A power-efficient approach for the implementation of privacy-aware networked visual sensors
J. Fernandez-Berni, R. Carmona-Galan, R. del Rio, R. Kleihorst, W. Philips and A. Rodriguez-Vazquez
Journal Paper - Sensors, vol. 14, no. 8, pp. 15203-15226, 2014
MDPI AG    DOI: 10.3390/s140815203    ISSN: 1424-8220    » doi
[abstract]
The capture, processing and distribution of visual information is one of the major challenges for the paradigm of the Internet of Things. Privacy emerges as a fundamental barrier to overcome. The idea of networked image sensors pervasively collecting data generates social rejection in the face of sensitive information being tampered by hackers or misused by legitimate users. Power consumption also constitutes a crucial aspect. Images contain a massive amount of data to be processed under strict timing requirements, demanding high-performance vision systems. In this paper, we describe a hardware-based strategy to concurrently address these two key issues. By conveying processing capabilities to the focal plane in addition to sensing, we can implement privacy protection measures just at the point where sensitive data are generated. Furthermore, such measures can be tailored for efficiently reducing the computational load of subsequent processing stages. As a proof of concept, a full-custom QVGA vision sensor chip is presented. It incorporates a mixed-signal focal-plane sensing-processing array providing programmable pixelation of multiple image regions in parallel. In addition to this functionality, the sensor exploits reconfigurability to implement other processing primitives, namely block-wise dynamic range adaptation, integral image computation and multi-resolution filtering. The proposed circuitry is also suitable to build a granular space, becoming the raw material for subsequent feature extraction and recognition of categorized objects.

Fire detection with a frame-less vision sensor working in the NIR band
J.A. Leñero-Bardallo, J. Fernández-Berni, R. Carmona-Galán, P. Häfliger and Á. Rodríguez-Vázquez
Conference - International Conference on Forest Fire Research ICFFR 2014
DOI: 10.14195/978-989-26-0884-6_151    » doi
[abstract]
This paper draws the attention of the community about the capabilities of an emerging generation of bio-inspired vision sensors to be used in fire detection systems. Their principle of operation will be described. Moreover experimental results showing the performance of an event-based vision sensor will be provided. The sensor was intended to monitor flames activity without using optic filters. In this article, we will also extend this preliminary work and explore how its outputs can be processed to detect fire in the environment.

Review of ADCs for imaging
J.A. Leñero-Bardallo, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference - IS&T International Symposium on Electronic Imaging 2014
[abstract]
The aim of this article is to guide image sensors designers to optimize the analog-to-digital conversion of pixel outputs. The most common ADCs topologies for image sensors are presented and discussed. The ADCs specific requirements for these sensors are analyzed and quantified. Finally, we present relevant recent contributions of specific ADCs for image sensors and we compare them using a novel FOM.

A QVGA Vision Sensor with Multi-functional Pixels for Focal-Plane Programmable Obfuscation
J. Fernández-Berni, R. Carmona Galán, R. del Río and Á. Rodríguez-Vázquez
Conference - ACM/IEEE International Conference on Distributed Smart Cameras ICDSC 2014
[abstract]
Privacy awareness constitutes a critical aspect for smart camera networks. An ideal awless protection of sensitive information would boost their application scenarios. However, it is still far from being achieved. Numerous challenges arise at diferent levels, from hardware security to subjective perception. Generally speaking, it can be stated that the closer to the image sensing device the protection measures take place, the higher the privacy and security attainable. Likewise, the integration of heterogeneous camera components becomes simpler since most of them will not require to consider privacy issues. The ultimate objective would be to incorporate complete protection directly into a smart image sensor in such a way that no sensitive data would be delivered off-chip while still permitting the targeted video analytics. This paper presents a 320x240-px prototype vision sensor embedding processing capabilities useful for accomplishing this objective. It is based on recongurable focal-plane sensing-processing that can provide programmable obfuscation. Pixelation of tunable granularity can be applied to multiple image regions in parallel. In addition to this functionality, the sensor exploits reconfigurability to implement other processing primitives, namely block-wise high dynamic range, integral image computation and Gaussian filtering. Its power consumption ranges from 42.6mW for high dynamic range operation to 55.2mW for integral image computation at 30fps. It has been fabricated in a standard 0.18μm CMOS process.

Towards an ultra-low-power low-cost wireless visual sensor node for fine-grain detection of forest fires
J. Fernández-Berni, R. Carmona-Galán, J.A. Leñero-Bardallo, R. Kleihorst and Á. Rodríguez-Vázquez
Conference - International Conference on Forest Fire Research ICFFR 2014
[abstract]
Advances in electronics, sensor technologies, embedded hardware and software are boosting the application scenarios of wireless sensor networks. Specifically, the incorporation of visual capabilities into the nodes means a milestone, and a challenge, in terms of the amount of information sensed and processed by these networks. The scarcity of resources -power, processing and memory- imposes strong restrictions on the vision hardware and algorithms suitable for implementation at the nodes. Both, hardware and algorithms must be adapted to the particular characteristics of the targeted application. This permits to achieve the required performance at lower energy and computational cost. We have followed this approach when addressing the detection of forest fires by means of wireless visual sensor networks. From the development of a smoke detection algorithm down to the design of a low-power smart imager, every step along the way has been influenced by the objective of reducing power consumption and computational resources as much as possible. Of course, reliability and robustness against false alarms have also been crucial requirements demanded by this specific application. All in all, we summarize in this paper our experience in this topic. In addition to a prototype vision system based on a full-custom smart imager, we also report results from a vision system based on ultra-low-power low-cost commercial imagers with a resolution of 30x30 pixels. Even for this small number of pixels, we have been able to detect smoke at around 100 meters away without false alarms. For such tiny images, smoke is simply a moving grey stain within a blurry scene, but it features a particular spatio-temporal dynamics. As described in the manuscript, the key point to succeed with so low resolution thus falls on the adequate encoding of that dynamics at algorithm level.

Gaussian pyramid extraction with a CMOS vision sensor
M. Suárez, V.M. Brea, J. Fernández-Berni, R. Carmona-Galán, D. Cabello and A. Rodríguez-Vázquez
Conference - International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2014
[abstract]
This paper addresses a CMOS vision sensor with 176x120 pixels in standard 0.18 μm CMOS technology that computes the Gaussian pyramid. The Gaussian pyramid is extracted with a double-Euler switched-capacitor network, giving RMSE errors below 1.2% of full-scale value. The chip provides a Gaussian pyramid of 3 octaves with 6 scales each with an energy cost of 26.5 nJ at 2.64 Mpx/s.

Comparative Analysis of Compressive Sensing Strategies for Smart Compressive Image Sensors
M. Trevisi, R. Carmona-Galán, J. Fernández-Berni and Á. Rodríguez-Vázquez
Conference - Workshop on the Architecture of Smart Cameras WASC 2014
[abstract]
Compressive sensing (CS) first appeared eight years ago as a new kind of signal processing theory. Since then, only a few steps have been taken to turn this theory into feasible practice. We have identified two important gaps that stand behind the lag on practical compressive image sensors. The first one is that, technologically speaking, there is not yet any imager based on CS that is able to overrule, at least in resolution/decryption-speed ratio, the capabilities of current standard imagers. The second one is that none of the published reports on compressive image sensors mention how the compressive sensing strategy is passed from the sensor, which is delivering image samples, to the image reconstruction algorithm at the reception side. In order to capture compressed image samples, it is necessary that the imagers implement a compressive strategy, which has the form of a matrix that convolves the original signal. There are two sets of methods that are primarily implemented nowadays to build a compressive strategy, the first one is to pick each element from a random distribution, preferably Gaussian, and the second is to arrange in random order the rows of an incoherent orthobasis matrix, preferably Fourier matrix. The selection of the compressive sensing strategy has an incidence on the physical implementation, for instance it is easier to implement a binary mask on each pixel than to multiply its value by a real number. In order to analyze the effect of this selection in the reconstruction from the samples delivered by the compressive image sensor, we have simulated the process with a MATLAB test bench and compared the reconstruction times and the RMSE vs. the number of samples delivered of three different sensing strategies using a 64×64 image of Lena as test image and a Total Variation NESTA algorithm as reconstruction algorithm.

Parallel Processing Architectures and Power Efficiency in Smart Camera Chips
R. Carmona-Galán, J. Fernández-Berni, M. Trevisi and A. Rodríguez-Vázquez
Conference - Workshop on the Architecture of Smart Cameras WASC 2014
[abstract]
Because of the massive amount of data, image and video processing represents a huge computational demand. Providing the necessary resources in embedded systems is not an easy task. Providing them on-chip needs a serious reconsideration of the processing architecture. In order to speed up processing, the number of processors/cores operating in parallel can be increased. This is an intuitive result, but there are two drawbacks. First, this speedup is limited by the degree to which the algorithm can be parallelized, what is known as Amdahl's law. Second, the more hardware operating at the same time the higher the power consumption. However, image processing and, in general, the processing visual information that keeps a retinotopic topology, affords an inherent parallelism that can be exploited to a great extent. Furthermore, there is an additional and less intuitive result of parallelization, which is that using a larger number of processors renders a more efficient implementation in terms of GOPS/W. Evidence of this can be found in massively parallel array processors. Their distributed architecture is adapted to the nature of the visual stimulus to the point that the amount of energy dedicated to data transmission and memory operations is largely reduced. The result is a power efficient implementation, perfectly suited for autonomous embedded vision systems working on a restricted power budget. This trend is observed in multi- and many-core processor chips, GPUs and different types of single-instruction multiple-data (SIMD) arrays containing from tens to hundreds of processing elements (PE). In the case of analog array processors, a similar relation is observed despite the fact that the disparity between design techniques, signal representation and the computation of the effective number of OPS advises against any sort of comparison.

Form Factor Improvement of Smart-Pixels for Vision Sensors through 3-D Vertically-Integrated Technologies
A. Rodríguez-Vázquez, R. Carmona-Galán, J. Fernández Berni, S. Vargas, J.A. Leñero, M. Suárez, V. Brea and B. Pérez-Verdú
Conference - IEEE Latin American Symposium on Circuits and Systems LASCAS 2014
[abstract]
While conventional CMOS active pixel sensors embed only the circuitry required for photo-detection, pixel addressing and voltage buffering, smart pixels incorporate also circuitry for data processing, data storage and control of data interchange. This additional circuitry enables data processing be realized concurrently with the acquisition of images which is instrumental to reduce the number of data needed to carry to information contained into images. This way, more efficient vision systems can be built at the cost of larger pixel pitch. Vertically-integrated 3D technologies enable to keep the advnatges of smart pixels while improving the form factor of smart pixels.

Smart imaging for power-efficient extraction of Viola-Jones local descriptors
J. Fernández-Berni, R. Carmona-Galán, R. del Río, J.A. Leñero-Bardallo, M. Suárez-Cambre and A. Rodríguez-Vázquez
Conference - IS&T International Symposium on Electronic Imaging 2014
[abstract]
In computer vision, local descriptors permit to summarize relevant visual cues through feature vectors. These vectors constitute inputs for trained classifiers which in turn enable diferent high-level vision tasks. While local descriptors certainly alleviate the computation load of subsequent processing stages by preventing them from handling raw images, they still have to deal with individual pixels. Feature vector extraction can thus become a major limitation for conventional embedded vision hardware. In this paper, we present a power-eficicient sensing-processing array conceived to provide the computation of integral images at diferent scales. These images are intermediate representations that speed up feature extraction. In particular, the mixed-signal array operation is tailored for extraction of Haar-like features. These features feed the cascade of classifiers at the core of the Viola-Jones framework. The processing lattice has been designed for the standard UMC 0.18μm 1P6M CMOS process. In addition to integral image computation, the array can be reprogrammed to deliver other early vision tasks: concurrent rectangular area sum, block-wise HDR imaging, Gaussian pyramids and image pre-warping for subsequent reduced kernel filtering.

Special Issue on Advances in sensing and communication circuits (ICECS 2012)
A. Rodríguez-Vázquez, J. Fernández-Berni and J.M. de la Rosa
Journal Paper - Analog Integrated Circuits and Signal Processing , vol. 77, no. 3, pp 315-317, 2013
SPRINGER    DOI: 10.1007/s10470-013-0218-4    ISSN: 0925-1030    » doi
[abstract]
Abstract not available

A hierarchical vision processing architecture oriented to 3D integration of smart camera chips
R. Carmona-Galán, Á. Zarándy, C. Rekeczky, P. Földesy, A. Rodríguez-Pérez, C. Domínguez-Matas, J. Fernández-Berni, G. Liñán-Cembrano, B. Pérez-Verdú, Z. Kárász, M. Suárez-Cambre, V. Brea-Sánchez, T. Roska, Á. Rodríguez-Vázquez
Journal Paper - Journal of Systems Architecture, vol. 59, no. 10 part A, pp 908-919, 2013
ELSEVIER    DOI: 10.1016/j.sysarc.2013.03.002    ISSN: 1383-7621    » doi
[abstract]
This paper introduces a vision processing architecture that is directly mappable on a 3D chip integration technology. Due to the aggregated nature of the information contained in the visual stimulus, adapted architectures are more efficient than conventional processing schemes. Given the relatively minor importance of the value of an isolated pixel, converting every one of them to digital prior to any processing is inefficient. Instead of this, our system relies on focal-plane image filtering and key point detection for feature extraction. The originally large amount of data representing the image is now reduced to a smaller number of abstracted entities, simplifying the operation of the subsequent digital processor. There are certain limitations to the implementation of such hierarchical scheme. The incorporation of processing elements close to the photo-sensing devices in a planar technology has a negative influence in the fill factor, pixel pitch and image size. It therefore affects the sensitivity and spatial resolution of the image sensor. A fundamental tradeoff needs to be solved. The larger the amount of processing conveyed to the sensor plane, the larger the pixel pitch. On the contrary, using a smaller pixel pitch sends more processing circuitry to the periphery of the sensor and tightens the data bottleneck between the sensor plane and the memory plane. 3D integration technologies with a high density of through-silicon-vias can help overcome these limitations. Vertical integration of the sensor plane and the processing and memory planes with a fully parallel connection eliminates data bottlenecks without compromising fill factor and pixel pitch. A case study is presented: a smart vision chip designed on a 3D integration technology provided by MIT Lincoln Labs, whose base process is 0.15 μm FD-SOI. Simulation results advance performance improvements with respect to the state-of-the-art in smart vision chips.

A 176x120 Pixel CMOS Vision Chip for Gaussian Filtering with Massivelly Parallel CDS and A/D-Conversion
M. Suárez, V.M. Brea, D. Cabello, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - European Conference on Circuit Theory and Design ECCTD 2013
[abstract]
This paper conveys a proof-of-concept chip for Gaussian pyramid generation for image feature detectors. Gaussian filtering and image resizing are performed with a switched-capacitor (SC) network. The chip is conceived as the mapping of a CMOS-3D architecture for feature detectors onto a conventional technology, with some functionality removed, and the corresponding area overhead with respect to that of a CMOS-3D architecture, but preserving masivelly parallel Correlated Double Sampling (CDS) and A/D conversion. The chip has been fabricated on a die of 5×5 mm2 with 0.18 μm CMOS technology, achieving an array of 176×120 sensing elements (pixels). The pixels are arranged in Processing Elements (PEs). Every PE comprises four photodiodes, four SC nodes, one CDS circuit, and local circuitry for one ADC. Every PE occupies an area of 44×44 μm2. The chip senses an image and computes the Gaussian pyramid with an average power consumption lower than 75 nW/pixel at 30 frames/s.

An Ultra-Low-Power Voltage-Mode Asynchronous WTA-LTA Circuit
J. Fernández-Berni, R. Carmona-Galán and A. Rodríquez-Vázquez
Conference - IEEE International Symposium on Circuits and Systems ISCAS 2013
[abstract]
This paper presents an asynchronous mixed-signal WTA-LTA circuit conceived to carry out local minimum maximum indexing in massively parallel image processing arrays. The hardware is focused on energy-efficient operation. We describe a realization for the standard CMOS base process of a commercial 3-D TSV stack featuring a power consumption of only 20pW per elementary cell at 30fps. The proposed block is also capable of resolving small voltage differences without requiring any external reference. This leads to a hit percentage greater than 90% even when taking into account global process variations and mismatch conditions.

Ultralow-power processing array for image enhancement and edge detection
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper - IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 59, no. 11, pp 751-755, 2012
IEEE    DOI: 10.1109/TCSII.2012.2228394    ISSN: 1549-7747    » doi
[abstract]
This paper presents a massively parallel processing array designed for the 0.13-μm 1.5-V standard CMOS base process of a commercial 3-D through-silicon via stack. The array, which will constitute one of the fundamental blocks of a smart CMOS imager currently under design, implements isotropic Gaussian filtering by means of a MOS-based RC network. Alternatively, this filtering can be turned into anisotropic by a very simple voltage comparator between neighboring nodes whose output controls the gate of the elementary MOS resistor. Anisotropic diffusion enables image enhancement by removing noise and small local variations while preserving edges. A binary edge image can also be attained by combining the output of the voltage comparators. In addition to these processing capabilities, the simulations have confirmed the robustness of the array against process variations and mismatch. The power consumption extrapolated for VGA-resolution array processing images at 30 fps is 570 μW.

Real-Time Remote Reporting of Motion Analysis with Wi-Flip
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - Int. Workshop on Cellular Nanoscale Networks and their Applications CNNA 2012
[abstract]
This paper describes a real-time application programmed into Wi-FLIP, a wireless smart camera resulting from the integration of FLIP-Q, a prototype mixed-signal focal-plane array processor, and Imote2, a commercial WSN platform. The application consists in scanning the whole scene by sequentially analyzing small regions. Within each region, motion is detected by background subtraction. Subsequently, information related to that motion - intensity and location - is radio-propagated in order to remotely account for it. By aggregating this information along time, a motion map of the scene is built. This map permits to visualize the different activity patterns taking place. It also provides an elaborated representation of the scene for further remote analysis, preventing raw images from being transmitted. In particular, the scene inspected in this demo corresponds to vehicular traffic in a motorway. The remote representation progressively built enables the assessment of the traffic density.

Early forest fire detection by vision-enabled wireless sensor networks
J. Fernández-Berni, R. Carmona-Galán, J.F. Martínez-Carmona and A. Rodríguez-Vázquez
Journal Paper - International Journal of Wildland Fire, vol. 21, no. 8, pp 938-949, 2012
Commonwealth Scientific and Industrial Research Organization Publishing    DOI: 10.1071/WF11168    ISSN: 1049-8001    » doi
[abstract]
Wireless sensor networks constitute a powerful technology particularly suitable for environmental monitoring. With regard to wildfires, they enable low-cost fine-grained surveillance of hazardous locations like wildland-urban interfaces. This paper presents work developed during the last 4 years targeting a vision-enabled wireless sensor network node for the reliable, early on-site detection of forest fires. The tasks carried out ranged from devising a robust vision algorithm for smoke detection to the design and physical implementation of a power-efficient smart imager tailored to the characteristics of such an algorithm. By integrating this smart imager with a commercial wireless platform, we endowed the resulting system with vision capabilities and radio communication. Numerous tests were arranged in different natural scenarios in order to progressively tune all the parameters involved in the autonomous operation of this prototype node. The last test carried out, involving the prescribed burning of a 9520-m shrub plot, confirmed the high degree of reliability of our approach in terms of both successful early detection and a very low false-alarm rate. Journal compilation.

CMOS-3D Smart Imager Architectures for Feature Detection
V.M. Brea, J. Fernández-Berni, R. Carmona-Galán, G. Liñán, D. Cabello and A. Rodríguez-Vázquez
Journal Paper - IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 2, no. 4, pp 723-736, 2012
IEEE    DOI: 10.1109/JETCAS.2012.2223552    ISSN: 2156-3357    » doi
[abstract]
This paper reports a multi-layered smart image sensor architecture for feature extraction based on detection of interest points. The architecture is conceived for 3-D integrated circuit technologies consisting of two layers (tiers) plus memory. The top tier includes sensing and processing circuitry aimed to perform Gaussian filtering and generate Gaussian pyramids in fully concurrent way. The circuitry in this tier operates in mixed-signal domain. It embeds in-pixel correlated double sampling, a switched-capacitor network for Gaussian pyramid generation, analog memories and a comparator for in-pixel analog-to-digital conversion. This tier can be further split into two for improved resolution; one containing the sensors and another containing a capacitor per sensor plus the mixed-signal processing circuitry. Regarding the bottom tier, it embeds digital circuitry entitled for the calculation of Harris, Hessian, and difference-of-Gaussian detectors. The overall system can hence be configured by the user to detect interest points by using the algorithm out of these three better suited to practical applications. The paper describes the different kind of algorithms featured and the circuitry employed at top and bottom tiers. The Gaussian pyramid is implemented with a switched-capacitor network in less than 50 μs, outperforming more conventional solutions.

Design of a smart camera SoC in a 3D-IC technology
R. Carmona-Galán, J. Fernández-Berni, S. Vargas-Sierra, G. Liñán-Cembrano, A. Rodríguez-Vázquez, V. Brea-Sánchez, M. Suárez-Cambre and D. Cabello-Ferrer
Conference - Workshop on Architecture of Smart Camera, 2012
[abstract]
Conventional digital signal processing architectures introduce data bottlenecks and are inefficient when dealing with multidimensional sensory signals; Architectures adapted to the nature of the stimulus are more efficient in terms of power consumption per operation but¿;Concurrent sensing, processing and memory in planar technologies introduces serious limitations to image resolution and image size via the penalties in fill factor and pixel pitch; 3D integrated circuit technologies with a dense TSV distribution permits eliminating data bottlenecks without degrading image resolution and size.

Power-efficient focal-plane image representation for extraction of enriched Viola-Jones features
J. Fernández-Berni, L. Acasandrei, R. Carmona-Galán, A. Barriga-Barrios and A. Rodríguez-Vázquez
Conference - IEEE International Symposium on Circuits and Systems ISCAS 2012
[abstract]
This paper describes the use of a reconfigurable focal-plane processing array in order to achieve an image representation which dramatically reduces the computational load of the Viola-Jones object detection framework. Additionally, such representation provides richer information than the simple sum of pixels within rectangular regions originally defined in this framework. As a result, more elaborated features could be devised to speed up the execution of the subsequent attentional cascade, boosting thus the performance of the whole algorithm. The proposed circuitry has been successfully implemented in a CMOS prototype smart imager. Experimental results are given, demonstrating the suitability of the approach presented to efficiently deliver enriched Viola-Jones features.

All-MOS implementation of RC networks for time-controlled Gaussian spatial filtering
J. Fernández-Berni and R. Carmona-Galán
Journal Paper - International Journal of Circuit Theory and Applications,vol. 40, no. 8, pp 859-876, 2012
JOHN WILEY & SONS    DOI: 10.1002/cta.759    ISSN: 0098-9886    » doi
[abstract]
This paper addresses the design and VLSI implementation of MOS-based RC networks capable of performing time-controlled Gaussian filtering. In these networks, all the resistors are substituted one by one by a single MOS transistor biased in the ohmic region. The design of this elementary transistor is carefully realized according to the value of the ideal resistor to be emulated. For a prescribed signal range, the MOSFET in triode region delivers an interval of instantaneous resistance values. We demonstrate that, for the elementary 2-node network, establishing the design equation at a particular point within this interval guarantees minimum error. This equation is then corroborated for networks of arbitrary size by analyzing them from a stochastic point of view. Following the design methodology proposed, the error committed by an MOS-based grid when compared with its equivalent ideal RC network is, despite the intrinsic nonlinearities of the transistors, below 1% even under mismatch conditions of 10%. In terms of image processing, this error hardly affects the outcome, which is perceptually equivalent to that of the ideal network. These results, extracted from simulation, are verified in a prototype vision chip with QCIF resolution manufactured in the AMS 0.35Âμm CMOS-OPTO process. This prototype incorporates a focal-plane MOS-based RC network that performs fully programmable Gaussian filtering. Copyright © 2011 John Wiley & Sons, Ltd. This paper addresses the design and VLSI implementation of all-MOS RC networks capable of performing timecontrolled Gaussian filtering. Following the design methodology proposed, the error committed by a MOS-based grid when compared to its equivalent ideal RC networks is, despite the intrinsic nonlinearities of the transistors, below 1% even under mismatch conditions of 10%. These results, extracted from simulation, are verified in a prototype vision chip with QCIF resolution manufactured in the AMS 0.35Âμm CMOS-OPTO process. Copyright © 2011 John Wiley & Sons, Ltd. Copyright © 2011 John Wiley & Sons, Ltd.

Low-power smart imagers for vision-enabled sensor networks
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Book - 156 p, 2012
SPRINGER    ISBN: 978-1-4614-2391-1    » link
[abstract]
This book presents a comprehensive, systematic approach to the development of vision system architectures that employ sensory-processing concurrency and parallel processing to meet the autonomy challenges posed by a variety of safety and surveillance applications. Coverage includes a thorough analysis of resistive diffusion networks embedded within an image sensor array. This analysis supports a systematic approach to the design of spatial image filters and their implementation as vision chips in CMOS technology. The book also addresses system-level considerations pertaining to the embedding of these vision chips into vision-enabled wireless sensor networks. Describes a system-level approach for designing of vision devices and embedding them into vision-enabled, wireless sensor networks; Surveys state-of-the-art, vision-enabled WSN nodes; Includes details of specifications and challenges of vision-enabled WSNs; Explains architectures for low-energy CMOS vision chips with embedded, programmable spatial filtering capabilities; Includes considerations pertaining to the integration of vision chips into off-the-shelf WSN platforms.

Sensores de visión de bajo consumo de potencia para vigilancia y monitorización en red
J. Fernández-Berni
Thesis - Date of defense: 20/06/2011
UNIVERSIDAD DE SEVILLA, IMSE-CNM    » link
[abstract]
Esta tesis constituye una contribución a la incorporación eficiente de hardware de visión en sistemas empotrados. Específicamente, nos centramos en la integración de chips de visión en los nodos de una red inalámbrica de sensores. Este tipo de redes de nodos inteligentes permite la implementación de lo que se conoce como computación ubicua. En esta nueva era para los ordenadores, los dispositivos de sensado, procesamiento y actuación se integran en nuestro contexto vital de manera totalmente transparente. El desarrollo de esta inteligencia ambiental se basa fundamentalmente en la eficiencia energética de sus elementos constitutivos. En el contexto de las redes inalámbricas de sensores, el objetivo es extender al máximo la vida útil de los nodos, evitando ciclos frecuentes de mantenimiento de cientos o miles de dispositivos geográficamente dispersos. En estas circunstancias, la incorporación de la visión dentro del catálogo de capacidades sensoriales de los nodos no es un asunto trivial en absoluto. Se requieren nuevas estrategias para abordar, con un coste energético acorde al escenario descrito, el procesamiento del flujo masivo de información asociado a una secuencia de imágenes.
Nuestro enfoque parte de un nivel muy bajo, es decir, del diseño de circuitos simples formados por transistores MOS. Estos circuitos conforman elementos de procesamiento que, aislados, apenas presentan utilidad, pero que organizados e interconectados adecuadamente pueden llegan a procesar una enorme cantidad de datos en paralelo con muy bajo consumo de potencia. La posibilidad que ofrecen las actuales tecnologías CMOS de integrar circuitos de estas características con dispositivos fotosensores, hacen de los arrays de sensores-procesadores una herramienta fundamental para implementar, con reducido coste energético, el procesamiento de imágenes. Basándonos en este marco conceptual, hemos diseñado un chip de visión prototipo en el que diferentes primitivas de procesamiento pueden ser programadas. Dichas primitivas hacen uso de la implementación VLSI eficiente de un proceso de difusión controlable en el tiempo así como de un plano focal reconfigurable en regiones definidas por el usuario. Los resultados obtenidos sitúan a nuestro prototipo como uno de los más competitivos en relación con aquellos reportados en la literatura que llevan a cabo un procesamiento similar. Es de reseñar especialmente el bajo consumo de potencia, que ha permitido su integración en un nodo de red inálambrica de sensores comercial, suponiendo sólo un 5.2%, en el peor de los casos, del consumo de potencia total del sistema resultante. Finalmente, hacemos una incursión en profundidad en una aplicación que se ve especialmente favorecida por la disponibilidad 'ubicua' de procesar imágenes. Hablamos de la detección temprana de incendios forestales. Así, se presenta un concepto de sistema de detección en el que, a diferencia de los actuales basados en cámaras que vigilan grandes extensiones, cada sensor de la red lleva a cabo la monitorización de pequeñas áreas de vegetación. Estos sensores deben estar constantemente ejecutando un algoritmo de detección, por lo que se hace fundamental la eficiencia a la hora de realizar el correspondiente procesamiento. De hecho, se presenta un algoritmo de detección que hace uso de las primitivas implementadas en el prototipo. Este algoritmo, testado en diferentes plataformas, ha sido capaz de llevar a cabo, con muy alto grado de fiabilidad y robustez contra falsas alarmas, la detección in-situ de humo en la plataforma de visión inalámbrica diseñada.

Focal-plane dynamic texture segmentation by programmable binning and scale extraction
J. Fernández-Berni and R. Carmona-Galán
Book Chapter - Focal-Plane Sensor-Processor Chips, pp 105-124, 2011
SPRINGER    DOI: 10.1007/978-1-4419-6475-5_5    ISBN: 978-1-4419-6474-8    » doi
[abstract]
Dynamic textures are spatially repetitive time-varying visual patterns that present, however, some temporal stationarity within their constituting elements. In addition, their spatial and temporal extents are a priori unknown. This kind of pattern is very common in nature; therefore, dynamic texture segmentation is an important task for surveillance and monitoring. Conventional methods employ optic flow computation, though it represents a heavy computational load. Here, we describe texture segmentation based on focal-plane space-scale generation. The programmable size of the subimages to be analysed and the scales to be extracted encode sufficient information from the texture signature to warn its presence. A prototype smart imager has been designed and fabricated in 0.35 μm CMOS, featuring a very low-power scale-space representation of used-defined subimages.

Image filtering by reduced kernels exploiting kernel structure and focal-plane averaging
J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference - European Conference on Circuit Theory and Design ECCTD 2011
[abstract]
Incorporating multi-resolution capabilities into imagers renders additional power saving mechanisms in the subsequent image processing. In this paper, we show how, by exploiting a certain mask structure, 3 × 3 kernels can be reduced to 2 × 2 kernels if charge redistribution is provided at the focal plane of the imaging device. More precisely, by averaging and shifting a half-resolution pixel grid, we will have a pre-processed image, subsampled by a factor of 2 on each dimension, that can be filtered with a mask of a reduced size. Very useful image filtering kernels, like a 3 × 3 Gaussian kernel for image smoothing, or the well-known Sobel operators, fall into this category of reducible kernels. Operating onto the pre-processed image with one of these reduced kernels represents a smaller number of operations per pixel than realizing all the multiply-accumulate operations needed to apply a 3 × 3 kernel. Memory accesses are reduced in the same fraction. Concerning the difficulties of providing this pre-processed image representation, we propose a methodology for obtaining it at a very low power cost. It requires the implementation of user definable image subdivision and subsampling at the focal plane. Experimental results are given, obtained from measurements on a CMOS imager prototype chip incorporating these multi-resolution capabilities. © 2011 IEEE.

Demo: Real-time remote reporting of active regions with Wi-FLIP
J. Fernández-Berni, R. Carmona-Galán, G. Liñán-Cembrano, A. Zarándy and A. Rodríguez-Vázquez
Conference - ACM/IEEE International Conference on Distributed Smart Cameras ICDSC 2011
[abstract]
This paper describes a real-time application programmed into Wi-FLIP, a wireless smart camera resulting from the integration of FLIP-Q, a focal-plane low-power image processor, and Imote2, a commercial WSN platform. The application, though simple, shows the potentiality of the reduced scene representations achievable at FLIP-Q to speed up the processing. It consists of detecting the active regions within the scene being surveyed, that is, those regions undergoing thresholded variations with respect to the background. If an activity pattern is prescribed, FLIP-Q enables the reconfigurability of the image plane accordingly, making its detection and tracking easier. For each frame, the number of active regions is calculated and wirelessly reported in real time. A base station picks up the radio signal and sends the information to a PC via USB, also in real time. Frame rates up to around 10fps have been achieved, although it greatly depends on the light conditions and the image plane division grid. © 2011 IEEE.

Wi-FLIP: A wireless smart camera based on a focal-plane low-power image processor
J. Fernández-Berni, R. Carmona-Galán, G. Liñán-Cembrano, A. Zarándy and A. Rodríguez-Vázquez
Conference - ACM/IEEE International Conference on Distributed Smart Cameras ICDSC 2011
[abstract]
This paper presents Wi-FLIP, a vision-enabled WSN node resulting from the integration of FLIP-Q, a prototype vision chip, and Imotel, a commercial WSN platform. In Wi-FLIP, image processing is not only constrained to the digital domain like in conventional architectures. Instead, its image sensor - the FLIP-Q prototype - incorporates pixel-level processing elements (PEs) implemented by analog circuitry. These PEs are interconnected, rendering a massively parallel SIMD-based focal-plane array. Low-level image processing tasks fit very well into this processing scheme. They feature a heavy computational load composed of pixel-wise repetitive operations which can be realized in parallel with moderate accuracy. In such circumstances, analog circuitry, not very precise but faster and more area- and power-efficient than its digital counterpart, has been extensively reported to achieve better performance. The Wi-FLIP's image sensor does not therefore output raw but pre-processed images that make the subsequent digital processing much lighter. The energy cost of such pre-processing is really low - 5.6mW for the worst-case scenario. As a result, for the configuration where the Imote2's processor works at minimum clock frequency, the maximum power consumed by our prototype represents only the 5.2% of the whole system power consumption. This percentage gets even lower as the clock frequency increases. We report experimental results for different algorithms, image resolutions and clock frequencies. The main drawback of this first version of Wi-FLIP is the low frame rate reachable due to the non-standard GPIO-based FLIPQ-to-Imote2 interface. © 2011 IEEE.

Multi-resolution low-power gaussian filtering by reconfigurable focal-plane binning
J. Fernández-Berni, R. Carmona-Galán, F. Pozas-Flores, A. Zarándy and A. Rodríguez-Vázquez
Conference - SPIE Microtechnologies for the New Millennium 2011
[abstract]
Gaussian filtering is a basic tool for image processing. Noise reduction, scale-space generation or edge detection are examples of tasks where different Gaussian filters can be successfully utilized. However, their implementation in a conventional digital processor by applying a convolution kernel throughout the image is quite inefficient. Not only the value of every single pixel is taken into consideration sucessively, but also contributions from their neighbors need to be taken into account. Processing of the frame is serialized and memory access is intensive and recurrent. The result is a low operation speed or, alternatively, a high power consumption. This inefficiency is specially remarkable for filters with large variance, as the kernel size increases significantly. In this paper, a different approach to achieve Gaussian filtering is proposed. It is oriented to applications with very low power budgets. The key point is a reconfigurable focal-plane binning. Pixels are grouped according to the targeted resolution by means of a division grid. Then, two consecutive shifts of this grid in opposite directions carry out the spread of information to the neighborhood of each pixel in parallel. The outcome is equivalent to the application of a 3x3 binomial filter kernel, which in turns is a good approximation of a Gaussian filter, on the original image. The variance of the closest Gaussian filter is around 0.5. By repeating the operation, Gaussian filters with larger variances can be achieved. A rough estimation of the necessary energy for each repetition until reaching the desired filter is below 20nJ for a QCIF-size array. Finally, experimental results of a QCIF proof-of-concept focal-plane array manufactured in 0.35 mu m CMOS technology are presented. A maximum RMSE of only 1.2% is obtained by the on-chip Gaussian filtering with respect to the corresponding equivalent ideal filter implemented off-chip.

Design of a smart SiPM based on focal-plane processing elements for improved spatial resolution in PET
F. Pozas-Flores, R. Carmona-Galán, J. Fernández-Berni and A. Rodríguez-Vázquez
Conference - SPIE Microtechnologies for the New Millennium 2011
[abstract]
Single-photon avalanche diodes are compatible with standard CMOS. It means that photo-multipliers for scintillation detectors in nuclear medicine (i. e. PET, SPECT) can be built in inexpensive technologies. These silicon photo-multipliers consist in arrays of, usually passively-quenched, SPADs whose output current is sensed by some analog readout circuitry. In addition to the implementation of photosensors that are sensitive to single-photon events, analog, digital and mixed-signal processing circuitry can be included in the same CMOS chip. For instance, the SPAD can be employed as an event detector, and with the help of some in-pixel circuitry, a digitized photo-multiplier can be built in which every single-photon detection event is summed up by a counter. Moreover, this concurrent processing circuitry can be employed to realize low level image processing tasks. They can be efficiently implemented by this architecture given their intrinsic parallelism. Our proposal is to operate onto the light-induced signal at the focal plane in order to obtain a more elaborated record of the detection. For instance, by providing some characterization of the light spot. Information about the depth-of-interaction, in scintillation detectors, can be derived from the position and shape of the scintillation light distribution. This will ultimately have an impact on the spatial resolution that can be achieved. We are presenting the design in CMOS of an array of detector cells. Each cell contains a SPAD, an MOS-based passive quenching circuit and drivers for the column and row detection lines.

Focal-plane generation of multi-resolution and multi-scale image representation for low-power vision applications
J. Fernández-Berni, R. Carmona-Galán, L. Carranza-González, A. Zarandy and A. Rodríguez-Vázquez
Conference - SPIE Infrared Technology and Applications XXXVII, 2011
[abstract]
Early vision stages represent a considerably heavy computational load. A huge amount of data needs to be processed under strict timing and power requirements. Conventional architectures usually fail to adhere to the specifications in many application fields, especially when autonomous vision-enabled devices are to be implemented, like in lightweight UAVs, robotics or wireless sensor networks. A bioinspired architectural approach can be employed consisting of a hierarchical division of the processing chain, conveying the highest computational demand to the focal plane. There, distributed processing elements, concurrent with the photosensitive devices, influence the image capture and generate a pre-processed representation of the scene where only the information of interest for subsequent stages remains. These focal-plane operators are implemented by analog building blocks, which may individually be a little imprecise, but as a whole render the appropriate image processing very efficiently. As a proof of concept, we have developed a 176x144-pixel smart CMOS imager that delivers lighter but enriched representations of the scene. Each pixel of the array contains a photosensor and some switches and weighted paths allowing reconfigurable resolution and spatial filtering. An energy-based image representation is also supported. These functionalities greatly simplify the operation of the subsequent digital processor implementing the high level logic of the vision algorithm. The resulting figures, 5.6mW@30fps, permit the integration of the smart image sensor with a wireless interface module (Imote2 from Memsic Corp.) for the development of vision-enabled

FLIP-Q: a QCIF resolution focal-plane array for low-power image processing
J. Fernández-Berni, R. Carmona-Galán and L. Carranza-González
Journal Paper - IEEE Journal of Solid-State Circuits, vol. 46,  no. 3, pp 669-680, 2011
IEEE    DOI: 10.1109/JSSC.2010.2102591    ISSN: 0018-9200    » doi
[abstract]
This paper reports a 176x144-pixel smart image sensor designed and fabricated in a 0.35 mu m CMOS-OPTO process. The chip implements a massively parallel focal-plane processing array which can output different simplified representations of the scene at very low power. The array is composed of pixel-level processing elements which carry out analog image processing concurrently with photosensing. These processing elements can be grouped into fully-programmable rectangular-shape areas by loading the appropriate interconnection patterns into the registers at the edge of the array. The targeted processing can be thus performed block-wise. Readout is done pixel-by-pixel in a random access fashion. On-chip 8b ADC is provided. The image processing primitives implemented by the chip, experimentally tested and fully functional, are scale space and Gaussian pyramid generation, fully-programmable multiresolution scene representation-including foveation-and block-wise energy-based scene representation. The power consumption associated to the capture, processing and A/D conversion of an image flow at 30 fps, with full-frame processing but reduced frame size output, ranges from 2.7 mW to 5.6 mW, depending on the operation to be performed.

On-site forest fire smoke detection by low-power autonomous vision sensor
J. Fernández-Berni, R. Carmona-Galán, L. Carranza-González, A. Cano-Rojas, J. F. Martínez-Carmona, A. Rodríguez-Vázquez and S. Morillas-Castillo
Conference - International Conference on Forest Fire Research ICFFR 2010
[abstract]
Early detection plays a crucial role to prevent forest fires from spreading. Wireless vision sensor networks deployed throughout high-risk areas can perform fine-grained surveillance and thereby very early detection and precise location of forest fires. One of the fundamental requirements that need to be met at the network nodes is reliable low-power on-site image processing. It greatly simplifies the communication infrastructure of the network as only alarm signals instead of complete images are transmitted, anticipating thus a very competitive cost. As a first approximation to fulfill such a requirement, this paper reports the results achieved from field tests carried out in collaboration with the Andalusian Fire-Fighting Service (INFOCA). Two controlled burns of forest debris were realized (www.youtube.com/user/vmoteProject). Smoke was successfully detected on-site by the EyeRISTM v1.2, a general-purpose autonomous vision system, built by AnaFocus Ltd., in which a vision algorithm was programmed. No false alarm was triggered despite the significant motion other than smoke present in the scene. Finally, as a further step, we describe the preliminary laboratory results obtained from a prototype vision chip which implements, at very low energy cost, some image processing primitives oriented to environmental monitoring.

Robust focal-plane analog processing hardware for dynamic texture segmentation
J. Fernández-Berni and R. Carmona-Galán
Conference - International Workshop on Cellular Nanoscale Networks and their Applications CNNA 2010
[abstract]
Cellular Nonlinear Networks (CNN) establish a theoretical framework in which programmable focal-plane image processing arrays can be developed. The conventional support for its analog programmability in VLSI is the implementation of transconductor-based multiplication of the input, output and state variables times the corresponding template elements. However, some distributions of weights can be greatly affected by the intrinsic nonidealities of the physical implementation. This is exactly the case when implementing linear diffusion within a transconductor-based CNN implementation. In this paper we propose an alternative implementation: a resistive grid based on MOSFETs operating in the triode region to realize linear diffusion of the input image, considered as the initial state of the network. In addition, these MOS-resistors can be employed as switches in order to sub-divide the image into bins, sized to track features on the appropriate scale. Thus, by simply controlling the size of the binning and for how long the pixel voltages will diffuse, it will be possible to segment and track dynamic textures along an image flow. Each frame of the flow is described by a smaller image in which each pixel represents the energy of the corresponding image bin, once the non-relevant spatial frequency components have been filtered out. We will demonstrate that the resulting low-resolution representation of the scene is very robust to the different sources of nonidealities in a standard CMOS technology. © 2010 IEEE.

Low-power focal-plane dynamic texture segmentation based on programmable image binning and diffusion hardware
J. Ferández-Berni and R. Carmona-Galán
Conference - SPIE Microtechnologies: Bioengineered and Bioinspired Systems IV, 2009
[abstract]
Stand-alone applications of vision are severely constrained by their limited power budget. This is one of the main reasons why vision has not yet been widely incorporated into wireless sensor networks. For them, image processing should be suscribed to the sensor node in order to reduce network traffic and its associated power consumption. In this scenario, operating the conventional acquisition-digitization-processing chain is unfeasible under tight power limitations. A bio-inspired scheme can be followed to meet the timing requirements while maintaining a low power consumption. In our approach, part of the low-level image processing is conveyed to the focal-plane thus speeding up system operation. Moreover, if a moderate accuracy is permissible, signal processing is realized in the analog domain, resulting in a highly efficient implementation. In this paper we propose a circuit to realize dynamic texture segmentation based on focal-plane spatial bandpass filtering of image subdivisions. By the appropriate binning, we introduce some constrains into the spatial extent of the targeted texture. By running time-controlled linear diffusion within each bin, a specific band of spatial frequencies can be highlighted. Measuring the average energy of the components in that band at each image bin the presence of a targeted texture can be detected and quantified. The resulting low-resolution representation of the scene can be then employed to track the texture along an image flow. An application specific chip, based on this analysis, is being developed for natural spaces monitoring by means of a network of low-power vision systems. ©2009 SPIE.

Accurate design of a MOS-based resistive network for time-controlled diffusion filtering
J. Fernández-Berni and R. Carmona-Galán
Conference - European Conference on Circuit Theory and Design ECCTD 2009
[abstract]
This paper analyses a MOS-based resistive network suitable for massively parallel image processing. The inclusion of MOS transistors biased in the ohmic region instead of true resistors permits certain control over the underlying spatial filtering while reducing the required area for VLSI implementation. However, it also leads to nonlinearities and thereby to errors with respect to an ideal resistive grid. By studying an elementary network composed of only two nodes we determine the guidelines to be followed in order to minimize the error according to the selected signal range. These guidelines are then extrapolated to larger networks demonstrating that pretty accurate networks can be achieved even for relatively wide signal ranges. Simulations are employed to validate the extrapolated results. The numerical examples will also allow to visualize how the insertion of MOS transistors affects the spatial filtering carried out by the grid.

A VLSI-oriented and power-efficient approach for dynamic texture recognition applied to smoke detection
J. Fernández-Berni, R. Carmona-Galán and L. Carranza-González
Conference - International Conference on Computer Vision Theory and Applications VISAPP 2009
[abstract]
The recognition of dynamic textures is fundamental in processing image sequences as they are very common in natural scenes. The computation of the optic flow is the most popular method to detect, segment and analyse dynamic textures. For weak dynamic textures, this method is specially adequate. However, for strong dynamic textures, it implies heavy computational load and therefore an important energy consumption. In this paper, we propose a novel approach intented to be implemented by very low-power integrated vision devices. It is based on a simple and flexible computation at the focal plane implemented by power-efficient hardware. The first stages of the processing are dedicated to remove redundant spatial information in order to obtain a simplified representation of the original scene. This simplified representation can be used by subsequent digital processing stages to finally decide about the presence and evolution of a certain dynamic texture in the scene. As an application of the proposed approach, we present the preliminary results of smoke detection for the development of a forest fire detection system based on a wireless vision sensor network.

On the implementation of linear diffusion in transconductance-based cellular nonlinear networks
J. Fernández-Berni and R. Carmona-Galán
Journal Paper - International Journal of Circuit Theory and Applications, vol. 37, no. 4, pp 543-567, 2009
JOHN WILEY & SONS    DOI: 10.1002/cta.564    ISSN: 0098-9886    » doi
[abstract]
In theory, cellular nonlinear networks (CNN) are well capable of implementing discrete-space linear diffusion by means of the appropriate templates. In practice, good results have not been demonstrated with transconductance-based circuits. In this paper, we prove that inherent mismatch to very large scale integration implementation is the reason. Although previous works consider that the small perturbations of the network parameters lead to small deviations from the ideal behavior, we consider that this is over optimistic. When interactions between nodes are supported by unidirectional building blocks, originally balanced current paths are realized by mismatched elements. In the case of linear diffusion, the singular location of natural frequencies of the system implies that a small perturbation of balanced current paths renders qualitatively different network dynamics. We analyze and compare a set of linear templates performing unconstrained and constrained linear diffusion in transconductance-based CNN hardware. Several numerical examples are also presented to visualize the consequences of mismatch on the processing. Finally, in order to emphasize the importance of having balanced current paths, we tested the influence of mismatch in a grid built with MOS transistors. In spite of their nonlinearity, the resulting network is much more robust to mismatch. This last result coincides with previous studies. Copyright 2008 John Wiley & Sons, Ltd.

A vision-based monitoring system for very early automatic detection of forest fires
J. Fernández-Berni, R. Carmona-Galán and L. Carranza-González
Conference - International Conference on Modelling, Monitoring and Management of Forest Fires, 2008
DOI:     » doi
[abstract]
This paper describes a system capable of detecting smoke at the very beginning of a forest fire with a precise spatial resolution. The system is based on a wireless vision sensor network. Each sensor monitors a small area of vegetation by running on-site a tailored vision algorithm to detect the presence of smoke. This algorithm examines chromaticity changes and spatio-temporal patterns in the scene that are characteristic of the smoke dynamics at early stages of propagation. Processing takes place at the sensor nodes and, if that is the case, an alarm signal is transmitted through the network along with a reference to the location of the triggered zone - without requiring complex GIS systems. This method improves the spatial resolution on the surveilled area and reduces the rate of false alarms. An energy efficient implementation of the sensor/processor devices is crucial as it determines the autonomy of the network nodes. At this point, we have developed an ad hoc vision algorithm, adapted to the nature of the problem, to be integrated into a single-chip sensor/processor. As a first step to validate the feasibility of the system, we applied the algorithm to smoke sequences recorded with commercial cameras at real-world scenaRíos that simulate the working conditions of the network nodes. The results obtained point to a very high reliability and robustness in the detection process.

Scopus access Wok access