Found results matching for:
Author: Delia Velasco Montero
Year: Since 2002
Journal Papers
A Pipelining-Based Heterogeneous Scheduling and Energy-Throughput Optimization Scheme for CNNs Leveraging Apache TVM
D. Velasco-Montero, B. Goossens, J. Fernández-Berni, Á. Rodríguez-Vázquez and W. Philips
Journal Paper · IEEE Access, 2023
abstract
doi
Extracting information of interest from continuous video streams is a strongly demanded computer vision task. For the realization of this task at the edge using the current de-facto standard approach, i.e., deep learning, it is critical to optimize key performance metrics such as throughput and energy consumption according to prescribed application requirements. This allows achieving timely decision-making while extending the battery lifetime as much as possible. In this context, we propose a method to boost neural-network performance based on a co-execution strategy that exploits hardware heterogeneity on edge platforms. The enabling tool is Apache TVM, a highly efficient machine-learning compiler compatible with a diversity of hardware back-ends. The proposed approach solves the problem of network partitioning and distributes the workloads to make concurrent use of all the processors available on the board following a pipeline scheme. We conducted experiments on various popular CNNs compiled with TVM on the Jetson TX2 platform. The experimental results based on measurements show a significant improvement in throughput with respect to a single-processor execution, ranging from 14% to 150% over all tested networks. Power-efficient configurations were also identified, accomplishing energy reductions above 10%.
Impact of Thermal Throttling on Long-Term Visual Inference in a CPU-Based Edge Device
T. Benoit-Cattin, D. Velasco-Montero and J. Fernández-Berni
Journal Paper · Electronics, vol. 9, no. 12, article 2106, 2020
abstract
doi pdf
Many application scenarios of edge visual inference, e.g., robotics or environmental monitoring, eventually require long periods of continuous operation. In such periods, the processor temperature plays a critical role to keep a prescribed frame rate. Particularly, the heavy computational load of convolutional neural networks (CNNs) may lead to thermal throttling and hence performance degradation in few seconds. In this paper, we report and analyze the long-term performance of 80 different cases resulting from running five CNN models on four software frameworks and two operating systems without and with active cooling. This comprehensive study was conducted on a low-cost edge platform, namely Raspberry Pi 4B (RPi4B), under stable indoor conditions. The results show that hysteresis-based active cooling prevented thermal throttling in all cases, thereby improving the throughput up to approximately 90% versus no cooling. Interestingly, the range of fan usage during active cooling varied from 33% to 65%. Given the impact of the fan on the power consumption of the system as a whole, these results stress the importance of a suitable selection of CNN model and software components. To assess the performance in outdoor applications, we integrated an external temperature sensor with the RPi4B and conducted a set of experiments with no active cooling in a wide interval of ambient temperature, ranging from 22 °C to 36 °C. Variations up to 27.7% were measured with respect to the maximum throughput achieved in that interval. This demonstrates that ambient temperature is a critical parameter in case active cooling cannot be applied.
PreVIous: A Methodology for Prediction of Visual Inference Performance on IoT Devices
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galan and A. Rodríguez-Vázquez
Journal Paper · IEEE Internet of Things Journal, vol. 7, no. 10, pp 9227-9240, 2020
abstract
doi pdf
This paper presents PreVIous, a methodology to predict the performance of convolutional neural networks (CNNs) in terms of throughput and energy consumption on vision-enabled devices for the Internet of Things. CNNs typically constitute a massive computational load for such devices, which are characterized by scarce hardware resources to be shared among multiple concurrent tasks. Therefore, it is critical to select the optimal CNN architecture for a particular hardware platform according to prescribed application requirements. However, the zoo of CNN models is already vast and rapidly growing. To facilitate a suitable selection, we introduce a prediction framework that allows to evaluate the performance of CNNs prior to their actual implementation. The proposed methodology is based on PreVIousNet, a neural network specifically designed to build accurate per-layer performance predictive models. PreVIousNet incorporates the most usual parameters found in state-of-the-art network architectures. The resulting predictive models for inference time and energy have been tested against comprehensive characterizations of seven well-known CNN models running on two different software frameworks and two different embedded platforms. To the best of our knowledge, this is the most extensive study in the literature concerning CNN performance prediction on low-power low-cost devices. The average deviation between predictions and real measurements is remarkably low, ranging from 3% to 10%. This means state-of-the-art modeling accuracy. As an additional asset, the fine-grained a priori analysis provided by PreVIous could also be exploited by neural architecture search engines.
Performance Assessment of Deep Learning Frameworks through Metrics of CPU Hardware Exploitation on an Embedded Platform
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · International Journal of Electrical and Computer Engineering Systems, vol. 11, no. 1, pp 1-11, 2020
abstract
doi pdf
In this paper, we analyze heterogeneous performance exhibited by some popular deep learning software frameworks for visual inference on a resource-constrained hardware platform. Benchmarking of Caffe, OpenCV, TensorFlow, and Caffe2 is performed on the same set of convolutional neural networks in terms of instantaneous throughput, power consumption, memory footprint, and CPU utilization. To understand the resulting dissimilar behavior, we thoroughly examine how the resources in the processor are differently exploited by these frameworks. We demonstrate that a strong correlation exists between hardware events occurring in the processor and inference performance. The proposed hardware-aware analysis aims to find limitations and bottlenecks emerging from the joint interaction of frameworks and networks on a particular CPU-based platform. This provides insight into introducing suitable modifications in both types of components to enhance their global performance. It also facilitates the selection of frameworks and networks among a large diversity of these components available these days for visual understanding.
Optimum Selection of DNN Model and Framework for Edge Inference
D.Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and Á. Rodríguez-Vázquez
Journal Paper · IEEE Access, vol. 6, pp 51680-51692, 2018
abstract
doi pdf
This paper describes a methodology to select the optimum combination of deep neural network and software framework for visual inference on embedded systems. As a first step, benchmarking is required. In particular, we have benchmarked six popular network models running on four deep learning frameworks implemented on a low-cost embedded platform. Three key performance metrics have been measured and compared with the resulting 24 combinations: accuracy, throughput, and power consumption. Then, application-level specifications come into play. We propose a figure of merit enabling the evaluation of each network/framework pair in terms of relative importance of the aforementioned metrics for a targeted application. We prove through numerical analysis and meaningful graphical representations that only a reduced subset of the combinations must actually be considered for real deployment. Our approach can be extended to other networks, frameworks, and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of embedded deep learning technology.
Optimum Network/Framework Selection from High-Level Specifications in Embedded Deep Learning Vision Applications
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Journal Paper · Lecture Notes in Computer Science LNCS, vol. 11182, pp 369-379, 2018
abstract
doi pdf
This paper benchmarks 16 combinations of popular Deep Neural Networks and Deep Learning frameworks on an embedded platform. A Figure of Merit based on high-level specifications is introduced. By sweeping the relative weight of accuracy, throughput and power consumption on global performance, we demonstrate that only a reduced set of the analyzed combinations must actually be considered for real deployment. We also report the optimum network/framework selection for all possible application scenarios defined in those terms, i.e. weighted balance of the aforementioned parameters. Our approach can be extended to other networks, frameworks and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of Deep Learning technology.
Conferences
Demo: CNN Performance Prediction on a CPU-based Edge Platform
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez Vázquez
Conference · International Conference on Sustainable Development in Civil Engineering ICSDC 2019
abstract
The implementation of algorithms based on Dee p Learning at edge visual systems is currently a challenge. In addition to accuracy, the network architecture also has an impact on inference performance in terms of throughput and power consumption. This demo showcases per layer inference performance of various convolut ional neural networks running at a low cost edge platform . Furthermore, a n empirical model is applied to predict processing time and power consumption prior to actually running the networks A comparison between the prediction from our model and the actual inference performance is displayed in real time.
Towards a Simplified Procedure for CNN Performance Prediction on Embedded Platforms
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Workshop on the Architecture of Smart Cameras WASC 2019
abstract
Vision is arguably the technical field benefiting the most from the renaissance of artificial intelligence in the last few years. In particular, the convergence of massive datasets for training, boosted computational power, and enhanced machine learning techniques has given rise to highly accurate vision algorithms -even outperforming humans in certain tasks- based on convolutional neural networks (CNNs). The potential of these algorithms has attracted attention from many parties, both in academia and industry, spurring the development of a myriad of hardware platforms and software frameworks. The challenge now is how to efficiently leverage and integrate this variety of components in practical realizations, taking also into account that CNN models keep evolving at a rapid pace. With this scenario in mind, we have been working on a simplified procedure to predict the performance of CNNs running on embedded platforms in terms of throughput and power consumption. The objective is to facilitate the evaluation of the aforementioned components and CNN models prior to actually implementing them, thereby speeding up the deployment of optimal solutions. In this talk, we will describe key aspects of the proposed procedure. Specifically, we will elaborate on SweepNet, a deep neural network tailored for meaningful per-layer characterization. The performance models extracted from SweepNet for a hardware platform allow to accurately predict layer by layer the execution time and energy consumption of any other CNN running on that platform.
On the Correlation of CNN Performance and Hardware Metrics for Visual Inference on a Low-Cost CPU-based Platform
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · International Conference on Systems, Signals and Image Processing IWSSIP 2019
abstract
pdf
While providing the same functionality, the various Deep Learning software frameworks available these days do not provide similar performance when running the same network model on a particular hardware platform. On the contrary, we show that the different coding techniques and underlying acceleration libraries have a great impact on the instantaneous throughput and CPU utilization when carrying out the same inference with Caffe, OpenCV, TensorFlow and Caffe2 on an ARM Cortex-A53 multi-core processor. Direct modelling of this dissimilar performance is not practical, mainly because of the complexity and rapid evolution of the toolchains. Alternatively, we examine how the hardware resources are distinctly exploited by the frameworks. We demonstrate that there is a strong correlation between inference performance - including power consumption - and critical parameters associated with memory usage and instruction flow control. This identified correlation is a preliminary step for the development of a simple empirical model. The objective is to facilitate selection and further performance tuning among the ever-growing zoo of deep neural networks and frameworks, as well as the exploration of new network architectures.
On-The-Fly Deployment of Deep Neural Networks on Heterogeneous Hardware in a Low-Cost Smart Camera
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galan and A. Rodríguez-Vázquez
Conference · International Conference on Distributed Smart Cameras ICDSC 2018
abstract
This demo showcases a low-cost smart camera where different hardware configurations can be selected to perform image recognition on deep neural networks. Both the hardware configuration and the network model can be changed any time on the fly. Up to 24 hardware-model combinations are possible, enabling dynamic reconfiguration according to prescribed application requirements.
Optimum Network/Framework Selection from High-Level Specifications in Embedded Deep Learning Vision Applications
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · Advanced Concepts for Intelligent Vision Systems ACIVS 2018
abstract
This paper benchmarks 16 combinations of popular Deep Neural Networks and Deep Learning frameworks on an embedded platform. A Figure of Merit based on high-level specifications is introduced. By sweeping the relative weight of accuracy, throughput and power consumption on global performance, we demonstrate that only a reduced set of the analyzed combinations must actually be considered for real deployment. We also report the optimum network/framework selection for all possible application scenarios defined in those terms, i.e. weighted balance of the aforementioned parameters. Our approach can be extended to other networks, frameworks and performance parameters, thus supporting system-level design decisions in the ever-changing ecosystem of Deep Learning technology.
Performance Analysis of Real-Time DNN Inference on Raspberry Pi
D. Velasco-Montero, J. Fernández-Berni, R. Carmona-Galán and A. Rodríguez-Vázquez
Conference · SPIE Real-Time Image and Video Processing 2018
abstract
pdf
Deep Neural Networks (DNNs) have emerged as the reference processing architecture for the implementation
of multiple computer vision tasks. They achieve much higher accuracy than traditional algorithms based on
shallow learning. However, it comes at the cost of a substantial increase of computational resources. This
constitutes a challenge for embedded vision systems performing edge inference as opposed to cloud processing.
In such a demanding scenario, several open-source frameworks have been developed, e.g. Cae, OpenCV,
TensorFlow, Theano, Torch or MXNet. All of these tools enable the deployment of various state-of-the-art
DNN models for inference, though each one relies on particular optimization libraries and techniques resulting
in dierent performance behavior. In this paper, we present a comparative study of some of these frameworks
in terms of power consumption, throughput and precision for some of the most popular Convolutional Neural
Networks (CNN) models. The benchmarking system is Raspberry Pi 3 Model B, a low-cost embedded platform
with limited resources. We highlight the advantages and limitations associated with the practical use of the
analyzed frameworks. Some guidelines are provided for suitable selection of a specic tool according to prescribed
application requirements.
Books
Visual Inference for IoT Systems: A Practical Approach
D. Velasco-Montero, J. Fernández-Berni and A. Rodríguez-Vázquez
Book · 145 p, 2022
abstract
link
This book presents a systematic approach to the implementation of Internet of Things (IoT) devices achieving visual inference through deep neural networks. Practical aspects are covered, with a focus on providing guidelines to optimally select hardware and software components as well as network architectures according to prescribed application requirements.
The monograph includes a remarkable set of experimental results and functional procedures supporting the theoretical concepts and methodologies introduced. A case study on animal recognition based on smart camera traps is also presented and thoroughly analyzed. In this case study, different system alternatives are explored and a particular realization is completely developed.
Illustrations, numerous plots from simulations and experiments, and supporting information in the form of charts and tables make Visual Inference and IoT Systems: A Practical Approach a clear and detailed guide to the topic. It will be of interest to researchers, industrial practitioners, and graduate students in the fields of computer vision and IoT.
Book Chapters
No results
Other publications
No results