Logo UNIPVM

EMANUELE PRINCIPI

Pubblicazioni

EMANUELE PRINCIPI

 

63 pubblicazioni classificate nel seguente modo:

Nr. doc. Classificazioni
30 4 Contributo in Atti di Convegno (Proceeding)
17 1 Contributo su Rivista
15 2 Contributo in Volume
1 8 Tesi di dottorato
Anno Risorsa
2019 A Non-Intrusive Load Monitoring Algorithm Based on Non-Uniform Sampling of Power Data and Deep Neural Networks
ENERGIES
Autore/i: Fagiani, Marco; Bonfigli, Roberto; Principi, Emanuele; Squartini, Stefano; Mandolini, Luigi
Classificazione: 1 Contributo su Rivista
Abstract: Nowadays, measurement systems strongly rely on the Internet of Things paradigm, and typically involve miniaturized devices on purpose. In these devices, the computational resources and signal acquisition rates are limited in order to preserve battery life. In addition, the amount of streamed data is affected by the network capacity strictly related to the transmission protocol constraints and the environmental conditions. All those limitations are in contrast with the need of exploiting all possible signal details for the task under study. In the specific application of interest, i.e., Non-Intrusive Load Monitoring (NILM), they could lead to low performance in the energy disaggregation process. To overcome these issues, an ad hoc data reduction policy needs to be adopted, in order to reduce the acquisition and elaboration burden of the device, and, at the same time, to ensure compliance with network bandwidth limits while maintaining a reliable signal representation. Moved by these motivations, an extended evaluation study concerning the application of data reduction strategy to the aggregate signal is presented in this work. In particular, a non-uniform subsampling (NUS) scheme is defined together with a uniform subsampling (US) strategy and compared, in terms of disaggregation performance, with the use of data at original sampling (OS) rate. A Deep Learning based technique is used for disaggregation, having the aggregate active power signal sampled according to diverse sampling schema mentioned above as input. The approaches are tested on the UK-DALE and REDD datasets, and the combination of US+NUS configurations allows for achieving a good performance in terms of F1 -score, even superior than the one obtained with the OS rate, and a remarkable data reduction at the same time.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/266461 Collegamento a IRIS

2019 Automatic Detection of Cry Sounds in Neonatal Intensive Care Units by Using Deep Learning and Acoustic Scene Simulation
IEEE ACCESS
Autore/i: Severini, Marco; Ferretti, Daniele; Principi, Emanuele; Squartini, Stefano
Classificazione: 1 Contributo su Rivista
Abstract: Cry detection is an important facility in both residential and public environments, which can answer to different needs of both private and professional users. In this paper, we investigate the problem of cry detection in professional environments, such as Neonatal Intensive Care Units (NICUs). The aim of our work is to propose a cry detection method based on deep neural networks (DNNs) and also to evaluate whether a properly designed synthetic dataset can replace on-field acquired data for training the DNN-based cry detector. In this way, a massive data collection campaign in NICUs can be avoided, and the cry detector can be easily retargeted to different NICUs. The paper presents different solutions based on single-channel and multi-channel DNNs. The experimental evaluation is conducted on the synthetic dataset created by simulating the acoustic scene of a real NICU, and on a real dataset containing audio acquired on the same NICU. The evaluation revealed that using real data in the training phase allows achieving the overall highest performance, with an Area Under Precision-Recall Curve (PRC-AUC) equal to 87.28%, when signals are processed with a beamformer and a post-filter and a single-channel DNN is used. The same method, however, reduces the performance to 70.61% when training is performed on the synthetic dataset. On the contrary, under the same conditions, the new single-channel architecture introduced in this paper achieves the highest performance with a PRC-AUC equal to 80.48%, thus proving that the acoustic scene simulation strategy can be used to train a cry detection method with positive results.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/266460 Collegamento a IRIS

2019 Detection of activity and position of speakers by using deep neural networks and acoustic data augmentation
EXPERT SYSTEMS WITH APPLICATIONS
Autore/i: Vecchiotti, P.; Pepe, G.; Principi, E.; Squartini, S.
Classificazione: 1 Contributo su Rivista
Abstract: The task of Speaker LOCalization (SLOC)has been the focus of numerous works in the research field, where SLOC is performed on pure speech data, requiring the presence of an Oracle Voice Activity Detection (VAD)algorithm. Nevertheless, this perfect working condition is not satisfied in a real world scenario, where employed VADs do commit errors. This work addresses this issue with an extensive analysis focusing on the relationship between several data-driven VAD and SLOC models, finally proposing a reliable framework for VAD and SLOC. The effectiveness of the approach here discussed is assessed against a multi-room scenario, which is close to a real-world environment. Furthermore, up to the authors’ best knowledge, only one contribution proposes a unique framework for VAD and SLOC acting in this addressed scenario; however, this solution does not rely on data-driven approaches. This work comes as an extension of the authors’ previous research addressing the VAD and SLOC tasks, by proposing numerous advancements to the original neural network architectures. In details, four different models based on convolutional neural networks (CNNs)are here tested, in order to easily highlight the advantages of the introduced novelties. In addition, two different CNN models go under study for SLOC. Furthermore, training of data-driven models is here improved through a specific data augmentation technique. During this procedure, the room impulse responses (RIRs)of two virtual rooms are generated from the knowledge of the room size, reverberation time and microphones and sources placement. Finally, the only other framework for simultaneous detection and localization in a multi-room scenario is here taken into account to fairly compare the proposed method. As result, the proposed method is more accurate than the baseline framework, and remarkable improvements are specially observed when the data augmentation techniques are applied for both the VAD and SLOC tasks.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/267205 Collegamento a IRIS

2018 Deep Learning for Timbre Modification and Transfer: An Evaluation Study
Audio Engineering Society Convention 144
Autore/i: Gabrielli, Leonardo; Cella, Carmine Emanuel; Vesperini, Fabio; Droghini, Diego; Principi, Emanuele; Squartini, Stefano
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: In the past years, several hybridization techniques have been proposed to synthesize novel audio content owing its properties from two audio sources. These algorithms, however, usually provide no feature learning, leaving the user, often intentionally, exploring parameters by trial-and-error. The introduction of machine learning algorithms in the music processing field calls for an investigation to seek for possible exploitation of their properties such as the ability to learn semantically meaningful features. In this first work we adopt a Neural Network Autoencoder architecture, and we enhance it to exploit temporal dependencies. In our experiments the architecture was able to modify the original timbre, resembling what it learned during the training phase, while preserving the pitch envelope from the input.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/258785 Collegamento a IRIS

2017 Energy management with support of PV partial shading modelling in Micro Grid environments
ENERGIES
Autore/i: Severini, Marco; Principi, Emanuele; Fagiani, Marco; Squartini, Stefano; Piazza, Francesco
Classificazione: 1 Contributo su Rivista
Abstract: Although photovoltaic power plants are suitable local energy sources in Micro Grid environments, when large plants are involved, partial shading and inaccurate modelling of the plant can affect both the design of the Micro Grid as well as the energy management process that allows for lowering the overall Micro Grid demand towards the main grid. To investigate the issue, a Photovoltaic Plant simulation model, based on a real life power plant, and an energy management system, based on a real life Micro Grid environment, have been integrated to evaluate the performance of a Micro Grid under partial shading conditions. Using a baseline energy production model as a reference, the energy demand of the Micro Grid has been computed in sunny and partial shading conditions. The experiments reveal that an estimation based on a simplified PV model can exceed by 65% the actual production. With regards to Micro Grid design, on sunny days, the expected costs, based on a simplified PV model, can be 5.5% lower than the cost based on the double inverter model. In single cloud scenarios, the underrating can reach 28.3%. With regard to the management process, if the energy yield is estimated by means of a simplified PV model, the actual cost can be from 17.1% to 21.5% higher than the theoretical cost expected at design time.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/249946 Collegamento a IRIS

2017 User-aided Footprint Extraction for Appliance Modelling in Non-Intrusive Load Monitoring
SSCI 2016, Proceedings on
Autore/i: Bonfigli, Roberto; Principi, Emanuele; Squartini, Stefano; Fagiani, Marco; Severini, Marco; Piazza, Francesco
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: In the area of Non-Intrusive Load Monitoring (NILM), many approaches need a supervised procedure of appliance modelling, in order to provide the informations about the appliances to the disaggregation algorithm and to obtain the disaggregated consumptions related to each one of them. In many approaches, the appliance modelling relies on the consumption footprint, which is a typical working cycle of the appliance. Since the NILM system has only the aggregated power consumption available, the recorded footprint might be corrupted by other appliances, which can not be turned off during this period, i.e., the fridge and freezer in the household. Furthermore, the user needs a facilitated procedure, in order to obtain a clean footprint from the aggregated power signal in real scenario. Therefore, a user-aided footprint extraction procedure is needed. In this work, this procedure is defined as a NILM problem with two sources, i.e., the desired appliance and the fridge-freezer combination. One of the resulting disaggregated profiles of the algorithm corresponds to the extracted footprint. Then, this is used for the appliance modelling stage to create te corresponding Hidden Markov Model (HMM), suitable for the Additive Factorial Approximate Maximum a Posteriori (AFAMAP) algorithm. The effectiveness of the footprint extraction procedure is evaluated through the confidence of the disaggregation output of a real problem, using a span of 30 days data taken from two different datasets (AMPds, ECO). The experiments are conducted using the HMM from the extracted footprint, compared to the con- fidence of the same problem using the HMM from the true footprint, as appliance level consumption. The results show that the performance are comparable, with the worst relative F1 loss of 3.83%, demonstrating the effectiveness of the footprint extraction procedure.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/240732 Collegamento a IRIS

2017 Localizing speakers in multiple rooms by using Deep Neural Networks
COMPUTER SPEECH AND LANGUAGE
Autore/i: Vesperini, Fabio; Vecchiotti, Paolo; Principi, Emanuele; Squartini, Stefano; Piazza, Francesco
Classificazione: 1 Contributo su Rivista
Abstract: In the field of human speech capturing systems, a fundamental role is played by the source localization algorithms. In this paper a Speaker Localization algorithm (SLOC) based on Deep Neural Networks (DNN) is evaluated and compared with state-of-the art approaches. The speaker position in the room under analysis is directly determined by the DNN, leading the proposed algorithm to be fully data-driven. Two different neural network architectures are investigated: the Multi Layer Perceptron (MLP) and Convolutional Neural Networks (CNN). GCC-PHAT (Generalized Cross Correlation-PHAse Transform) Patterns, computed from the audio signals captured by the microphone are used as input features for the DNN. In particular, a multi-room case study is dealt with, where the acoustic scene of each room is influenced by sounds emitted in the other rooms. The algorithm is tested by means of the home recorded DIRHA dataset, characterized by multiple wall and ceiling microphone signals for each room. In detail, the focus goes to speaker localization task in two distinct neighboring rooms. As term of comparison, two algorithms proposed in literature for the addressed applicative context are evaluated, the Crosspower Spectrum Phase Speaker Localization (CSP-SLOC) and the Steered Response Power using the Phase Transform speaker localization (SRP-SLOC). Besides providing an extensive analysis of the proposed method, the article shows how DNN-based algorithm significantly outperforms the state-of-the-art approaches evaluated on the DIRHA dataset, providing an average localization error, expressed in terms of Root Mean Square Error (RMSE), equal to 324 mm and 367 mm, respectively, for the Simulated and the Real subsets.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/252452 Collegamento a IRIS

2017 Acoustic novelty detection with adversarial autoencoders
Proceedings of the International Joint Conference on Neural Networks
Autore/i: Principi, Emanuele; Vesperini, Fabio; Squartini, Stefano; Piazza, Francesco
Editore: Institute of Electrical and Electronics Engineers Inc.
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: Novelty detection is the task of recognising events the differ from a model of normality. This paper proposes an acoustic novelty detector based on neural networks trained with an adversarial training strategy. The proposed approach is composed of a feature extraction stage that calculates Log-Mel spectral features from the input signal. Then, an autoencoder network, trained on a corpus of 'normal' acoustic signals, is employed to detect whether a segment contains an abnormal event or not. A novelty is detected if the Euclidean distance between the input and the output of the autoencoder exceeds a certain threshold. The innovative contribution of the proposed approach resides in the training procedure of the autoencoder network: instead of using the conventional training procedure that minimises only the Minimum Mean Squared Error loss function, here we adopt an adversarial strategy, where a discriminator network is trained to distinguish between the output of the autoencoder and data sampled from the training corpus. The autoencoder, then, is trained also by using the binary cross-entropy loss calculated at the output of the discriminator network. The performance of the algorithm has been assessed on a corpus derived from the PASCAL CHiME dataset. The results showed that the proposed approach provides a relative performance improvement equal to 0.26% compared to the standard autoencoder. The significance of the improvement has been evaluated with a one-tailed z-test and resulted significant with p < 0.001. The presented approach thus showed promising results on this task and it could be extended as a general training strategy for autoencoders if confirmed by additional experiments.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/252454 Collegamento a IRIS

2017 Non-intrusive load monitoring by using active and reactive power in additive Factorial Hidden Markov Models
APPLIED ENERGY
Autore/i: Bonfigli, Roberto; Principi, Emanuele; Fagiani, Marco; Severini, Marco; Squartini, Stefano; Piazza, Francesco
Classificazione: 1 Contributo su Rivista
Abstract: Non-intrusive load monitoring (NILM) is the task of determining the appliances individual contributions to the aggregate power consumption by using a set of electrical parameters measured at a single metering point. NILM allows to provide detailed consumption information to the users, that induces them to modify their habits towards a wiser use of the electrical energy. This paper proposes a NILM algorithm based on the joint use of active and reactive power in the Additive Factorial Hidden Markov Models framework. In particular, in the proposed approach, the appliance model is represented by a bivariate Hidden Markov Model whose emitted symbols are the joint active-reactive power signals. The disaggregation is performed by means of an alternative formulation of the Additive Factorial Approximate Maximum a Posteriori (AFAMAP) algorithm for dealing with the bivariate HMM models. The proposed solution has been compared to the original AFAMAP algorithm based on the active power only and to the seminal approach proposed by Hart (1992), based on finite state machine appliance models and which employs both the active and reactive power. Hart's algorithm has been improved for handling the occurrence of multiple solutions by means of a Maximum A Posteriori technique (MAP). The experiments have been conducted on the AMPds dataset in noised and denoised conditions and the performance evaluated by using the F1-Measure and the normalized disaggregation metrics. In terms of F1-Measure, the results showed that the proposed approach outperforms AFAMAP, Hart's algorithm, and Hart's with MAP respectively by +14.9%, +21.8%, and +2.5% in the 6 appliances denoised case study. In the 6 appliances noised case study, the relative performance improvement is +25.5%, +51.1%, and +6.7%.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/252450 Collegamento a IRIS

2017 Denoising autoencoders for Non-Intrusive Load Monitoring: Improvements and comparative evaluation
ENERGY AND BUILDINGS
Autore/i: Bonfigli, Roberto; Felicetti, Andrea; Principi, Emanuele; Fagiani, Marco; Squartini, Stefano; Piazza, Francesco
Classificazione: 1 Contributo su Rivista
Abstract: Non-Intrusive Load Monitoring (NILM) is the task of determining the appliances individual contributions to the aggregate power consumption by using a set of electrical parameters measured at a single metering point. NILM allows to provide detailed consumption information to the users, that induces them to modify their habits towards a wiser use of the electrical energy. This paper proposes a NILM algorithm based on the Deep Neural Networks. In particular, the NILM task is treated as a noise reduction problem addressed by using denoising autoencoder (dAE) architecture, i.e., a neural network trained to reconstruct a signal from its noisy version. This architecture has been initially proposed by Kelly and Knottenbelt (2015), and here is extended and improved by conducting a detailed study on the topology of the network, and by intelligently recombining the disaggregated output with a median filter. An additional contribution of this paper is an exhaustive comparative evaluation conducted with respect to one of the reference work in the field of Hidden Markov Models (HMM) for NILM, i.e., the Additive Factorial Approximate Maximum a Posteriori (AFAMAP) algorithm. The experiments have been conducted on the AMPds, UK-DALE, and REDD datasets in seen and unseen scenarios both in presence and in absence of noise. In order to be able to evaluate AFAMAP in presence of noise, an HMM model representing the noise contribution has been introduced. The results showed that the dAE approach outperforms the AFAMAP algorithm both in seen and unseen condition, and that it exhibits a significant robustness in presence of noise.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/252451 Collegamento a IRIS

2017 A neural network approach for sound event detection in real life audio
Signal Processing Conference (EUSIPCO), 2017 25th European
Autore/i: Valenti, Michele; Tonelli, Dario; Vesperini, Fabio; Principi, Emanuele; Squartini, Stefano
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: This paper presents and compares two algorithms based on artificial neural networks (ANNs) for sound event detection in real life audio. Both systems have been developed and evaluated with the material provided for the third task of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge. For the first algorithm, we make use of an ANN trained on different features extracted from the down-mixed mono channel audio. Secondly, we analyse a binaural algorithm where the same feature extraction is performed on four different channels: the two binaural channels, the averaged monaural signal and the difference between the binaural channels. The proposed feature set comprehends, along with mel-frequency cepstral coefficients and log-mel energies, also activity information extracted with two different voice activity detection (VAD) algorithms. Moreover, we will present results obtained with two different neural architectures, namely multi-layer perceptrons (MLPs) and recurrent neural networks. The highest scores obtained on the DCASE 2016 evaluation dataset are achieved by a MLP trained on binaural features and adaptive energy VAD; they consist of an averaged error rate of 0.79 and an averaged F1 score of 48.1%, thus marking an improvement over the best score registered in the DCASE 2016 challenge.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/252459 Collegamento a IRIS

2017 Convolutional Neural Networks with 3-D Kernels for Voice Activity Detection in a Multiroom Environment
Multidisciplinary Approaches to Neural Computing
Autore/i: Vecchiotti, P.; Vesperini, F.; Principi, E.; Squartini, S.; Piazza, F
Editore: Springer, Cham
Classificazione: 2 Contributo in Volume
Abstract: This paper focuses on employing Convolutional Neural Networks (CNN) with 3-D kernels for Voice Activity Detectors in multi-room domestic scenarios (mVAD). This technology is compared with the Multi Layer Perceptron (MLP) and interesting advancements are observed with respect to previous works of the authors. In order to approximate real- life scenarios, the DIRHA dataset is exploited. It has been recorded in a home environment by means of several microphones arranged in vari- ous rooms. Our study is composed by a multi-stage analysis focusing on the selection of the network size and the input microphones in relation with their number and position. Results are evaluated in terms of Speech Activity Detection error rate (SAD). The CNN-mVAD outperforms the other method with a significant solidity in terms of performance statis- tics, achieving in the best overall case a SAD equal to 7.0%.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/241540 Collegamento a IRIS

2017 A Combined One-Class SVM and Template-Matching Approach for User-Aided Human Fall Detection by Means of Floor Acoustic Features
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE
Autore/i: Droghini, Diego; Ferretti, Daniele; Principi, Emanuele; Squartini, Stefano; Piazza, Francesco
Classificazione: 1 Contributo su Rivista
Abstract: The primary cause of injury-related death for the elders is represented by falls. The scientific community devoted them particular attention, since injuries can be limited by an early detection of the event. The solution proposed in this paper is based on a combined One-Class SVM (OCSVM) and template-matching classifier that discriminate human falls from nonfalls in a semisupervised framework. Acoustic signals are captured by means of a Floor Acoustic Sensor; then Mel-Frequency Cepstral Coefficients and Gaussian Mean Supervectors (GMSs) are extracted for the fall/nonfall discrimination. Here we propose a single-sensor two-stage user-aided approach: in the first stage, the OCSVM detects abnormal acoustic events. In the second, the template-matching classifier produces the final decision exploiting a set of template GMSs related to the events marked as false positives by the user. The performance of the algorithm has been evaluated on a corpus containing human falls and nonfall sounds. Compared to the OCSVM only approach, the proposed algorithm improves the performance by 10.14% in clean conditions and 4.84% in noisy conditions. Compared to Popescu and Mahnot (2009) the performance improvement is 19.96% in clean conditions and 8.08% in noisy conditions.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/249952 Collegamento a IRIS

2017 Human Fall Detection by Using an Innovative Floor Acoustic Sensor
Multidisciplinary Approaches to Neural Computing
Autore/i: Droghini, D.; Principi, E.; Squartini, S.; Piazza, F
Editore: Springer, Cham
Classificazione: 2 Contributo in Volume
Abstract: Supporting people in their homes is an important issue both for ethical and practical reasons. Indeed, in the recent years, the scientific community devoted particular attention to detecting human falls, since the first cause of death for elderly people is due to the consequences of a fall. In this paper, we propose a human fall classification system based on an innovative floor acoustic sensor able to capture the acoustic waves transmitted through the floor. The algorithm employed is able to discriminate human falls from non falls and it is based on Mel-Frequency Cepstral Coefficients and a two class Support Vector Machine. The dataset employed for performance evaluation is composed by falls of a human mimicking doll, everyday objects and everyday noises. The obtained results show that the proposed solution is suitable for human fall detection in realistic scenarios, allowing to guarantee a 0% miss probability at very low false positive rates.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/241539 Collegamento a IRIS

2016 Acoustic cues from the floor: A new approach for fall classification
EXPERT SYSTEMS WITH APPLICATIONS
Autore/i: Principi, Emanuele; Droghini, Diego; Squartini, Stefano; Olivetti, Paolo; Piazza, Francesco
Classificazione: 1 Contributo su Rivista
Abstract: The interest in assistive technologies for supporting people at home is constantly increasing, both in academia and industry. In this context, the authors propose a fall classification system based on an innovative acoustic sensor that operates similarly to stethoscopes and captures the acoustic waves transmitted through the floor. The sensor is designed to minimize the impact of aerial sounds in recordings, thus allowing a more focused acoustic description of fall events. The audio signals acquired by means of the sensor are processed by a fall recognition algorithm based on Mel-Frequency Cepstral Coefficients, Supervectors and Support Vector Machines to discriminate among different types of fall events. The performance of the algorithm has been evaluated against a specific audio corpus comprising falls of a human mimicking doll and of everyday objects. The results showed that the floor sensor significantly improves the performance respect to an aerial microphone: in particular, the F1-Measure is 6.50% higher in clean conditions and 8.76% higher in mismatched noisy conditions. The proposed approach, thus, has a considerable advantage over aerial solutions since it is able to achieve higher fall classification performance using a simpler algorithmic pipeline and hardware setup.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/236004 Collegamento a IRIS

2016 Deep neural networks for Multi-Room Voice Activity Detection: Advancements and comparative evaluation
Neural Networks (IJCNN), 2016 International Joint Conference on
Autore/i: Vesperini, Fabio; Vecchiotti, Paolo; Principi, Emanuele; Squartini, Stefano; Piazza, Francesco
Editore: IEEE
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: This paper focuses on Voice Activity Detectors (VAD) for multi-room domestic scenarios based on deep neural network architectures. Interesting advancements are observed with respect to a previous work. A comparative and extensive analysis is lead among four different neural networks (NN). In particular, we exploit Deep Belief Network (DBN), Multi-Layer Perceptron (MLP), Bidirectional Long Short-Term Memory recurrent neural network (BLSTM) and Convolutional Neural Network (CNN). The latter has recently encountered a large success in the computational audio processing field and it has been successfully employed in our task. Two home recorded datasets are used in order to approximate real-life scenarios. They contain audio files from several microphones arranged in various rooms, from whom six features are extracted and used as input for the deep neural classifiers. The output stage has been redesigned compared to the previous author's contribution, in order to take advantage of the networks discriminative ability. Our study is composed by a multi-stage analysis focusing on the selection of the features, the network size and the input microphones. Results are evaluated in terms of Speech Activity Detection error rate (SAD). As result, a best SAD equal to 5.8% and 2.6% is reached respectively in the two considered datasets. In addiction, a significant solidity in terms of microphone positioning is observed in the case of CNN.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/239799 Collegamento a IRIS

2016 An experimental study on new features for activity of daily living recognition
Neural Networks (IJCNN), 2016 International Joint Conference on
Autore/i: Ferretti, Daniele; Principi, Emanuele; Squartini, Stefano; Mandolini, Luigi
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: In the last few years, the researchers have spent many efforts in developing advanced systems for activity daily living (ADL) recognition in diverse applicative contexts, as home automation and ambient assisted living. Some of these need to know in real time the actions performed by a user, and this involves a number of additional issues to be taken into account during the recognition. In this paper, we present some improvements of a sliding window based approach to perform ADL recognition in a online fashion, i.e., recognizing activities as and when new sensor events are recorded. We describe seven methods used to extract features from the sequence of sensor events. The first four relate to previous works regarding the system of ADL recognition described, while, the last three represent the original contribution of this work. Support Vector Machine (SVM) has been used as classifier. Several experiments have been carried out by using a public smart home dataset and obtained results show that two of the three novel approaches allow to improve the recognition performance of the conventional methods, up to an increment of 5% with respect to the baseline feature extraction approach.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/239796 Collegamento a IRIS

2016 A neural network based algorithm for speaker localization in a multi-room environment
Machine Learning for Signal Processing (MLSP), 2016 IEEE 26th International Workshop on
Autore/i: Vesperini, Fabio; Vecchiotti, Paolo; Principi, Emanuele; Squartini, Stefano; Piazza, Francesco
Editore: IEEE
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: A Speaker Localization algorithm based on Neural Networks for multi-room domestic scenarios is proposed in this paper. The approach is fully data-driven and employs a Neural Network fed by GCC-PHAT (Generalized Cross Correlation Phase Transform) Patterns, calculated by means of the microphone signals, to determine the speaker position in the room under analysis. In particular, we deal with a multi-room case study, in which the acoustic scene of each room is influenced by sounds emitted in the other rooms. The algorithm is tested against the home recorded DIRHA dataset, characterized by multiple wall and ceiling microphone signals for each room. In particular, we focused on the speaker localization problem in two distinct neighbouring rooms. We assumed the presence of an Oracle multi-room Voice Activity Detector (VAD) in our experiments. A three-stage optimization procedure has been adopted to find the best network configuration and GCC-PHAT Patterns combination. Moreover, an algorithm based on Time Difference of Arrival (TDOA), recently proposed in literature for the addressed applicative context, has been considered as term of comparison. As result, the proposed algorithm outperforms the reference one, providing an average localization error, expressed in terms of RMSE, equal to 525 mm against 1465 mm. Concluding, we also assessed the algorithm performance when a real VAD, recently proposed by some of the authors, is used. Even though a degradation of localization capability is registered (an average RMSE equal to 770 mm), still a remarkable improvement with respect to the state of the art performance is obtained.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/239800 Collegamento a IRIS

2016 ELM Based Algorithms for Acoustic Template Matching in Home Automation Scenarios: Advancements and Performance Analysis
Recent Advances in Nonlinear Speech Processing
Autore/i: Della Porta, G.; Principi, E.; Ferroni, G.; Squartini, S.; Hussain, A.; Piazza, F.
Classificazione: 2 Contributo in Volume
Abstract: Speech and sound recognition in home automation scenarios has been gaining an increasing interest in the last decade. One interesting approach addressed in the literature is based on the template matching paradigm, which is characterized by ease of implementation and independence on large datasets for system training. Moving from a recent contribution of some of the authors, where an Extreme Learn-ing Machine algorithm was proposed and evaluated, a wider performance analysis in diverse operating conditions is provided here, together with some relevant improvements. These are allowed by the employment of supervector features as input, for the first time used with ELMs, up to the authors’ knowledge. As already verified in other application contexts and with different learning systems, this ensures a more robust characterization of the speech segment to be classified, also in presence of mismatch between training and testing data. The accomplished computer simulations confirm the effectiveness of the approach, with F1-Measure performance up to 99% in the multicondition case, and a computational time reduction factor close to 4, with respect to the SVM counterpart.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/230583 Collegamento a IRIS

2015 A floor acoustic sensor for fall classification
138th Audio Engineering Society Convention 2015
Autore/i: Principi, E.; Olivetti, P.; Squartini, S.; Bonfigli, R.; Piazza, F.
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: The interest in assistive technologies for supporting people at home is constantly increasing, both in academia and industry. In this context, the authors propose a fall classification system based on an innovative acoustic sensor that operates similarly to stethoscopes and captures the acoustic waves transmitted through the floor. The sensor is designed to minimize the impact of aerial sounds in recordings, thus allowing a more focused acoustic description of fall events. In this preliminary work, the audio signals acquired by means of the sensor are processed by a fall recognition algorithm based on Mel-Frequency Cepstral Coefficients, Supervectors and Support Vector Machines, to discriminate among different types of fall events. The performance of the algorithm has been evaluated against a specific audio corpus comprising falls of persons and of common objects. The results show the effectiveness of the approach.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/230581 Collegamento a IRIS

2015 Acoustic Template-Matching for Automatic Emergency State Detection: an ELM based algorithm
NEUROCOMPUTING
Autore/i: E. Principi; S. Squartini; E. Cambria; F. Piazza
Classificazione: 1 Contributo su Rivista
Abstract: Extreme Learning Machine (ELM) represents a popular paradigm for training feedforward neural networks due to its fast learning time. This paper applies the technique for the automatic classification of speech utterances. Power Normalized Cepstral Coefficients (PNCC) are employed as feature vectors and ELM performs the final classification. Both the baseline ELM algorithm and ELM with kernel have been employed and tested. Due to the fixed number of input neurons in the ELM, a length normalization algorithm is employed to transform the PNCC sequence into a vector of fixed length. Length normalization has been performed using two techniques: the first is based on Dynamic Time Warping (DTW) distances, the second on the vectorized outerproduct of trajectory matrix. Experiments have been conducted on the TIDIGITS corpus, to assess the performance on an isolated speech recognition task, and on ITAAL, to validate the system in an emergency detection task in realistic acoustic conditions. The ELM approach has been compared to template matching based on Dynamic Time Warping and to a Support Vector Machine based speech recognizer. The obtained results demonstrated the effectiveness of the approach both in terms of recognition performance and execution times. In particular, classification based on PNCCs, DTW distances and ELM kernel resulted in the best performing algorithm both in terms of recognition accuracy and execution times.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/153902 Collegamento a IRIS

2015 Signer Independent Isolated Italian Sign Recognition Based on Hidden Markov Models
PATTERN ANALYSIS AND APPLICATIONS
Autore/i: M. Fagiani; E. Principi; S. Squartini; F. Piazza
Classificazione: 1 Contributo su Rivista
Abstract: Sign languages represent the most natural way to communicate for deaf and hard of hearing. However, there are often barriers between people using this kind of languages and hearing people, typically oriented to express themselves by means of oral languages. In order to facilitate the social inclu- siveness in everyday life for deaf minorities, technology can play an impor- tant role. Indeed many attempts have been recently made by the scientific community to develop automatic translation tools. Unfortunately, not many solutions are actually available for the Italian Sign Language (Lingua Italiana dei Segni - LIS) case study, specially for what concerns the recognition task. In this paper the authors want to face such a lack, in particular addressing the signer-independent case study, i.e., when the signers in the testing set are to included in the training set. From this perspective, the proposed algorithm represents the first real attempt in the LIS case. The automatic recognizer is based on Hidden Markov Models (HMMs) and video features have been extracted by using the OpenCV open source library. The effectiveness of the HMM system is validated by a comparative evaluation with Support Vector Machine approach. The video material used to train the recognizer and testing its performance consists in a database that the authors have deliberately cre- ated by involving ten signers and 147 isolated-sign videos for each signer. The database is publicly available. Computer simulations have shown the effective- ness of the adopted methodology, with recognition accuracies comparable to those obtained by the automatic tools developed for other sign languages.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/179303 Collegamento a IRIS

2015 An integrated system for voice command recognition and emergency detection based on audio signals
EXPERT SYSTEMS WITH APPLICATIONS
Autore/i: Principi, E.; Squartini, S.; Ferroni, G.; Bonfigli, R.; Piazza, F.
Classificazione: 1 Contributo su Rivista
Abstract: The recent reports on population ageing in the most advanced countries are driving governments and the scientific community to focus on technologies for providing assistance to people in their own homes. Particular attention has been devoted to solutions based on acoustic signals since they provide a convenient way to monitor people activities and they enable hands-free human–machine interfaces. In this context, this paper presents a complete solution for voice command recognition and emergency detection based on audio signals entirely integrated in a low-consuming embedded platform. The system combines an active operation mode were distress calls are captured and a vocal interface is enabled for controlling the home automation subsystem, and a pro-active mode, were a novelty detection algorithm detects abnormal acoustic events to alert the user of a possible emergency. In the first operation mode, a Voice Activity Detector captures voice segments of the audio signal, and a speech recogniser detects commands and distress calls. In the pro-active mode, an acoustic novelty detector is employed in order to be able to deal with unknown sounds, thus not requiring an explicit modelling of emergency sounds. In addition, the system integrates a VoIP infrastructure so that emergencies can be communicated to relatives or care centres. The monitoring unit is equipped with multiple microphones and it is connected to the home local area network to communicate with the home automation subsystem. The algorithms have been implemented in a low-consuming embedded platform based on a ARM Cortex-A8 CPU. The effectiveness of the adopted algorithms has been tested on two different databases: ITAAL and A3Novelty. The obtained results show that the adopted solutions are suitable for speech and audio event monitoring in a realistic scenario.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/224934 Collegamento a IRIS

2014 BAR LIS: a web application for Italian Sign Language based interaction
AMBIENT ASSISTED LIVING
Autore/i: Luca Nardi; Matteo Rubini; Stefano Squartini; Emanuele Principi; Francesco Piazza
Editore: Springer Verlag Germany:Tiergartenstrasse 17, D 69121 Heidelberg Germany:011 49 6221 3450, EMAIL: g.braun@springer.de, INTERNET: http://www.springer.de, Fax: 011 49 6221 345229
Classificazione: 2 Contributo in Volume
Abstract: This work presents the development and testing of BAR LIS (BAR in Italian Sign Language), a web application created in collaboration with the Ancona division of Ente Nazionale Sordi (ENS) and presented during X Masters Awards 2012 event in Senigallia, Italy. BAR LIS was structured as a small dictionary of words and related signs in LIS, Italian Sign Language (presented as 3D rendered animations) in order to ease communication between people attending the X Masters event and hearing impaired personnel of the ENS stall, which included a small coffee shop. A more extensive set of words was later created and tested with ENS members in order to study the impact of resolution of 3D models on comprehensibility and quality of the signs.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/127262 Collegamento a IRIS

2014 Improving the performance of a in-home acoustic monitoring system by integrating a vocal effort classification algorithm
Proceedings of AES 136th Convention
Autore/i: E. Principi; S. Squartini; F. Piazza
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/153910 Collegamento a IRIS

2014 A Real-Time Implementation of an Acoustic Novelty Detector on the BeagleBoard-xM
Proceedings of EDERC2014
Autore/i: R. Bonfigli; G. Ferroni; E. Principi; S. Squartini; F. Piazza
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: Novelty detection consists in recognising events that deviate from normality. This paper presents the implementation of a real-time statistical novelty detector on the BeagleBoard-xM. The application processes an incoming audio signal, extracts Power Normalized Cepstral Coefficients and determines whether a novelty sound is present or not based on a statistical model of normality. The novelty detector has been implemented as a standalone graphical application capable of running in real-time on the BeagleBoard-xM platform. Experiments have been conducted to assess the performance of the solution in terms of both detection performance and of real-time capabilities. The results demonstrate that the system is able to operate in real-time on the BeagleBoard-xM with a real-time factor equal to 8.10%, and an F-Measure equal to 77.41%.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/179903 Collegamento a IRIS

2014 Power Normalized Cepstral Coefficients based supervectors and i-vectors for small vocabulary speech recognition
Proceedings of IJCNN2014
Autore/i: E. Principi; S. Squartini; F. Piazza
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: Template-matching and discriminative techniques, like support vector machines (SVMs), have been widely used for automatic speech recognition. Both methods require that varying length sequences are mapped to vectors of fixed lengths: in template-matching, the problem is solved by means of dynamic time warping (DTW), while in SVM with dynamic kernels. The supervector and i-vector paradigms seem to represent a valid solution to such a problem when SVM are employed for classification. In this work, Gaussian mean supervectors (GMS), Gaussian posterior probability supervectors (GPPS) and i-vectors are evaluated as features both for template-matching and for SVM-based speech recognition in a comparative fashion. All these features are based on Power Normalized Cepstral Coefficients (PNCCs) directly extracted from speech utterances. The different methods are assessed in small vocabulary speech recognition tasks using two distinct corpora, and they have been compared to DTW, dynamic time alignment kernel (DTAK), outerproduct of trajectory matrix, and PocketSphinx as further recognition techniques to be evaluated. Experimental results showed the appropriateness of the supervector and i-vector based solutions with respect to the other state-of-the art techniques here addressed.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/153906 Collegamento a IRIS

2014 Advanced Integration of Multimedia Assistive Technologies: a prospective outlook
Proceedings of MESA 2014
Autore/i: D. Liciotti; G. Ferroni; E. Frontoni; S. Squartini; E. Principi; R. Bonfigli; P. Zingaretti; F. Piazza
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: In the recent years several studies on population ageing in the most advanced countries argued that the share of people older than 65 years is steadily increasing. In order to tackle this phenomena, a significant effort has been devoted to the development of advanced technologies for supervising the domestic environments and their inhabitants to provide them assistance in their own home. In this context, the present paper aims to delineate a novel, highly-integrated system for advanced analysis of human behaviours. It is based on the fusion of the audio and vision frameworks, developed at the Multimedia Assistive Technology Laboratory (MATeLab) of the Università Politecnica delle Marche, in order to operate in the ambient assisted living context exploiting audio-visual domain features. The existing video framework exploits vertical RGB-D sensors for people tracking, interaction analysis and users activities detection in domestic scenarios. The depth information has been used to remove the affect of the appearance variation and to evaluate users activities inside the home and in front of the fixtures. In addition, group interactions are monitored and analysed. On the other side, the audio framework recognises voice commands by continuously monitoring the acoustic home environment. In addition, a hands-free communication to a relative or to a healthcare centre is automatically triggered when a distress call is detected. Echo and interference cancellation algorithms guarantee the high-quality communication and reliable speech recognition, respectively. The system we intend to delineate, thus, exploits multi-domain information, gathered from audio and video frameworks each, and stores them in a remote cloud for instant processing and analysis of the scene. Related actions are consequently performed.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/179906 Collegamento a IRIS

2014 Neural Networks Based Methods for Voice Activity Detection in a Multi-room Domestic Environment
Proceedings of the First Italian Conference on Computational Linguistics CLiC-it 2014 & the Fourth International Workshop EVALITA 2014
Autore/i: G. Ferroni; R. Bonfigli; E. Principi; S. Squartini; F. Piazza
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: A plethora of Voice or Speaker Activity Detection systems exist in literature. They are indeed a fundamental part of complex systems that deals with speech processing. In this work the authors exploit neural network based VAD to address the speaker activity detection in a multi-room domestic scenario. The goal is to detect the voice activity in each of the two target rooms in presence of other sounds and speeches occurring in other rooms and outside. A large dataset recorded in a smart-home is provided and result obtained are acceptable.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/205118 Collegamento a IRIS

2014 Riconoscimento Automatico di Richieste di Aiuto e Comandi di Domotica per Ambient Assisted Living
X Convegno Nazionale dell'Associazione Italiana di Scienze della Voce
Autore/i: E. Principi; S. Squartini; F. Piazza
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/205117 Collegamento a IRIS

2013 A distributed system for recognizing home automation commands and distress calls in the Italian language
Proceedings of Interspeech 2013
Autore/i: E. Principi; S. Squartini; F. Piazza; D. Fuselli; M. Bonifazi
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: This paper describes a system for recognizing distress calls and home automation voice commands in a smart-home. Distress calls are recognized with the purpose of assisting people in their own homes: when they are detected, a phone call is automatically established with a contact in a address book and the person can request for assistance. The voice call is established through a voice over ip stack, with hands-free communication guaranteed by an acoustic echo canceller. The acoustic environment is constantly monitored by several low-consuming devices distributed throughout the home. In each device, a voice activity detector detects speech segments, and a speech recognition engine recognizes commands and distress calls. Robustness to environmental disturbances has been increased by employing Power Normalized Cepstral Coefficients and by using an adaptive algorithm for interference cancellation. An Italian speech corpus of home automation commands and distress calls has been developed for evaluation purposes. The corpus has been recorded in a real room using multiple microphones, and each sentence has been uttered both in normal and shouted speaking styles. The system performance has been assessed in terms of commands/distress recognition accuracy in order to prove the effectiveness of the approach.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/112686 Collegamento a IRIS

2013 An Embedded-processor driven Test Bench for Acoustic Feedback Cancellation in real environments
AES 134th Convention
Autore/i: F. Faccenda; S. Squartini; E. Principi; L. Gabrielli; F. Piazza
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: In order to facilitate the communication among speakers, speech reinforcement systems equipped with microphones and loudspeakers are employed. Due to the acoustic couplings between them, the speech intelligibility may result ruined and, moreover, high channel gains could drive the system to instability. Acoustic Feedback Cancellation (AFC) methods need to be applied to keep the system stable. In this work, a new Test Bench for testing AFC algorithms in real environments is proposed. It is based on the TMS320C6748 processor, running the Suppressor-PEM algorithm, a recent technique based on the PEM-AFROW paradigm. The partitioned block frequency domain adaptive filter (PB-FDAF) paradigm has been adopted to keep the computational complexity low. A professional sound card and a PC, where an automatic gain controller has been implemented to prevent signal clipping, complete the framework. Several experimental tests confirmed the framework suitability to operate under diverse acoustic conditions.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/91064 Collegamento a IRIS

2013 A Speech-Based System for In-Home Emergency Detection and Remote Assistance
AES 134th Convention
Autore/i: E. Principi; D. Fuselli; S. Squartini; M. Bonifazi; F. Piazza
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: This paper describes a system for the detection of emergency states and for the remote assistance of people in their own homes. Emergencies are detected recognizing distress calls by means of a speech recognition engine. When an emergency is detected, a phone call is automatically established with a relative or friend by means of a VoIP stack and an Acoustic Echo Canceller. Several low-consuming embedded units are distributed throughout the house to monitor the acoustic environment, and one central unit coordinates the system operation. This unit also integrates multimedia content delivery services, and home automation functionalities. Being an ongoing project, this paper describes the entire system and then focuses on the algorithms implemented for the acoustic monitoring and the hands-free communication services. Preliminary experiments have been conducted to assess the performance of the recognition module in noisy and reverberated environments, and the out of grammar rejection capabilities. Results showed that the implemented Power Normalized Cepstral Coefficients extraction pipeline improves the word recognition accuracy in noisy and reverberated conditions, and that introducing a "garbage phone" in the acoustic model allows to effectively reject out of grammar words and sentences.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/91067 Collegamento a IRIS

2013 A New System for Automatic Recognition of Italian Sign Language
Neural Nets and Surroundings
Autore/i: M. Fagiani; E. Principi; S. Squartini; F. Piazza
Editore: Springer Verlag Germany:Tiergartenstrasse 17, D 69121 Heidelberg Germany:011 49 6221 3450, EMAIL: g.braun@springer.de, INTERNET: http://www.springer.de, Fax: 011 49 6221 345229
Classificazione: 2 Contributo in Volume
Abstract: This work proposes a preliminary study of an automatic recognition system for the Italian Sign Language (Lingua Italiana dei Segni - LIS). Several other attempts have been made in the literature, but they are typically oriented to international languages. The system is composed of a feature extraction stage, and a sign recognition stage. Each sign is represeted by a single Hidden Markov Model, with parameters estimated through the resubstitution method. Then, starting from a set of features related to the position and the shape of head and hands, the Sequential Forward Selection technique has been applied to obtain feature vectors with the minimum dimension and the best recognition performance. Experiments have been performed using the cross-validation method on the Italian Sign Language Database A3LIS-147, maintaining the orthogonality between training and test sets. The obtained recognition accuracy averaged across all signers is 47.24%, which represents an encouraging result and demonstrates the effectiveness of the idea.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/83970 Collegamento a IRIS

2013 A Real-time Dual-Channel Speech Reinforcement System for Intra-Cabin Communication
AES
Autore/i: F. Faccenda; S. Squartini; E. Principi; L. Gabrielli; F. Piazza
Classificazione: 1 Contributo su Rivista
Abstract: To facilitate communications among passengers in a large vehicle, an appropriate system with microphones, loudspeakers, and amplifiers is needed. However, a signal processing algorithm is required to avoid feedback and instability. Borrowing from speech-reinforcement research, the authors use a room-modeling adaptive feedback-cancellation approach that combines the Prediction Error Method and adaptive filtering. And, by including a suppressor filter, the system can be extended to a dual-channel scenario that supports bidirectional communications, where additional feedback paths must be considered with respect to the single-channel case study. In order to achieve low latencies and real-time processing, the partitioned block frequency domain adaptive filter algorithm has been adopted. Voice-activity and double-talk detectors have been included as well. Computer simulations in various acoustic conditions have shown the effectiveness of this approach.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/136662 Collegamento a IRIS

2013 A Real-Time Speech Enhancement Framework in Noisy and Reverberated Acoustic Scenarios
COGNITIVE COMPUTATION
Autore/i: Rudy Rotili; Emanuele Principi; Stefano Squartini; Bjoern Schuller
Classificazione: 1 Contributo su Rivista
Abstract: This paper deals with speech enhancement in noisy reverberated environments where multiple speakers are active. The authors propose an advanced real-time speech processing front-end aimed at automatically reducing the distortions introduced by room reverberation in distant speech signals, also considering the presence of background noise, and thus to achieve a significant improvement in speech quality for each speaker. The overall framework is composed of three cooperating blocks, each one fulfilling a specific task: speaker diarization, room impulse responses identification and speech dereverberation. In particular, the speaker diarization algorithm pilots the operations performed in the other two algorithmic stages, which have been suitably designed and parametrized to operate with noisy speech observations. Extensive computer simulations have been performed by using a subset of the AMI database under different realistic noisy and reverberated conditions. Obtained results show the effectiveness of the approach
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/81571 Collegamento a IRIS

2012 Environmental Robust Speech and Speaker Recognition through Multi-channel Histogram Equalization
NEUROCOMPUTING
Autore/i: Squartini S.; Principi E.; Rotili R.; Piazza F.
Classificazione: 1 Contributo su Rivista
Abstract: Feature statistics normalization in the cepstral domain is one of the most performing approaches for robust automaticspeech and speaker recognition in noisy acoustic scenarios: feature coefficients are normalized by using suitable linear or nonlinear transformations in order to match the noisy speech statistics to the clean speech one. Histogram equalization (HEQ) belongs to such a category of algorithms and has proved to be effective on purpose and therefore taken here as reference. In this paper the presence of multi-channel acoustic channels is used to enhance the statistics modeling capabilities of the HEQ algorithm, by exploiting the availability of multiple noisy speech occurrences, with the aim of maximizing the effectiveness of the cepstra normalization process. Computer simulations based on the Aurora 2 database in speech and speaker recognition scenarios have shown that a significant recognition improvement with respect to the single-channel counterpart and other multi-channel techniques can be achieved confirming the effectiveness of the idea. The proposed algorithmic configuration has also been combined with the kernel estimation technique in order to further improve the speech recognition performances.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/62532 Collegamento a IRIS

2012 Dominance Detection in A Reverberated Acoustic Scenario
Advances in Neural Networks - ISNN2012
Autore/i: E. Principi; R. Rotili; M. Woellmer; S. Squartini; B. Schuller
Editore: Springer Verlag Germany:Tiergartenstrasse 17, D 69121 Heidelberg Germany:011 49 6221 3450, EMAIL: g.braun@springer.de, INTERNET: http://www.springer.de, Fax: 011 49 6221 345229
Classificazione: 2 Contributo in Volume
Abstract: This work proposes a dominance detection framework operating in reverberated environments. The framework is composed of a speech enhancement front-end, which automatically reduces the distortions introduced by room reverberation in the speech signals, and a dominance detector, which processes the enhanced signals and estimates the most and least dominant person in a segment. The front-end is composed by three cooperating blocks: speaker diarization, room impulse responses identification and speech dereverberation. The dominance estimation algorithm is based on bidirectional Long Short-Term Memory networks which allow for context-sensitive activity classification from audio feature functionals extracted via the real-time speech feature extraction toolkit openSMILE. Experiments have been performed suitably reverberating the DOME dataset: the absolute accuracy improvement averaged over the addressed reverberated conditions is 32.68% in the most dominant person estimation task and 36.56% in the least dominant person estimation one, both with full agreement among annotators.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/74300 Collegamento a IRIS

2012 Networked BeagleBoards for Wireless Music Applications
EDERC2012 Proceedings
Autore/i: L. Gabrielli; S. Squartini; E. Principi; F. Piazza
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: One of the most demanding challenges in the field of audio engineering is the transmission of low-latency high quality audio streams over networks. While several protocols nowadays allow wired local network streaming, much effort is still required to achieve similar goals over existing wireless LAN technologies. While the challenge is still far from being solved, several design issues can be highlighted and future scenarios can be outlined. This paper proposes the setup of a wireless music production system based on open hardware and open software which requires relatively low setup effort while allowing for a high flexibility of use. The hardware platform is the Beagleboard, based on Texas Instruments DM3730, running a GNU/Linux OS and the computer music language Pure Data. Such a device can capture electric instrument audio, generate sound, send MIDI or OSC control data, and stream to PCs and other embedded devices operating as mixers, effect racks and so on, enabling an ecosystem of flexible and open devices. Tests conducted on a home wireless network show acceptable latency for many applications.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/80043 Collegamento a IRIS

2012 Real-Time Speech Recognition in a Multi-Talker Reverberated Acoustic Scenario
Advanced Intelligent Computing
Autore/i: Rotili R.; Principi E.; Squartini S.; Schuller B.
Editore: Springer - LNCS
Classificazione: 2 Contributo in Volume
Abstract: This paper proposes a real-time algorithmic framework for Automatic Speech Recognition (ASR) in presence of multiple sources in reverberated environment. The addressed real-life acoustic scenario definitely asks for a robust signal processing solution to reduce the impact of source mixing and reverberation on ASR performances. Here the authors show how the implemented approach allows to improve recognition accuracies under real-time processing constraints and overlapping distant-talking speakers. A suitable database has been generated on purpose, by adapting an existing large vocabulary continuous speech recognition (LVCSR) corpus to deal with the acoustic conditions under study.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/63381 Collegamento a IRIS

2012 A Real-Time Speech Enhancement Front-End for Multi-Talker Reverberated Scenarios
Speech Enhancement, Modeling and Recognition- Algorithms and Applications
Autore/i: Rotili R.; Principi E.; Squartini S.; Piazza F.
Editore: Intech Open
Classificazione: 2 Contributo in Volume
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/66153 Collegamento a IRIS

2012 Low Power High-Performance Computing on the BeagleBoard Platform
EDERC2012 Proceedings
Autore/i: E. Principi; V. Colagiacomo; S. Squartini; F. Piazza
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: The ever increasing energy requirements of supercomputers and server farms is driving the scientific and industrial communities to take in deeper consideration the energy efficiency of computing equipments. This contribution addresses the issue proposing a cluster of ARM processors for high performance computing. The cluster is composed of five BeagleBoard-xM, with one board managing the cluster, and the other boards executing the actual processing. The software platform is based on the Angstrom GNU/Linux distribution and is equipped with a distributed file system to ease sharing data and code among the nodes of the cluster, and with tools for managing tasks and monitoring the status of each node. The computational capabilities of the cluster have been assessed through High-Performance Linpack and a cluster-wide speaker diarization algorithm, while power consumption has been measured using a clamp meter. Experimental results obtained in the speaker diarization task showed that the energy efficiency of the BeagleBoard-xM cluster is comparable to the one of a laptop computer equipped with a Intel Core2 Duo T8300 running at 2.4 GHz. Furthermore, removing the bottleneck due to the Ethernet interface, the BeagleBoard-xM cluster is able to achieve a superior energy efficiency.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/80042 Collegamento a IRIS

2012 Real-Time Activity Detection in a Multi-Talker Reverberated Environment
COGNITIVE COMPUTATION
Autore/i: Emanuele Principi; Rudy Rotili; Martin Woellmer; Florian Eyben; Stefano Squartini; Bjoern Schuller
Classificazione: 1 Contributo su Rivista
Abstract: This paper proposes a real-time person activity detection framework operating in presence of multiple sources in reverberated environments. Such a framework is composed by two main parts: The speech enhancement front-end and the activity detector. The aim of the former is to automatically reduce the distortions introduced by room reverberation in the available distant speech signals and thus to achieve a significant improvement of speech quality for each speaker. The overall front-end is composed by three cooperating blocks, each one fulfilling a specific task: Speaker diarization, room impulse responses identification, and speech dereverberation. In particular, the speaker diarization algorithm is essential to pilot the operations performed in the other two stages in accordance with speakers' activity in the room. The activity estimation algorithm is based on bidirectional Long Short-Term Memory networks which allow for context-sensitive activity classification from audio feature functionals extracted via the real-time speech feature extraction toolkit openSMILE. Extensive computer simulations have been performed by using a subset of the AMI database for activity evaluation in meetings: Obtained results confirm the effectiveness of the approach.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/66823 Collegamento a IRIS

2012 Conversational Speech Recognition In Non-Stationary Reverberated Environment
Behavioural Cognitive Systems
Autore/i: Rotili R.; Principi E.; Woellmer M.; Squartini S.; Schuller B.
Editore: Springer Verlag Germany:Tiergartenstrasse 17, D 69121 Heidelberg Germany:011 49 6221 3450, EMAIL: g.braun@springer.de, INTERNET: http://www.springer.de, Fax: 011 49 6221 345229
Classificazione: 2 Contributo in Volume
Abstract: This paper presents a conversational speech recognition system able to operate in non-stationary reverberated environments. The system is composed of a dereverberation front-end exploiting multiple distant microphones, and a speech recognition engine. The dereverberation front-end identifies a room impulse response by means of a blind channel identification stage based on the Unconstrained Normalized Multi-Channel Frequency Domain Least Mean Square algorithm. The dereverberation stage is based on the adaptive inverse filter theory and uses the identified responses to obtain a set of inverse filters which are then exploited to estimate the clean speech. The speech recognizer is based on tied-state cross-word triphone models and decodes features computed from the dereverberated speech signal. Experiments conducted on the Buckeye corpus of conversational speech report a relative word accuracy improvement of 17.48% in the stationary case and of 11.16% in the non-stationary one.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/66725 Collegamento a IRIS

2012 A New Italian Sign Language Database
Advances in Brain Inspired Cognitive Systems
Autore/i: M. Fagiani; E. Principi; S. Squartini; F. Piazza
Editore: Springer Verlag Germany:Tiergartenstrasse 17, D 69121 Heidelberg Germany:011 49 6221 3450, EMAIL: g.braun@springer.de, INTERNET: http://www.springer.de, Fax: 011 49 6221 345229
Classificazione: 2 Contributo in Volume
Abstract: In this work a new video database of Italian Sign Language (Lingua Italiana dei Segni - LIS) is proposed. Several other attempts have been made in the literature, but they are typically oriented to international languages (like the American Sign Language - ASL). As in speech, also this kind of language presents different peculiarities strictly depending on the geographical location where it is used. The authors have firstly observed that a specific database for LIS is missing and this shoved them to develop the one here presented. It has been conceived to be used in Automatic Sign Recognition and Synthesis (often referred as Automatic Translation into Sign Languages) applications, which represent an important technological opportunity to augment the social inclusion of people with severe hearing impairments. The Database, namely A3LIS-147, is free and available for download.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/74228 Collegamento a IRIS

2011 Multichannel Feature Enhancement for Robust SpeechRecognition
Speech Technologies / Book 1
Autore/i: Rotili R.; Principi E.; Cifani S.; Piazza F.; Squartini S.
Editore: INTECH
Classificazione: 2 Contributo in Volume
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/54618 Collegamento a IRIS

2011 An Evaluation Study on Speech Feature Densities for Bayesian Estimation in Robust ASR
Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces: Theoretical and Practical Issues
Autore/i: S. CIFANI; E. PRINCIPI; R. ROTILI; S. SQUARTINI; F. PIAZZA
Editore: Springer Verlag Germany:Tiergartenstrasse 17, D 69121 Heidelberg Germany:011 49 6221 3450, EMAIL: g.braun@springer.de, INTERNET: http://www.springer.de, Fax: 011 49 6221 345229
Classificazione: 2 Contributo in Volume
Abstract: Bayesian estimators, especially the Minimum Mean Square Error (MMSE) and the Maximum A Posteriori (MAP), are very popular in estimating the clean speech STFT coefficients. Recently, a similar trend has been successfully applied to speech feature enhancement for robust Automatic Speech/Speaker Recognition (ASR) applications either in the Mel, log-Mel or in the cepstral domain. It is a matter of fact that the goodness of the estimate directly depends on the assumptions made about the noise and speech coefficients densities. Nevertheless, while this latter has been exhaustively studied in the case of STFT coefficients, not equivalent attention has been paid to the case of speech features. In this paper, we study the distribution of Mel, log-Mel as well as MFCC coefficients obtained from speech segments. The histograms of the speech features are first fitted into several pdf models by means of the Chi-Square Goodness-of-Fit test, then they are modeled using a Gaussian Mixture Model (GMM). Performed computer simulations show that the choice of log-Mel and MFCC coefficients is more convenient w.r.t. the Mel one from this perspective.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/42013 Collegamento a IRIS

2011 Efficient SNR driven SPLICE implementation for robust speech recognition
Analysis of Verbal and Nonverbal Communication and Enactment: The Processing Issue
Autore/i: Squartini S.; Principi E.; Cifani S.; Rotili R.; Piazza F.
Editore: Springer Verlag
Classificazione: 2 Contributo in Volume
Abstract: The SPLICE algorithm has been recently proposed in the literature to address the robustness issue in Automatic Speech Recognition (ASR). Several variants have been also proposed to improve some drawbacks of the original technique. In this presentation an innovative efficient solution is discussed: it is based on SNR estimation in the frequency or mel domain and investigates the possibility of using different noise types for GMM training in order to maximize the generalization capabilities of the tool and therefore the recognition performances in presence of unknown noise sources. Computer simulations, conducted on the AURORA2 database, seem to confirm the effectiveness of the idea: the proposed approach yields similar accuracy performances w.r.t. the reference one, even employing a simpler mismatch compensation paradigm which does not need any a-priori knowledge on the noises used in the training phase.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/58353 Collegamento a IRIS

2011 Enhanced Multichannel Histogram Equalization for Speech Recognition in noisy acoustic conditions
Proceedings of the 21st Italian Workshop on Neural Nets - Frontiers in Artificial Intelligence and Applications
Autore/i: E. Principi; R. Rotili; S. Squartini
Editore: IOS Press
Classificazione: 2 Contributo in Volume
Abstract: Feature statistics normalization in the cepstral domain is one of the most performing approaches for robust automatic Speech Recognition (ASR) in noisy acoustic scenarios. According to this approach, feature coefficients are normalized by using suitable linear or nonlinear transformations in order to match the noisy speech statistics to the clean speech one. Histogram Equalization (HEQ) is an effective algorithm belonging to this category. Recently some of the authors have proposed an interesting extension to the HEQ original algorithm, in order to suitably deal with the multichannel audio information coming from multi-microphone sensory activity in far-field acoustic scenarios. In this paper the feature normalization capabilities of the multichannel HEQ technique are further enhanced by introducing the kernel estimation technique and employing the multi-condition training for ASR system parametrization. Computer simulations based on the Aurora 2 database have shown that a significant recognition improvement with respect to the single-channel counterpart and other multi-channel techniques can be achieved confirming the effectiveness of the idea.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/65522 Collegamento a IRIS

2011 A Real-Time Speech Enhancement Framework for Multi-party Meetings
NoLISP2011 Proceedings, LNAI
Autore/i: R. Rotili; E. Principi; S. Squartini; B. Schuller
Editore: Springer
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Abstract: This paper proposes a real-time speech enhancement framework working in presence of multiple sources in reverberated environments. The aim is to automatically reduce the distortions introduced by room reverberation in the available distant speech signals and thus to achieve a significant improvement of speech quality for each speaker. The overall framework is composed by three cooperating blocks, each one fulfilling a specific task: speaker diarization, room-impulse response identification and speech dereverberation. In particular the speaker diarization algorithm is essential to pilot the operations performed in the other two stages in accordance with speakers' activity in the room. Extensive computer simulations have been performed by using a subset of the AMI database: Obtained results show the effectiveness of the approach.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/65524 Collegamento a IRIS

2011 Real-Time Joint Blind Speech Separation and Dereverberation in Presence of Overlapping Speakers
Advances in Neural Networks - ISNN 2011
Autore/i: Rotili R.; Principi E.; Squartini S.; Piazza F.
Editore: Springer Verlag Germany:Tiergartenstrasse 17, D 69121 Heidelberg Germany:011 49 6221 3450, EMAIL: g.braun@springer.de, INTERNET: http://www.springer.de, Fax: 011 49 6221 345229
Classificazione: 2 Contributo in Volume
Abstract: Blind source separation and speech dereverberation are two important and common issues in the field of audio processing especially in the context of real meetings. In this paper a real time framework implementing a sequential source separation and speech dereverberation algorithm based on blind channel identification is taken as starting point. The major drawback of this approach consists in the inability of the BCI stage of estimating the room impulse responses when two or more sources are concurrently active. To overcome the aforementioned disadvantage a speaker diarization system have been successfully inserted in the reference framework to pilot the BCI stage. In such a way the identification task can be accomplished by using directly the microphone mixture making the overall structure well suited for real-time applications. The proposed solution works in frequency domain and the NU-Tech software platform has been used on purpose for real-time simulations.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/54620 Collegamento a IRIS

2010 Robust Speech Recognition Using Feature-Domain Multi-Channel Bayesian Estimators
Proceedings of ISCAS 2010
Autore/i: PRINCIPI E; CIFANI S; ROTILI R; SQUARTINI S; F. PIAZZA
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/52324 Collegamento a IRIS

2010 Real-Time Simulation for Acoustic Feedback Cancellation algorithms: an hybrid PC/C6713-DSK based implementation
Proceeding of ECERD2010
Autore/i: S. CIFANI; E. PRINCIPI; R. ROTILI; F. PIAZZA; S. SQUARTINI
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/48087 Collegamento a IRIS

2010 Comparative evaluation of single-channel MMSE based noise reduction schemes for speech recognition
JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING
Autore/i: E. PRINCIPI; S. CIFANI; R. ROTILI; S. SQUARTINI; F. PIAZZA
Classificazione: 1 Contributo su Rivista
Abstract: One of the big challenges in the field of Automatic Speech Recognition (ASR) consists in developing suitable solutions able to work properly also in adverse acoustic conditions, like in presence of additive noise and/or in reverberant rooms. Recently a certain attention has been paid to deeply integrate the noise suppressor in the feature extraction pipeline. In this paper, different single-channel MMSE-based noise reduction schemes have been implemented both in the frequency and cepstral domains and the related recognition performances evaluated on the AURORA2 and AURORA4 databases, therefore providing a useful reference for the scientific community.
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/53364 Collegamento a IRIS

2009 Real-time implementation of robust PEM-AFROW based solutions for acoustic feedback control
Proceedings of the AES 127th Convention
Autore/i: CIFANI S; ROTILI R; PRINCIPI E; SQUARTINI S; F. PIAZZA
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/52281 Collegamento a IRIS

2009 Robust Speech Recognition Using MAP Based Noise Suppression Rules in the Feature Domain
19th Czech-German Workshop on Speech Processing
Autore/i: ROTILI R; PRINCIPI E; CIFANI S; SQUARTINI S; F. PIAZZA
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/53554 Collegamento a IRIS

2009 A real-time Speech-interfaced System for Group Conversation Modeling
Proceedings of WIRN09
Autore/i: ROCCHI C; PRINCIPI E; CIFANI S; ROTILI R; SQUARTINI S; F. PIAZZA
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/50519 Collegamento a IRIS

2009 Pre-processing techniques for automatic speech recognition
Editore: Università Politecnica delle Marche
Classificazione: 8 Tesi di dottorato
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/242152 Collegamento a IRIS

2009 Keyword spotting based system for conversation fostering in tabletop scenarios: preliminary evaluation
HSI'09 - the 2nd IES International Conference on Human System Interaction
Autore/i: PRINCIPI E; CIFANI S; ROCCHI C; SQUARTINI S; F. PIAZZA
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/50874 Collegamento a IRIS

2009 A PEM-AFROW based algorithm for Acoustic Feedback Control in Automotive Speech Reinforcement Systems
Proceedings of ISPA09
Autore/i: S. CIFANI; L. CASAGRANDE MONTESI; R. ROTILI; E. PRINCIPI; S. SQUARTINI; F. PIAZZA
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/52320 Collegamento a IRIS

2008 A Robust Iterative Inverse Filtering Approach for Speech Dereverberation in Presence of Disturbances
APCCAS 2008
Autore/i: R. ROTILI; S. CIFANI; E. PRINCIPI; S. SQUARTINI; F. PIAZZA
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/52923 Collegamento a IRIS

2008 A Multichannel Noise Reduction Front-end based on Psychoacoustics for robust Speech Recognition in highly noisy environments
HSCMA 2008
Autore/i: S. CIFANI; E. PRINCIPI; C. ROCCHI; S. SQUARTINI; F. PIAZZA
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/50859 Collegamento a IRIS

2005 An ICA based approach for Blind Deconvolution of three-dimensional signals
Circuits and Systems, Proceedings of the 2005 International Symposium on
Autore/i: E. PRINCIPI; S. SQUARTINI; F. PIAZZA
Classificazione: 4 Contributo in Atti di Convegno (Proceeding)
Scheda della pubblicazione: https://iris.univpm.it/handle/11566/53010 Collegamento a IRIS


Università Politecnica delle Marche

P.zza Roma 22, 60121 Ancona
Tel (+39) 071.220.1, Fax (+39) 071.220.2324
P.I. 00382520427