Here, you have several projects and demos I am currently involved in or I have been involved in the past:
FEMVoQ: Three-dimensional finite element simulation of voice quality: influence of phonation types and vocal tract shaping
In this project, we aim at significantly increasing the number and types of 3D generated spoken utterances that
have been reported to date in literature producing those sounds and surpassing the current state of the art of 3D
vocal tract (VT) acoustics, without resorting to supercomputer facilities. Moreover, we also wish to endorse these utterances
with some voice qualities (VoQ), whichs arise from variations in the phonation type and in the VT shape, from the well-known
Lombard effect, to the singing formant that allows for a better voice projection, or to speaking in sad or aggressive styles, to name a few.
Project granted by the Spanish Ministry of Science and Innovation (PID2020-120441GB-I00). Period: 2021-09-01 - 2023-08-31
GENIOVOX: Computational generation of expressive voice
In this project, we aim at the computational generation of expressive voice by following a hybrid approach.
We will skip the inherent limitations of voice corpus but benefit from transformation voice techniques developed in that area.
The key idea will be to map the parameter modifications in recorded voice, which are responsible for expressive effects, into the
glottal pulses models and vocal tract geometries. The former are used as boundary conditions at the glottis in numerical simulations
of vocal tract acoustics. Vocal tract geometries will vary in time to achieve the desired expressive effects.
Project granted by the Spanish Ministry of Economy, Industry and Competitiveness (TEC2016-81107-P). Period: 2016-12-30 - 2019-12-29
DYNAMAP: DYnamic Acoustic MAPping - Development of low cost sensors networks for real time noise mapping
DYNAMAP is a LIFE+ project aimed at developing a dynamic noise mapping system able to detect and represent in real
time the acoustic impact of road infrastructure. The main objective of this project is to ease and reduce the cost of
periodically updating noise maps, as required by the European Directive 2002/49/EC on environmental noise. To that end,
an automatic monitoring system, based on customised low-cost sensors and a software tool implemented on a general purpose
GIS platform, will be developed and built in two pilot areas located along the A90 motorway that surrounds the city of Rome (Italy),
and inside the agglomeration of Milan (Italy). A one-year survey will then be undertaken to check the reliability, effectiveness and
efficiency of the DYNAMAP system.
In this project, we will develop an Anomalous Noise Events Detection Algorithm (ANED) to identify and discard the events that are
not representative of road traffic noise (denoted as anomalous events) and that, as a consequence, distort the noise levels measured by the sensors.
LIFE+ project granted by the European Commission (LIFE13 ENV/IT/001254). Period: 2014-07-01 - 2019-06-30
EUNISON: Extensive UNIfied-domain SimulatiON of the human voice
In the EUNISON project, we seek to build a new voice simulator that is based on physical first principles to an unprecedented degree.
From given inputs, representing topology or muscle activations or phonemes, it will render the 3-D physics of the voice, including of
course its acoustic output. This will give important insights into how the voice works, and how it fails. The goal is not a speech synthesis
system, but rather a voice simulation engine, with many applications; given the right controls and enough computer time, it could be made to speak in any language, or sing in any style.
KTH coordinates this project, with a budget of 2.96 million euros.
In this project, Dr. Oriol Guasch is the Scientific coordinator, and our team leads
Work packages 5 (Simulations of the Vocal Tract) and 8 (Dissemination).
Future Emerging Technologies (FET) project granted by the European Commission (308874). Period: 2013-03-01 - 2016-02-29
THOFU: Technologies for the HOtel of the FUture
The main objective of this project is to design the hotel of the future, since its construction, its parts,
the interaction with users, security and integration with its surroundings and the Internet.
Gesfor Group leads this project with a budget of 23 million euros.
In the project, our team will be involved in the work-package related to intelligent and adaptive interfaces within
a high-tech hotel, researching on new paradigms of interaction and studying usability and user experience.
National Consortium (CENIT) granted by the Spanish Ministry of Science and Innovation (CEN-2010-1019). Period: September 2010 - December 2013
CreaVeu: From any text to any voice
The main objective of this project is to integrate text-to-speech and voice conversion technologies, besides providing
an intuitive graphic user interface that allows a non-expert user to adapt the main expressive characteristics of the synthesized
voice to a given target (timing, character, etc.). The main domain applications of this technology are entertainment, videogames, etc.
Agència de Gestió d'Ajuts Universitaris i de Recerca (AGAUR) (Catalan Government) (2010-VALOR-00164). Period: 2011 - 20113
EmoLib: Emotion identification from text
EmoLib is a library that extracts the affect and emotions from an incoming text by tagging such text according to the feeling that is written or being
conveyed. EmoLib has been coded in the Java programming language.
This demo was developed by Alexandre Trilla within his ABD Thesis.
evMIC: Multimodal, Immersive and Collaborative Virtual Environments
The main objective of this project is to create an interoperable platform, user-centric, allowing the creation of virtual
learning environments, overcoming the current limitations and aligning with the current definition of what will be "The Future Internet."
Besides contributing to state-of-the-art documents on speech technologies, multimodal processing and graphics and virtual reality, our team is going to participate in developing
interfaces to interact with the virtual environment involving expressive text-to-speech synthesis, multimodal affect analysis, and 3D avatars modelling and synthesis.
Singular Strategic Project granted by the Spanish Ministry of Industry, Tourism and Trade (TSI-020301-2009-25). Period: 2009 - 2011
INREDIS: INterfaces for RElations between Environment and people with DISabilities
The main objective of this project consists of developing grounding technologies to allow creating communication and interaction channels between disabled people and their environment.
Technosite leads this project with a budget of 23.6 million euros.
Besides contributing to detailed state-of-the-art documents on speech technologies, multimodal processing and graphics and virtual reality, our team is going to participate in developing applications
involving expressive text-to-speech synthesis, multimodal affect analysis, and 3D avatars modelling and synthesis.
National Consortium (CENIT) granted by the Spanish Ministry of Industry, Tourism and Trade (CEN-2007-2011). Period: 2007 - 2010
MAGNUS: Mouse Advanced GNU Speech
It is a speech controlled mouse pointer application through Catalan voice commands.
This application aims to provide oral accessibility for people with reduced mobility.
This project has been developed by Alexandre Trilla within his Master Thesis.
Project members:
Project Coordinator: The Generalitat de Catalunya's Education Department
The Acoustics Section of Enginyeria i Arquitectura La Salle
SAVE: Expressive AudioVisual Synthesis
The project is focused on the research of a multimodal output interface
with high expressivity content, which makes it possible to give a high naturality
perception to the end user. The project proposes the study and development of a novel
expressive audiovisual synthesis system based on a photo realistic talking head.
Project granted by the Spanish Ministry of Science and Technology (TEC2006-08043/TCM).
Period: 2007 - 2009
SALERO: Semantic AudiovisuaL Entertainment Reusable Objects
Our group is involved in this project to develop innovative Multilingual Text-to-Speech
techniques for the achievement of expressive speech synthesis in the cross media-production framework
(e.g. movies, games, broadcast, etc.)
Project supported by the European Comission (IST-FP6-027122). Period: 2006 - 2009 (I was involved in the project from 2006 to 2007)
Sam, the Virtual Weatherman
Automatic service for weather forecast on demand (TV, Internet and mobile devices) by means of a virtual speaker called Sam.
Our group has developed the corpus-based Text-to-Speech system embedded in the forecast application.
Project members:
The Catalan Broadcasting Corporation (CCRTV)
The Interactive Technology Group (Pompeu Fabra University)
The Speech Technologies Area of Enginyeria i Arquitectura La Salle
Project supported by CCRTV and CIDEM (RDITSCON04-0005). Period: May 2004 - April 2005
IntegraTV-4all
Adapted leisure, information and remote assistance services via a television set, with advanced natural language voice
communication functionalities for people with sensory disabilities and the aged.
Our group has developed an audio-visual alarm clock, which integrated in the hotel TV menu, as a result of improving our
previous virtual speaker (see virtual speaker section).
Project members:
Project Coordinator: TMT Factory
Fundación ONCE
Universidad Politécnica de Madrid (UPM)
Universidad Carlos III de Madrid
The Speech Technologies Area of Enginyeria i Arquitectura La Salle
Project supported by the Spanish Ministry of Science and Technology (FIT-350301-2004-2).
Period: September 2004-December 2005
VIRTUAL SPEAKER
In this project, we developed an expressive realistic virtual speaker in 2D based on image processing and text-to-speech synthesis in Catalan, Spanish and English:
Project granted by the Spanish Ministry of Science and Technology (FIT-150500-2002-410). Period: year 2002
WEIGHT TUNING INTERFACE FOR SPEECH SYNTHESIS (CATALAN)
This is a web platform, based on evolutive computation, which has beend designed to find the optimal weight configuration of the cost function
for unit selection text-to-speech synthesis:
ITP - Speech Processing Interface
This is an interface for speech labelling (automatic and/or manual). Pitch
marks, phoneme boundaries, pitch curve, spectogram and prosodic features are some of the speech parameters that can be extracted by means of this interface.
It runs under Windows platforms.
ALGTEC (ALGEBRA & TECHNOLOGY)
ALGTEC is a multimedia application that helps and motivates the engineering student
for the learning of Algebra concepts. It describes some Algebra concepts applied to
technologic situations by means of a virtual teacher.
First version: based on Microsoft Agents and Text-to-Speech
In the future it will incorporate our Virtual Speaker