Teflon at Fonetik 2024

We presented our data collection at Fonetik 2024 in Stockholm, Sweden. The conference also hosted a symposium in celebration of Björn Lindblom 90th birthday.


The abstract can be found here: https://zenodo.org/records/11396068

Third face-to-face TEFLON meeting in Stockholm, March 2024

Our third face-to-face meeting within the TEFLON project took place in Stockholm, March 20-22. Sofia hosted the meeting at Karolinska Institutet, and the research team got to try out both the Huddinge and the Solna campus of the university.

This time, a total of 13 participants from all project partners – Aalto University, Tampere University, Oslo University, KI and NTNU – were present. The topics for discussion covered updates of learning experiments in Finland, Norway and Sweden, a shared view on user experience from the different experiments, and ideas regarding the use of the collected data. At this stage, most of the data have been collected, and the teams at KI, Tampere and Oslo are now working to process the data and prepare for analysis. This involves a lot of manual perceptual assessment, that will be used to examine potential learning effects for the users of the app. In addition, the manual assessments will be used for re-training the acoustic models that underlie the automatic rating in the app, which we hope will improve the quality of automatic ratings for future users.

In between discussions, we had time for food and drinks.

Another engaging topic during the meeting was ideas for future collaboration, both concerning coming shared publications, and for potential continuation projects. It turns out that we share interests that we were not aware of before! Hopefully, these ideas can turn into future projects!

Challenges collecting and sharing speech data from children

The research and development that we carry out in the Teflon project relies on collections of speech recordings from foreign children who learn to speak the Nordic languages and from Swedish children with speech sound disorder. This kind of data is unique for several reasons: firstly, no data sets of children speaking Nordic languages are publicly available, as opposed to adult data that is far from scarce. Secondly, we are interested in second language (L2) speakers of Nordic languages, because we want to study how to detect and characterize their mispronunciations. Thirdly, there is no publicly available data of Nordic children with speech sound disorder. Consequently, substantial effort during the project was devoted to collecting such data. This was done both simulating the situation where children play the Pop2Talk game, and, later, recording the game sessions with the students.

The intention when collecting this data was to make all the corpora publicly available, in the spirit of the open science principles. However, we soon realized that the rules defined by the General Data Protection Regulation (GDPR) are interpreted very differently in the different Nordic countries (and, possibly in the other European countries). Those rules have the goal of protecting the privacy of European citizens and refer to any information that may identify each individual. In collections of text, complying with these rules may just mean removing any personal information such as names, telephone numbers, user IDs, or addresses. For speech the situation is more complicated. The main issue regarding speech, is related to the question if the voice is sufficient in order to identify the speaker, and if this, in turns, constitutes sufficient grounds to forbid public sharing of the recordings.

Our experience in Sweden is that different lawyers may interpret the law differently. The lawyers at KI, for example, think it is fine to share recordings of isolated words. But some disagree. When consulting the lawyers at Språkbanken Tal (the main channel for sharing speech data in Sweden), however, their interpretation of the law was much more strict. A similar stricter interpretation is now used in Finland (although publication of speech data from individuals over 16 years has been permitted in the past). In contrast, in Norway, the rules are interpreted to only apply to what is spoken, and not to the voice that speaks. As a result, of the many corpora that we recorded for Finnish, Swedish and Norwegian as target languages, we will only be able to publish the Norwegian corpus.

The different responses to our requests to the ethical committees in the different countries illustrate the complexity of speech data sharing. It is useful to stress that the L2 data sets for Swedish, Norwegian and Finnish were completely equivalent in all respects: age of participants, characteristics of the participants, content of the recordings, anonymization of the participants in the metadata, sharing conditions requested, to name a few. It is also important to point out that the characteristics of child voice, that may allow the identification of an individual, change very quickly with age. This means that the participants will not be identifiable by their voice in just a few months or years after the recordings. We hope that, in the future, uniform interpretations of regulations and practices in handling speech data will be introduced. We believe that the availability of open-access speech corpora with child speech is of essential value for scientific research and speech technology development, and they will bring about great advantages for society.

We have submitted a paper describing the data collected in the Teflon project in more detail to the LREC2024 conference that will take place in Turin, Italy, on May 20-25. Please visit the LREC2024 website for more information. 

Aalto’s participation in the ACM Multimedia 2023 Computational Paralinguistics and Multimodal Sentiment Analysis Challenge

The Computational Paralinguistics Challenge (ComParE) was organized as part of the ACM Multimedia 2023 conference. The annual challenge deals with states and traits of individuals as manifested in their speech and further signals’ properties. Each year, the organizers introduce tasks, along with data, for the participants to develop solutions.

Our team, consisting of PhD students: Dejan Porjazovski, and Yaroslav Getman, as well as the Research Fellow Tamás Grósz and Professor Mikko Kurimo, participated in the two challenges provided by the organizers:

Requests and Emotion Share.

The Requests sub-challenge involves real interactions in French between call centre agents and customers calling to resolve an issue. The task is further divided into two sub-tasks: Determine whether the customer call concerns a complaint or not, and whether the call concerns membership issues or a process, such as affiliation.

The multilingual Emotion Share task, involving speakers from the USA, South Africa, and Venezuela, tackles a regression problem of recognising the intensity of 9 emotions present in the dataset. The intensities of the emotions that need to be recognized are anger, boredom, calmness, concentration, determination, excitement, interest, sadness, and tiredness.

Our team tackled these issues by utilizing the state-of-the-art wav2vec2 model, along with a Bayesian linear layer. The choice for the appropriate wav2vec2 model, along with the best-performing transformer layer, and the Bayesian linear layer, led to our team winning the Emotion Share sub-challenge:

http://www.compare.openaudio.eu/winners/

For more technical details about the work, see our paper:

https://dl.acm.org/doi/10.1145/3581783.3612848

Simultaneously, we also participated in the 4th Multimodal Sentiment Analysis Challenge. The team consisted of PhD students Anja Virkkunen and Dejan Porjazovski, together with Research Fellow Tamás Grósz and Professor Mikko Kurimo.

We chose to tackle two extremely hard problems: Humour and Mimicked Emotions detection using videos. Our solution focused on large, pre-trained models, and we developed a method that could identify relevant parts outputs of these foundation models, making them a bit more transparent. The empirical results demonstrate that these large AIs have smaller specialized outputs that contain relevant information for detecting jokes and emotions of the speakers based on visual and audible cues.

For more details, check out our paper:

https://dl.acm.org/doi/10.1145/3606039.3613102

OSALLISTUJIA KAIVATAAN TUTKIMUKSEEN SUOMESSA (RESEARCH IN FINLAND)

Hei,

Etsimme osallistujia yhteispohjoismaiseen tieteelliseen tutkimukseen, jossa tutkitaan lasten kielen oppimista oppimispelien avulla. Tampereen yliopistossa ja Aalto-yliopistossa tutkitaan pelisovelluksia, joilla voi harjoitella vierasta kieltä tai matematiikkaa. Tämä tutkimus on jatkoa vuonna 2016 alkaneelle lasten digitaalisen oppimisen tutkimukselle, jossa on aiemmin tutkittu mm. lasten englannin kielen oppimista äänipohjaisten sovellusten avulla.

Tutkimuksen mittaukset tapahtuvat Helsingissä.

Kutsumme mukaan tutkimukseen ekaluokkalaisia ja esikoulussa olevia lapsia, joiden äidinkieli on suomi, ja jotka eivät vielä puhu sujuvasti ruotsia tai englantia.
Osa lapsista pelaa kielenoppimispeliä ja osa puolestaan pelaa DragonBox Numbers -matematiikkapeliä.

Pyydämme kiinnostuneita ottamaan yhteyttä sähköpostitse: anna.2.smolander@tuni.fi

Lämpimästi tervetuloa mukaan!

Data collection started in Oslo schools

The TEFLON studies in Oslo, Norway, focus on investigating how gaming interventions affect immigrant children’s development and language learning. We have recently launched data collection at schools in Oslo and surrounding communities. The study is being coordinated by the doctoral researcher Anne Marte Haug Olstad.

We are inviting pupils at so-called Welcome Classes (Velkomstklasser / Mottaksklasser) to participate in the study. Pupils in these classes have recently arrived in Norway, have varying language backgrounds, and are in the early phases of learning Norwegian.

Within this study, the kids get a chance to play educational games, such as our recently developed language learning game “Pop2Talk Norwegian” and the “Dragonbox” maths game (https://dragonbox.com/). They get to play the games 4-5 times per week for 4 weeks during school hours.

Photo credit: Anna Smolander

For the research, we are testing children’s language and cognitive skills and how playing the games may help their learning and skill development. The testing is currently being conducted at the schools. We are also planning to invite some pupils to take part in an EEG experiment and measure how children’s brain responses are affected by the learning. This study will take place at the Socio-Cognitive Laboratory at University of Oslo in Blindern.

We are thankful to all the participants as well as the engaged teachers in these classes for making this research possible!

Interspeech 2023

Hi, my name is Xinwei. I am a second year Phd student within the TEFLON project. I was very happy and excited about attending the INTERSPEECH 2023 from 20.August to 24.August  in Dublin.

It was my first time joining such an international and well-known conference. In this conference I met a lot of senior researchers from all over the world in the fields of speech and signal processing that includes my supervisor Prof. Torbjørn Karl Svendsen. I also met Prof. Mikko Kurimo from the university of Aalto and many of the other members in our project during this conference. 

There were so many interesting ideas and techniques being proposed this year. I just wished to clone several of myself in order to be the audience of several parallel presentations,  instead of running from one room to another all over the building. I also noticed many inspirational posters that inspired me  for my research fields. In addition, I also encountered some of the posters with which I was not familiar with, e.g., speech synthesis and speech enhancement.  It was really amazing to talk to those presenters and grasp the main concepts and problems of those fields in half an hour’s time.   

I also presented our work with the title “An Analysis of Goodness of Pronunciation for Child Speech”. Thanks to many questions and advice from the audiences, I am now having a clearer picture of my future research work.

Really amazing conference. Can’t wait to meet all of you next year!

The Pop2Talk Nordic game in autumn 2023

Hi, I’m Nhan Phan, a PhD student at Aalto University. I’m part of the Teflon project, and my role is to maintain the Pop2Talk Nordic mobile application. Pop2Talk is a language learning game that allows children to imitate words that were said during the game progress, and get a rating of their pronunciation. Based on the original game version intended for English, the Pop2Talk Nordic version expanded to Nordic languages. As I write this blog post, we have successfully developed four different versions of the game. These include:

  • A Finnish version for children who find it challenging to pronounce the phoneme “R”.
  • A Swedish version for children with Speech Sound Disorder.
  • And two other versions for children who want to learn Norwegian or English.

While much of the project’s foundational work was led by experienced partners in children’s speech-language pathology and in automatic speech recognition technology, I was fortunate to be involved in the project, being responsible for the design and maintenance of both the server and the mobile game.

With our need for illustrations in multiple languages, we turned to the latest AI technology for automatic image generation. I provided our partners with technical instructions on crafting text prompts to ensure the resulting images were both captivating and child-friendly. To my surprise, the images they produced were outstanding, surpassing all my expectations. Check out this cool picture from Magdalena at Karolinska Institutet. Funny thing, the girl is the spitting image of my 2-year-old daughter. I had to ask right away to use it as my daughter’s avatar.

When I showed the game to my daughter, she was super into it—even though she’s just starting to learn Finnish. She got so hooked that I had to yank the phone from her! I really hope other kids get just as excited when they play (and fingers crossed, they’ll want to take a break after a while).

As we move into October, we’re preparing for numerous experiments. These will study how children acquire vocabulary and pronunciation, and evaluate the game’s effectiveness in helping them. Those experiments will be conducted in various schools across Finland, Sweden and Norway. My primary focus at the moment is to ensure the server runs without any serious problems.

NTNU (Teflon) at NNL2P

On the 12th of October 2023, Giampiero Salvi presented the work carried out in Teflon at the Nordic Network for L2 Pronunciation (NNL2P) in Trondheim. The workshop had a number of very interesting keynote speakers that brought long experience and insights in the field of pronunciation assessment. The Teflon presentation was very well received. Please the workshop website for more information.

TEFLON at SLaTE 2023

Speech and Language Technology in Education (SLaTE) is a ‘Special Interest Group’ (SIG) of the ‘International Speech Communication Association’ (ISCA).

After 4 years without meetings the SLaTE met again at Trinity College in Dublin in August 2023 just before the ISCA’s main event INTERSPEECH 2023. TEFLON partners had three presentations in SLaTE and one in its sister workshop SIGUL 2023 that was run parallel to it in another nearby meeting room.

In SLaTE 2023:

In SIGUL 2023:

Second face-to-face Teflon meeting in Trondheim, June 2023

On the 20th-21st of June, 12 researchers from Aalto University, Tampere University, Karolinska Institutet, University of Oslo and NTNU gathered at the Campus of NTNU for the second face-to-face meeting of the Teflon project.

As for the first physical meeting, we had two full days including discussion sessions about the data, evaluations, automatic speech recognition, game design, automatic and human pronunciation assessment, experiment design, publications, dissemination and project management. The game was already in a mature state of development, so much of the discussion was devoted to planning the experiments with school children in the three countries. On Tuesday evening we continued after the meeting to have dinner in downtown Trondheim and enjoy the good company and delicious food in the restaurant To Rum og Kjokken where we had a room all to ourselves. We also took a long walk through the city in the light of the night sun.

The next steps in the project include running the experiments with the children, collecting new data from the experiments and analyzing it.

 Listening evaluations

How does a ”correct” utterance sound?

Written by: Sofia Strömberggson, Karolinska Institut

A picture containing text, slot machine

Description automatically generated

An important feature of the speech training app developed in TEFLON is to provide children with feedback concerning their speech production. This feedback will be presented as stars, with 5 stars representing “correct” pronunciation, and the fewer stars, the further away from “correct”. But how does the app know how many stars to present for a given utterance? In fact, this is something that the app (or rather, the acoustic model within the app) has to learn from how humans have evaluated children’s utterances.

But even for humans, the task of evaluating the correctness of children’s utterances is not trivial. For one, there are very many ways for utterances to be “correct”. (And that is a good thing when it comes to human perception in daily life! It means that we can “tolerate” a lot of variation in speech production without communication being too easily disrupted.) But also, there are even more ways that utterances can be “incorrect”. For example, is the utterance still intelligible as the intended word? And for intelligible utterances – are one or more speech sounds affected? Are different types of speech errors more severe than others? In TEFLON, the different research teams have tackled this challenge in slightly different ways.

At Karolinska Institutet, the human evaluators have used the same 1-5 scale that will be used in the app. In an effort to ensure consistency in the evaluations, the following rating key was specified: 

  1. not at all identifiable as the target word
  2. not identifiable as the target word
  3. slight phonemic error (e.g., the target word “kollision” is pronounced “kolliton”)
  4. subphonemic error/”unexpected variant” (e.g., the /r/-sound in “ros” is not quite produced as you’d expect)
  5. prototypical/adult-like/correct

But even with this rating key, the evaluation decisions are not always easy. The researchers are currently running listening experiments with more listeners – experts (in this context: speech-language pathologists) and non-experts – to explore listener behaviors more systematically. For this task, the listeners are instructed to rate the utterances from 1 to 5 as they think the app should rate the utterance (i.e., without being provided with a rating key). Through these experiments, the researchers hope to learn more concerning whether expert and non-experts differ in their ratings, and whether listener ratings are more consistent for listeners who have access to “reference samples” when conducting their evaluations. The researchers aim to present their findings later during 2023.

Aalto’s team participates in the ACM Multimedia 2022 Computational Paralinguistics ChallengE

The ACM Multimedia 2022 Computational Paralinguistics ChallengE (ComParE) is an open Grand Challenge dealing with states and traits of speakers as manifested in their speech signal’s properties and beyond. At the start of the competition, the data is provided by the organizers, and the Sub-Challenges are generally open for participation.

This year our team, consisting of PhD students (Yaroslav Getman and Dejan Porjazovski), Research Fellows (Tamás Grósz and Sudarsana Reddy Kadiri), and Professor Mikko Kurimo, embarked on tackling two Sub-Challenges: the Vocalisations and the Stuttering one. 

In the Stuttering Sub-Challenge, participants were tasked to develop a system that can recognize different kinds of stuttering (e.g. word/phrase repetition, prolongation, sound repetition and others). Stuttering is a complex speech disorder with a crude prevalence of about 1 % of the population. Monitoring of stuttering would allow objective feedback to persons who stutter (PWS) and speech therapists, thus facilitating tailored speech therapy, with the automatic detection of different stuttering phenomena as a necessary prerequisite. As training data, we could use the Kassel State of Fluency corpus containing approximately 5600 annotated samples. 

In the Vocalisations Sub-Challenge, non-verbal vocal expressions (such as laughter, cries, moans, and screams) from the Variably Intense Vocalizations of Affect and Emotion Corpus are used for classifying the expression of six different emotions. Such human non-verbals are still understudied but are ubiquitous in human communication. This task was extremely challenging because the training data contained only female voices, while the developed systems were evaluated on male sounds.

Our team developed solutions for both tasks using state-of-the-art models like wav2vec 2.0, data augmentation and other simple tricks based on the distributed training data. For technical details, see our paper:

Tamás Grósz, Dejan Porjazovski, Yaroslav Getman, Sudarsana Kadiri, and Mikko Kurimo. 2022. Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7026–7029. https://doi.org/10.1145/3503161.3551572

In total, 23 teams from all around the world registered for the competition, of which 8 submitted solutions for the Stuttering, and 11 for the Vocalisations Sub-Challenge. 

Aalto’s team won both competitions, earning two spaces in the hall of fame:

 http://www.compare.openaudio.eu/winners/

Teflon team presents at Interspeech

The 23rd INTERSPEECH Conference took place from September 18 to 22, 2022, at Songdo ConvensiA, in Incheon, Korea, under the theme Human and Humanizing Speech Technology. INTERSPEECH is the world’s largest and most comprehensive conference on the science and technology of spoken language processing. INTERSPEECH conferences emphasize interdisciplinary approaches addressing all aspects of speech science and technology, ranging from basic theory to advanced applications.

Truly a city of the future, Songdo sits adjacent to Seoul, regarded as one of the technology capitals of the world. The city’s underground railway already offers high-speed WiFi, with electronic panels at the exits and provides the waiting time for connecting to buses or trains, while companies like Samsung Electronics are already working on linking household devices to mobile phones. On the technological front, Songdo is a brand-new city that offers the chance to integrate innovation into daily life truly.

This year, the Teflon team submitted a paper titled “wav2vec2-based Speech Rating System for Children with Speech Sound Disorder” to Interspeech. The article described our initial systems developed using Sofia Strömbergsson’s corpus of children suffering from speech sound disorder. Speech therapies, which could aid these children in speech acquisition, greatly rely on speech practice trials and accurate feedback about their pronunciations. Our solutions could be the basis for software tools that would enable home therapy and lessen the burden on speech-language pathologists. Our submission was accepted with very positive reviews and selected for a poster presentation.

We (Tamás & Mikko) presented our poster on Wednesday, September 21, 13:30-15:30(KST). We were lucky enough to be placed right in front of the main entrance, resulting in many people stopping at our stand to check the poster.

We had several very intriguing conversations and gained some valuable ideas and suggestions from our colleagues, which we will explore in the future. After a fruitful poster session, we let some steam off during the gala banquet, where we had the chance to sample Korean cuisine and listen to some authentic K-POP music.

Sources:

https://www.interspeech2022.org/general

Getman, Y., Al-Ghezi, R., Voskoboinik, K., Grósz, T., Kurimo, M., Salvi, G., Svendsen, T., Strömbergsson, S. (2022) wav2vec2-based Speech Rating System for Children with Speech Sound Disorder. Proc. Interspeech 2022, 3618-3622, doi: 10.21437/Interspeech.2022-10103

The first face-to-face Teflon meeting in Helsinki, September 2022

In 5-6 September, 15 researchers of Aalto University, Tampere University, Karolinska Institutet, University of Oslo and NTNU (Trondheim) gathered in the Campus of Aalto University for the first face-to-face meeting of the Teflon project. The project has been running already for almost 1.5 years, but due to the pandemic, our kick-off and all other meetings have been only virtual. Actually 4 of us still had to participate remotely due to sudden Covid-19 cases in NTNU’s team etc, but for the rest this was a really delightful experience to meet and have in-depth discussions of the project, science, technology and everything else.

We had two full days including discussion sessions about the data, evaluations, automatic speech recognition, game design, automatic and human pronunciation assessment, experiment design, publications, dissemination and project management. Because we were still in the early stages of building the children’s pronunciation game, collecting and annotating the data and training the automatic assessment, the focus was clearly on planning the next steps of the project.  On Monday evening we continued after the project to have dinner in downtown Helsinki and enjoy the good company and delicious food in the restaurant Emo.

The next steps in the project include finishing the game codes, developing multitask systems and faster speech processing servers, repeating the previously run tests on the new Finnish, Swedish and Norwegian data, finishing the human assessments for these data, fix word lists and other specs for the game for each language, and recruiting speakers for the remaining training data.

Greetings from Aalto University, Otaniemi!

Mikko Kurimo is the coordinator of Teflon and leads the research group at Aalto University. Prof. Kurimo has been the head of the automatic speech recognition (ASR) group at Aalto since 2000. He has led the group in many national and international machine learning and ASR projects. Kurimo’s work is internationally best known for unsupervised subword language modeling for morphologically complex languages such as Finnish, Estonian, Turkish and Arabic. His research interests include deep learning methods for automatic speech recognition and spoken language modeling.

Aalto’s team is in charge of the ASR technology that is required in Teflon and will also provide the game platform for Teflon. The platform is based on the previous projects of Mikko Kurimo and Sari Ylinen where English was the target language. Now the goal in Teflon is to apply it to Finnish, Swedish and Norwegian.

The Aalto team consists of:

  • Tamas Grósz is a post-doctoral researcher whose specification area is ASR and computational paralinguistics. His background lies in developing machine learning based ASR systems. 
  • Ragheb Al-Ghezi is a doctoral researcher whose main research focus lies in applying ASR and machine learning in language assessment. He has a versatile background in natural language processing and second language (L2) learning.
  • Ekaterina Voskoboinik is a doctoral researcher whose main research focus lies in applying statistical language modeling and machine learning in ASR and spoken L2 assessment. Her background is on learning and analysing the representations of words and subwords for morphologically rich languages.
  • Yaroslav Getman is a doctoral researcher whose main research focus lies in applying self-supervised learning and pre-trained models for ASR in low-resource tasks like L2 learners’ ASR for spoken language assessment.
  • Aku Rouhe is a doctoral researcher whose main research focus is in ASR. He has a versatile background speech and spoken language modeling including speaker adaptation, voice activity detection, subword language modeling using statistical morphemes, speech translation and decoding algorithms and miscue tolerant ASR in L2 reading tutoring.
  • Nhan Phan is a master’s student whose research topic is developing a mobile app to give feedback for beginner level L2 learners’ in read aloud tasks. His skills also include game programming using Unity and his task is to modify the game platform for Teflon tasks.

We are excited and looking forward to the experiment with the Nordic languages.

Aalto University is the coordinating partner of the Teflon project. Aalto University describes itself as a meeting place for the fields of science, art, technology, and business. Today the university is one of the most prestigious ones in Finland. The university started operating in 2010 and the goal was to merge together the old Helsinki School of Economics, Helsinki University of Technology and the University of Art and Design Helsinki and create a new multidisciplinary university. The Aalto university campus in Otaniemi is situated in Greater Helsinki in the city of Espoo, just a metro ride away from central Helsinki. 

The campus was originally built almost in the middle of the forest in Otaniemi. The rebuilding after the Second World War in Finland led to a need to educate more engineers – and hence to a need to build a new campus area and laboratory spaces for the Helsinki University of Technology and VTT Technical Research Centre of Finland Ltd. The Otaniemi campus is a park campus design from the 1950s and the city plan of the area is the work of the Finnish architect Alvar Aalto who has along with other esteemed Finnish architects such as Reima and Raili Pietilä and Heikki and Kaija Sirén also designed multiple buildings in the campus. The name of the university is a tribute to Alvar Aalto who himself graduated from the former Helsinki University of Technology.

List of recent publications: 

  • Anssi Moisio, Dejan Porjazovski, Aku Rouhe, Yaroslav Getman, Anja Virkkunen, Ragheb AlGhezi, Mietta Lennes, Tamás Grósz, Krister Lindén and Mikko Kurimo. Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks. Language Resources & Evaluation (2022).
  • Kathania, Hemant; Kadiri, Sudarsana; Kadyan, Virender; Kurimo, Mikko. Data augmentation using spectral warping for low resource children ASR. Journal of Signal Processing Systems. Accepted for publication.
  • Kathania, H , Kadiri, S , Alku, P & Kurimo, M. A Formant Modification Method for Improved ASR of Children’s Speech. Speech Communication (2022), vol. 136 , pp. 98 – 106.
  • Kathania, Hemant; Kadiri, Sudarsana; Alku, Paavo; Kurimo, Mikko. Using data augmentation and time-scale modification to improve ASR of children’s speech in noisy environments. Applied Sciences (Springer International Publishing AG), 2021.
  • Ylinen Sari, Smolander Anna-Riikka, Karhila Reima, Kakouros Sofoklis, Lipsanen Jari, Huotilainen Minna, Kurimo Mikko. The Effects of a Digital Articulatory Game on the Ability to Perceive Speech-Sound Contrasts in Another Language. Frontiers in Education, 2021 ; Vol 6.
  • Ragheb Al-Ghezi, Yaroslav Getman, Ekaterina Voskoboinik, Mittul Singh, Mikko Kurimo. Automatic Rating of Spontaneous Speech for Low-Resource Languages. IEEE Spoken Language Technology Workshop (SLT 2022), 2022.
  • Grósz, T., Porjazovski, D., Getman, Y., Kadiri, S., Kurimo, M. Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering. ACM Multimedia 2022 Conference : Grand Challenges.
  • Yaroslav Getman, Ragheb Al-Ghezi, Katja Voskoboinik, Tamás Grósz, Mikko Kurimo, Giampiero Salvi, Torbjørn Svendsen, and Sofia Strömbergsson. Wav2vec2-based Speech Rating System for Children with Speech Sound Disorder. Proc. Interspeech 2022. 
  • Ragheb Al-Ghezi, Yaroslav Getman, Aku Rouhe, Raili Hildén and Mikko Kurimo. Self-supervised End-to-End ASR for Low Resource L2 Swedish. Proc. Interspeech 2021.

https://www.aalto.fi/fi/aalto-yliopisto

https://research.aalto.fi/fi/persons/mikko-kurimo

Photo: Unto Rautio

Researchers of Teflon project: Sofia Strömbergsson

Author: trainee Elina Aittokallio, Tampere University

Greetings from Tampere!In the following blogpost we will briefly introduce one of the researchers working on the Teflon project, Sofia Strömbergsson from Karolinska Institutet (KI), the largest center of medical academic research university in Sweden.

Kuvan esikatselu

Strömbergsson is an associate professor of Karolinska Institutet. She is a teacher and conductor of research concerning children’s speech and language disorders. Strömbergsson earned her doctoral degree in April 2014 at the Royal Institute of Technology (KTH), Stockholm. Strömbergsson also has a medical science degree in SLP (Lund 2007) and in computational linguistics (Uppsala, 2000). 

We’ve had the honor to hold an interview on Sofia’s job as a researcher:

What sparked your interest into the field of logopedics? How did your career path form?

I’ve always been interested in language, and particularly in its spoken form – different ways of speaking, and how that’s perceived by listeners. I first started out in the field of speech technology, and I worked a few years in the industry developing text-to-speech for Swedish. I really enjoyed the task of trying to “teach” the computer how to read. After some time, though, I wanted to make use of my interest in speech and language in ways that would benefit others more directly. That’s when I decided to embark on a study program in logopedics. When I finished, an opportunity opened up for pursuing PhD studies at KTH, where I had the chance to combine my two interests into a thesis project on how children perceive automatically “corrected” versions of their own recorded misarticulation. In 2014, after I finished, I joined the Division of Speech-Language Pathology at KI, where I still am today.  

How do you find Karolinska Institutet as a place to work and do research at? 

When I joined KI and the Division of Speech and Language Pathology in 2014, it felt a little bit like coming home. Although I had really enjoyed my time at KTH, I had been a bit of an odd bird, being the only SLP among engineers (and, admittedly, also some phoneticians). At KI, I experienced that I could contribute with my background in computational linguistics as a new perspective, but to a research core that lied closer to my own interests. I thrive here, where I can focus on research centered around clinical relevance, and to increase our understanding of if, how and why certain intervention methods work, and others don’t. 

Regarding the ongoing Teflon project:

What is your role in the project?  

I lead the part of TEFLON that focuses on trying out the speech training app in a clinical population. 

What type of joys and challenges have you faced during the project? 

Up until now, I’ve been working alone at my site on the project. Although we’ve had regular meetings in the bigger group, it hasn’t always been easy to manage the KI part of the project single-handedly. Therefore, I’m thrilled to have a project assistant, Magdalena Pettersson, joining me now in October! 

Has the project raised any new questions for future research? 

As we are still in the beginning of the project, I can’t say it has. (But I might have more to say about it in 2024!)

Aside from Teflon, are you involved in other research projects at the moment?  

Yes, I run a project called SPETS (http://ki.se/spets). In SPETS, we explore and compare different types of intervention for preschool-aged children with Developmental Language Disorder (DLD). It’s a longitudinal study, following children with DLD for 2 years, to explore what intervention they receive, and how their language skills, communicative participation, and quality of life change over time.

Lastly, as you have been very active in the field of research and must have a lot of other work besides it, we’d like to know what is your way to cool down from work.

Do you have any non-work-related passion projects, hobbies, or goals to achieve? 

Although I try to guard my limits, research has its way of intruding on one’s free time. And indeed, I also let it, and often enjoy it! But that said, cooling down and doing other things is important. And there are a lot of things I enjoy doing to relax! Spending time with my family and friends is of course one and seeking wild-life experiences is another. I love to go hiking and cross-country skiing in the Norwegian mountains, and I run a lot, and most of all enjoy running in new places.

We wish Strömbergsson all the success for the project and future research. In addition, we have listed below some research Strömbergsson has been part of and also the PhD from 2014.

Recent research:

Simulating Speech Error Patterns Across Languages and Different Datasets

Strömbergsson S, Götze J, Edlund J, Nilsson Björkenstam K 

Language and speech 2022;65(1):105-142

A survey of Swedish speech-language pathologists’ practices regarding assessment of speech sound disorders

Wikse Barrow C, Körner K, Strömbergsson S 

Logopedics, phoniatrics, vocology 2021;():1-12

Canonical babbling ratio-Concurrent and predictive evaluation of the 0.15 criterion

Nyman A, Strombergsson S, Lohmander A 

Journal of communication disorders 2021;:106164-

Sofia Strömbergsson’s PhD from 2014:

Children’s perception of their synthetically corrected speech production

Strombergsson S, Wengelin A, House D 

Clinical linguistics & phonetics 2014;28(6):373-95

Sources:

https://staff.ki.se/people/sofia-strombergsson

https://nyheter.ki.se/ny-docent-i-logopedi-vid-clintec-0

TEFLON in Tampere

Author: trainee Kamilla Hyytiäinen, Tampere University 

Greetings from Tampere! 

One of the Finnish research partners of the Nordic Teflon project is based at Tampere University. In this blogpost you will get a brief introduction to the city of Tampere, the university and its various fields of science as well as the local research participants of the Teflon project and their academic achievements. 

Nordic cooperation between Sweden and Finland dates back to the Swedish era in the 18th century, when the city of Tampere, among other cities in Finland, was established by King of Sweden Gustav III in 1779, on the bank of The Tammerkoski rapids. Today, Tampere is the third largest city in Finland and the largest inland centre in the Nordic countries with roughly 238 000 inhabitants. Tampere has been an industrial pioneer in Finland since the very beginning and is still the centre of Finnish industry today. Versatile research, education and cooperation between companies and universities have maintained and developed the competitiveness of the region’s industry. 

At Tampere University, multidisciplinary research is conducted across the boundaries of fields of science with over 2,800 researchers. The focal areas of research lie in the fields of health, technology and society. In addition to basic research, new fields of research emerge at the university including e.g. gamification, augmented reality and sustainable cities. Multidisciplinarity is well represented in the Teflon project group: we consist of specialists in the field of logopedics, neuroscience, speech technology, language pedagogy and linguistics. 

Sari Ylinen has been working as an Associate Professor at Tampere University since August 2021 and she is currently leading the Teflon project group in Tampere. Ylinen has comprehensive experience in project management: she has previously coordinated various research projects funded by Academy of Finland, Business Finland and University of Helsinki. In addition to managing research projects, Ylinen has extensive work experience as a neuroscientist for over a decade and she has specialised in brain plasticity in language learning, among other things.  In recent years, she has studied articulation with the help of speech technology and gamification, which she aims to apply to children’s language learning and correcting their pronunciation errors. 

Other Teflon project members in Tampere include a language technologist and research assistants. Anna Smolander has worked in Ylinen’s projects for several years as a language technologist developing digital platforms for various learning difficulties. The Teflon Tampere research team has two research assistants, Kamilla Hyytiäinen and Saara Telinkangas, who are eager to familiarise themselves with how international research projects operate. Telinkangas studies Scandinavian languages and politics and is finishing her Bachelor’s degree while Hyytiäinen is an English and Swedish teacher student in the Master’s programme. The main tasks of the assistants include analysing Swedish speech samples produced by Finnish children in order to develop the speech recognition for language learning applications. 

We are all looking forward to diving deep into the world of digital language learning with our fellow Nordic partners and aim at scientific breakthroughs together! 

Sources: 

https://www.tampere.fi/en/city-of-tampere/information-on-tampere.html

https://www.tuni.fi/en/research/research-tampere-university