Interspeech 2023

Hi, my name is Xinwei. I am a second year Phd student within the TEFLON project. I was very happy and excited about attending the INTERSPEECH 2023 from 20.August to 24.August  in Dublin.

It was my first time joining such an international and well-known conference. In this conference I met a lot of senior researchers from all over the world in the fields of speech and signal processing that includes my supervisor Prof. Torbjørn Karl Svendsen. I also met Prof. Mikko Kurimo from the university of Aalto and many of the other members in our project during this conference. 

There were so many interesting ideas and techniques being proposed this year. I just wished to clone several of myself in order to be the audience of several parallel presentations,  instead of running from one room to another all over the building. I also noticed many inspirational posters that inspired me  for my research fields. In addition, I also encountered some of the posters with which I was not familiar with, e.g., speech synthesis and speech enhancement.  It was really amazing to talk to those presenters and grasp the main concepts and problems of those fields in half an hour’s time.   

I also presented our work with the title “An Analysis of Goodness of Pronunciation for Child Speech”. Thanks to many questions and advice from the audiences, I am now having a clearer picture of my future research work.

Really amazing conference. Can’t wait to meet all of you next year!

The Pop2Talk Nordic game in autumn 2023

Hi, I’m Nhan Phan, a PhD student at Aalto University. I’m part of the Teflon project, and my role is to maintain the Pop2Talk Nordic mobile application. Pop2Talk is a language learning game that allows children to imitate words that were said during the game progress, and get a rating of their pronunciation. Based on the original game version intended for English, the Pop2Talk Nordic version expanded to Nordic languages. As I write this blog post, we have successfully developed four different versions of the game. These include:

  • A Finnish version for children who find it challenging to pronounce the phoneme “R”.
  • A Swedish version for children with Speech Sound Disorder.
  • And two other versions for children who want to learn Norwegian or English.

While much of the project’s foundational work was led by experienced partners in children’s speech-language pathology and in automatic speech recognition technology, I was fortunate to be involved in the project, being responsible for the design and maintenance of both the server and the mobile game.

With our need for illustrations in multiple languages, we turned to the latest AI technology for automatic image generation. I provided our partners with technical instructions on crafting text prompts to ensure the resulting images were both captivating and child-friendly. To my surprise, the images they produced were outstanding, surpassing all my expectations. Check out this cool picture from Magdalena at Karolinska Institutet. Funny thing, the girl is the spitting image of my 2-year-old daughter. I had to ask right away to use it as my daughter’s avatar.

When I showed the game to my daughter, she was super into it—even though she’s just starting to learn Finnish. She got so hooked that I had to yank the phone from her! I really hope other kids get just as excited when they play (and fingers crossed, they’ll want to take a break after a while).

As we move into October, we’re preparing for numerous experiments. These will study how children acquire vocabulary and pronunciation, and evaluate the game’s effectiveness in helping them. Those experiments will be conducted in various schools across Finland, Sweden and Norway. My primary focus at the moment is to ensure the server runs without any serious problems.

 Listening evaluations

How does a ”correct” utterance sound?

Written by: Sofia Strömberggson, Karolinska Institut

A picture containing text, slot machine

Description automatically generated

An important feature of the speech training app developed in TEFLON is to provide children with feedback concerning their speech production. This feedback will be presented as stars, with 5 stars representing “correct” pronunciation, and the fewer stars, the further away from “correct”. But how does the app know how many stars to present for a given utterance? In fact, this is something that the app (or rather, the acoustic model within the app) has to learn from how humans have evaluated children’s utterances.

But even for humans, the task of evaluating the correctness of children’s utterances is not trivial. For one, there are very many ways for utterances to be “correct”. (And that is a good thing when it comes to human perception in daily life! It means that we can “tolerate” a lot of variation in speech production without communication being too easily disrupted.) But also, there are even more ways that utterances can be “incorrect”. For example, is the utterance still intelligible as the intended word? And for intelligible utterances – are one or more speech sounds affected? Are different types of speech errors more severe than others? In TEFLON, the different research teams have tackled this challenge in slightly different ways.

At Karolinska Institutet, the human evaluators have used the same 1-5 scale that will be used in the app. In an effort to ensure consistency in the evaluations, the following rating key was specified: 

  1. not at all identifiable as the target word
  2. not identifiable as the target word
  3. slight phonemic error (e.g., the target word “kollision” is pronounced “kolliton”)
  4. subphonemic error/”unexpected variant” (e.g., the /r/-sound in “ros” is not quite produced as you’d expect)
  5. prototypical/adult-like/correct

But even with this rating key, the evaluation decisions are not always easy. The researchers are currently running listening experiments with more listeners – experts (in this context: speech-language pathologists) and non-experts – to explore listener behaviors more systematically. For this task, the listeners are instructed to rate the utterances from 1 to 5 as they think the app should rate the utterance (i.e., without being provided with a rating key). Through these experiments, the researchers hope to learn more concerning whether expert and non-experts differ in their ratings, and whether listener ratings are more consistent for listeners who have access to “reference samples” when conducting their evaluations. The researchers aim to present their findings later during 2023.

Aalto’s team participates in the ACM Multimedia 2022 Computational Paralinguistics ChallengE

The ACM Multimedia 2022 Computational Paralinguistics ChallengE (ComParE) is an open Grand Challenge dealing with states and traits of speakers as manifested in their speech signal’s properties and beyond. At the start of the competition, the data is provided by the organizers, and the Sub-Challenges are generally open for participation.

This year our team, consisting of PhD students (Yaroslav Getman and Dejan Porjazovski), Research Fellows (Tamás Grósz and Sudarsana Reddy Kadiri), and Professor Mikko Kurimo, embarked on tackling two Sub-Challenges: the Vocalisations and the Stuttering one. 

In the Stuttering Sub-Challenge, participants were tasked to develop a system that can recognize different kinds of stuttering (e.g. word/phrase repetition, prolongation, sound repetition and others). Stuttering is a complex speech disorder with a crude prevalence of about 1 % of the population. Monitoring of stuttering would allow objective feedback to persons who stutter (PWS) and speech therapists, thus facilitating tailored speech therapy, with the automatic detection of different stuttering phenomena as a necessary prerequisite. As training data, we could use the Kassel State of Fluency corpus containing approximately 5600 annotated samples. 

In the Vocalisations Sub-Challenge, non-verbal vocal expressions (such as laughter, cries, moans, and screams) from the Variably Intense Vocalizations of Affect and Emotion Corpus are used for classifying the expression of six different emotions. Such human non-verbals are still understudied but are ubiquitous in human communication. This task was extremely challenging because the training data contained only female voices, while the developed systems were evaluated on male sounds.

Our team developed solutions for both tasks using state-of-the-art models like wav2vec 2.0, data augmentation and other simple tricks based on the distributed training data. For technical details, see our paper:

Tamás Grósz, Dejan Porjazovski, Yaroslav Getman, Sudarsana Kadiri, and Mikko Kurimo. 2022. Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 7026–7029.

In total, 23 teams from all around the world registered for the competition, of which 8 submitted solutions for the Stuttering, and 11 for the Vocalisations Sub-Challenge. 

Aalto’s team won both competitions, earning two spaces in the hall of fame:

Teflon team presents at Interspeech

The 23rd INTERSPEECH Conference took place from September 18 to 22, 2022, at Songdo ConvensiA, in Incheon, Korea, under the theme Human and Humanizing Speech Technology. INTERSPEECH is the world’s largest and most comprehensive conference on the science and technology of spoken language processing. INTERSPEECH conferences emphasize interdisciplinary approaches addressing all aspects of speech science and technology, ranging from basic theory to advanced applications.

Truly a city of the future, Songdo sits adjacent to Seoul, regarded as one of the technology capitals of the world. The city’s underground railway already offers high-speed WiFi, with electronic panels at the exits and provides the waiting time for connecting to buses or trains, while companies like Samsung Electronics are already working on linking household devices to mobile phones. On the technological front, Songdo is a brand-new city that offers the chance to integrate innovation into daily life truly.

This year, the Teflon team submitted a paper titled “wav2vec2-based Speech Rating System for Children with Speech Sound Disorder” to Interspeech. The article described our initial systems developed using Sofia Strömbergsson’s corpus of children suffering from speech sound disorder. Speech therapies, which could aid these children in speech acquisition, greatly rely on speech practice trials and accurate feedback about their pronunciations. Our solutions could be the basis for software tools that would enable home therapy and lessen the burden on speech-language pathologists. Our submission was accepted with very positive reviews and selected for a poster presentation.

We (Tamás & Mikko) presented our poster on Wednesday, September 21, 13:30-15:30(KST). We were lucky enough to be placed right in front of the main entrance, resulting in many people stopping at our stand to check the poster.

We had several very intriguing conversations and gained some valuable ideas and suggestions from our colleagues, which we will explore in the future. After a fruitful poster session, we let some steam off during the gala banquet, where we had the chance to sample Korean cuisine and listen to some authentic K-POP music.


Getman, Y., Al-Ghezi, R., Voskoboinik, K., Grósz, T., Kurimo, M., Salvi, G., Svendsen, T., Strömbergsson, S. (2022) wav2vec2-based Speech Rating System for Children with Speech Sound Disorder. Proc. Interspeech 2022, 3618-3622, doi: 10.21437/Interspeech.2022-10103

The first face-to-face Teflon meeting in Helsinki, September 2022

In 5-6 September, 15 researchers of Aalto University, Tampere University, Karolinska Institutet, University of Oslo and NTNU (Trondheim) gathered in the Campus of Aalto University for the first face-to-face meeting of the Teflon project. The project has been running already for almost 1.5 years, but due to the pandemic, our kick-off and all other meetings have been only virtual. Actually 4 of us still had to participate remotely due to sudden Covid-19 cases in NTNU’s team etc, but for the rest this was a really delightful experience to meet and have in-depth discussions of the project, science, technology and everything else.

We had two full days including discussion sessions about the data, evaluations, automatic speech recognition, game design, automatic and human pronunciation assessment, experiment design, publications, dissemination and project management. Because we were still in the early stages of building the children’s pronunciation game, collecting and annotating the data and training the automatic assessment, the focus was clearly on planning the next steps of the project.  On Monday evening we continued after the project to have dinner in downtown Helsinki and enjoy the good company and delicious food in the restaurant Emo.

The next steps in the project include finishing the game codes, developing multitask systems and faster speech processing servers, repeating the previously run tests on the new Finnish, Swedish and Norwegian data, finishing the human assessments for these data, fix word lists and other specs for the game for each language, and recruiting speakers for the remaining training data.

Greetings from Aalto University, Otaniemi!

Mikko Kurimo is the coordinator of Teflon and leads the research group at Aalto University. Prof. Kurimo has been the head of the automatic speech recognition (ASR) group at Aalto since 2000. He has led the group in many national and international machine learning and ASR projects. Kurimo’s work is internationally best known for unsupervised subword language modeling for morphologically complex languages such as Finnish, Estonian, Turkish and Arabic. His research interests include deep learning methods for automatic speech recognition and spoken language modeling.

Aalto’s team is in charge of the ASR technology that is required in Teflon and will also provide the game platform for Teflon. The platform is based on the previous projects of Mikko Kurimo and Sari Ylinen where English was the target language. Now the goal in Teflon is to apply it to Finnish, Swedish and Norwegian.

The Aalto team consists of:

  • Tamas Grósz is a post-doctoral researcher whose specification area is ASR and computational paralinguistics. His background lies in developing machine learning based ASR systems. 
  • Ragheb Al-Ghezi is a doctoral researcher whose main research focus lies in applying ASR and machine learning in language assessment. He has a versatile background in natural language processing and second language (L2) learning.
  • Ekaterina Voskoboinik is a doctoral researcher whose main research focus lies in applying statistical language modeling and machine learning in ASR and spoken L2 assessment. Her background is on learning and analysing the representations of words and subwords for morphologically rich languages.
  • Yaroslav Getman is a doctoral researcher whose main research focus lies in applying self-supervised learning and pre-trained models for ASR in low-resource tasks like L2 learners’ ASR for spoken language assessment.
  • Aku Rouhe is a doctoral researcher whose main research focus is in ASR. He has a versatile background speech and spoken language modeling including speaker adaptation, voice activity detection, subword language modeling using statistical morphemes, speech translation and decoding algorithms and miscue tolerant ASR in L2 reading tutoring.
  • Nhan Phan is a master’s student whose research topic is developing a mobile app to give feedback for beginner level L2 learners’ in read aloud tasks. His skills also include game programming using Unity and his task is to modify the game platform for Teflon tasks.

We are excited and looking forward to the experiment with the Nordic languages.

Aalto University is the coordinating partner of the Teflon project. Aalto University describes itself as a meeting place for the fields of science, art, technology, and business. Today the university is one of the most prestigious ones in Finland. The university started operating in 2010 and the goal was to merge together the old Helsinki School of Economics, Helsinki University of Technology and the University of Art and Design Helsinki and create a new multidisciplinary university. The Aalto university campus in Otaniemi is situated in Greater Helsinki in the city of Espoo, just a metro ride away from central Helsinki. 

The campus was originally built almost in the middle of the forest in Otaniemi. The rebuilding after the Second World War in Finland led to a need to educate more engineers – and hence to a need to build a new campus area and laboratory spaces for the Helsinki University of Technology and VTT Technical Research Centre of Finland Ltd. The Otaniemi campus is a park campus design from the 1950s and the city plan of the area is the work of the Finnish architect Alvar Aalto who has along with other esteemed Finnish architects such as Reima and Raili Pietilä and Heikki and Kaija Sirén also designed multiple buildings in the campus. The name of the university is a tribute to Alvar Aalto who himself graduated from the former Helsinki University of Technology.

List of recent publications: 

  • Anssi Moisio, Dejan Porjazovski, Aku Rouhe, Yaroslav Getman, Anja Virkkunen, Ragheb AlGhezi, Mietta Lennes, Tamás Grósz, Krister Lindén and Mikko Kurimo. Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks. Language Resources & Evaluation (2022).
  • Kathania, Hemant; Kadiri, Sudarsana; Kadyan, Virender; Kurimo, Mikko. Data augmentation using spectral warping for low resource children ASR. Journal of Signal Processing Systems. Accepted for publication.
  • Kathania, H , Kadiri, S , Alku, P & Kurimo, M. A Formant Modification Method for Improved ASR of Children’s Speech. Speech Communication (2022), vol. 136 , pp. 98 – 106.
  • Kathania, Hemant; Kadiri, Sudarsana; Alku, Paavo; Kurimo, Mikko. Using data augmentation and time-scale modification to improve ASR of children’s speech in noisy environments. Applied Sciences (Springer International Publishing AG), 2021.
  • Ylinen Sari, Smolander Anna-Riikka, Karhila Reima, Kakouros Sofoklis, Lipsanen Jari, Huotilainen Minna, Kurimo Mikko. The Effects of a Digital Articulatory Game on the Ability to Perceive Speech-Sound Contrasts in Another Language. Frontiers in Education, 2021 ; Vol 6.
  • Ragheb Al-Ghezi, Yaroslav Getman, Ekaterina Voskoboinik, Mittul Singh, Mikko Kurimo. Automatic Rating of Spontaneous Speech for Low-Resource Languages. IEEE Spoken Language Technology Workshop (SLT 2022), 2022.
  • Grósz, T., Porjazovski, D., Getman, Y., Kadiri, S., Kurimo, M. Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering. ACM Multimedia 2022 Conference : Grand Challenges.
  • Yaroslav Getman, Ragheb Al-Ghezi, Katja Voskoboinik, Tamás Grósz, Mikko Kurimo, Giampiero Salvi, Torbjørn Svendsen, and Sofia Strömbergsson. Wav2vec2-based Speech Rating System for Children with Speech Sound Disorder. Proc. Interspeech 2022. 
  • Ragheb Al-Ghezi, Yaroslav Getman, Aku Rouhe, Raili Hildén and Mikko Kurimo. Self-supervised End-to-End ASR for Low Resource L2 Swedish. Proc. Interspeech 2021.

Photo: Unto Rautio

Researchers of Teflon project: Sofia Strömbergsson

Author: trainee Elina Aittokallio, Tampere University

Greetings from Tampere!In the following blogpost we will briefly introduce one of the researchers working on the Teflon project, Sofia Strömbergsson from Karolinska Institutet (KI), the largest center of medical academic research university in Sweden.

Kuvan esikatselu

Strömbergsson is an associate professor of Karolinska Institutet. She is a teacher and conductor of research concerning children’s speech and language disorders. Strömbergsson earned her doctoral degree in April 2014 at the Royal Institute of Technology (KTH), Stockholm. Strömbergsson also has a medical science degree in SLP (Lund 2007) and in computational linguistics (Uppsala, 2000). 

We’ve had the honor to hold an interview on Sofia’s job as a researcher:

What sparked your interest into the field of logopedics? How did your career path form?

I’ve always been interested in language, and particularly in its spoken form – different ways of speaking, and how that’s perceived by listeners. I first started out in the field of speech technology, and I worked a few years in the industry developing text-to-speech for Swedish. I really enjoyed the task of trying to “teach” the computer how to read. After some time, though, I wanted to make use of my interest in speech and language in ways that would benefit others more directly. That’s when I decided to embark on a study program in logopedics. When I finished, an opportunity opened up for pursuing PhD studies at KTH, where I had the chance to combine my two interests into a thesis project on how children perceive automatically “corrected” versions of their own recorded misarticulation. In 2014, after I finished, I joined the Division of Speech-Language Pathology at KI, where I still am today.  

How do you find Karolinska Institutet as a place to work and do research at? 

When I joined KI and the Division of Speech and Language Pathology in 2014, it felt a little bit like coming home. Although I had really enjoyed my time at KTH, I had been a bit of an odd bird, being the only SLP among engineers (and, admittedly, also some phoneticians). At KI, I experienced that I could contribute with my background in computational linguistics as a new perspective, but to a research core that lied closer to my own interests. I thrive here, where I can focus on research centered around clinical relevance, and to increase our understanding of if, how and why certain intervention methods work, and others don’t. 

Regarding the ongoing Teflon project:

What is your role in the project?  

I lead the part of TEFLON that focuses on trying out the speech training app in a clinical population. 

What type of joys and challenges have you faced during the project? 

Up until now, I’ve been working alone at my site on the project. Although we’ve had regular meetings in the bigger group, it hasn’t always been easy to manage the KI part of the project single-handedly. Therefore, I’m thrilled to have a project assistant, Magdalena Pettersson, joining me now in October! 

Has the project raised any new questions for future research? 

As we are still in the beginning of the project, I can’t say it has. (But I might have more to say about it in 2024!)

Aside from Teflon, are you involved in other research projects at the moment?  

Yes, I run a project called SPETS ( In SPETS, we explore and compare different types of intervention for preschool-aged children with Developmental Language Disorder (DLD). It’s a longitudinal study, following children with DLD for 2 years, to explore what intervention they receive, and how their language skills, communicative participation, and quality of life change over time.

Lastly, as you have been very active in the field of research and must have a lot of other work besides it, we’d like to know what is your way to cool down from work.

Do you have any non-work-related passion projects, hobbies, or goals to achieve? 

Although I try to guard my limits, research has its way of intruding on one’s free time. And indeed, I also let it, and often enjoy it! But that said, cooling down and doing other things is important. And there are a lot of things I enjoy doing to relax! Spending time with my family and friends is of course one and seeking wild-life experiences is another. I love to go hiking and cross-country skiing in the Norwegian mountains, and I run a lot, and most of all enjoy running in new places.

We wish Strömbergsson all the success for the project and future research. In addition, we have listed below some research Strömbergsson has been part of and also the PhD from 2014.

Recent research:

Simulating Speech Error Patterns Across Languages and Different Datasets

Strömbergsson S, Götze J, Edlund J, Nilsson Björkenstam K 

Language and speech 2022;65(1):105-142

A survey of Swedish speech-language pathologists’ practices regarding assessment of speech sound disorders

Wikse Barrow C, Körner K, Strömbergsson S 

Logopedics, phoniatrics, vocology 2021;():1-12

Canonical babbling ratio-Concurrent and predictive evaluation of the 0.15 criterion

Nyman A, Strombergsson S, Lohmander A 

Journal of communication disorders 2021;:106164-

Sofia Strömbergsson’s PhD from 2014:

Children’s perception of their synthetically corrected speech production

Strombergsson S, Wengelin A, House D 

Clinical linguistics & phonetics 2014;28(6):373-95


TEFLON in Tampere

Author: trainee Kamilla Hyytiäinen, Tampere University 

Greetings from Tampere! 

One of the Finnish research partners of the Nordic Teflon project is based at Tampere University. In this blogpost you will get a brief introduction to the city of Tampere, the university and its various fields of science as well as the local research participants of the Teflon project and their academic achievements. 

Nordic cooperation between Sweden and Finland dates back to the Swedish era in the 18th century, when the city of Tampere, among other cities in Finland, was established by King of Sweden Gustav III in 1779, on the bank of The Tammerkoski rapids. Today, Tampere is the third largest city in Finland and the largest inland centre in the Nordic countries with roughly 238 000 inhabitants. Tampere has been an industrial pioneer in Finland since the very beginning and is still the centre of Finnish industry today. Versatile research, education and cooperation between companies and universities have maintained and developed the competitiveness of the region’s industry. 

At Tampere University, multidisciplinary research is conducted across the boundaries of fields of science with over 2,800 researchers. The focal areas of research lie in the fields of health, technology and society. In addition to basic research, new fields of research emerge at the university including e.g. gamification, augmented reality and sustainable cities. Multidisciplinarity is well represented in the Teflon project group: we consist of specialists in the field of logopedics, neuroscience, speech technology, language pedagogy and linguistics. 

Sari Ylinen has been working as an Associate Professor at Tampere University since August 2021 and she is currently leading the Teflon project group in Tampere. Ylinen has comprehensive experience in project management: she has previously coordinated various research projects funded by Academy of Finland, Business Finland and University of Helsinki. In addition to managing research projects, Ylinen has extensive work experience as a neuroscientist for over a decade and she has specialised in brain plasticity in language learning, among other things.  In recent years, she has studied articulation with the help of speech technology and gamification, which she aims to apply to children’s language learning and correcting their pronunciation errors. 

Other Teflon project members in Tampere include a language technologist and research assistants. Anna Smolander has worked in Ylinen’s projects for several years as a language technologist developing digital platforms for various learning difficulties. The Teflon Tampere research team has two research assistants, Kamilla Hyytiäinen and Saara Telinkangas, who are eager to familiarise themselves with how international research projects operate. Telinkangas studies Scandinavian languages and politics and is finishing her Bachelor’s degree while Hyytiäinen is an English and Swedish teacher student in the Master’s programme. The main tasks of the assistants include analysing Swedish speech samples produced by Finnish children in order to develop the speech recognition for language learning applications. 

We are all looking forward to diving deep into the world of digital language learning with our fellow Nordic partners and aim at scientific breakthroughs together!