Third face-to-face TEFLON meeting in Stockholm, March 2024

Our third face-to-face meeting within the TEFLON project took place in Stockholm, March 20-22. Sofia hosted the meeting at Karolinska Institutet, and the research team got to try out both the Huddinge and the Solna campus of the university.

This time, a total of 13 participants from all project partners – Aalto University, Tampere University, Oslo University, KI and NTNU – were present. The topics for discussion covered updates of learning experiments in Finland, Norway and Sweden, a shared view on user experience from the different experiments, and ideas regarding the use of the collected data. At this stage, most of the data have been collected, and the teams at KI, Tampere and Oslo are now working to process the data and prepare for analysis. This involves a lot of manual perceptual assessment, that will be used to examine potential learning effects for the users of the app. In addition, the manual assessments will be used for re-training the acoustic models that underlie the automatic rating in the app, which we hope will improve the quality of automatic ratings for future users.

In between discussions, we had time for food and drinks.

Another engaging topic during the meeting was ideas for future collaboration, both concerning coming shared publications, and for potential continuation projects. It turns out that we share interests that we were not aware of before! Hopefully, these ideas can turn into future projects!

 Listening evaluations

How does a ”correct” utterance sound?

Written by: Sofia Strömberggson, Karolinska Institut

A picture containing text, slot machine

Description automatically generated

An important feature of the speech training app developed in TEFLON is to provide children with feedback concerning their speech production. This feedback will be presented as stars, with 5 stars representing “correct” pronunciation, and the fewer stars, the further away from “correct”. But how does the app know how many stars to present for a given utterance? In fact, this is something that the app (or rather, the acoustic model within the app) has to learn from how humans have evaluated children’s utterances.

But even for humans, the task of evaluating the correctness of children’s utterances is not trivial. For one, there are very many ways for utterances to be “correct”. (And that is a good thing when it comes to human perception in daily life! It means that we can “tolerate” a lot of variation in speech production without communication being too easily disrupted.) But also, there are even more ways that utterances can be “incorrect”. For example, is the utterance still intelligible as the intended word? And for intelligible utterances – are one or more speech sounds affected? Are different types of speech errors more severe than others? In TEFLON, the different research teams have tackled this challenge in slightly different ways.

At Karolinska Institutet, the human evaluators have used the same 1-5 scale that will be used in the app. In an effort to ensure consistency in the evaluations, the following rating key was specified: 

  1. not at all identifiable as the target word
  2. not identifiable as the target word
  3. slight phonemic error (e.g., the target word “kollision” is pronounced “kolliton”)
  4. subphonemic error/”unexpected variant” (e.g., the /r/-sound in “ros” is not quite produced as you’d expect)
  5. prototypical/adult-like/correct

But even with this rating key, the evaluation decisions are not always easy. The researchers are currently running listening experiments with more listeners – experts (in this context: speech-language pathologists) and non-experts – to explore listener behaviors more systematically. For this task, the listeners are instructed to rate the utterances from 1 to 5 as they think the app should rate the utterance (i.e., without being provided with a rating key). Through these experiments, the researchers hope to learn more concerning whether expert and non-experts differ in their ratings, and whether listener ratings are more consistent for listeners who have access to “reference samples” when conducting their evaluations. The researchers aim to present their findings later during 2023.