Challenges collecting and sharing speech data from children

The research and development that we carry out in the Teflon project relies on collections of speech recordings from foreign children who learn to speak the Nordic languages and from Swedish children with speech sound disorder. This kind of data is unique for several reasons: firstly, no data sets of children speaking Nordic languages are publicly available, as opposed to adult data that is far from scarce. Secondly, we are interested in second language (L2) speakers of Nordic languages, because we want to study how to detect and characterize their mispronunciations. Thirdly, there is no publicly available data of Nordic children with speech sound disorder. Consequently, substantial effort during the project was devoted to collecting such data. This was done both simulating the situation where children play the Pop2Talk game, and, later, recording the game sessions with the students.

The intention when collecting this data was to make all the corpora publicly available, in the spirit of the open science principles. However, we soon realized that the rules defined by the General Data Protection Regulation (GDPR) are interpreted very differently in the different Nordic countries (and, possibly in the other European countries). Those rules have the goal of protecting the privacy of European citizens and refer to any information that may identify each individual. In collections of text, complying with these rules may just mean removing any personal information such as names, telephone numbers, user IDs, or addresses. For speech the situation is more complicated. The main issue regarding speech, is related to the question if the voice is sufficient in order to identify the speaker, and if this, in turns, constitutes sufficient grounds to forbid public sharing of the recordings.

Our experience in Sweden is that different lawyers may interpret the law differently. The lawyers at KI, for example, think it is fine to share recordings of isolated words. But some disagree. When consulting the lawyers at Språkbanken Tal (the main channel for sharing speech data in Sweden), however, their interpretation of the law was much more strict. A similar stricter interpretation is now used in Finland (although publication of speech data from individuals over 16 years has been permitted in the past). In contrast, in Norway, the rules are interpreted to only apply to what is spoken, and not to the voice that speaks. As a result, of the many corpora that we recorded for Finnish, Swedish and Norwegian as target languages, we will only be able to publish the Norwegian corpus.

The different responses to our requests to the ethical committees in the different countries illustrate the complexity of speech data sharing. It is useful to stress that the L2 data sets for Swedish, Norwegian and Finnish were completely equivalent in all respects: age of participants, characteristics of the participants, content of the recordings, anonymization of the participants in the metadata, sharing conditions requested, to name a few. It is also important to point out that the characteristics of child voice, that may allow the identification of an individual, change very quickly with age. This means that the participants will not be identifiable by their voice in just a few months or years after the recordings. We hope that, in the future, uniform interpretations of regulations and practices in handling speech data will be introduced. We believe that the availability of open-access speech corpora with child speech is of essential value for scientific research and speech technology development, and they will bring about great advantages for society.

We have submitted a paper describing the data collected in the Teflon project in more detail to the LREC2024 conference that will take place in Turin, Italy, on May 20-25. Please visit the LREC2024 website for more information. 

NTNU (Teflon) at NNL2P

On the 12th of October 2023, Giampiero Salvi presented the work carried out in Teflon at the Nordic Network for L2 Pronunciation (NNL2P) in Trondheim. The workshop had a number of very interesting keynote speakers that brought long experience and insights in the field of pronunciation assessment. The Teflon presentation was very well received. Please the workshop website for more information.

TEFLON at SLaTE 2023

Speech and Language Technology in Education (SLaTE) is a ‘Special Interest Group’ (SIG) of the ‘International Speech Communication Association’ (ISCA).

After 4 years without meetings the SLaTE met again at Trinity College in Dublin in August 2023 just before the ISCA’s main event INTERSPEECH 2023. TEFLON partners had three presentations in SLaTE and one in its sister workshop SIGUL 2023 that was run parallel to it in another nearby meeting room.

In SLaTE 2023:

In SIGUL 2023:

Second face-to-face Teflon meeting in Trondheim, June 2023

On the 20th-21st of June, 12 researchers from Aalto University, Tampere University, Karolinska Institutet, University of Oslo and NTNU gathered at the Campus of NTNU for the second face-to-face meeting of the Teflon project.

As for the first physical meeting, we had two full days including discussion sessions about the data, evaluations, automatic speech recognition, game design, automatic and human pronunciation assessment, experiment design, publications, dissemination and project management. The game was already in a mature state of development, so much of the discussion was devoted to planning the experiments with school children in the three countries. On Tuesday evening we continued after the meeting to have dinner in downtown Trondheim and enjoy the good company and delicious food in the restaurant To Rum og Kjokken where we had a room all to ourselves. We also took a long walk through the city in the light of the night sun.

The next steps in the project include running the experiments with the children, collecting new data from the experiments and analyzing it.