Non-native Children’s Automatic Speech Assessment Challenge (NOCASA)

For the MLSP special session Automatic Assessment of Atypical Speech (AAAS), also organised by us in MLSP 2025, see https://teflon.aalto.fi/mlsp-aaas-2025/

Brief introduction of the challenge

Learning the pronunciation of foreign or second language (L2) requires a lot of practise and accurate feedback. Mobile apps that have automatic pronunciation assessment (APA) technology let learners practise pronunciation at their own time, place and pace. 

Challenges: The developing and implementing APA has several challenges. First and worst of all is the lack of speech data for L2 learners that would be annotated for pronunciation accuracy. This is particularly the case of children and the learners of low-resource target languages. Second, if such data were available, it is usually heavily unbalanced for different skill levels and the provided reference scores suffer from noise and inter-annotator disagreement. Finally, to make a useful app, the scoring has to happen in seconds to provide real-time feedback with minimal delay to encourage a lot of repetitions.   

Data: Recordings of children of 5 – 12 years repeating the words in Norwegian that were played to them. The children include L1 speakers, beginner learners of L2 Norwegian and children who had no previous exposure to Norwegian. For each word in the data we provide the correct orthographic transcription and the speech accuracy assessment score of 1 – 5 given by human experts. For more details, see [1].

Task: Develop an automatic speech assessment system and use it to predict the score of each utterance in the given test data.

Please contact mikko.kurimo@aalto.fi for more information.

To get started:

Please fill out the PDF for the dataset electronically, print, sign, scan, and email it to the organizers: [mikko.kurimo@aalto.fi, giampiero.salvi@ntnu.no, and tamas.grosz@aalto.fi]. The agreement must be signed by a permanent staff member.

Data sets to be used, evaluation system, baseline systems

Dataset: https://zenodo.org/records/14018511 (access granted after the EULA is filled, signed and sent to the organizers)

Submission system link: https://www.codabench.org/competitions/6965/?secret_key=1be3fbdf-8fc9-4376-bda0-947bc9d32054

Baselines: See them in https://github.com/aalto-speech/nocasa-baselines/

Rules for participation

Participants must adhere to the predefined training and test splits as given. Additional training data can be used, but only if it is clearly mentioned in their report and the data is publicly available. The developed solutions could be tested via a maximum of five trials by uploading the model predictions on the test set, whose labels are unknown to the participants. Each participation has to be accompanied by an article presenting the results that undergoes a standard double-blind peer-review process, and only authors of accepted papers will be added to the final leaderboard.
Contributors can find their own features and use their own machine learning algorithm. However, to enable faster development, a standard feature set for traditional ML models (SVM) and a state-of-the-art end-to-end solution (wav2vec 2.0) will be provided.
The organisers reserve the right to re-evaluate the findings but will not participate in the Challenge themselves. 
We encourage contributions aiming at the highest performance, surpassing our baselines, and contributions aiming at finding new and interesting insights w.r.t. these data.

The results of the Challenge will be presented at the annual IEEE MLSP conference.

Paper templates could be found at https://2025.ieeemlsp.org/en/PAPER-SUBMISSION-GUIDELINES.html

Relevant dates

All deadlines are: 23:59:59 AoE

  • Team registration starts: March 19.
  • Training and test data released: March 19 (see above).
  • Baseline codes and models released: March 28.
  • Evaluation system opens: April 2.
  • Baseline paper released: April 27.
  • Paper submission deadline: May 20.
  • Evaluation system closes, and the leaderboard is public: May 20.
  • Post-Challenge evaluation phase: from May 20 until July 15.
  • Notification of paper acceptance: June 24.
  • Camera-ready upload July 15.

References

[1] Anne Marte Haug Olstad, Anna Smolander, Sofia Strömbergsson, Sari Ylinen, Minna Lehtonen, Mikko Kurimo, Yaroslav Getman, Tamás Grósz, Xinwei Cao, Torbjørn Svendsen, and Giampiero Salvi. 2024. Collecting Linguistic Resources for Assessing Children’s Pronunciation of Nordic Languages. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3529–3537, Torino, Italia. ELRA and ICCL.