I work in the Speech and Natural Language Technologies department at Vicomtech, specializing in Natural Language Generation and Machine Translation, while pursuing my PhD. My research bridges computational linguistics and sustainability. I focus on optimizing language model training to enhance performance for low-resource languages while reducing energy consumption, contributing to the broader Green AI initiative. My work aims to make NLP technologies accessible, equitable, and environmentally sustainable.

🔥 News

2024.11: 🎉🎉 Published “Automating Easy Read Text Segmentation” at EMNLP 2024 Findings (Read Paper).
2024.08: 🎉🎉 Published “Split and Rephrase with Large Language Models” at ACL 2024 (Read Paper).
2023.07: 🎉🎉 Published “Unsupervised Subtitle Segmentation with Masked Language Models” at ACL 2023 (Read Paper).

📖 Education

Ph.D. in Natural Language Processing (2022–Present)
University of the Basque Country (UPV/EHU)
- Thesis: Optimization, Adaptation, and Applications of Large Language Models
M.Sc. in Artificial Intelligence (2019–2020)
University of Leeds
- Graduated with Distinction
B.Sc. in Computer Science (2015–2019)
University of Malaga

🏢 Employment History

Vicomtech – Researcher in Speech and Natural Language Technologies

San Sebastián, Spain
2020 – Present

Conduct research and development in Natural Language Generation, Machine Translation, and Text Simplification for underrepresented languages.
Lead and contribute to cutting-edge projects, including ADAGIO, ADAPT-IA, and IRAZ, focusing on optimizing language models and advancing Green AI methodologies.
Collaborate on national and international initiatives to improve accessibility and sustainability in NLP.
Supervise and mentor interns and junior researchers in the Speech and Natural Language Technologies department.

🛠 Skills

Research Skills

Large Language Models Optimization
Neural Machine Translation (NMT)
Text Simplification
Machine Translation Quality Estimation
Green AI methodologies for sustainable model training

Technical Skills

Programming Languages: Python, Golang, C++, TypeScript, HTML, CSS, SQL
Frameworks & Tools: PyTorch, llama.cpp, LlamaIndex, MarianNMT, COMET, FastAPI, Angular
Specialized Tools: Energy usage profiling tools (e.g., CodeCarbon), lightweight deployment frameworks

Languages

Spanish: Native
English: Full Professional Proficiency
Basque: Professional Proficiency

📝 Publications

2024

Split and Rephrase with Large Language Models
David Ponce, Thierry Etchegoyhen, Jesús Calleja, Harritxu Gete
ACL 2024 (Paper)
Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain
David Ponce, Harritxu Gete, Thierry Etchegoyhen
WMT 2024 (Paper)
Automating Easy Read Text Segmentation
Jesús Calleja, Thierry Etchegoyhen, Antonio David Ponce Martínez
EMNLP 2024 Findings (Paper)

2023

Unsupervised Subtitle Segmentation with Masked Language Models
David Ponce, Thierry Etchegoyhen, Víctor Ruiz
ACL 2023 (Paper)
Learning from Past Mistakes: Quality Estimation from Monolingual Corpora and Machine Translation Learning Stages
Thierry Etchegoyhen, David Ponce
MT Summit XIX (Paper)
IRAZ: Easy-to-Read Content Generation via Automated Text Simplification
Thierry Etchegoyhen, Jesús Calleja, David Ponce
SEPLN 2023 (Paper)

2022

TANDO: A Corpus for Document-level Machine Translation
Harritxu Gete, Thierry Etchegoyhen, David Ponce, et al.
LREC 2022 (Paper)

2021

Online Learning over Time in Adaptive Neural Machine Translation
Thierry Etchegoyhen, David Ponce, Harritxu Gete, Víctor Ruiz
RANLP 2021 (Paper)
ITAI: Adaptive Neural Machine Translation Platform
Thierry Etchegoyhen, David Ponce, Harritxu Gete, Víctor Ruiz
SEPLN 2021 (Paper)

📂 Projects

2024

BIKAIN
Research and development of a high-reliability automatic quality estimation system to identify translation errors at the sentence, word, terminology, and context levels in an integrated manner.
IKUN
Research and development of Large Multimodal Models for industrial domain adaptation, facilitating the generation of synthetic images and time series to enhance quality assurance processes. Includes multimodal conversational interfaces for industrial dashboards and knowledge bases.
LiveAI
Development of AI services for accessibility and audiovisual translation, enabling real-time transcription, translation, and spoken interpretation (dubbing) of live events.

2023

ADAGIO
Research and development of a system for automatic text generation adaptable to specific domains using Artificial Intelligence technologies.
ADAPT-IA
Research and development of adaptive AI technologies and MLOps applied to Basque language technologies, focusing on industrial integration, continuous deployment of neural models, and exploring methodologies to optimize maintenance and adaptation.
IACODE
Development of standardized code generation from existing code following MISRA technical programming guidelines, automating the process using generative AI models specialized in code generation.

2022

LIDO
Research and development of a system for the optimization of multilingual linguistic data using Artificial Intelligence technologies.
IRAZ
Development of an easy-to-read solution through automated text simplification, aimed at improving accessibility for people with reading difficulties.

2021

STREAMS
Development of a cloud-based platform integrating AI-powered transcription, translation, automatic subtitling, and speech synthesis services in multiple languages (Basque, Spanish, French, and English). The platform enhances business processes across various sectors.
IKA
Design, development, validation, and integration of an automatic translation quality estimation system to address challenges in the translation market and multilingual content generation.

2020

TANDO
Research and development of document-level neural machine translation systems for Basque-Spanish, including fine-grained evaluations of gender and contextual phenomena.
ITAI
Design, development, validation, and integration of a continuous learning system for neural machine translation, aimed at addressing challenges in multilingual content generation.

David Ponce

🔥 News

📖 Education

🏢 Employment History

Vicomtech – Researcher in Speech and Natural Language Technologies

🛠 Skills

Research Skills

Technical Skills

Languages

📝 Publications

2024

2023

2022

2021

📂 Projects

2024

2023

2022

2021

2020