📊 Full opportunity report: AMÁLIA · The Three Hard Questions. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Portugal launched AMÁLIA, a €5.5M LLM focusing on European Portuguese. While it outperforms many models, critical questions about its openness, native data, and objectives are still unresolved, raising policy concerns.
Portugal’s €5.5 million investment in the AMÁLIA large language model has resulted in a functional, Portuguese-specific AI system now accessible to hundreds of thousands of academic users, but fundamental questions about its openness, data sources, and strategic objectives remain unresolved, raising concerns about the broader European sovereign-LLM movement.
AMÁLIA is a consortium project involving about 60 researchers from Portugal’s leading institutions, including NOVA, IST, and IT. The model, announced in December 2024, is based on a continuation of the EuroLLM multilingual foundation, rather than training from scratch, and was completed by September 2025. It currently handles text in Portuguese, with multimodal capabilities planned for future versions.
Technically, AMÁLIA outperforms previous open models on European Portuguese benchmarks and surpasses Qwen 3-8B on most Portuguese tasks, although it still trails Qwen on the ALBA benchmark, the team’s primary Portuguese test. The model’s training involved approximately 107 billion tokens, with only about 5.8 billion tokens from Portugal’s web archive, Arquivo.pt, representing roughly 5.5% of the total pre-training data. The supervised fine-tuning phase included about 17-18% Portuguese data, but native-language emphasis was limited.
While the technical results are promising, questions about how open the model truly is, whether its native-language data is sufficient, and what strategic goals it aims to serve remain open, especially given the broader context of European efforts to develop sovereign language models.
AMÁLIA
The three hard
questions.
Portugal spent €5.5M to build a European Portuguese LLM. The base version is operational, the benchmarks beat Qwen 3-8B on most pt-PT tasks. So why are the most important questions still unanswered?
Last month, Duarte O.Carmo published the sharpest public analysis of AMÁLIA — Portugal’s state-funded European Portuguese large language model. He prefaces his critique with the necessary diplomatic apparatus before doing what almost nobody else in the European-sovereign-LLM discourse has been willing to do publicly: asking hard questions about whether the work, as released, actually does what it set out to do. This piece is a structural extension of his analysis. The AMÁLIA case study exposes three hard questions every national LLM effort needs to answer publicly — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
Three questions every national LLM effort needs to answer publicly.
Duarte O.Carmo’s framing maps cleanly onto the structural argument. Each question lands specifically in AMÁLIA — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
The three questions form a structural feedback loop. Q3 (optimization target) determines Q2 (data volume needed) which conditions Q1 (openness sufficient for community contribution). The European sovereign-LLM movement collectively benefits from these questions becoming standard methodology disclosure, not exceptional critique.

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips
PORTUGUESE FLASH CARDS – Basic Portuguese words and phrases for beginners and travelers
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
107 billion tokens. 5.8 billion clearly pt-PT.
The structurally tractable question with a structurally surprising answer. For a model whose entire stated purpose is European Portuguese prioritization, the native-language share of extended pre-training is 5.5%. The implications cascade into every other question.

The Complete Google Agent ADK Blueprint: Build 150+ Multimodal AI Agents with Google's Agent Development Kit, Gemini and Google Cloud (The Complete AI Blueprint Book 11)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The Olmo standard. AMÁLIA’s current state.
Allen Institute for AI’s Olmo project defines what “fully open” operationally requires. Olmo doesn’t lead frontier benchmarks. That’s not the point. The point is to be the structural reference for openness. AMÁLIA’s “fully open source” claim should track to the operational standard.

Training Data for Machine Learning: Human Supervision from Annotation to Data Science
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Four strategic positions. AMÁLIA between two and three.
Approximately €100M+ in publicly disclosed European sovereign-LLM funding across the major initiatives. The structural question every project faces: what is the actual competitive position you’re staking? Four options — none mutually exclusive — but each requiring different commitments.
European Portuguese NLP software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards. For AMÁLIA and the movement.
The structural critique generalizes beyond AMÁLIA. Italy, France, Germany, Switzerland, the OpenEuroLLM consortium, and every subsequent national project benefit from public discourse holding national LLM efforts to operational standards on openness, data accounting, and strategic positioning.
The European sovereign-AI agenda is a serious strategic project that deserves serious public discourse. O.Carmo’s analysis is what serious public discourse looks like. Appropriately diplomatic. Structurally rigorous. Willing to ask the hard questions in public when the public investment justifies it. More of this is needed — across every European sovereign-LLM project, not just AMÁLIA.
Implications for European Sovereign-Language AI Development
The development of AMÁLIA highlights critical issues facing European efforts to create independent language models. Its performance demonstrates that significant progress is possible within national budgets, but the questions of transparency, data adequacy, and strategic purpose are central to ensuring these models serve public interests and national sovereignty. The debate over openness—how accessible and modifiable the model truly is—remains unresolved, impacting trust and future collaboration across Europe.
Furthermore, the reliance on a continuation of an existing multilingual foundation rather than training from scratch raises questions about the long-term sustainability and uniqueness of these models. The broader European community is watching Portugal’s approach as a case study for balancing technical achievement with strategic independence.
European Sovereign-Language Model Initiatives and Challenges
Across Europe, multiple countries have launched their own large language model projects, such as Italy’s Minerva, Germany’s Aleph Alpha, France’s Mistral, and others. These efforts share common questions about how open their models are, how much native-language data is enough, and what the models should optimize for. Many of these initiatives are still in early stages, with models either trained from scratch or built upon multilingual foundations.
Portugal’s AMÁLIA is notable for its public funding, institutional backing, and the transparency of its development process. However, the broader landscape reveals a pattern: models are often released with limited clarity on data sources, openness, and strategic goals, making it difficult to assess their true independence and societal impact. The upcoming months will be critical in determining whether these models can evolve to meet both technical and policy expectations.
“AMÁLIA is an impressive piece of work, but it raises fundamental questions about openness and native data sufficiency that need addressing.”
— Duarte O.Carmo
Unanswered Questions About Model Openness and Goals
It remains unclear how open AMÁLIA truly is—whether its code and training data will be accessible for external review or modification. The long-term strategic objectives are also still evolving, with the final version due in June 2026. Additionally, the sufficiency of native Portuguese data for robust language understanding remains a debated point, and the impact of these choices on model performance and independence is yet to be fully assessed.
Upcoming Milestones and Evaluation Phases
The final version of AMÁLIA is scheduled for release in June 2026, which will include a comprehensive evaluation of its capabilities, openness, and strategic alignment. Over the next 12-24 months, researchers and policymakers will scrutinize its deployment, data transparency, and alignment with national AI policies. The broader European community will observe whether Portugal’s approach influences other national projects to address the key questions of openness, native data sufficiency, and strategic purpose.
Key Questions
What are the main technical achievements of AMÁLIA so far?
AMÁLIA has demonstrated superior performance on European Portuguese benchmarks compared to previous open models and exceeds Qwen 3-8B on most Portuguese tasks, showing significant progress in language-specific AI capabilities.
Why are questions about openness and native data important?
Openness determines whether the model can be reviewed, modified, and trusted by external parties, while native data sufficiency affects the model’s ability to understand and generate high-quality Portuguese language content, impacting national sovereignty and AI independence.
What are the broader implications for European AI policy?
Portugal’s AMÁLIA exemplifies the challenges and opportunities of developing sovereign-language models. The unresolved questions about transparency and purpose could influence future policy decisions across Europe, emphasizing the need for clear standards and strategic clarity.
When will the final version of AMÁLIA be available?
The final version is expected in June 2026, after which comprehensive evaluations of its capabilities, openness, and strategic goals will be conducted.
Source: ThorstenMeyerAI.com