Europe just took a big step toward multilingual, sovereign AI. Tilde has released TildeOpen LLM, a 30B open-source foundation model optimized for European languages and funded by the EU. Why this matters: Multilingual equity: An “equitable” tokenizer and training curriculum reduce English-first bias, improving grammar, fluency, and token efficiency in smaller European languages. Sovereignty and security: Open weights, self-hosting (on-prem or EU cloud), and alignment with EU governance and privacy expectations. Transparent and research-friendly: CC-BY-4.0 license, detailed training recipe (EuroHPC LUMI supercomputer), and multilingual benchmarks. How it compares to many LLMs: Pros: - Full compliance with the EU AI Act - Stronger accuracy and fluency for European languages that are typically underserved. - Lower token counts for those languages → faster, cheaper inference for EU use cases. - Open-source with practical self-hosting to keep sensitive data within EU jurisdiction. Cons: - It’s a base model (not instruction-tuned yet) — you’ll want fine-tuning for assistants/agents. - 30B params are heavier to deploy than 7B–13B models; quantization and solid GPUs help. - Requires attribution (CC-BY-4.0), and you’ll need to handle safety/alignment yourself. If you work in government, regulated industries, or any org serving multilingual European audiences, TildeOpen is a serious foundation to consider, especially when data residency, cultural accuracy, and trust are non-negotiable. Explore: Overview + FAQ: https://lnkd.in/gQEEkxxH Model + benchmarks: https://lnkd.in/eZPcShwg EU success story: https://lnkd.in/epEcDDXK #AI #LLM #MultilingualAI #DigitalSovereignty #OpenSource #NLP #Europe #GenAI #LanguageTechnology
Awaremind AI’s Post
More Relevant Posts
-
Latvian AI company Tilde launches a multilingual open-source LLM, trained on LUMI supercomputer. The model supports 34 languages, including all 24 official EU languages, and Ukrainian, Norwegian, Icelandic, Turkish, and several Balkan languages. In addition to its exceptionally broad language coverage, TildeOpen LLM also includes built-in safeguards against Russian disinformation. Recent investigations by VIGINUM, DFRLab (Atlantic Council), and GLOBSEC revealed how Kremlin-aligned narratives have infiltrated global AI models. In response, Tilde worked with media monitoring authorities to filter disinformation from training data and used topic modeling to block politically sensitive content from Kremlin-controlled sources. Unlike many global models hosted abroad, TildeOpen LLM can be deployed locally or within European cloud environments. This is a significant step for digital sovereignty and secure AI development in Europe. The European Gazette 🔗 https://lnkd.in/duq9qM-5 TildeOpen LLM website 🔗 https://lnkd.in/dp6AZXPf
To view or add a comment, sign in
-
🇪🇺 Europe’s betting big on open large language models, designed to be transparent, multilingual, and built around shared European values. In our latest piece, we explore the current landscape of Europe’s open LLMs — from EU-backed initiatives to national projects that reflect different ambitions and constraints. We interviewed Edwin Rijgersberg (founder of AI Studio Delta and the creator of GEITje, the first open Dutch LLM), who shared his insights on how these efforts are progressing and what’s ahead for Europe’s open AI ecosystem. Key topics include: 🔸 The EU’s LLM approach 🔸 National initiatives and how they differ 🔸 Where Europe stands against US and Chinese models 🔸 Steps towards a pan-European infrastructure 👇 Read the full article below: https://lnkd.in/euRsqtyC #OpenLLMs #EuropeLLMs #EUAIstrategy
To view or add a comment, sign in
-
The EU-Funded TildeOpen LLM: Can Efficient and Responsible AI Compete with Giants? The European Commission has just shown that the AI race doesn’t have to be about size alone it can also be about efficiency, sustainability, and responsibility. Through the Large AI Grand Challenge, the EU funded four startups, including Tilde (Latvia) granting €1 million and 8 million GPU hours on Europe’s LUMI and LEONARDO supercomputers to train new large-scale models. In less than a year, Tilde delivered TildeOpen LLM, a 30-billion-parameter, open-source model optimized for European languages. It’s smaller, faster, and fully compliant with the EU AI Act, yet achieves state-of-the-art multilingual performance even outperforming general-purpose models in languages like Lithuanian, Latvian, or Slovenian. ⚖️ Are TildeOpen and ChatGPT-5 Comparable? In some ways, yes, and that’s what makes it interesting. ✅ Comparable: Linguistic quality: TildeOpen rivals or exceeds global models in Europe’s underrepresented languages, and also consider Spanish!! yeiii , which is a milestone for inclusion and cultural accuracy. Efficiency: It reached high performance with only a fraction of GPT-5’s compute power, Formula 1 performance on hybrid-car fuel. Ethics and infrastructure: Trained entirely on European supercomputers, it ensures data security and regulatory compliance while reducing dependency on foreign tech. 🚫 Not directly comparable: Scale and scope: GPT-5 is a universal, multimodal model (text, image, voice, reasoning); TildeOpen focuses on multilingual text. Data volume: GPT-5 learned from hundreds of terabytes of global data; TildeOpen uses curated European datasets, smaller, but more transparent and aligned with EU values. Update cycles: OpenAI can retrain continuously; TildeOpen moves through public funding and academic collaboration, with slower cycles. Now, the real question is: 👉 Can the market reward this kind of innovation? #AI #ResponsibleAI #Europe #TildeOpen #Innovation #AIAct #DigitalSovereignty #Sustainability #Efficiency #OpenSource
To view or add a comment, sign in
-
🚨New Paper at #EMNLP25 Findings If we ask a multilingual language model a factual question written on different languages, do the answers always refer to the same entity? Well..not quite. We evaluated this phenomena and conducted representational analysis of this phenomena and also analyzing several approaches that could potentially mitigate such issue. Work is conducted by@Mahardika Krisna Ihsani, @Xi Ai and Min-Yen Kan Key Contributions and Findings: A code-mixed coreferential task to observe implicit consistency across languages within a sentence We discovered consistency bottleneck that could hamper a multilingual language model to recall a knowledge consistently across different languages. This issue is tied with language characteristics with language script showing the strongest effect and also training objective. Cross-lingual supervision can alleviate the consistency bottleneck to enhance alignments between coreferential entities. Shared language scripts contribute to crosslingual consistency, especially for encoder and decoder models, but it is not a necessary condition to achieve it. Welcome to our poster presentation at #EMNLP2025! We will present our poster at Hall C on Nov 7 at 12:30-13:30. See you there! 🧵[5/n] 📄 arXiv: https://lnkd.in/gv2gb6zh ⌨️ Repo: https://lnkd.in/gSvQNuBV
To view or add a comment, sign in
-
-
Release of the massive HPLT v3.0 multilingual dataset! 🚀 October is back and so are HPLT datasets (we've been doing this for three consecutive years now!). This time is my honour, on behalf of the HPLT team, to announce the release of the massive HPLT v3.0 multilingual dataset which can be considered a major upgrade for large-scale multilingual corpora. Accounting for 29 billion documents, 198 language-script combinations and 112 trillion characters, v3.0 shows significant gains over v2, driven by several improvements, including a new global deduplication process: ✅ Unique content boosted from 52% to 73% on average. ✅ Data substance and robustness remains high with better extraction and improved language identification. ✅ Shows increased variety and better representativity of natural web content. This release provides a cleaner, more robust dataset for building powerful LLMs and machine translation systems, including a myriad of low- to medium-resourced languages. And we have not said our last word: wait for more data soon because we are already working on it. Special thanks to all the collaborators and funding bodies, including the European Union's Horizon Europe programme and UK Research and Innovation. 🔗 Explore and download the data: https://lnkd.in/dv5mqVP3 🔎 [NEW]See the analysis and evaluation highlights on our website post: https://lnkd.in/duGAeMTu #HPLT #NLProc #AI #Datasets #MachineTranslation #MultilingualNLP #LanguageTechnology #OpenData #Data4LLMs
To view or add a comment, sign in
-
AI isn’t only about the models—it’s also about the evaluators who judge them. The NIST report on DeepSeek reveals how evaluation can define whether a model is seen as powerful, risky, or responsible. At Root Signals, we enable you to take full control of evaluating the LLM applications/agents you built. #LLM #AI #DeepSeek #AIevals
US government-backed evaluation of open source DeepSeek Large Language Models were deemed "dangerous and shortsighted" in a recent report by National Institute of Standards and Technology (NIST). The main reason is it has less refusals, in other words, it actually obeys user instructions. It is kinda obvious that Chinese models shouldn't be used by anyone near US government but it is actually unclear whether AI alignment nerds think "high instruction-following capability" is a good thing or not.
To view or add a comment, sign in
-
-
Fantastic to see the release of #TildeLLM, an open-source LLM from Europe, supporting 34 European languages including Slavic, Baltic, and Balkan, and fully compliant with the EU AI Act, with all data security maintained within the EU. I love that the priority for European AI seems to be open, legally compliant, and multilingual. https://lnkd.in/eWm-3gnH
To view or add a comment, sign in
-
Building Trust in Large Language Models: The Key to Responsible AI In an era where artificial intelligence is becoming increasingly integrated into our daily lives, the question of trustworthiness in Large Language Models (LLMs) has never been more critical. As we harness the power of these advanced systems, understanding what makes them reliable is essential for developers, businesses, and users alike. First and foremost, transparency is a cornerstone of trust. Users need to understand how LLMs generate responses, including the data sources and algorithms that underpin their functionality. Clear documentation and open communication about model training processes can demystify these technologies and foster confidence among users. Another vital aspect is robustness. A trustworthy LLM should be resilient against adversarial inputs and capable of handling a wide range of queries without producing harmful or misleading information. Rigorous testing and continuous updates are necessary to ensure that these models can adapt to new challenges and maintain high standards of accuracy. Ethical considerations also play a significant role in establishing trust. Developers must prioritize fairness and inclusivity, ensuring that LLMs do not perpetuate biases or reinforce stereotypes. Implementing diverse training datasets and conducting regular audits can help mitigate these risks and promote equitable outcomes. Finally, user feedback is invaluable. Engaging with users to gather insights on their experiences can guide improvements and enhance the overall reliability of LLMs. By fostering a collaborative relationship between developers and users, we can create systems that not only meet expectations but exceed them. As we continue to explore the potential of LLMs, let’s prioritize trustworthiness as a fundamental principle. Together, we can build AI systems that empower individuals and organizations while upholding ethical standards. #artificialintelligenceschool #aischool #superintelligenceschool
To view or add a comment, sign in
-
“We think not in words but in shadows of words” used to said V Nabokov. The relentless efforts to tame AI #Hallucinations is a perfect illustration. Let s open the box : ✅ Definition is slippery Because hallucinations are multifaceted, it’s increasingly hard to agree on how to define, evaluate, and reduce them. Taxonomies often separate intrinsic hallucinations—those that contradict the user’s prompt—from extrinsic ones—those that contradict training data or external reality. In both cases, confusion still abounds. 👉 https://lnkd.in/e8rxfQEK ✅ There’s a built-in trade-off Several papers demonstrate this inherent trade-off between consistency (avoiding invalid outputs) and breadth (producing diverse, linguistically rich content) in LLMs. For broad language classes, any model that generalizes beyond its training data will either hallucinate or collapse. 👉 https://lnkd.in/eEG7aUgx ✅ Hallucinations are inevitable A January study from the National University of Singapore proves there will always be solvable problems that remain beyond a given model’s capabilities—so some hallucination is unavoidable. 👉 https://lnkd.in/eg48wJcZ ✅ What about learning to models to say “I don’t know” ? Surely it helps—but isn’t a panacea. Recent work from OpenAI Research and Georgia Tech emphasizes post-training this “IDK” skill: teaching models to abstain when uncertain. Yet even with this card, hallucinations persist because the world is rarely binary. Many benchmarks mirror standardized exams and rely on pass/fail metrics; optimizing for binary accuracy can actually encourage confident mistakes. The authors suggest making confidence thresholds explicit in instructions and adding confidence targets to mainstream evaluations (e.g., SWE-bench). 👉 https://lnkd.in/eK3h-FWK ✅ Realign the incentives. To truly reduce hallucinations, we should reward appropriate expressions of uncertainty instead of penalizing them. The future points toward nuanced language models with stronger pragmatic competence—the direction “world models” are heading—though their horizon still feels distant. 👉 https://lnkd.in/eVd8nsdc It leads to a central conclusion : LLMs currently master language but not thoughts. And it is more than a nuance ! Mastering thought will be a long journey. It will make us rediscover an inconvenient truth : ambiguity is a core feature of language. It lets us use context to convey complex meanings and connect with others in more sensitive, human ways. Ambiguity is our key human mastery. So it is time for you to open any of Nabokov's novels. You will find the best proof that artificial superintelligence has still a long way to go.
To view or add a comment, sign in
-
-
Want to know how large language models deal with learning over long time horizons? How modern hate-checkers deal with content from different points in time? Then take a look at our newest work: "Chronoberg: Capturing Language Evolution and Temporal Awareness in Foundation Models": https://lnkd.in/eDcmDUJj - spearheaded jointly by Niharika H. (Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI)) & Subarnaduti Paul (University of Bremen)! Chronoberg is an open-source dataset consisting of chronologically sorted books than span over 250 years. We've annotated this dataset with affect at different points in time, including valence, arousal and dominance for words and sentences. In addition to the paper, the datatset is available publicly on Hugging Face: https://lnkd.in/eX9t6GGA On the basis of the dataset our paper contains various analyses, ranging from looking at shifts in meaning, investigating how different modern hate-checkers (Google's Perspective API, Facebook's RoBERTa-Hate, OpenAI's content moderation) perform on content from various points in time, and (the part I am particularly proud of as a lifelong ML researcher) training models from scratch (1.4B models) on different time intervals and comparing them to LLMs trained continually over the 250 years. The latter emulates lifelong learning, highlighting that our current LLMs and lifelong learning algorithms still have a long way to go in realistic real-world scenarios. Chronoberg will help us explore these new ways, develop better algorithms, benchmark models over long time horizons, and look at more in-depth applications; think identification of harmful language, historical contextualization, continual learning and unlearning over time. These large-scale efforts can only happen through collaboration, so thanks to all further collaborators Kristian Kersting , Patrick Schramowski, Manuel Brack, Lars-Joel Frey
To view or add a comment, sign in
-
More from this author
Explore related topics
- How to Optimize Large Language Models
- How Llms Process Language
- Best Practices for AI Safety and Trust in Language Models
- Trends in Open-Source Language Models
- How to Train Custom Language Models
- Latest Developments in AI Language Models
- How EU AI Regulation Shapes Global Business Practices
- Regulatory Challenges for European AI Startups
- Recent Developments in LLM Models
- How to Build Responsible AI With Foundation Models
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development