We have something to share with you on Monday. 👀
Tinybird
Software Development
Tinybird is a managed ClickHouse® service for AI-native software teams. Get ClickHouse performance without complexity.
About us
The analytics backend for your app. Ship software with big data requirements faster and more intuitively than you ever thought possible.
- Website
-
https://tbrd.co/home
External link for Tinybird
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- New York
- Type
- Privately Held
- Founded
- 2019
- Specialties
- Clickhouse, Data analytics, Visualization, Real-time Analytics, Data APIs, and Data Products
Products
Tinybird
Big Data Analytics Software
Tinybird is the data platform for user-facing analytics. Ingest batch and streaming data. Query using SQL. Publish as APIs. Build fast data products, faster.
Locations
-
Primary
New York, US
-
Calle de Moreno Nieto, 2
Madrid, Community of Madrid 28005, ES
Employees at Tinybird
Updates
-
Tinybird reposted this
We're back! New edition of Kfund's Beyond the Prompt meetup. Tuesday Sept 16th in Madrid with David Villalon of Maisa and David Zafra of heydiga Due to limited space, attendance will be reserved for fellow founders and builders, with one representative per company. Requests will be manually reviewed and approved. Thanks for understanding! Requests to attend can be done in the link in the comments 👇🏼 Thanks to Tinybird for hosting us!
-
-
ClickHouse is powerful and scalable database. We wanted to see if we could consistently ingest at least 1 billion rows per second into our ClickHouse database. Come join us as we build a live ClickHouse ingestion service capable of achieving billion-row-per-second ingestion, and we'll discuss all the gotchas and corner cases you'd need to handle when operating ClickHouse at this scale.
Ingest 1 Billion Rows per Second in ClickHouse (with Javi Santana)
www.linkedin.com
-
Large language models excel at text generation but struggle with analytical reasoning over structured data. Their token-prediction architecture leads to numerical imprecision and logical inconsistencies that make them unreliable for data analysis without proper tooling integration. The core problem: LLMs are trained to predict likely next tokens based on patterns in text, not to perform accurate calculations or logical operations. When you ask an LLM to analyze data, it's essentially trying to guess what the analysis should look like based on similar examples it's seen, rather than actually performing the analysis. This manifests in several ways. LLMs struggle with numerical precision - they might calculate 23.7% growth when the actual number is 23.2%, compounding errors across multi-step calculations. They have trouble with logical consistency - they'll report both "sales increased 15%" and "Q4 sales were down significantly" in the same analysis. And they're poor at catching data quality issues like duplicate records, timezone inconsistencies, or outliers that would make any experienced analyst immediately suspicious. The traditional solution has been to build elaborate prompt engineering systems, trying to guide LLMs through analytical reasoning step by step. But this approach hits fundamental limits because you're asking a text generation system to perform operations it wasn't designed for. You must use a different approach: instead of trying to make LLMs better at analysis, make it easier for them to leverage tools that are good at analysis. Rather than asking an LLM to calculate metrics, we give it access to systems that can perform those calculations reliably and then help it interpret the results. This hybrid approach combines the strengths of both systems. LLMs are excellent at understanding natural language queries, translating business requirements into technical specifications, and communicating results in human-friendly formats. Analytical systems are excellent at performing calculations, handling edge cases, and maintaining logical consistency. Don't build LLMs that replace analytical tools - build systems where LLMs orchestrate analytical workflows. They become intelligent interfaces to powerful analytical capabilities rather than trying to replicate those capabilities themselves. This approach has broader implications for AI system design. Instead of building monolithic models that try to do everything, we should focus on building AI systems that can effectively coordinate specialized tools. The intelligence is in the orchestration, not in trying to replicate every capability within the model itself. You can read about our hybrid approach to combining LLMs with analytical tools in the Tinybird MCP Server in the blog post linked in the comments.
-
-
Join us as we build a full, end-to-end observability pipeline - including a real-time dashboard and alerts - for an existing application. You can use what we build and apply to your own application, service, or open source project. We'll instrument an existing Python backend with OpenTelemetry and send metrics, logs and traces to Tinybird via the Tinybird OpenTelemetry Exporter and template. We'll then connect Grafana to Tinybird to construct a real-time observability dashboard with different views and to setup alerts to monitor our application in real-time.
Build End-to-End Observability with OpenTelemetry, Tinybird, and Grafana
www.linkedin.com
-
Today on Tinybird Builds: Javi Santana ingests 1 Billion Rows / Second into ClickHouse, and answers your questions about scaling a ClickHouse cluster. 🗓️ Today - 11:30 AM ET / 17:30 CET 📍Tinybird Builds YouTube (link in comments) 👨🏻💻 Javi Santana Subscribe to the Tinybird Builds YouTube channel to get notified of future builds.
-
-
Processing 1 billion rows per second requires more than just choosing ClickHouse as your database. The difference between benchmark performance and production performance comes down to data modeling, memory configuration, and query optimization decisions that can make or break performance at scale. Here's the reality from our experience with ClickHouse: raw performance numbers mean nothing without understanding the complete system design. Achieving true billion-row-per-second performance isn't just about your database engine - it's about designing denormalized schemas that avoid expensive joins, using appropriate data types (UInt32 vs String matters), partitioning by date ranges that match your query patterns, and configuring memory settings that prevent OOM kills during complex aggregations. A common misconception is that columnar databases magically solve all performance problems. They don't. Column-oriented storage gives you advantages for analytical workloads, but you still need sparse indexes on high-cardinality columns, partition pruning strategies that eliminate 90%+ of data from scans, and queries that avoid SELECT * (which defeats the columnar advantage entirely). We've learned that the most critical factor isn't the theoretical maximum throughput - it's sustained performance under real-world conditions. Can your system handle rolling 30-day averages with WINDOW functions while ingesting 100K events/second? Can it maintain sub-second latency when someone runs an unoptimized GROUP BY on a billion-row table? Can it recover gracefully when a node fails mid-query without losing 2 hours of work? In our experience, performance bottlenecks often happen at the intersection of components: network serialization, memory management, disk I/O patterns, and query plan optimization. A system that can theoretically process a billion rows per second might struggle with 100 million rows if the query requires complex joins or window functions. What we've found most valuable is focusing on performance characteristics that matter for real applications: P99 latency for user-facing queries, ingestion consistency during high-load periods, and operational simplicity when things go wrong. Raw throughput numbers matter, but operational reliability matters more. The key insight is that true performance at scale requires treating your data infrastructure as a system, not a collection of components. Every layer - from data ingestion to query execution to result caching - needs to be optimized together. If you want to learn more about ClickHouse performance in practice, come join Javi's livestream tomorrow. He'll build out a ClickHouse cluster ingesting 1B rows/s and answer your questions about database perf in real-world scenarios. Link is in the comments.
-
-
This Thursday on Tinybird Builds: Javi Santana will spin up a ClickHouse cluster with 1B rows/s streaming ingestion while discussing ClickHouse scaling, perf, and (probably) naturally-aspirated engines. 🏎️ You can get notified by subscribing to Tinybird Builds on YouTube -> https://tbrd.co/builds-yt Want to study up before the build? Check out Javi's blog post -> https://tbrd.co/ch-1b-rows
-
-
Come learn how to build a powerful, flexible, and fast filtering system for your real-time dashboard. We'll build a fully-functional real-time web analytics dashboard with a powerful click-to-filter feature, allowing the dashboard to show metrics, time series charts, and tables filtering by any combination of selected dimensions. It's the ultimate dashboard filter, implemented live. You'll learn how to: - Dynamically query your database based on any filter, directly from the application - How to optimized database table schemas and SQL for high performance reads regardless of filtering dimension (and without needing many indexes) - How to integrate your dashboard frontend with a real-time analytics backend
Build a Real-Time Dashboard that can Filter on Any Dimension
www.linkedin.com