The document discusses the evolution of Celtra's data pipeline over time as business needs and data volume grew. Key steps included:
- Moving from MySQL to Spark/Hive/S3 to handle larger volumes and enable complex ETL like sessionization
- Storing raw events in S3 and aggregating into cubes for reporting while also enabling exploratory analysis
- Evaluating technologies like Vertica and eventually settling on Snowflake for its managed services, nested data support, and ability to evolve schemas.
- Moving cubes from MySQL to Snowflake for faster queries, easier schema changes, and computing aggregates directly from sessions with SQL.