Showing 173 open source projects for "etl"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Orchestrate Your AI Agents with Zenflow Icon
    Orchestrate Your AI Agents with Zenflow

    The multi-agent workflow engine for modern teams. Zenflow executes coding, testing, and verification with deep repo awareness

    Zenflow orchestrates AI agents like a real engineering system. With parallel execution, spec-driven workflows, and deep multi-repo understanding, agents plan, implement, test, and verify end-to-end. Upgrade to AI workflows that work the way your team does.
    Try free now
  • 1
    Ethereum ETL

    Ethereum ETL

    Python scripts for ETL (extract, transform and load) jobs for Ethereum

    Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery. Ethereum ETL lets you convert blockchain data into convenient formats like CSVs and relational databases.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Embedded Template Library (ETL)

    Embedded Template Library (ETL)

    Embedded Template Library

    C++ is a great language to use for embedded applications and templates are a powerful aspect. The standard library can offer a great deal of well-tested functionality, but there are some parts of the standard library that do not fit well with deterministic behavior and limited resource requirements. These limitations usually preclude the use of dynamically allocated memory and containers with open-ended sizes. What is needed is a template library where the user can declare the size, or...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Addax

    Addax

    Addax is a versatile open-source ETL tool

    Addax is a data integration and ETL (Extract, Transform, Load) tool designed for high-performance data migration tasks. It simplifies the process of moving data between different systems and formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Steampipe

    Steampipe

    Zero-ETL, infinite possibilities. Live query APIs, code & more

    ...Your cloud is a live database that changes fast. Don't wait on ETL to sync, or rely on old data. Crunch it where it's born, fueling new use cases and swift decisions.
    Downloads: 4 This Week
    Last Update:
    See Project
  • G-P - Global EOR Solution Icon
    G-P - Global EOR Solution

    Companies searching for an Employer of Record solution to mitigate risk and manage compliance, taxes, benefits, and payroll anywhere in the world

    With G-P's industry-leading Employer of Record (EOR) and Contractor solutions, you can hire, onboard and manage teams in 180+ countries — quickly and compliantly — without setting up entities.
    Learn More
  • 5
    Logstash

    Logstash

    Centralize, transform and stash your data

    Logstash is a server-side data processing pipeline that dynamically ingests data from numerous sources, transforms it, and ships it to your favorite “stash” regardless of format or complexity. It supports and ingests data of all shapes, sizes and sources, dynamically transforms and prepares this data, and transports it to the output of your choice. Logstash is extensible, with over 200 plugins available to let you create and configure your pipeline how you choose.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    omniparser

    omniparser

    Native Golang ETL streaming parser and transform library

    Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON output based on a schema written in JSON.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    ...Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses, and Databases. Convert the column name to be compatible with Amazon Athena and the AWS Glue Catalog. Run a query against AWS CloudWatchLogs Insights and convert the results to Pandas DataFrame. Get QuickSight dashboard ID given a name and fails if there is more than 1 ID associated with this name. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    TiDB

    TiDB

    Open Source NewSQL Database

    TiDB is an open source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is currently the most actively developed open source NewSQL database, and has a rich set of features including horizontal scalability, strong consistency, and high availability.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    SyncLite

    SyncLite

    Build Anything Sync Anywhere

    ...SyncLite enables real-time, transactional data replication and consolidation from various sources including edge/desktop applications using popular embedded databases (SQLite, DuckDB, Apache Derby, H2, HyperSQL), data streaming applications, IoT message brokers, traditional database systems(ETL) and more into a diverse array of databases, data warehouses, and data lakes, enabling AI and ML use-cases at all three levels: Edge, Fog and Cloud. SyncLite's novel CDC replication framework for embedded databases, is designed to assist developers in rapidly building general-purpose data-intensive applications, Gen AI Search/RAG applications for edge, desktop, and mobile environments. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • AI-First Supply Chain Management Icon
    AI-First Supply Chain Management

    Supply chain managers, executives, and businesses seeking AI-powered solutions to optimize planning, operations, and decision-making across the supply

    Logility is a market-leading provider of AI-first supply chain management solutions engineered to help organizations build sustainable digital supply chains that improve people’s lives and the world we live in. The company’s approach is designed to reimagine supply chain planning by shifting away from traditional “what happened” processes to an AI-driven strategy that combines the power of humans and machines to predict and be ready for what’s coming. Logility’s fully integrated, end-to-end platform helps clients know faster, turn uncertainty into opportunity, and transform the supply chain from a cost center to an engine for growth.
    Learn More
  • 10
    InfluxDB

    InfluxDB

    The open source time series database

    ...InfluxDB provides infrastructure and application monitoring, IoT monitoring and analytics and more. It has APIs for storing and querying data, processing it in the background for ETL or monitoring and alerting purposes. This data can also be visualized, explored and more to help businesses seize opportunities and make the best decisions. InfluxDB is easy to start and easy to scale. Learn more about it on https://www.influxdata.com/
    Downloads: 47 This Week
    Last Update:
    See Project
  • 11
    Prefect

    Prefect

    Prefect is a workflow orchestration framework

    Prefect is an open-source modern workflow orchestration tool for scheduling, monitoring, and managing data workflows and tasks. It enables Python-native pipeline definitions with robust retries, caching, observability, and a powerful UI—ideal for data engineering and ETL processes.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    Rubix ML

    Rubix ML

    A high-level machine learning and deep learning library for PHP

    Rubix ML is a free open-source machine learning (ML) library that allows you to build programs that learn from your data using the PHP language. We provide tools for the entire machine learning life cycle from ETL to training, cross-validation, and production with over 40 supervised and unsupervised learning algorithms. In addition, we provide tutorials and other educational content to help you get started using ML in your projects. Our intuitive interface is quick to grasp while hiding alot of power and complexity. Write less code and iterate faster leaving the hard stuff to us. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    NVIDIA Merlin

    NVIDIA Merlin

    Library providing end-to-end GPU-accelerated recommender systems

    ...Each stage of the Merlin pipeline is optimized to support hundreds of terabytes of data, which is all accessible through easy-to-use APIs. For more information, see NVIDIA Merlin on the NVIDIA developer website. Transform data (ETL) for preprocessing and engineering features. Accelerate your existing training pipelines in TensorFlow, PyTorch, or FastAI by leveraging optimized, custom-built data loaders. Scale large deep learning recommender models by distributing large embedding tables that exceed available GPU and CPU memory. Deploy data transformations and trained models to production with only a few lines of code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Dungbeetle

    Dungbeetle

    A distributed job server

    Dungbeetle is a metadata and data lineage tracking tool developed by Zerodha to map and visualize how data flows across systems. It helps teams maintain data transparency by tracking dependencies between databases, tables, and reports, offering a centralized view of data pipelines. Dungbeetle is designed to enhance observability and trust in analytics ecosystems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Daft

    Daft

    Distributed DataFrame for Python designed for the cloud

    Daft is a framework for ETL, analytics and ML/AI at scale. Its familiar Python Dataframe API is built to outperform Spark in performance and ease of use. Daft plugs directly into your ML/AI stack through efficient zero-copy integrations with essential Python libraries such as Pytorch and Ray. It also allows requesting GPUs as a resource for running models. Daft runs locally with a lightweight multithreaded backend.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    Erigon

    Erigon

    Ethereum implementation on the efficiency frontier

    Erigon is an implementation of Ethereum (execution client), on the efficiency frontier, written in Go. For an Archive node of Ethereum Mainnet we recommend >=3TB storage space: 1.8TB state (as of March 2022), 200GB temp files (can symlink or mount folder <datadir>/etl-tmp to another disk). Ethereum Mainnet Full node ( see --prune* flags): 400Gb. Erigon by default is "all in one binary" solution, but it's possible start TxPool as separated processes. Same true about: JSON RPC layer (RPCDaemon), p2p layer (Sentry), history download layer (Downloader), consensus. Don't start services as separated processes unless you have clear reason for it: resource limiting, scale, replace by your own implementation, security. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 17
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    AlaSQL

    AlaSQL

    JavaScript SQL database for browser and Node.js for relational tables

    AlaSQL.js - JavaScript SQL database for browser and Node.js. Handles both traditional relational tables and nested JSON data (NoSQL). Export, store, and import data from localStorage, IndexedDB, or Excel. We focus on speed by taking advantage of the dynamic nature of JavaScript when building up queries. Real-world solutions demand flexibility regarding where data comes from and where it is to be stored. We focus on flexibility by making sure you can import/export and query directly on data...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Pyper

    Pyper

    Concurrent Python made simple

    Pyper is a Python-native orchestration and scheduling framework designed for modern data workflows, machine learning pipelines, and any task that benefits from a lightweight DAG-based execution engine. Unlike heavier platforms like Airflow, Pyper aims to remain lean, modular, and developer-friendly, embracing Pythonic conventions and minimizing boilerplate. It focuses on local development ergonomics and seamless transition to production environments, making it ideal for small teams and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Hamilton DAGWorks

    Hamilton DAGWorks

    Helps scientists define testable, modular, self-documenting dataflow

    ...As shown below, it results in readable code that can always be visualized. Hamilton loads that definition and automatically builds the DAG for you. Hamilton brings modularity and structure to any Python application moving data: ETL pipelines, ML workflows, LLM applications, RAG systems, BI dashboards, and the Hamilton UI allows you to automatically visualize, catalog, and monitor execution.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    UIforETW

    UIforETW

    User interface for recording and managing ETW traces

    UIforETW is a Windows performance tracing companion that wraps the Event Tracing for Windows (ETW) toolchain in an approachable GUI. It standardizes trace collection profiles, launches WPR/xperf with the right providers, and organizes the resulting .etl files for repeatable investigations. The tool streamlines the entire loop—record, annotate, open in WPA/XperfView—so engineers can focus on finding scheduling stalls, I/O bottlenecks, GC pauses, or GPU hitches instead of memorizing command-line incantations. It also manages symbol settings and capture templates, making it much easier to get actionable call stacks on developer machines and CI bots alike. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Pathway

    Pathway

    Python ETL framework for stream processing, real-time analytics, LLM

    Pathway is an open-source framework designed for building real-time data applications using reactive and declarative paradigms. It enables seamless integration of live data streams and structured data into analytical pipelines with minimal latency. Pathway is especially well-suited for scenarios like financial analytics, IoT, fraud detection, and logistics, where high-velocity and continuously changing data is the norm. Unlike traditional batch processing frameworks, Pathway continuously...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Superduper

    Superduper

    Superduper: Integrate AI models and machine learning workflows

    ...Developers may leverage Superduper by building compositional and declarative objects that out-source the details of deployment, orchestration versioning, and more to the Superduper engine. This allows developers to completely avoid implementing MLOps, ETL pipelines, model deployment, data migration, and synchronization. Using Superduper is simply "CAPE": Connect to your data, apply arbitrary AI to that data, package and reuse the application on arbitrary data, and execute AI-database queries and predictions on the resulting AI outputs and data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ImportExcel

    ImportExcel

    PowerShell module to import/export Excel spreadsheets, without Excel

    ...Advanced features include adding and formatting tables, setting number/date formats, creating charts, and applying styling or conditional formatting programmatically. The module is optimized for performance (streaming where possible) and supports large datasets, making it useful for ETL tasks, automated reporting, and data analysis in pure PowerShell environments. It integrates well with scheduled jobs and CI pipelines where generating or consuming spreadsheets is part of an automated workflow.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    AWS SDK for pandas

    AWS SDK for pandas

    Easy integration with Athena, Glue, Redshift, Timestream, Neptune

    ...The library abstracts efficient patterns like partitioning, compression, and vectorized I/O so you get performant data lake operations without hand-rolling boilerplate. It also supports Redshift, OpenSearch, and other services, enabling ETL tasks that blend SQL engines and Python transformations. Operational helpers handle IAM, sessions, and concurrency while exposing knobs for encryption, versioning, and catalog consistency. The result is a productive workflow that keeps your analytics in Python while leveraging AWS-native storage and query engines at scale.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next