Showing 62 open source projects for "mapreduce"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Award-Winning Medical Office Software Designed for Your Specialty Icon
    Award-Winning Medical Office Software Designed for Your Specialty

    Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

    RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
    Learn More
  • 1
    PowerJob

    PowerJob

    Enterprise job scheduling middleware with distributed computing

    ...Four timing strategies are supported, including CRON expression, fixed rate, fixed delay and OpenAPI which allows you to define your own scheduling policies, such as delaying execution. Four execution modes are supported, including stand-alone, broadcast, Map and MapReduce. Distributed computing resources could be utilized in MapReduce mode, try the magic out here! Both job dependency management and data communications between jobs are supported. Developers can write their processors in Java, Shell, Python, and will subsequently support multilingual scheduling via HTTP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Apache HBase

    Apache HBase

    Get random, realtime read/write access to your Big Data

    ...Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options. Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX. Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Koog

    Koog

    Koog is the official Kotlin framework for building AI agents

    Koog is a Kotlin‑based framework for building and running AI agents entirely in idiomatic Kotlin, supporting both single‑run agents that process individual inputs and complex workflow agents with custom strategies and configurations. It features pure Kotlin implementation, seamless Model Control Protocol (MCP) integration for enhanced model management, vector embeddings for semantic search, and a flexible system for creating and extending tools that access external systems and APIs....
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    DTail

    DTail

    DTail is a distributed DevOps tool for tailing, grepping, catting logs

    DTail (a distributed tail program) is a DevOps tool for engineers programmed in Google Go for following (tailing), catting and grepping (including gzip and zstd decompression support) log files on many machines concurrently. An advanced feature of DTail is to execute distributed MapReduce aggregations across many devices. For secure authorization and transport encryption, the SSH protocol is used. Furthermore, DTail respects the UNIX file system permission model (traditional on all Linux/UNIX variants and also ACLs on Linux based operating systems). The DTail binary operates in either client or server mode. The DTail server must be installed on all server boxes involved. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • FusionAuth: Authentication and User Management Software Icon
    FusionAuth: Authentication and User Management Software

    Offer your users flexible authentication options, including passwords, passwordless, single sign-on (SSO), and multi-factor authentication (MFA).

    FusionAuth adds login, registration, SSO, MFA, and a bazillion other features to your app in days - not months.
    Learn More
  • 5
    Luigi

    Luigi

    Python module that helps you build complex pipelines of batch jobs

    ...You can build pretty much any task you want, but Luigi also comes with a toolbox of several common task templates that you use. It includes support for running Python mapreduce jobs in Hadoop, as well as Hive, and Pig, jobs. It also comes with file system abstractions for HDFS, and local files that ensures all file system operations are atomic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6

    JRecord

    Read Cobol data files in Java

    ...The source is now available at https://github.com/bmTas/JRecord Projects using JRecord include: * https://github.com/thospfuller/rcoboldi - Cobol File in R * https://github.com/tmalaska/CopybookInputFormat - Cobol files in Hadoop * https://github.com/gss2002/copybook_formatter * https://github.com/gss2002/ftp2hdfs has some code that allows ftping RDW files directly from the Mainframe into Hadoop/HDFS as a mapreduce job or standalone client.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7

    ParDRe

    Parallel tool to remove duplicate DNA reads

    ...It is faster than multithreaded counterparts (end of 2015) for the same number of cores and, thanks to the message-passing technology, it can be executed on clusters. There also exists a MapReduce counterpart of ParDRe, called MarDRe (see the link above). UPDATE: From version 2.0.5 ParDRe also provides support to remove only optical duplicates (and leave biologically interesting duplicates) as well as to work with compressed input/output with .gz format.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    SkePi

    Data parallel and stream parallel skeletons implemented in erlang.

    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    spatial-framework-for-hadoop

    spatial-framework-for-hadoop

    The Spatial Framework for Hadoop allows developers

    ...At the root level of this repository, you can build a single jar with everything in the framework using Apache Ant. Alternatively, you can build a jar at the root level of each framework component. Custom MapReduce jobs that use the Esri Geometry API require that the developer has authored the job, (referencing the com.esri.geometry.* classes), and deployed the job Jar file to the Hadoop system, prior to the ArcGIS user submitting the workflow file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Loan management software that makes it easy. Icon
    Loan management software that makes it easy.

    Ideal for lending professionals who are looking for a feature rich loan management system

    Bryt Software is ideal for lending professionals who are looking for a feature rich loan management system that is intuitive and easy to use. We are 100% cloud-based, software as a service. We believe in providing our customers with fair and honest pricing. Our monthly fees are based on your number of users and we have a minimal implementation charge.
    Learn More
  • 10
    geometry-api-java

    geometry-api-java

    The Esri Geometry API for Java enables developers to write apps

    The Esri Geometry API for Java can be used to enable spatial data processing in 3rd-party data-processing solutions. Developers of custom MapReduce-based applications for Hadoop can use this API for spatial processing of data in the Hadoop system. The API is also used by the Hive UDF’s and could be used by developers building geometry functions for 3rd-party applications such as Cassandra, HBase, Storm and many other Java-based “big data” applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Gizmo Microservice Toolkit

    Gizmo Microservice Toolkit

    A Microservice Toolkit from The New York Times

    At The New York Times, our development teams have been adopting the Go programming language over the last three years to build better back-end services. In the past I’ve written about using Go for Elastic MapReduce streaming. I’ve also talked about using Go at GothamGo for news analysis and to improve our email and alert systems at the Golang NYC Meetup. We use Go for a wide variety of tasks, but the most common use throughout the company is for building JSON APIs. When we first began building APIs with Go, we didn’t use any frameworks or shared interfaces. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    rq

    rq

    A tool for doing record analysis and transformation

    ...It's a tool that's used for performing queries on streams of records in various formats. The goal is to make ad-hoc exploration of data sets easy without having to use more heavy-weight tools like SQL/MapReduce/custom programs. rq fills a similar niche as tools like awk or sed, but works with structured (record) data instead of text. It was created with love out of the best parts of Rust, and is distributed as a dependency-free binary on many operating systems and architectures.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13

    MarDRe

    MapReduce-based tool to remove duplicate DNA reads

    ...Instead, MarDRe takes advantage of the MapReduce programming model to significantly improve ParDRe performance on distributed systems, especially on cloud-based infrastructures. Written in pure Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for Big Data processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    HSRA

    Hadoop spliced read aligner for RNA-seq data

    HSRA is a MapReduce-based parallel tool for mapping reads from RNA sequencing (RNA-seq) experiments. RNA-seq analyses typically begin by mapping reads to a reference genome in order to determine the location from which the reads were originated, which is a very time-consuming step. This tool allows bioinformatics researchers to efficiently distribute their mapping tasks over the nodes of a cluster by combining a fast multithreaded spliced aligner (HISAT2) with Apache Hadoop, which is a distributed computing framework for scalable Big Data processing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    spark-msna

    Algorithm on Spark for aligning multiple similar DNA/RNA sequences

    The algorithm uses suffix tree for identifying common substrings and uses a modified Needleman-Wunsch algorithm for pairwise alignments. In order to improve the efficiency of pairwise alignments, an unsupervised learning based on clustering technique is used to create a knowledge base to guide them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Scalding

    Scalding

    A Scala API for Cascading

    Scalding is a Scala DSL built on Cascading that simplifies writing Hadoop MapReduce jobs. It lets users describe data transformations using Scala’s functional abstractions, while abstracting away low-level Hadoop boilerplate. It enables expressive and testable pipeline definitions and integrates with various input/output formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    EventQL

    EventQL

    Distributed "massively parallel" SQL query engine

    EventQL is a distributed, column-oriented database built for large-scale event collection and analytics. It runs super-fast SQL and MapReduce queries. The community software … the ideal channel for companies and organizations looking for additional interactions with their community? The first AC Repair appeared in the Best AC Repair Miami research landscape as early as the end of the 2000s, but the great added value offered by these HVAC companies in Miami was not recognized or even questioned for a long time. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    RSS Atom Feed Analytics With MapReduce

    This is a data analytics project for RSS feeds using hadoop MapReduce

    This project accepts the output of jatomrss project as the input. It applies the MR logic on the same to perform the analytics
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Summingbird

    Summingbird

    Streaming MapReduce with Scalding and Storm

    Summingbird is a streaming + batch hybrid computation framework developed by Twitter. Its aim is to let developers express data aggregation pipelines in a unified way, where the same logic can run either in real time (stream) or in batch mode, and the results can be merged or reconciled. In effect, Summingbird abstracts over multiple execution engines (such as Storm, Scalding, etc.) to provide one high-level program that composes transformations and aggregations, and then executes them in...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    MIREX
    MIREX (MapReduce Information Retrieval Experiments) provides solutions to easily and quickly run large-scale information retrieval experiments on a cluster of machines using Hadoop. Version 0.3 has tools for the TREC ClueWeb09 and ClueWeb12 collections.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    owl reasoning over big biomedical data

    A OWL reasoning framework for the analysis of big biomedical data

    A general OWL reasoning framework for the analysis of big biomedical data and implement a MapReduce-based property chain reasoning prototype system. OWL reasoning method is ideally suitable for problems involved complex semantic associations because it is able to infer logical consequences based on a set of asserted rules or axioms. MapReduce framework isused to solve the problem of scalability. In our experiment, we focus on the discovery of associations between Traditional Chinese Medicine (TCM) and Western Medicine (WM).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    ankus

    ankus

    Data Mining and Machine Learning Algorithms based on MapReduce

    [The feature of ankus] * ankus is a 'web-based big data mining project and tool'. - MapReduce-based data mining/machine learning algorithms library - Hadoop-based distributed bigdata system - offering a web-based GUI for easy use [The ankus project & License] * The ankus project consists of three as an open source. * ankus has Dual licensed under the community and commercial licenses. * community license is following GPLv3 - Some algorithms in Core Project do not under the OSS License [Demonstration Site] http://www.openankus.org:18080 [Official website & E-mail] www.openankus.org ankus@openankus.org [ankus video list] http://bit.ly/ankus_video [community] http://www.facebook.com/groups/openankus (Korean Groups) http://www.facebook.com/openankus (English Groups) http://bit.ly/ankus_forum (Google groups user forum)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    hmrjp-maven-plugin

    hmrjp-maven-plugin

    Hadoop mapreduce maven plugin

    hmrjp-maven-plugin is a maven plugin which helps creating, running and verifying hadoop mapreduce jobs remotely just like any other java project which is built using maven.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    CliqueSquare

    CliqueSquare

    Distributed RDF Processing over Hadoop

    CliqueSquare is a system for storing and querying large RDF graphs relying on Hadoop’s distributed file system (HDFS) and Hadoop’s MapReduce open-source implementation. It provides a novel partitioning and storage scheme that permits 1-level joins to be evaluated locally using efficient map-only joins. In addition, CliqueSquare is equipped with a unique optimization algorithm based on graphs and cliques capable of generating highly parallelizable flat query plans relying on n-ary equality joins.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Hadoop-BAM is a Java library for the manipulation of files in common bioinformatics formats using the Hadoop MapReduce framework with the Picard SAM JDK, and command line tools similar to SAMtools. The file formats currently supported are BAM, SAM, FASTQ, FASTA, QSEQ, BCF, and VCF. For a longer high-level description of Hadoop-BAM, refer to the article "Hadoop-BAM: directly manipulating next generation sequencing data in the cloud" in Bioinformatics Volume 28 Issue 6 pp. 876-877, available online at: http://dx.doi.org/10.1093/bioinformatics/bts054 Note that the library part of Hadoop-BAM is mainly for developers with experience in using Hadoop. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next