Open Source Windows Text Processing Software

Browse free open source Text Processing software and projects for Windows below. Use the toggles on the left to filter open source Text Processing software by OS, license, language, programming language, and project status.

  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Stay in Flow. Let Zenflow Handle the Heavy Lifting. Icon
    Stay in Flow. Let Zenflow Handle the Heavy Lifting.

    Your AI engineering control center. Zenflow turns specs into shipped features using parallel agents and multi-repo intelligence.

    Zenflow is your engineering control center, turning specs into shipped features. Parallel agents handle coding, testing, and refactoring with real repo context. Multi-agent workflows remove bottlenecks and automate routine work so developers stay focused and in flow.
    Try free now
  • 1
    Stanford CoreNLP

    Stanford CoreNLP

    Stanford CoreNLP, a Java suite of core NLP tools

    CoreNLP is your one stop shop for natural language processing in Java! CoreNLP enables users to derive linguistic annotations for text, including token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations. CoreNLP currently supports 6 languages, Arabic, Chinese, English, French, German, and Spanish. The centerpiece of CoreNLP is the pipeline. Pipelines take in raw text, run a series of NLP annotators on the text, and produce a final set of annotations. Pipelines produce CoreDocuments, data objects that contain all of the annotation information, accessible with a simple API, and serializable to a Google Protocol Buffer. CoreNLP generates a variety of linguistic annotations, including parts of speech, named entities, dependency parses, and coreference.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    compromise

    compromise

    Modest natural-language processing

    Language is complicated and there's a gazillion words. Compromise is a javascript library that interprets and pre-parses text and makes some reasonable decisions so things are way easier. Compromise tries its best to parse text. it is small, quick, and often good-enough. It is not as smart as you'd think. Conjugate and negate verbs in any tense. Play between plural, singular and possessive forms. Interpret plain-text numbers. Handle implicit terms. Use it on the client-side or as an es-module. compromise is 180kb (minified). It's pretty fast. It can run on keypress. It works mainly by conjugating all forms of a basic word list. Decide how words get interpreted or make heavier changes with a compromise-plugin. Parse text without running POS-tagging. Pre-parse any match statements for faster lookups. It is not the most accurate, or clever nlp library, but found its niche as an easy, small library that can run everywhere.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    tika-python

    tika-python

    Python binding to the Apache Tika™ REST services

    A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and easy to install. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background. To get this working in a disconnected environment, download a tika server file (both tika-server.jar and tika-server.jar.md5, which can be found here) and set the TIKA_SERVER_JAR environment variable to TIKA_SERVER_JAR="file:////tika-server.jar" which successfully tells python-tika to "download" this file and move it to /tmp/tika-server.jar and run as a background process. This is the only way to run python-tika without internet access. Without this set, the default is to check the tika version and pull latest every time from Apache.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to expand its capabilities, focusing on versatile data extraction, platform support, and seamless integration with various systems. DocWire SDK is dedicated to streamlining data processing, reducing development time and costs, and harnessing the potential of AI. Its advancements promise a superior experience compared to its predecessor, DocToText.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Free and Open Source HR Software Icon
    Free and Open Source HR Software

    OrangeHRM provides a world-class HRIS experience and offers everything you and your team need to be that HR hero you know that you are.

    Give your HR team the tools they need to streamline administrative tasks, support employees, and make informed decisions with the OrangeHRM free and open source HR software.
    Learn More
  • 5
    ArabicDiacritizer

    ArabicDiacritizer

    An automatic restoration of Arabic diacritic marks

    This is a software of Arabic diacritical marks restoration. It is based mainly on deep architectures using deep neural network. The algorithm generates diacritized text with determined end case. The algorithm is described in detail in: Ilyes Rebai, and Yassine BenAyed 'Text-to-speech synthesis system with Arabic diacritic recognition system', Computer Speech & Language, 2015. We appreciate it very much if you can cite our related work. ************** Installation *************** - Extract the archive "ArabicDiacritizer Setup.rar". - Install the application using "Setup.exe". - Put an Arabic text in the Text Box. - Start the diacritization process. If the following problem occured: <Access to the path '..\ArabicDiacritizer v1.0\text.data' is denied> - Access to the path "Program Files\ArabicDiacritizer\ArabicDiacritizer v1.0\", - Right click on "ArabicDiacritizer" - Choose "Run as administrator" For further information, please contact: rebai_ily
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Auvai is a Java API and Java Swing based application for Text to Speech conversion of Unicode Tamil. Future direction of this API and application is to support Text to Speech conversion for all "Indic" languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Bi-gram applications based on language models produced by SRILM from Chinese Wikipedia corpus, include Chinese word segmenter, word-based (not character-based) Traditional-Simplified Chinese converter and Chinese syllable-to-word converter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Concrete Voice is a text to speech program. It can read the time, anounce weather, read text file, save text files to audio files, open any text file (supports all text encoding formats) and many more advance stuff!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Consilium Sentence Suggestions Tools

    Consilium Sentence Suggestions Tools

    Consilium – User Defined sentence Suggestion Tool.

    There are many tools available in market which will provide spell correction or grammer correction while making documents, but very few tools are available which are providing sentence completion according to previously entered text. But this all are providing sentence complition suggestion for sentences which are oftenly or regularly used by all people in same manner. But in reality style of writing changes person to person. While our aim is to provide a sentence suggestion tool which will give suggestion to complete the sentence according previously enterd data by the user. Output or suggestion for same sentence or word will change person to person according to previously entered data by that person. So, it will be very easy to type any document, sms, mail, chatting etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Desktop and Mobile Device Management Software Icon
    Desktop and Mobile Device Management Software

    It's a modern take on desktop management that can be scaled as per organizational needs.

    Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.
    Learn More
  • 10
    A Java application for statistical analysis and systematic manipulation of natural language texts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    A simple intelligent editor.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    "Java Artificial Intelligence Markup Language PAD" is a tool that manages ProgramD AI (on local or remote machines) and AIML files with real-time previews and it provides a network support to test AI capabilities over many network protocols.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    When translating becomes a game ! Text to translate can be graphically selected. Several dictionnaries can be sorted according to the context. A large choice of matching strategies is available. The OCR engine is tunable.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Leseratte is a Java parser for German written language. Currently, it contains a German lexicon (based on the Wiktionary), inflexion rules, a grammar and a parser. (Semantics component planned.) Usable as a Java library, also provides a graphical UI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    This project is devoted to the development of natural language processing tools and resources for the Lingala language, which is spoken by tens of millions of people in central Africa.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Pylero
    Pylero is an open-source Python-based text generator.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    The Information Extraction Plugin allows the use of information extraction techniques within RapidMiner. It can be seen as an interface between natural language and IE- or datamining-methods, by extracting interesting information out of documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Voice is a text to speech program with many features. Some of the features include: Reads Text, Rich Text and Word Documents aloud. Custom greeting. Professional document editor. Clipboard monitoring and processing. Good looking animated character.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    TF-IDF.jar is a Java Archive file to measure TF-IDF of each document in a document collection (corpus). The jar can be used to (a) get all the terms in the corpus (b) get the document frequency (DF) and inverse document frequency (IDF) of all the terms in the corpus (c) get the TF-IDF of each document in the corpus (d) get each term with their frequency (no. of presence), term frequency (TF) and TF-IDF in every document
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    TextBlob

    TextBlob

    TextBlob is a Python library for processing textual data

    Simple, Pythonic, text processing, Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both. Supports word inflection (pluralization and singularization) and lemmatization, as well as spelling correction. Add new models or languages through extensions. Also, it comes with a WordNet integration. If you only intend to use TextBlob’s default models (no model overrides), you can pass the lite argument. This downloads only those corpora needed for basic functionality. TextBlob is also available as a conda package.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    TextMarker
    TextMarker is now developed and hosted at Apache UIMA (http://uima.apache.org/textmarker.html). TextMarker is a UIMA-based tool for information extraction and more. The full featured editor of the rule language and the build process of UIMA descriptors are complemented with components for visualization, explanation, testing and rule learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Transcription Aid

    Transcription Aid

    Transcription Aid helps you type text from recordings.

    This software is to help type in text from speech recordings. It has several functions proven to help this type of work. However it is fully manual (aside from auto-completion), so no speech recognition if you are looking for that, but it is a great tool to do the job.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    WebDjVuTextEd

    Edit the OCR text layer of DjVu documents in a web browser

    WebDjVuTextEd allows to edit the text layer of OCR'ed DjVu documents in a web browser. You can modify the structure (paragraphs, lines, words...) create, delete, edit text nodes, modify their container box by mouse, and run a spellchecker. The program does not directly read the DjVu files, it requires exported XML text data and images. When using without a webserver, you can open and save local files, but cannot take advantages of auto-save and spell checking. Note that current SVN version has much more features than V1.0!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    OCR c++ library. Include: contour recognition; vectorisation; matrix letter feature recognition; auto page segmentation and detect rotation; SS3 ASM core; XML base; web-based GUI; 99,6% printed Unicode text recognition; letter base up to 1200 letters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    crf decoder
    CRF decoder is the simplified version of CRF++, only for decoding the sequential data. It removes the training component and its correspondent codes from CRF++, which makes CRF decoder more reabable and understandable for freshman.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next