Skip to content

252 release candidate#929

Merged
maziyarpanahi merged 49 commits into
masterfrom
252-release-candidate
Jun 11, 2020
Merged

252 release candidate#929
maziyarpanahi merged 49 commits into
masterfrom
252-release-candidate

Conversation

@maziyarpanahi

@maziyarpanahi maziyarpanahi commented Jun 11, 2020

Copy link
Copy Markdown
Contributor

2.5.2


New Features

  • Introducing a new LanguageDetectorDL state-of-the-art annotator to detect and identify languages in documents and sentences
  • Add a new param entityValue to TextMatcher to add custom value inside metadata. Useful in post-processing when there are multiple TextMatcher annotators with multiple dictionaries Add custom metadata to annotators #920

Bugfixes


Enhancements

  • Improve TF backend in ContextSpellChecker annotator

Pipelines and Models

We have added 4 new LanguageDetectorDL models and pipelines to detect and identify up to 20 languages:

  • The model with 7 languages: Czech, German, English, Spanish, French, Italy, and Slovak
  • The model with 20 languages: Bulgarian, Czech, German, Greek, English, Spanish, Finnish, French, Croatian, Hungarian, Italy, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Swedish, Turkish, and Ukrainian
Model Name Build Lang Offline
LanguageDetectorDL ld_wiki_7 2.5.2 xx Download
LanguageDetectorDL ld_wiki_20 2.5.2 xx Download
Pipeline Name Build Lang Offline
LanguageDetectorDL detect_language_7 2.5.2 xx Download
LanguageDetectorDL detect_language_20 2.5.2 xx Download

Documentation

  • Update documentation for release of Spark NLP 2.5.x
  • Update the entire spark-nlp-workshop notebooks for Spark NLP 2.5.x
  • Update the entire spark-nlp-models repository with new pre-trained models and pipelines

Installation

Python

#PyPI

pip install spark-nlp==2.5.2

#Conda

conda install -c johnsnowlabs spark-nlp==2.5.2

Spark

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.2

PySpark

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.2

Maven

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.11</artifactId>
    <version>2.5.2</version>
</dependency>

FAT JARs

albertoandreottiATgmail and others added 30 commits May 29, 2020 15:27
Also, catch exception in case the rsult of Character.getName is illegal exception
Caution: Even if we save the ordered ListMap Apache Spark still reads them back ffrom the fields without any order. It's best to order before any use and not before saving.
Fix bugs in includeConfidence and BasicTokenizer/BertEmbeddings
This is a multi-lingual annotator and the default lang should be xx which represents multiple languages
WIP: Introducing new LanguageDetectorDL annotator to identify languages
@maziyarpanahi maziyarpanahi merged commit 2248cf4 into master Jun 11, 2020
@maziyarpanahi maziyarpanahi deleted the 252-release-candidate branch March 29, 2021 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants