252 release candidate by maziyarpanahi · Pull Request #929 · JohnSnowLabs/spark-nlp

maziyarpanahi · 2020-06-11T10:42:42Z

2.5.2

New Features

Introducing a new LanguageDetectorDL state-of-the-art annotator to detect and identify languages in documents and sentences
Add a new param entityValue to TextMatcher to add custom value inside metadata. Useful in post-processing when there are multiple TextMatcher annotators with multiple dictionaries Add custom metadata to annotators #920

Bugfixes

Add missing TensorFlow graphs to train ContextSpellChecker annotator Pre-defined graphs for ContextSpellChecker are missing #912
Fix misspelled param in classThreshold param in ContextSpellChecker annotator Misspelled param name in ContextSpellChecker #911
Fix a bug where setGraphFolder in NerDLApproach annotator couldn't find a graph on Databricks (DBFS) NerDLApproach does not accept HDFS path for graph folder #739
Fix a bug in NerDLApproach when includeConfidence was set to true java.lang.NumberFormatException BigDecimal in NerDLApproach regarding includeConfidence #917
Fix a bug in BertEmbeddings BertEmbeddings NullPointerException in Python #906 NullPointerException using BertEmbeddings #918

Enhancements

Improve TF backend in ContextSpellChecker annotator

Pipelines and Models

We have added 4 new LanguageDetectorDL models and pipelines to detect and identify up to 20 languages:

The model with 7 languages: Czech, German, English, Spanish, French, Italy, and Slovak
The model with 20 languages: Bulgarian, Czech, German, Greek, English, Spanish, Finnish, French, Croatian, Hungarian, Italy, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Swedish, Turkish, and Ukrainian

Model	Name	Build	Lang	Offline
LanguageDetectorDL	`ld_wiki_7`	2.5.2	`xx`	Download
LanguageDetectorDL	`ld_wiki_20`	2.5.2	`xx`	Download

Pipeline	Name	Build	Lang	Offline
LanguageDetectorDL	`detect_language_7`	2.5.2	`xx`	Download
LanguageDetectorDL	`detect_language_20`	2.5.2	`xx`	Download

Documentation

Update documentation for release of Spark NLP 2.5.x
Update the entire spark-nlp-workshop notebooks for Spark NLP 2.5.x
Update the entire spark-nlp-models repository with new pre-trained models and pipelines

Installation

Python

#PyPI

pip install spark-nlp==2.5.2

#Conda

conda install -c johnsnowlabs spark-nlp==2.5.2

Spark

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.2

PySpark

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.5.2

Maven

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.11</artifactId>
    <version>2.5.2</version>
</dependency>

FAT JARs

Also, catch exception in case the rsult of Character.getName is illegal exception

Caution: Even if we save the ordered ListMap Apache Spark still reads them back ffrom the fields without any order. It's best to order before any use and not before saving.

Spell fixes

Spell correction

…a-252 Text matcher entity metadata 252

Fixed multi language selection tabs

fixed databricks setGraph Path issue.

Fix bugs in includeConfidence and BasicTokenizer/BertEmbeddings

This is a multi-lingual annotator and the default lang should be xx which represents multiple languages

WIP: Introducing new LanguageDetectorDL annotator to identify languages

albertoandreottiATgmail and others added 30 commits May 29, 2020 15:27

added missing graphs

ad8f947

improved call to TF legibility

9ae6b51

fixed typo

780be4c

Fix Spell Mistakes

6e09815

rollback

13a27cd

Fix Spell Mistakes

90d828b

Implementing entity field in metadata in scala

9c235aa

update models md

958582a

[skip travis]Python wrapper for the entityValue param in TextMatcher

4cb777f

[skip travis]Including entity field in the expected fixtures in tests

dd8585c

Fixed multi language selection tabs

78acf52

[skip travis] Add maxResultSize config to sparknlp start

b21905f

[skip travis] Add LANGUAGE to AnnotatorType

1bbacc9

[skip travis] Fix misspelling in param description

6fe757a

[skip travis] New LanguageDetectorDL annotator

c26feaa

[skip travis] Add LanguageDetectorDL to ResourceDownloader

38fe9e4

[skip travis] Add LanguageDetectorDL TensorFlow backend

8490783

[skip travis] Add threshold, thresholdLabel, and coalesceSentences

4f5a3f5

[skip travis] Add coalesceSentences concept

664d953

[skip travis] Update Scaladocs

f290e64

[skip travis] Fix link to workshop

7cdeb10

[skip travis] update coalesceSentences default to true

6df022e

fixed databricks setGraph Path issue.

74f173b

fixed databricks setGraph Path issue.

8a1ea1d

[skip travis] Include try catch to avoid exception in confidence scores

82ce630

[skip travis] Make sure String is initialized before the search

6889011

Also, catch exception in case the rsult of Character.getName is illegal exception

[skip travis] Move ordering logic to TensorFlow

2a918a2

Caution: Even if we save the ordered ListMap Apache Spark still reads them back ffrom the fields without any order. It's best to order before any use and not before saving.

[skip travis] Fix Scaladoc formatting

656fb39

[skip travis] Add test dataset for LanguageDetectorDL

611ad8b

[skip travis] Refine the cleaning characters

e455172

Merge pull request #913 from JohnSnowLabs/spell_fixes

b7e7140

Spell fixes

maziyarpanahi added enhancement documentation bug-fix new-feature Introducing a new feature labels Jun 11, 2020

maziyarpanahi self-assigned this Jun 11, 2020

maziyarpanahi added 18 commits June 11, 2020 12:43

Merge pull request #914 from mehta-sandip/spell_correction

0feb325

Spell correction

Merge pull request #915 from JohnSnowLabs/text-matcher-entity-metadat…

2ad7a91

…a-252 Text matcher entity metadata 252

Merge pull request #919 from JohnSnowLabs/evaluationDocsMultiLanguageFix

ed8238f

Fixed multi language selection tabs

Merge pull request #925 from JohnSnowLabs/setGraphError_issue_739

7a2e3c9

fixed databricks setGraph Path issue.

Merge pull request #927 from JohnSnowLabs/bug-fixes-252

21d0c83

Fix bugs in includeConfidence and BasicTokenizer/BertEmbeddings

[skip travis] Make default lang xx

f624727

This is a multi-lingual annotator and the default lang should be xx which represents multiple languages

Merge branch '252-release-candidate' into language-detection-init

2243d0b

[skip travis] Fix misspelling

d176247

[skip travis] Add Python APIs for LanguageDetectorDL

9e3815b

[skip travis] Use regular session instead of the one with table

4aacae4

Add LanguageDetectorDLTestSpec for Scala

5a6232e

Merge pull request #921 from JohnSnowLabs/language-detection-init

9bd38a2

WIP: Introducing new LanguageDetectorDL annotator to identify languages

[skip travis] Bump version to 2.5.2

db2f7ff

[skip trravis] Rollback changes in create_models.ipynb

a71bf87

[skip travis] Update CHANGELOG

026e3e7

[skip travis] Update README for 2.5.2

36d6854

Update the website for 2.5.2 release

917e680

[skip travis] Update Scaladoc

366cde7

maziyarpanahi merged commit 2248cf4 into master Jun 11, 2020

maziyarpanahi deleted the 252-release-candidate branch March 29, 2021 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

252 release candidate#929

252 release candidate#929
maziyarpanahi merged 49 commits into
masterfrom
252-release-candidate

maziyarpanahi commented Jun 11, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

maziyarpanahi commented Jun 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

2.5.2

New Features

Bugfixes

Enhancements

Pipelines and Models

Documentation

Installation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

maziyarpanahi commented Jun 11, 2020 •

edited

Loading