Spark NLP is an open-source text processing library for advanced natural language processing for the Python, Java, and Scala programming languages. The library obtained today the best performing academic peer-reviewed results for two years in a row with an important growing community (2.5M Downloads and 9x growth in 2020).

Photo by Florian Olivo on Unsplash

Some more impressive numbers from the latest 2.7.x release:

  • more accurate, faster, and support up to 375 languages.
  • support to state-of-the-art Seq2Seq and Text2Text transformers. This includes new annotators for Google T5 (Text-To-Text Transfer Transformer) and MarianMNT for Neural Machine Translation — with over 646 new pre-trained models and pipelines.
  • 720+…

The Big Data scenario

Imagine a new requirement suddenly comes out: it has become necessary to re-process your last 2 years of data to come out with some specific reports for the BI team. Wait, your production database is taking 15 hours to compute the reports and the clients need the outcomes tomorrow? And actually every week?

One possible solution is to create a data warehouse on S3 ingesting data from your database and scale the data processing using some of the most powerful parallel and distributed engines such as Apache Spark.

Delta Lake

Last April 2019 Databricks open sourced…

Stefano Lori

Tech Lead Data and AI, Senior Data Scientist in Fintech and Spark NLP contributor.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store