Spark NLP is an open-source text processing library for advanced natural language processing for the Python, Java, and Scala programming languages. The library obtained today the best performing academic peer-reviewed results for two years in a row with an important growing community (2.5M Downloads and 9x growth in 2020).
Some more impressive numbers from the latest 2.7.x release:
The Big Data scenario
Imagine a new requirement suddenly comes out: it has become necessary to re-process your last 2 years of data to come out with some specific reports for the BI team. Wait, your production database is taking 15 hours to compute the reports and the clients need the outcomes tomorrow? And actually every week?
One possible solution is to create a data warehouse on S3 ingesting data from your database and scale the data processing using some of the most powerful parallel and distributed engines such as Apache Spark.
Last April 2019 Databricks open sourced…
Tech Lead Data and AI, Senior Data Scientist in Fintech and Spark NLP contributor.