Open in app
Home
Notifications
Lists
Stories

Write
Stefano Lori
Stefano Lori

Home

Published in spark-nlp

·Jan 13, 2021

Cleaning and extracting text from HTML/XML documents by using Spark NLP

Spark NLP is an open-source text processing library for advanced natural language processing for the Python, Java, and Scala programming languages. The library obtained today the best performing academic peer-reviewed results for two years in a row with an important growing community (2.5M Downloads and 9x growth in 2020). Some…

NLP

5 min read

Cleaning and extracting content from HTML/XML documents using Spark NLP
Cleaning and extracting content from HTML/XML documents using Spark NLP

Jul 21, 2020

Reliable and serverless data ingestion using Delta Lake on AWS Glue

The Big Data scenario Imagine a new requirement suddenly comes out: it has become necessary to re-process your last 2 years of data to come out with some specific reports for the BI team. Wait, your production database is taking 15 hours to compute the reports and the clients need…

Delta Lake

5 min read

Reliable and serverless data ingestion using Delta Lake on AWS Glue
Reliable and serverless data ingestion using Delta Lake on AWS Glue
Stefano Lori

Stefano Lori

Tech Lead Data and AI, Senior Data Scientist in Fintech and Spark NLP contributor.

Following
  • Ajay Gupta

    Ajay Gupta

  • Cassie Kozyrkov

    Cassie Kozyrkov

  • Towards AI Editorial Team

    Towards AI Editorial Team

  • Dean Wampler

    Dean Wampler

  • PyTorch

    PyTorch

See all (22)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Knowable