Stefano Lori – Medium

Pinned

Stefano Lori

Ranking document similarity at scale with Spark NLP

Combining the power of Spark NLP sentence embeddings and LSH approximate nearest neighbors search pipelines to catch contextual and…

9 min readJul 2, 2023

--

Ranking document similarity at scale with Spark NLP

--

Stefano Lori

Spark NLP Document Similarity Ranker as-retriever for RAG tasks

Breaking news! Spark NLP (https://sparknlp.org/) gets enhanced with a new DocumentSimilarityRanker as-retriever interface for your RAG…

4 min readMar 18, 2024

--

Spark NLP Document Similarity Ranker as-retriever for RAG tasks

--

Stefano Lori

Polars vs DuckDB for Delta Lake ops

Introduction

3 min readSep 13, 2023

--

Polars vs DuckDB for Delta Lake ops

--

Stefano Lori

Polars is all you need: SQL chapter

I just found a powerful Python SQL API for my data analysis

6 min readMay 20, 2023

--

Polars is all you need: SQL chapter

--

Stefano Lori
in
spark-nlp

Cleaning and extracting content from HTML/XML documents using Spark NLP

Spark NLP is an open-source text processing library for advanced natural language processing for the Python, Java and Scala programming…

5 min readJan 13, 2021

--

Cleaning and extracting content from HTML/XML documents using Spark NLP

--

Stefano Lori

Reliable and serverless data ingestion using Delta Lake on AWS Glue

The Big Data scenario

5 min readJul 21, 2020

--

Reliable and serverless data ingestion using Delta Lake on AWS Glue

--

Stefano Lori

Stefano Lori

Lead Big Data and AI, Senior Data Scientist in Fintech, ESG and Spark NLP contributor.

Following

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams