Explore state-of-the-art natural language understanding at scale in this 50-minute conference talk from ODSC West 2018. Dive into the challenges of processing language and learn about the NLP library for Apache Spark, which extends Spark ML pipeline APIs for distributed, optimized NLP and ML pipelines. Discover core NLP algorithms including lemmatization, part of speech tagging, dependency parsing, named entity recognition, spell checking, and sentiment detection. Follow along with demonstrations of building common pipelines using PySpark on notebooks. Gain insights into benchmarks, design best practices, and performance optimizations for NLP, ML, and deep learning pipelines on Spark. Understand the latest improvements in Spark NLP, including native Spark extensions, embedded TensorFlow, and advanced word embeddings. Learn about practical applications like e-discovery and domain-specific sentiment analysis models. Get guidance on starting your first NLP project and setting realistic expectations for working with natural language processing at scale.
Read more
State of the Art Natural Language Understanding at Scale - David Talby