Play all

Intro

Big data vs Large-scale?

KISS principle

Identify the scope of the problem

Avoid scope creep

Know your audience

Finding right tool for the job

Define a scalable architecture

5 Iterative development

Why social media?

Cautions about social-media data

Why Twitter?

Benefits of using Twitter

How do we harness such data?

The need for a specific tool

Beginning story (1)

So we needed to standardize this! (2)

In the end - lessons learned

For instant NLP uses

Defining a framework for data collection

Our COVID-19 infrastructure - under the hood (2)

Acknowledgments

Description:

Explore the process of building tools and frameworks for large-scale social media mining in this 46-minute talk by Dr. Juan M. Banda. Learn about the Social Media Mining Toolkit (SMMT) and its application in creating a massive COVID-19 Twitter dataset. Discover the challenges, lessons learned, and key decisions involved in developing and maintaining large-scale social media data gathering projects for NLP and machine learning research. Gain insights into the importance of standardization, scalable architecture, and iterative development in handling big data from social media platforms. Understand the benefits and cautions of using Twitter data, and explore the framework used for the COVID-19 data collection infrastructure.

Building Tools and Frameworks for Large-Scale Social Media Mining

Elvis Saravia

Add to list

#Data Science #Data Mining #Computer Science #Machine Learning #Artificial Intelligence #Natural Language Processing (NLP) #Software Engineering #Software Architecture #Scalable Architectures