Play all

Introduction

Feature Vectors in the Iris Data Set

Good Pet Data Set

Possible Decision Trees

Interpreting Models

Building a Decision Tree in MLlib

Evaluating a Decision Tree

Better Than Random Guessing?

Decisions Should Make Lower Impurity Subsets

Tuning Hyperparameters

How to Create a Crowd?

Trees See Subsets of Examples

Or Subsets of Features

Diversity of Opinion

Random Decision Forests

Description:

Explore the world of Random Decision Forests on Apache Spark in this 53-minute conference talk from GOTO Amsterdam 2015. Dive into machine learning concepts as Sean Owen, Director of Data Science at Cloudera, guides you through feature vectors, decision trees, and model interpretation. Learn how to build and evaluate decision trees using MLlib, understand the importance of impurity reduction in decision-making, and discover techniques for tuning hyperparameters. Delve into advanced topics such as creating diverse opinions through subsets of examples and features, culminating in an exploration of Random Decision Forests. Gain practical insights into Apache Spark's capabilities for data scientists, including its distributed nature, REPL environment, and Python APIs alongside native Scala support.

A Taste of Random Decision Forests on Apache Spark

GOTO Conferences

Add to list

#Conference Talks #GOTO Conferences #Data Science #Data Analysis #Computer Science #Machine Learning #Big Data #Apache Spark #Decision Trees #Hyperparameter Tuning #MLlib

0:00 / 0:00