Главная
Study mode:
on
1
Intro
2
Workshop Outline
3
Workshop Agenda
4
Workshop Goals
5
Data Preparation
6
Faulty Sensor Situation
7
systematically missing variables
8
building missing variables
9
missing values
10
pragmatic solution
11
novel categorical levels
12
new data
13
Wyoming
14
Chemical categorical variables
15
Dealing with new levels
16
VTreat solution
17
Categorical variables
18
Compact coding
19
Indicator vs numerical variables
20
Treatment Plan
21
User Interface
22
Treatment Example
23
Linear Regression
24
Calibration
25
Interpretation
26
Operational Issues
27
Overfitting
28
Data fussing
29
John Mount
Description:
Explore data preparation techniques for analysis using R in this comprehensive conference talk from ODSC WEST 2015. Learn the fundamentals of data quality and how to automate routine steps in a principled manner. Discover common pitfalls in data preparation and how to detect and fix them through interactive demonstrations in the open-source R analysis environment. Download materials from the provided GitHub repository to follow along or practice later. Gain insights on handling faulty sensor situations, missing variables, novel categorical levels, and compact coding. Understand the importance of treatment plans, user interfaces, and operational issues in data preparation. Led by John Mount and Nina Zumel, experienced data scientists and authors, this talk covers essential topics such as linear regression, calibration, interpretation, and avoiding overfitting. Equip yourself with practical skills to improve your data science projects and increase their chances of success.

Prepping Data for Analysis Using R

Open Data Science
Add to list