Preface
1. Introduction: What Is Data Science?
Big Data and Data Science Hype
Getting Past the Hype
Why Now?
Datafication
The Current Landscape with a Little History
Data Science lobs
A Data Science Profile
Thought Experiment: Meta-Definition
OK, So What Is a Data Scientist, Really?
In Academia
In Industry
2. Statistical Inference, Exploratory Data Analysis, and the Data Science
Process
Statistic.a1 Thinking in the Age of Big Data
Statistical Inference
Populations and Samples
Populations and Samples of Big Data
Big Data Can Mean Big Assumptions
Modeling
Exploratory Data Analysis
Philosophy of Exploratory Data Analysis
Exercise: EDA
The Data Science Process
A Data Scientist''s Role in This Process
Thought Experiment: How Would You Simulate Chaos?
Case Study: RealDirect
How Does RealDirect Make Money?
Exercise: RealDirect Data Strategy
3. Algorithms
Machine Learning Algorithms
Three Basic Algorithms
Linear Regression
k-Nearest Neighbors k-NN
k-means
Exercise: Basic Machine Learning Algorithms
Solutions
Summing It All Up
Thought Experiment: Automated Statistician
4. Spare Filters, Naive Bayes, and Wrangling
Thought Experiment: Learning by Example
Why Won''t Linear Regression Work for Filtering Spare?
How About k-nearest Neighbors?
Naive Bayes
Bayes Law
A Spare Filter for Individual Words
A Spam Filter That Combines Words: Naive Bayes
Fancy It Up: Laplace Smoothing
Comparing Naive Bayes to k-NN
Sample Code in bash
Scraping the Web: APIs and Other Tools
Jake''s Exercise: Naive Bayes for Article Classification
Sample R Code for Dealing with the NYT API
5. Logistic Regression
Thought Experiments
Classifiers
Runtime
You
Interpretability
Scalability
M6D Logistic Regression Case Study
Chck Models
The Underlying Math
6.1ime Stamps and Financial Modeling
7.Extracting Meaning from Data
8.Recommendation Engines:Building a User-Facing Data Product at Scale
9.Data Visualization and Fraud Detection
10.SociaI Networks and Data Journalism
11.Causality
12.Epidemiology
13.Lessons Learned from Data Competitions:Data Leakage and Model Evaluation
14.Data Engineering:MapReduce,Pregel,and Hadoop
15.The Students Speak
16.Next-Generation Data Scientists,Hubris,and Ethics
Index