数据科学(影印版 英文)
作者: (美)舒特,奥尼尔 著
出版时间: 2014
现在人们已经意识到数据可以让选举或者商业模式变得不同,数据科学作为一项职业正在不断发展。但是你应该如何在这样一个广阔而又错综复杂的交叉学科领域中开展工作呢?舒特、奥尼尔著的《数据科学(影印版)》这本书将会告诉你所需要了解的一切。它富有深刻见解,是根据哥伦比亚大学的数据科学课程的讲义整理而成。
目录
- Preface
- 1. Introduction: What Is Data Science
- Big Data and Data Science Hype
- Getting Past the Hype
- Why Now
- Datafication
- The Current Landscape (with a Little History)
- Data Science lobs
- A Data Science Profile
- Thought Experiment: Meta-Definition
- OK, So What Is a Data Scientist, Really
- In Academia
- In Industry
- 2. Statistical Inference, Exploratory Data Analysis, and the Data Science
- Process
- Statistic.a1 Thinking in the Age of Big Data
- Statistical Inference
- Populations and Samples
- Populations and Samples of Big Data
- Big Data Can Mean Big Assumptions
- Modeling
- Exploratory Data Analysis
- Philosophy of Exploratory Data Analysis
- Exercise: EDA
- The Data Science Process
- A Data Scientist's Role in This Process
- Thought Experiment: How Would You Simulate Chaos
- Case Study: RealDirect
- How Does RealDirect Make Money
- Exercise: RealDirect Data Strategy
- 3. Algorithms
- Machine Learning Algorithms
- Three Basic Algorithms
- Linear Regression
- k-Nearest Neighbors (k-NN)
- k-means
- Exercise: Basic Machine Learning Algorithms
- Solutions
- Summing It All Up
- Thought Experiment: Automated Statistician
- 4. Spare Filters, Naive Bayes, and Wrangling
- Thought Experiment: Learning by Example
- Why Won't Linear Regression Work for Filtering Spare
- How About k-nearest Neighbors
- Naive Bayes
- Bayes Law
- A Spare Filter for Individual Words
- A Spam Filter That Combines Words: Naive Bayes
- Fancy It Up: Laplace Smoothing
- Comparing Naive Bayes to k-NN
- Sample Code in bash
- Scraping the Web: APIs and Other Tools
- Jake's Exercise: Naive Bayes for Article Classification
- Sample R Code for Dealing with the NYT API
- 5. Logistic Regression
- Thought Experiments
- Classifiers
- Runtime
- You
- Interpretability
- Scalability
- M6D Logistic Regression Case Study
- Chck Models
- The Underlying Math
- 6.1ime Stamps and Financial Modeling
- 7.Extracting Meaning from Data
- 8.Recommendation Engines:Building a User-Facing Data Product at Scale
- 9.Data Visualization and Fraud Detection
- 10.SociaI Networks and Data Journalism
- 11.Causality
- 12.Epidemiology
- 13.Lessons Learned from Data Competitions:Data Leakage and Model Evaluation
- 14.Data Engineering:MapReduce,Pregel,and Hadoop
- 15.The Students Speak
- 16.Next-Generation Data Scientists,Hubris,and Ethics
- Index