Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python 2nd Edition

Author: Peter Bruce, Andrew Bruce

Publisher: O'Reilly Media


Publish Date: May 26, 2020

ISBN-10: 149207294X

Pages: 368

File Type: PDF

Language: English

Book Preface

This book is aimed at the data scientist with some familiarity with the R and/or Python programming languages, and with some prior (perhaps spotty or ephemeral) exposure to statistics. Two of the authors came to the world of data science from the world of statistics, and have some appreciation of the contribution that statistics can make to the art of data science. At the same time, we are well aware of the limitations of traditional statistics instruction: statistics as a discipline is a century and a half old, and most statistics textbooks and courses are laden with the momentum and inertia of an ocean liner. All the methods in this book have some connection—historical or methodological—to the discipline of statistics. Methods that evolved mainly out of computer science, such as neural nets, are not included.

Two goals underlie this book:

•To lay out, in digestible, navigable, and easily referenced form, key concepts from statistics that are relevant to data science.
•To explain which concepts are important and useful from a data science perspec‐ tive, which are less so, and why.

Conventions Used in This Book

The following typographical conventions are used in this book:
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords.

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.

Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.


