Best Mathematical & Statistical Software Books
Perl is a programming language developed by Larry Wall, specially designed for text processing. It stands for Practical Extraction and Report Language.
1. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
Author: by Hadley Wickham
O'Reilly Media
English
520 pages
Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun.
Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible. Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results.
You’ll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you’ve learned along the way.
You’ll learn how to:Wrangletransform your datasets into a form convenient for analysisProgramlearn powerful R tools for solving data problems with greater clarity and easeExploreexamine your data, generate hypotheses, and quickly test themModelprovide a low-dimensional summary that captures true “signals” in your datasetCommunicatelearn R Markdown for integrating prose, code, and results.
2. An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
Author: by Gareth James
Springer
English
440 pages
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years.
This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented.
Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform.
Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience.
3. Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python
Author: by Peter Bruce
O'Reilly Media
English
368 pages
Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.
Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.
With this book, you’ll learn:Why exploratory data analysis is a key preliminary step in data scienceHow random sampling can reduce bias and yield a higher-quality dataset, even with big dataHow the principles of experimental design yield definitive answers to questionsHow to use regression to estimate outcomes and detect anomaliesKey classification techniques for predicting which categories a record belongs toStatistical machine learning methods that “learn” from dataUnsupervised learning methods for extracting meaning from unlabeled data.
4. A Common-Sense Guide to Data Structures and Algorithms, Second Edition: Level Up Your Core Programming Skills
Author: by Jay Wengrow
Pragmatic Bookshelf
English
508 pages
Algorithms and data structures are much more than abstract concepts. Mastering them enables you to write code that runs faster and more efficiently, which is particularly important for todays web and mobile apps. Take a practical approach to data structures and algorithms, with techniques and real-world scenarios that you can use in your daily production code, with examples in JavaScript, Python, and Ruby.
This new and revised second edition features new chapters on recursion, dynamic programming, and using Big O in your daily work. Use Big O notation to measure and articulate the efficiency of your code, and modify your algorithm to make it faster.
Find out how your choice of arrays, linked lists, and hash tables can dramatically affect the code you write. Use recursion to solve tricky problems and create algorithms that run exponentially faster than the alternatives. Dig into advanced data structures such as binary trees and graphs to help scale specialized applications such as social networks and mapping software.
5. Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines
Author: by Chris Fregly
O'Reilly Media
English
524 pages
With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level up your skills.
This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance.
Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and moreUse automated machine learning to implement a specific subset of use cases with SageMaker AutopilotDive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deploymentTie everything together into a repeatable machine learning operations pipelineExplore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache KafkaLearn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more
6. Discovering Statistics Using IBM SPSS Statistics: North American Edition
Author: by Andy Field
SAGE Publications Ltd
English
816 pages
With an exciting new look, math diagnostic tool, and a research roadmap to navigate projects, this new edition of Andy Field’s award-winning text offers a unique combination of humor and step-by-step instruction to make learning statistics compelling and accessible to even the most anxious of students.
The Fifth Edition takes students from initial theory to regression, factor analysis, and multilevel modeling, fully incorporating IBM SPSS Statistics version 25 and fascinating examples throughout.
7. The Book of R: A First Course in Programming and Statistics
Author: by Tilman M. Davies
1593276516
English
832 pages
The Book of R is a comprehensive, beginner-friendly guide to R, the world’s most popular programming language for statistical analysis. Even if you have no programming experience and little more than a grounding in the basics of mathematics, you’ll find everything you need to begin using R effectively for statistical analysis.
You’ll start with the basics, like how to handle data and write simple programs, before moving on to more advanced topics, like producing statistical summaries of your data and performing statistical tests and modeling. You’ll even learn how to create impressive data visualizations with R’s basic graphics tools and contributed packages, like ggplot2 and ggvis, as well as interactive 3D visualizations using the rgl package.
Dozens of hands-on exercises (with downloadable solutions) take you from theory to practice, as you learn:The fundamentals of programming in R, including how to write data frames, create functions, and use variables, statements, and loopsStatistical concepts like exploratory data analysis, probabilities, hypothesis tests, and regression modeling, and how to execute them in RHow to access R’s thousands of functions, libraries, and data setsHow to draw valid and useful conclusions from your dataHow to create publication-quality graphics of your resultsCombining detailed explanations with real-world examples and exercises, this book will provide you with a solid understanding of both statistics and the depth of R’s functionality.
8. Mastering Shiny: Build Interactive Apps, Reports, and Dashboards Powered by R
Author: by Hadley Wickham
O'Reilly Media
English
372 pages
Master the Shiny web frameworkand take your R skills to a whole new level. By letting you move beyond static reports, Shiny helps you create fully interactive web apps for data analyses. Users will be able to jump between datasets, explore different subsets or facets of the data, run models with parameter values of their choosing, customize visualizations, and much more.
Hadley Wickham from RStudio shows data scientists, data analysts, statisticians, and scientific researchers with no knowledge of HTML, CSS, or JavaScript how to create rich web apps from R. This in-depth guide provides a learning path that you can follow with confidence, as you go from a Shiny beginner to an expert developer who can write large, complex apps that are maintainable and performant.
Get started: Discover how the major pieces of a Shiny app fit togetherPut Shiny in action: Explore Shiny functionality with a focus on code samples, example apps, and useful techniquesMaster reactivity: Go deep into the theory and practice of reactive programming and examine reactive graph componentsApply best practices: Examine useful techniques for making your Shiny apps work well in production
9. Learning Spark: Lightning-Fast Data Analytics
Author: by Jules S. Damji
O'Reilly Media
English
400 pages
Data is bigger, arrives faster, and comes in a variety of formatsand it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently?Enter Apache Spark. Updated to include Spark 3.
0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you’ll be able to:Learn Python, SQL, Scala, or Java high-level Structured APIsUnderstand Spark operations and SQL EngineInspect, tune, and debug Spark operations with Spark configurations and Spark UIConnect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or KafkaPerform analytics on batch and streaming data using Structured StreamingBuild reliable data pipelines with open source Delta Lake and SparkDevelop machine learning pipelines with MLlib and productionize models using MLflow
10. R For Dummies
Author: by Andrie de Vries
For Dummies
English
432 pages
Mastering R has never been easier Picking up R can be tough, even for seasoned statisticians and data analysts. R For Dummies, 2nd Edition provides a quick and painless way to master all the R you’ll ever need. Requiring no prior programming experience and packed with tons of practical examples, step-by-step exercises, and sample code, this friendly and accessible guide shows you how to know your way around lists, data frames, and other R data structures, while learning to interact with other programs, such as Microsoft Excel.
You’ll learn how to reshape and manipulate data, merge data sets, split and combine data, perform calculations on vectors and arrays, and so much more. R is an open source statistical environment and programming language that has become very popular in varied fields for the management and analysis of data.
R provides a wide array of statistical and graphical techniques, and has become the standard among statisticians for software development and data analysis. R For Dummies, 2nd Edition takes the intimidation out of working with R and arms you with the knowledge and know-how to master the programming language of choice among statisticians and data analysts worldwide.
11. Applied Predictive Modeling
Author: by Max Kuhn
Springer
English
613 pages
Winner of the 2014 Technometrics Ziegel Prize for Outstanding BookApplied Predictive Modeling covers the overall predictive modeling process, beginning with the crucial steps of data preprocessing, data splitting and foundations of model tuning. The text then provides intuitive explanations of numerous common and modern regression and classification techniques, always with an emphasis on illustrating and solving real data problems.
Addressing practical concerns extends beyond model fitting to topics such as handling class imbalance, selecting predictors, and pinpointing causes of poor model performanceall of which are problems that occur frequently in practice. The text illustrates all parts of the modeling process through many hands-on, real-life examples.
And every chapter contains extensive R code for each step of the process. The data sets and corresponding code are available in the book’s companion AppliedPredictiveModeling R package, which is freely available on the CRAN archive. This multi-purpose text can be used as an introduction to predictive models and the overall modeling process, a practitioner’s reference handbook, or as a text for advanced undergraduate or graduate level predictive modeling courses.
12. The Little SAS Book: A Primer, Sixth Edition
Author: by Lora D. Delwiche
SAS Institute
English
364 pages
A classic that just keeps getting better, The Little SAS Book is essential for anyone learning SAS programming. Lora Delwiche and Susan Slaughter offer a user-friendly approach so that readers can quickly and easily learn the most commonly used features of the SAS language.
Each topic is presented in a self-contained, two-page layout complete with examples and graphics. Nearly every section has been revised to ensure that the sixth edition is fully up-to-date. This edition is also interface-independent, written for all SAS programmers whether they use SAS Studio, SAS Enterprise Guide, or the SAS windowing environment.
New sections have been added covering PROC SQL, iterative DO loops, DO WHILE and DO UNTIL statements, %DO statements, using variable names with special characters, the ODS EXCEL destination, and the XLSX LIBNAME engine. This title belongs on every SAS programmer’s bookshelf.
It’s a resource not just to get you started, but one you will return to as you continue to improve your programming skills.
13. Discovering Statistics Using R
Author: by Andy Field
SAGE Publications Ltd
English
992 pages
The R version of Andy Fields hugely popular Discovering Statistics Using SPSS takes students on a journey of statistical discovery using the freeware R. Like its sister textbook, Discovering Statistics Using R is written in an irreverent style and follows the same ground breaking structure and pedagogical approach.
The core material is enhanced by a cast of characters to help the reader on their way, hundreds of examples, self assessment tests to consolidate knowledge, and additional website material for those wanting to learn more.
14. All of Statistics: A Concise Course in Statistical Inference (Springer Texts in Statistics)
Author: by Larry Wasserman
English
462 pages
1441923225
Taken literally, the title “All of Statistics” is an exaggeration. But in spirit, the title is apt, as the book does cover a much broader range of topics than a typical introductory book on mathematical statistics. This book is for people who want to learn probability and statistics quickly.
It is suitable for graduate or advanced undergraduate students in computer science, mathematics, statistics, and related disciplines. The book includes modern topics like non-parametric curve estimation, bootstrapping, and classification, topics that are usually relegated to follow-up courses. The reader is presumed to know calculus and a little linear algebra.
No previous knowledge of probability and statistics is required. Statistics, data mining, and machine learning are all concerned with collecting and analysing data.