Best Data Modeling & Design Books
Businesses have come a long way. In the past, market research, understanding the needs of customers and choosing the best marketing strategies was a big challenge.
1. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Author: by Martin Kleppmann
O'Reilly Media
English
616 pages
Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers.
What are the right choices for your application? How do you make sense of all these buzzwords? In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data.
Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectivelyMake informed decisions by identifying the strengths and weaknesses of different toolsNavigate the trade-offs around consistency, scalability, fault tolerance, and complexityUnderstand the distributed systems research upon which modern databases are builtPeek behind the scenes of major online services, and learn from their architectures
2. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Author: by Wes McKinney
O'Reilly Media
English
550 pages
Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3. 6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively.
You’ll learn the latest versions of pandas, NumPy, IPython, and Jupiter in the process. Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing.
Data files and related material are available on GitHub. Use the IPython shell and Jupiter notebook for exploratory computingLearn basic and advanced features in NumPy (Numerical Python)Get started with data analysis tools in the pandas libraryUse flexible tools to load, clean, transform, merge, and reshape dataCreate informative visualizations with matplotlibApply the pandas group by facility to slice, dice, and summarize datasetsAnalyze and manipulate regular and irregular time series dataLearn how to solve real-world data analysis problems with thorough, detailed examples.
3. SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Author: by Walter Shields
ClydeBank Media LLC
English
251 pages
“THE BEST SQL BOOK FOR BEGINNERS IN 2021 – HANDS DOWN!”*INCLUDES FREE ACCESS TO A SAMPLE DATABASE, SQL BROWSER APP, COMPREHENSION QUIZES & SEVERAL OTHER DIGITAL RESOURCES! Not sure how to prepare for the data-driven future? This book shows you EXACTLY what you need to know to successfully use the SQL programming language to enhance your career!
#1 NEW RELEASE & #1 BEST SELLER *Are you a developer who wants to expand your mastery to database management? Then you NEED this book. Buy now and start reading today! Are you a project manager who needs to better understand your development team’s needs?
A decision maker who needs to make deeper data-driven analysis? Everything you need to know is included in these pages! The ubiquity of big data means that now more than ever there is a burning need to warehouse, access, and understand the contents of massive databases quickly and efficiently.
That’s where SQL comes in. SQL is the workhorse programming language that forms the backbone of modern data management and interpretation. Any database management professional will tell you that despite trendy data management languages that come and go, SQL remains the most widely used and most reliable to date, with no signs of stopping.
4. Data Science from Scratch: First Principles with Python
Author: by Joel Grus
O'Reilly Media
English
406 pages
To really learn data science, you should not only master the toolsdata science libraries, frameworks, modules, and toolkitsbut also understand the ideas and principles underlying them. Updated for Python 3. 6, this second edition of Data Science from Scratch shows you how these tools and algorithms work by implementing them from scratch.
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with the hacking skills you need to get started as a data scientist.
Packed with New material on deep learning, statistics, and natural language processing, this updated book shows you how to find the gems in today’s messy glut of data. Get a crash course in PythonLearn the basics of linear algebra, statistics, and probabilityand how and when they’re used in data scienceCollect, explore, clean, munge, and manipulate dataDive into the fundamentals of machine learningImplement models such as k-nearest neighbors, Nave Bayes, linear and logistic regression, decision trees, neural networks, and clusteringExplore recommender systems, natural language processing, network analysis, MapReduce, and databases..
5. Learning SQL: Generate, Manipulate, and Retrieve Data
Author: by Alan Beaulieu
O'Reilly Media
English
384 pages
As data floods into your company, you need to put it to work right awayand SQL is the best tool for the job. With the latest edition of this introductory guide, author Alan Beaulieu helps developers get up to speed with SQL fundamentals for writing database applications, performing administrative tasks, and generating reports.
You’ll find new chapters on SQL and big data, analytic functions, and working with very large databases. Each chapter presents a self-contained lesson on a key SQL concept or technique using numerous illustrations and annotated examples. Exercises let you practice the skills you learn.
Knowledge of SQL is a must for interacting with data. With Learning SQL, you’ll quickly discover how to put the power and flexibility of this language to work. Move quickly through SQL basics and several advanced featuresUse SQL data statements to generate, manipulate, and retrieve dataCreate database objects, such as tables, indexes, and constraints with SQL schema statementsLearn how datasets interact with queries; understand the importance of subqueriesConvert and manipulate data with SQL’s built-in functions and use conditional logic in data statements
6. The Art of Statistics: How to Learn from Data
Author: by David Spiegelhalter
Basic Books
English
448 pages
In this “important and comprehensive” guide to statistical thinking (New Yorker), discover how data literacy is changing the world and gives you a better understanding of life’s biggest problems. Statistics are everywhere, as integral to science as they are to business, and in the popular media hundreds of times a day.
In this age of big data, a basic grasp of statistical literacy is more important than ever if we want to separate the fact from the fiction, the ostentatious embellishments from the raw evidence – and even more so if we hope to participate in the future, rather than being simple bystanders.
In The Art of Statistics, world-renowned statistician David Spiegelhalter shows readers how to derive knowledge from raw data by focusing on the concepts and connections behind the math. Drawing on real world examples to introduce complex issues, he shows us how statistics can help us determine the luckiest passenger on the Titanic, whether a notorious serial killer could have been caught earlier, and if screening for ovarian cancer is beneficial.
7. Python Data Science Handbook: Essential Tools for Working with Data
Author: by Jake VanderPlas
O'Reilly Media
English
548 pages
For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them allIPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.
Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models.
Quite simply, this is the must-have reference for scientific computing in Python. With this handbook, you’ll learn how to use:IPython and Jupyter: provide computational environments for data scientists using PythonNumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in PythonPandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in PythonMatplotlib: includes capabilities for a flexible range of data visualizations in PythonScikit-Learn: for efficient and clean Python implementations of the most important and established machine learning algorithms
8. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking
Author: by Foster Provost
O'Reilly Media
English
414 pages
Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for extracting useful knowledge and business value from the data you collect.
This guide also helps you understand the many data-mining techniques in use today. Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles.
You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.
Understand how data science fits in your organizationand how you can use it for competitive advantageTreat data as a business asset that requires careful investment if you’re to gain real valueApproach business problems data-analytically, using the data-mining process to gather good data in the most appropriate wayLearn general concepts for actually extracting knowledge from dataApply data science principles when interviewing data science job candidates.
9. Deep Learning for Coders with Fastai and PyTorch: AI Applications Without a PhD
Author: by Jeremy Howard
O'Reilly Media
English
624 pages
Deep learning is often viewed as the exclusive domain of math PhDs and big tech companies. But as this hands-on guide demonstrates, programmers comfortable with Python can achieve impressive results in deep learning with little math background, small amounts of data, and minimal code.How?
With fastai, the first library to provide a consistent interface to the most frequently used deep learning applications. Authors Jeremy Howard and Sylvain Gugger, the creators of fastai, show you how to train a model on a wide range of tasks using fastai and PyTorch.
You’ll also dive progressively further into deep learning theory to gain a complete understanding of the algorithms behind the scenes. Train models in computer vision, natural language processing, tabular data, and collaborative filteringLearn the latest deep learning techniques that matter most in practiceImprove accuracy, speed, and reliability by understanding how deep learning models workDiscover how to turn your models into web applicationsImplement deep learning algorithms from scratchConsider the ethical implications of your workGain insight from the foreword by PyTorch cofounder, Soumith Chintala.
10. Mastering Shiny: Build Interactive Apps, Reports, and Dashboards Powered by R
Author: by Hadley Wickham
O'Reilly Media
English
372 pages
Master the Shiny web frameworkand take your R skills to a whole new level. By letting you move beyond static reports, Shiny helps you create fully interactive web apps for data analyses. Users will be able to jump between datasets, explore different subsets or facets of the data, run models with parameter values of their choosing, customize visualizations, and much more.
Hadley Wickham from RStudio shows data scientists, data analysts, statisticians, and scientific researchers with no knowledge of HTML, CSS, or JavaScript how to create rich web apps from R. This in-depth guide provides a learning path that you can follow with confidence, as you go from a Shiny beginner to an expert developer who can write large, complex apps that are maintainable and performant.
Get started: Discover how the major pieces of a Shiny app fit togetherPut Shiny in action: Explore Shiny functionality with a focus on code samples, example apps, and useful techniquesMaster reactivity: Go deep into the theory and practice of reactive programming and examine reactive graph componentsApply best practices: Examine useful techniques for making your Shiny apps work well in production
11. Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
Author: by Rob Collie
Holy Macro! Books
English
308 pages
Microsoft Power BI, including Power Pivot and Power Query, are a set of free add-ons to Excel that allow users to produce new kinds of reports and analyses that were simply impossible before. This book, printed in full-gorgeous color, gives you an overview of Power BI, Power Pivot and Power Query, and then dives into DAX formulas, the core capability of Power Pivot.
Always from the perspective of the Excel audience. Written by the world’s foremost Power BI bloggers and practitioners, the book’s concepts and approach are introduced in a simple, step-by-step manner tailored to the learning style of Excel users everywhere. The techniques presented allow users to produce, in hours or even minutes, results that formerly would have taken entire teams weeks or months to produce.
This book includes lessons on: difference between calculated columns and measureshow formulas can be reused across reports of completely different shapeshow to merge disjointed sets of data into unified reportshow to make certain columns in a pivot behave as if the pivot were filtered while other columns do nothow to create time-intelligent calculations in pivot tables such as “Year over Year” and “Moving Averages” whether they use a standard, fiscal, or a complete custom calendar.
12. Python for Finance: Mastering Data-Driven Finance
Author: by Yves Hilpisch
O'Reilly Media
English
720 pages
The financial industry has recently adopted Python at a tremendous rate, with some of the largest investment banks and hedge funds using it to build core trading and risk management systems. Updated for Python 3, the second edition of this hands on book helps you get started with the language, guiding developers and quantitative analysts through Python libraries and tools for building financial applications and interactive financial analytics.
Using practical examples throughout the book, author Yves Hilpisch also shows you how to develop a full fledged framework for Monte Carlo simulation based derivatives and risk analytics, based on a large, realistic case study. Much of the book uses interactive IPython Notebooks.
13. Mastering Tableau 2021: Implement advanced business intelligence techniques and analytics with Tableau, 3rd Edition
Author: by Marleen Meier
English
792 pages
1800561644
Build, design, and improve advanced business intelligence solutions using Tableau’s latest features, including Tableau Prep Builder, Tableau Hyper, and Tableau ServerKey FeaturesMaster new features in Tableau 2021 to solve real-world analytics challengesPerform geo-spatial, time series, and self-service analytics using real-life examplesBuild and publish dashboards and explore storytelling using Python and R integration supportBook DescriptionTableau is one of the leading business intelligence (BI) tools used to solve data analysis challenges.
With this book, you will master Tableau’s features and offerings in various paradigms of the BI domain. Updated with fresh topics including Quick Level of Detail expressions, the newest Tableau Server features, Einstein Discovery, and more, this book covers essential Tableau concepts and advanced functionalities.
Leveraging Tableau Hyper files and using Prep Builder, you’ll be able to perform data preparation and handling easily. You’ll gear up to perform complex joins, spatial joins, unions, and data blending tasks using practical examples. Following this, you’ll learn how to execute data densification and further explore expert-level examples to help you with calculations, mapping, and visual design using Tableau extensions.
14. Living in Data: A Citizen's Guide to a Better Information Future
Author: by Jer Thorp
MCD (May 4, 2021)
English
320 pages
Jer Thorp’s analysis of the word data in 10,325 New York Times stories written between 1984 and 2018 shows a distinct trend: among the words most closely associated with data, we find not only its classic companions information and digital, but also a variety of new neighborsfrom scandal and misinformation to ethics, friends, and play.
To live in data in the twenty-first century is to be incessantly extracted from, classified and categorized, statisti-fied, sold, and surveilled. Dataour datais mined and processed for profit, power, and political gain. In Living in Data, Thorp asks a crucial question of our time: How do we stop passively inhabiting data, and instead become active citizens of it?
Threading a data story through hippo attacks, glaciers, and school gymnasiums, around colossal rice piles, and over active minefields, Living in Data reminds us that the future of data is still wide open, that there are ways to transcend facts and figures and to find more visceral ways to engage with data, that there are always new stories to be told about how data can be used.
15. A History of Data Visualization and Graphic Communication
Author: by Michael Friendly
Harvard University Press
English
320 pages
A comprehensive history of data visualizationits origins, rise, and effects on the ways we think about and solve problems. With complex information everywhere, graphics have become indispensable to our daily lives. Navigation apps show real-time, interactive traffic data. A color-coded map of exit polls details election balloting down to the county level.
Charts communicate stock market trends, government spending, and the dangers of epidemics. A History of Data Visualization and Graphic Communication tells the story of how graphics left the exclusive confines of scientific research and became ubiquitous. As data visualization spread, it changed the way we think.
Michael Friendly and Howard Wainer take us back to the beginnings of graphic communication in the mid-seventeenth century, when the Dutch cartographer Michael Florent van Langren created the first chart of statistical data, which showed estimates of the distance from Rome to Toledo.
By 1786 William Playfair had invented the line graph and bar chart to explain trade imports and exports. In the nineteenth century, the golden age of data display, graphics found new uses in tracking disease outbreaks and understanding social issues. Friendly and Wainer make the case that the explosion in graphical communication both reinforced and was advanced by a cognitive revolution: visual thinking.
16. Spark: The Definitive Guide: Big Data Processing Made Simple
Author: by Bill Chambers
O'Reilly Media
English
606 pages
Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2. 0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals.
You’ll explore the basic operations and common functions of Spark’s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine-learning library.
Get a gentle overview of big data and SparkLearn about DataFrames, SQL, and DatasetsSpark’s core APIsthrough worked examplesDive into Spark’s low-level APIs, RDDs, and execution of SQL and DataFramesUnderstand how Spark runs on a clusterDebug, monitor, and tune Spark clusters and applicationsLearn the power of Structured Streaming, Spark’s stream-processing engineLearn how you can apply MLlib to a variety of problems, including classification or recommendation