Best Data Warehousing Books

There are literally hundreds of Data Warehousing books on the market today.

1. SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL

Author: by Walter Shields
ClydeBank Media LLC
251 pages

View on Amazon

“THE BEST SQL BOOK FOR BEGINNERS IN 2021 – HANDS DOWN!”*INCLUDES FREE ACCESS TO A SAMPLE DATABASE, SQL BROWSER APP, COMPREHENSION QUIZES & SEVERAL OTHER DIGITAL RESOURCES! Not sure how to prepare for the data-driven future? This book shows you EXACTLY what you need to know to successfully use the SQL programming language to enhance your career!

#1 NEW RELEASE & #1 BEST SELLER *Are you a developer who wants to expand your mastery to database management? Then you NEED this book. Buy now and start reading today! Are you a project manager who needs to better understand your development team’s needs?

A decision maker who needs to make deeper data-driven analysis? Everything you need to know is included in these pages! The ubiquity of big data means that now more than ever there is a burning need to warehouse, access, and understand the contents of massive databases quickly and efficiently.

That’s where SQL comes in. SQL is the workhorse programming language that forms the backbone of modern data management and interpretation. Any database management professional will tell you that despite trendy data management languages that come and go, SQL remains the most widely used and most reliable to date, with no signs of stopping.

2. Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python

Author: by Peter Bruce

O'Reilly Media
368 pages

View on Amazon

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.

Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.

With this book, you’ll learn:Why exploratory data analysis is a key preliminary step in data scienceHow random sampling can reduce bias and yield a higher-quality dataset, even with big dataHow the principles of experimental design yield definitive answers to questionsHow to use regression to estimate outcomes and detect anomaliesKey classification techniques for predicting which categories a record belongs toStatistical machine learning methods that “learn” from dataUnsupervised learning methods for extracting meaning from unlabeled data.

3. Learning SQL: Generate, Manipulate, and Retrieve Data

Author: by Alan Beaulieu
O'Reilly Media
384 pages

View on Amazon

As data floods into your company, you need to put it to work right awayand SQL is the best tool for the job. With the latest edition of this introductory guide, author Alan Beaulieu helps developers get up to speed with SQL fundamentals for writing database applications, performing administrative tasks, and generating reports.

You’ll find new chapters on SQL and big data, analytic functions, and working with very large databases. Each chapter presents a self-contained lesson on a key SQL concept or technique using numerous illustrations and annotated examples. Exercises let you practice the skills you learn.

Knowledge of SQL is a must for interacting with data. With Learning SQL, you’ll quickly discover how to put the power and flexibility of this language to work. Move quickly through SQL basics and several advanced featuresUse SQL data statements to generate, manipulate, and retrieve dataCreate database objects, such as tables, indexes, and constraints with SQL schema statementsLearn how datasets interact with queries; understand the importance of subqueriesConvert and manipulate data with SQL’s built-in functions and use conditional logic in data statements

4. Data Science on AWS: Implementing End-to-End, Continuous AI and Machine Learning Pipelines

Author: by Chris Fregly
O'Reilly Media
524 pages

View on Amazon

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level up your skills.

This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance.

Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and moreUse automated machine learning to implement a specific subset of use cases with SageMaker AutopilotDive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deploymentTie everything together into a repeatable machine learning operations pipelineExplore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache KafkaLearn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more

5. The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition

Author: by Ralph Kimball
600 pages

View on Amazon

Updated new edition of Ralph Kimball’s groundbreaking book on dimensional modeling for data warehousing and business intelligence! The first edition of Ralph Kimball’s The Data Warehouse Toolkit introduced the industry to dimensional modeling,and now his books are considered the most authoritative guides in this space.

This new third edition is a complete library of updated dimensional modeling techniques, the most comprehensive collection ever. It covers new and enhanced star schema dimensional modeling patterns, adds two new chapters on ETL techniques, includes new and expanded business matrices for 12 case studies, and more.

Authored by Ralph Kimball and Margy Ross, known worldwide as educators, consultants, and influential thought leaders in data warehousing and business intelligenceBegins with fundamental design recommendations and progresses through increasingly complex scenariosPresents unique modeling techniques for business applications such as inventory management, procurement, invoicing, accounting,customer relationship management, big data analytics, and moreDraws real-world case studies from a variety of industries,including retail sales, financial services, telecommunications,education, health care, insurance, e-commerce, and moreDesign dimensional databases that are easy to understand and provide fast query response with The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition.

6. Database Internals: A Deep Dive into How Distributed Data Systems Work

Author: by Alex Petrov
O'Reilly Media
376 pages

View on Amazon

When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals.

Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed.

This book examines:Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable Log Structured storage engines, with differences and use-cases for eachStorage building blocks: Learn how database files are organized to build efficient storage, using auxiliary data structures such as Page Cache, Buffer Pool and Write-Ahead LogDistributed systems: Learn step-by-step how nodes and processes connect and build complex communication patternsDatabase clusters: Which consistency models are commonly used by modern databases and how distributed storage systems achieve consistency

7. Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016

Author: by Rob Collie
Holy Macro! Books
308 pages

View on Amazon

Microsoft Power BI, including Power Pivot and Power Query, are a set of free add-ons to Excel that allow users to produce new kinds of reports and analyses that were simply impossible before. This book, printed in full-gorgeous color, gives you an overview of Power BI, Power Pivot and Power Query, and then dives into DAX formulas, the core capability of Power Pivot.

Always from the perspective of the Excel audience. Written by the world’s foremost Power BI bloggers and practitioners, the book’s concepts and approach are introduced in a simple, step-by-step manner tailored to the learning style of Excel users everywhere. The techniques presented allow users to produce, in hours or even minutes, results that formerly would have taken entire teams weeks or months to produce.

This book includes lessons on: difference between calculated columns and measureshow formulas can be reused across reports of completely different shapeshow to merge disjointed sets of data into unified reportshow to make certain columns in a pivot behave as if the pivot were filtered while other columns do nothow to create time-intelligent calculations in pivot tables such as “Year over Year” and “Moving Averages” whether they use a standard, fiscal, or a complete custom calendar.

8. SQL Cookbook: Query Solutions and Techniques for All SQL Users

Author: by Anthony Molinaro
O'Reilly Media
572 pages

View on Amazon

You may know SQL basics, but are you taking advantage of its expressive power? This second edition applies a highly practical approach to Structured Query Language (SQL) so you can create and manipulate large stores of data. Based on real-world examples, this updated cookbook provides a framework to help you construct solutions and executable examples in severalflavors of SQL, including Oracle, DB2, SQL Server, MySQL, andPostgreSQL.

SQL programmers, analysts, data scientists, database administrators, and even relatively casual SQL users will find SQL Cookbook to be a valuable problem-solving guide for everyday issues. No other resource offers recipes in this unique format to help you tackle nagging day-to-day conundrums with SQL.

The second edition includes:Fully revised recipes that recognize the greater adoption of window functions in SQL implementationsAdditional recipes that reflect the widespread adoption of common table expressions (CTEs) for more readable, easier-to-implement solutionsNew recipes to make SQL more useful for people who aren’t database experts, including data scientistsExpanded solutions for working with numbers and stringsUp-to-date SQL recipes throughout the book to guide you through the basics

9. Data Pipelines Pocket Reference: Moving and Processing Data for Analytics

Author: by James Densmore
O'Reilly Media
276 pages

View on Amazon

Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today’s modern data stack.

You’ll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions.

You’ll learn:What a data pipeline is and how it worksHow data is moved and processed on modern data infrastructure, including cloud platformsCommon tools and products used by data engineers to build pipelinesHow pipelines support analytics and reporting needsConsiderations for pipeline maintenance, testing, and alerting

10. Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema

Author: by Lawrence Corr
DecisionOne Press
328 pages

View on Amazon

Agile Data Warehouse Design is a step-by-step guide for capturing data warehousing / business intelligence (DW/BI) requirements and turning them into high performance dimensional models in the most direct way: by modelstorming (data modeling + brainstorming) with BI stakeholders. The book describes BEAM, an agile approach to dimensional modeling, for improving communication between data warehouse designers, BI stakeholders and the whole DW/BI development team.

BEAM provides tools and techniques that will encourage DW/BI designers and developers to move away from their keyboards and entity relationship based tools and model interactively with their colleagues. The result is everyone thinks dimensionally from the outset! Developers understand how to efficiently implement dimensional modeling solutions.

Business stakeholders feel ownership of the data warehouse they have created, and can already imagine how they will use it to answer their business questions. Within this book, you will learn: Agile dimensional modeling using Business Event Analysis & Modeling (BEAM) Modelstorming: data modeling that is quicker, more inclusive, more productive, and frankly more fun!

11. Excel 2021: A Step-By-Step Guide to Learning the Basics of Excel and Easy Excel Tips for Beginners

Author: by Albion Jensen
230 pages

View on Amazon

Do you have little or no experience with Microsoft Excel? Are you looking for a way to make charts, tables, graphs, and formulas? Do you need to increase your marketability in the increasingly competitive job market? Keep reading if the response is yes!

Stop struggling with Excel formulas that are not working! It’s time to start working smarter, not harder. If you like learning by doing and if you’re looking to maximize your efficiency and supercharge your productivity using Excel this is the book for you.

You will:Start Entering, Editing, & Managing Data in the simplest way. Learn how to speed up your work with Excel spreadsheets. Discover the 5 proven Time-Saving Excel data insertion methods. Understand the 7 most common Excel Formulas for better workflow. Know the cause of the 6 most common Excel errors and the solution to get rid of them.

Learn the top 5 Excel charts and graphs to present your work. Become able to use Excel for data analysis. Learn how to prepare your work for printing. Impress employees and coworkers with Excel skillsHave a first look at the highlights of Excel 2021.

12. Introduction to Data Mining (2nd Edition) (What's New in Computer Science)

Author: by Pang-Ning Tan

864 pages

View on Amazon

Introduction to Data Mining presents fundamental concepts and algorithms for those learning data mining for the first time. Each concept is explored thoroughly and supported with numerous examples. KEY TOPICS: Provides both theoretical and practical coverage of all data mining topics.

Includes extensive number of integrated examples and figures. Topics covered include; predictive modeling, association analysis, clustering, anomaly detection, visualization. MARKET: Suitable for individuals seeking an introduction to data mining. The text assumes only a modest statistics or mathematics background, and no database knowledge is needed.

13. Learn Excel 365 Expert Skills with The Smart Method: Fifth Edition: updated for the Jan 2021 Semi-Annual version 2008

Author: by Mike Smart
The Smart Method Ltd
643 pages

View on Amazon

IMPORTANT: There are three (very different) versions of Excel. Make sure you are buying the right book for your version:Excel 365: If you pay for Microsoft Office by subscription this is the right book for you. Excel 365 is the most powerful version of Excel (it is very different to Excel 2019).

Excel 2019: This is the “pay once use forever” Excel version. It has fewer features than Excel 365. If you are using this version you need our book: Learn Excel 2019 Expert Skills with The Smart Method. Excel 2019 for Apple Mac: If you have an Apple Mac computer, your Excel version is very different to Excel for Windows.

Apple Mac users need our book: Learn Excel 2019 Expert Skills for Mac. Excel 365 has taken Excel into a new era. Excel 365 is now massively more powerful than the older Excel 2019 version and supports dynamic arrays – a game-changing feature that changes best practice for many common data analysis tasks.

Dynamic arrays are comprehensively covered in a comprehensive 54 page session. This book also covers Excel 365’s new XLOOKUP function (the modern replacement for Excel 2019’s old VLOOKUP function). You’ll also find lessons that comprehensively cover the XMATCH, UNIQUE, FILTER, SORT, SORTBY, SEQUENCE and RANDARRAY dynamic array functions (none of which exist in Excel 2019).

14. Collect, Combine, and Transform Data Using Power Query in Excel and Power BI (Business Skills)

Author: by Gil Raviv
Microsoft Press
432 pages

View on Amazon

December 4, 2019

View on Amazon

View on Amazon

Take advantage of today’s sky-high demand for data engineers. With this in-depth book, current and aspiring engineers will learn powerful real-world best practices for managing data big and small. Contributors from notable companies including Twitter, Google, Stitch Fix, Microsoft, Capital One, and LinkedIn share their experiences and lessons learned for overcoming a variety of specific and often nagging challenges.

Edited by Tobias Macey, host of the popular Data Engineering Podcast, this book presents 97 concise and useful tips for cleaning, prepping, wrangling, storing, processing, and ingesting data. Data engineers, data architects, data team managers, data scientists, machine learning engineers, and software engineers will greatly benefit from the wisdom and experience of their peers.

Topics include:The Importance of Data Lineage – Julien Le DemData Security for Data Engineers – Katharine JarmulThe Two Types of Data Engineering and Data Engineers – Jesse AndersonSix Dimensions for Picking an Analytical Data Warehouse – Gleb MezhanskiyThe End of ETL as We Know It – Paul SingmanBuilding a Career as a Data Engineer – Vijay KiranModern Metadata for the Modern Data Stack – Prukalpa SankarYour Data Tests Failed!Now What?Sam Bail