Master the most agile and resilient design for building analytics applications: the Unified Star Schema (USS) approach. The USS has many benefits over traditional dimensional modeling. Witness the power of the USS as a single star schema that serves as a foundation for all present and future business requirements of your organization.
Data warehouse legend Bill Inmon and business intelligence innovator, Francesco Puppini, explain step-by-step why the Unified Star Schema is the recommended approach for business intelligence designs today, and show through many examples how to build and use this new solution.
This book contains two parts. Part I, Architecture, explains the benefits of data marts and data warehouses, covering how organizations progressed to their current state of analytics, and to the challenges that result from current business intelligence architectures. Chapter 1 covers the drivers behind and the characteristics of the data warehouse and data mart. Chapter 2 introduces dimensional modeling concepts, including fact tables, dimensions, star joins, and snowflakes. Chapter 3 recalls the evolution of the data mart. Chapter 4 explains Extract, Transform, and Load (ETL), and the value ETL brings to reporting. Chapter 5 explores the Integrated Data Mart Approach, and Chapter 6 explains how to monitor this environment. Chapter 7 describes the different types of metadata within the data warehouse environment. Chapter 8 progresses through the evolution to our current modern data warehouse environment.
Part II, the Unified Star Schema, covers the Unified Star Schema (USS) approach and how it solves the challenges introduced in Part I. There are eight chapters within Part II:
The data lakehouse is the next generation of the data warehouse and data lake, designed to meet today's complex and ever-changing analytics, machine learning, and data science requirements.
Learn about the features and architecture of the data lakehouse, along with its powerful analytical infrastructure. Appreciate how the universal common connector blends structured, textual, analog, and IoT data. Maintain the lakehouse for future generations through Data Lakehouse Housekeeping and Data Future-proofing. Know how to incorporate the lakehouse into an existing data governance strategy. Incorporate data catalogs, data lineage tools, and open source software into your architecture to ensure your data scientists, analysts, and end users live happily ever after.
Organizations invest incredible amounts of time and money obtaining and then storing big data in data stores called data lakes. But how many of these organizations can actually get the data back out in a useable form? Very few can turn the data lake into an information gold mine. Most wind up with garbage dumps.
Data Lake Architecture will explain how to build a useful data lake, where data scientists and data analysts can solve business challenges and identify new business opportunities. Learn how to structure data lakes as well as analog, application, and text-based data ponds to provide maximum business value. Understand the role of the raw data pond and when to use an archival data pond. Leverage the four key ingredients for data lake success: metadata, integration mapping, context, and metaprocess.
Bill Inmon opened our eyes to the architecture and benefits of a data warehouse, and now he takes us to the next level of data lake architecture.
The data lakehouse is the next generation of the data warehouse and data lake, designed to meet today's complex and ever-changing modern information systems. This book shows you how to construct your data lakehouse as the foundation for your artificial intelligence (AI), machine learning (ML), and data mesh initiatives. Know the pitfalls and techniques for maximizing business value of your data lakehouse.
In addition, be able to explain the core characteristics and critical success factors of a data lakehouse. By reviewing entry errors, key incompatibility, and ensuring good documentation, we can improve the data quality and believability of your lakehouse. Evaluate criteria for data quality, including accuracy, completeness, reliability, relevance, and timeliness. Understand the different types of storage for the lakehouse, including the under-utilized yet extremely valuable bulk storage.
There are three data types in the data lakehouse (structured, textual, and analog/ IoT), and for each, learn how to build a robust foundation for artificial intelligence (AI), machine learning (ML), and data mesh. Leverage data models for structured data, ontologies and taxonomies for textual data, and distillation algorithms for analog/IoT data. Learn how to abstract these data types to accommodate future requirements and simplify data lineage. Apply Extract, Transform, and Load (ETL) to create a structure that returns the answers to business problems. The end result is a data lakehouse that meets our needs.
Speaking of human needs, learn Maslow's Hierarchy of Data Lakehouse Needs. Next explore data integration geared for Al, ML, and data mesh. Then deep dive with us into all of the varieties of analytics within the lakehouse, including structured, textual, and analog analytics. Witness how descriptive data, data catalog, and metadata can increase the value of the lakehouse.
We conclude with a detailed evolution of data architecture, from magnetic tape to the data lakehouse as a bedrock foundation for AI, ML, and data mesh.
Learn how the data lakehouse is designed and architected to meet today's complex and ever-changing analytics, machine learning, and data science requirements.
In the bestseller, Building the Data Lakehouse, you learned about the features of the data lakehouse, along with its powerful analytical infrastructure. This book is the architectural companion to Building the Data Lakehouse. Appreciate the strategic approaches and challenges with including structured data, text, and IoT/analog readings within the same analytical environment. Know the steps to create the data lakehouse canonical model, and the dynamic processing necessary to satisfy the most demanding business analysts and data scientists. Understand the modern cloud data storage cost-saving methodology through Data Future-proofing. Experience a new paradigm of Micro Repository in microservices architecture and advanced security to ensure your data lakehouse delivers business value for generations.
The data lakehouse is the next generation of the data warehouse and data lake, designed to meet today's complex and ever-changing analytics, machine learning, and data science requirements. This book covers the essential topics prior to building the full methodology for the data lakehouse.
Learn about the features and architecture of the data lakehouse, along with its powerful analytical infrastructure. Appreciate how the universal common connector blends structured, textual, analog, and IoT data. Maintain the lakehouse for future generations through Data Lakehouse Housekeeping and Data Future-proofing.
Incorporate data catalogs, data lineage tools, and open source software into your architecture to ensure your data scientists, analysts, and end users live happily ever after. Deep dive into one specific implementation of a data lakehouse: the Databricks Lakehouse Platform.
The data lakehouse is the next generation of the data warehouse and data lake, designed to meet today's complex and ever-changing analytics, machine learning, and data science requirements.
Learn about the features and architecture of the data lakehouse, along with its powerful analytical infrastructure. Appreciate how the universal common connector blends structured, textual, analog, and IoT data. Maintain the lakehouse for future generations through Data Lakehouse Housekeeping and Data Future-proofing. Incorporate data catalogs, data lineage tools, and open source software into your architecture to ensure your data scientists, analysts, and end users live happily ever after. Deep dive into one specific implementation of a data lakehouse: the Databricks Lakehouse Platform.
Increase the awareness of your customer's behavior to survive and excel within your industry.
One hundred years ago, the voice of the customer was easily and routinely heard by the shopkeeper. In small towns, the shopkeeper knew everyone. Today's world has gotten much bigger and much more complex. No longer does the store owner personally know everyone who comes into the store. Yet there are three important abilities technologies offer that make it possible to listen to the voice of the customer today:
This book answers important questions such as:
After reading this book the reader will be able to manage, build, and operate a corporate infrastructure that listens to the voice of the customer.
This book will introduce you to the world of taxonomies and textual analytics.
In our distant past, we attempted to create wealth by turning everyday substances into gold. This was early alchemy, and ultimately it did not work. But the world has changed. Today we have a type of modern alchemy that really can create gold. We can transform voluminous text into a wealth of knowledge.
Text is a common fabric of society, yet it is still challenging for our technology to make sense of text. This is where taxonomies can help. In this book, legendary Bill Inmon will introduce you to the concept of taxonomies and how they are used to simplify and understand text. We emphasize the practical aspects of taxonomies, and the subsequent usage of taxonomies as a basis for textual analytics.
This book is for managers who have to deal with text, students of computer science, programmers who need to understand taxonomies, systems analysts who hope to draw business value out of a body of text, and especially those who are struggling to decode data lakes. Hopefully for those individuals (and many more), this book will serve as both an introduction to taxonomies and a guide to how taxonomies can be used to bring text into the realm of corporate decision-making.
This book will introduce you to the world of taxonomies, as well as explore:
- Simple and complex taxonomies
- Ontologies
- Obtaining taxonomies
- Changing taxonomies
- Taxonomies and data models
- Types of textual data
- Textual analytics.
In addition, several case studies are presented from industries as diverse as banking, call centers, and travel.
Learn essential techniques from data warehouse legend Bill Inmon on how to build the reporting environment your business needs now
Answers for many valuable business questions hide in text. How well can your existing reporting environment extract the necessary text from email, spreadsheets, and documents, and put it in a useful format for analytics and reporting? Transforming the traditional data warehouse into an efficient unstructured data warehouse requires additional skills from the analyst, architect, designer, and developer. This book will prepare you to successfully implement an unstructured data warehouse and, through clear explanations, examples, and case studies, you will learn new techniques and tips to successfully obtain and analyze text.
Master these ten objectives:
Overcome the challenges, appreciate the varieties, and apply the process of data integration.
Learn all about data integration and become a data integration hero instead of following the masses and running in the opposite direction at the mere mention of the word integration. Understand why organizations avoid data integration and often wind up with spider web environments containing siloed applications instead of an enterprise database which excites analysts and data scientists. Distinguish the different types of integration: database, attribute, key, index, encoding, measurement, format, definition, KPI, calculations, summarization, selection criteria, data exclusion, lineage, and timing. Apply identification, equivocation, and physical conversion levels of integration for both structured and textual data. Leverage deidentification, proximity analysis, alternate spelling, stop word resolution, homographic resolution, stemming, taxonomical resolution, inline contextualization, classification, and acronym resolution. Learn how to combine structured and textual data in the context of three levels of interaction. Follow the steps of scope, model, and map in integrating structured data. Follow the steps of scope, connect taxonomies, ingest raw text, and determine analytical processes in integrating textual data. Apply integration best practices, including identifying integration roles, developing a reusable data integration process, and documenting the integration benefits. Compare taxonomies with data models. Know how data integration helps data science.
To reinforce all of the concepts within the book, we include a detailed case study on data integration.
For years, business users have leveraged spreadsheets for storing and communicating data. Although spreadsheets may be easy to create and update, making important corporate decisions based on spreadsheets is risky due to the lack of data credibility. Whether you are a manager, developer, end user, or student, this book will help you turn spreadsheet data into credible, useful, reliable data that can be trusted in order to make important decisions.
A chapter is dedicated to each of the following topics:
Build a Textual Warehouse to help your organization understand and analyze documents through text analytics (both sentiment and non-sentiment analysis), to make better business decisions.
Learn the important role of documents and text within your organization, the difference between identifying and qualifying text, and when you need document preprocessing. Appreciate the power of taxonomies and the necessity of textual ETL. Know how the textual warehouse architecture differs from the conventional data warehouse architecture and when to apply contextualization and textual disambiguation.
About Bill
Bill Inmon, the father of the data warehouse, has written 60 books published in nine languages. ComputerWorld named Bill one of the ten most influential people in the history of the computer profession. Bill's latest adventure is the building of technology known as textual disambiguation.
About Ranjeet
Ranjeet Srivastava is a data management professional and an enterprise architect with more than 20 years in enterprise product research, development, and design of data-intensive mission-critical applications.