Use R to turn data into insight, knowledge, and understanding. With this practical book, aspiring data scientists will learn how to do data science with R and RStudio, along with the tidyverseâ a collection of R packages designed to work together to make data science fast, fluent, and fun. Even if you have no programming experience, this updated edition will have you doing data science quickly.
You'll learn how to import, transform, and visualize your data and communicate the results. And you'll get a complete, big-picture understanding of the data science cycle and the basic tools you need to manage the details. Updated for the latest tidyverse features and best practices, new chapters show you how to get data from spreadsheets, databases, and websites. Exercises help you practice what you've learned along the way.
You'll understand how to:
Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the data-analytic thinking necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.
Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You'll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company's data science projects. You'll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.
Feel confident navigating the fundamentals of data science
Data Science Essentials For Dummies is a quick reference on the core concepts of the exploding and in-demand data science field, which involves data collection and working on dataset cleaning, processing, and visualization. This direct and accessible resource helps you brush up on key topics and is right to the point--eliminating review material, wordy explanations, and fluff--so you get what you need, fast.
Perfect for supplementing classroom learning, reviewing for a certification, or staying knowledgeable on the job, Data Science Essentials For Dummies is a reliable reference that's great to keep on hand as an everyday desk reference.
To really learn data science, you should not only master the tools--data science libraries, frameworks, modules, and toolkits--but also understand the ideas and principles underlying them. Updated for Python 3.6, this second edition of Data Science from Scratch shows you how these tools and algorithms work by implementing them from scratch.
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with the hacking skills you need to get started as a data scientist. Packed with new material on deep learning, statistics, and natural language processing, this updated book shows you how to find the gems in today's messy glut of data.
Enterprises have made amazing advances by taking advantage of data about their business to provide predictions and understanding of their customers, markets, and products. But as the world of business becomes more interconnected and global, enterprise data is no long a monolith; it is just a part of a vast web of data. Managing data on a world-wide scale is a key capability for any business today.
The Semantic Web treats data as a distributed resource on the scale of the World Wide Web, and incorporates features to address the challenges of massive data distribution as part of its basic design. The aim of the first two editions was to motivate the Semantic Web technology stack from end-to-end; to describe not only what the Semantic Web standards are and how they work, but also what their goals are and why they were designed as they are. It tells a coherent story from beginning to end of how the standards work to manage a world-wide distributed web of knowledge in a meaningful way.
The third edition builds on this foundation to bring Semantic Web practice to enterprise. Fabien Gandon joins Dean Allemang and Jim Hendler, bringing with him years of experience in global linked data, to open up the story to a modern view of global linked data. While the overall story is the same, the examples have been brought up to date and applied in a modern setting, where enterprise and global data come together as a living, linked network of data. Also included with the third edition, all of the data sets and queries are available online for study and experimentation at data.world/swwo.
Free from product discussions, this book will serve as a timeless resource for years to come.
As data management continues to evolve rapidly, managing all of your data in a central place, such as a data warehouse, is no longer scalable. Today's world is about quickly turning data into value. This requires a paradigm shift in the way we federate responsibilities, manage data, and make it available to others. With this practical book, you'll learn how to design a next-gen data architecture that takes into account the scale you need for your organization.
Executives, architects and engineers, analytics teams, and compliance and governance staff will learn how to build a next-gen data landscape. Author Piethein Strengholt provides blueprints, principles, observations, best practices, and patterns to get you up to speed.
Traditional data architecture patterns are severely limited. To use these patterns, you have to ETL data into each tool--a cost-prohibitive process for making warehouse features available to all of your data. The lack of flexibility with these patterns requires you to lock into a set of priority tools and formats, which creates data silos and data drift. This practical book shows you a better way.
Apache Iceberg provides the capabilities, performance, scalability, and savings that fulfill the promise of an open data lakehouse. By following the lessons in this book, you'll be able to achieve interactive, batch, machine learning, and streaming analytics with this high-performance open source format. Authors Tomer Shiran, Jason Hughes, and Alex Merced from Dremio show you how to get started with Iceberg.
With this book, you'll learn:
Discover why Apache Iceberg is a foundational technology for implementing an open data lakehouse.
Businesses own more data than ever before, but it's of no value if you don't know how to use it. Data governance manages the people, processes, and strategy needed for deploying data projects to production. But doing it well is far from easy: Less than one fourth of business leaders say their organizations are data driven. In Designing Data Governance from the Ground Up, you'll build a cross-functional strategy to create roadmaps and stewardship for data-focused projects, embed data governance into your engineering practice, and put processes in place to monitor data after deployment.
In the last decade, the amount of data people produced grew 3,000 percent. Most organizations lack the strategy to clean, collect, organize, and automate data for production-ready projects. Without effective data governance, most businesses will keep failing to gain value from the mountain of data that's available to them.
There's a plethora of content intended to help DataOps and DevOps teams reach production, but 90 percent of projects trained with big data fail to reach production because they lack governance.
This book shares six steps you can take to build a data governance strategy from scratch. You'll find a data framework, pull together a team of data stewards, build a data governance team, define your roadmap, weave data governance into your development process, and monitor your data in production
Whether you're a chief data officer or individual contributor, this book will show you how to manage up, get the buy-in you need to build data governance, find the right colleagues to co-create data governance, and keep them engaged for the long haul.
Let's step back to the year 1978. Sony introduces hip portable music with the Walkman, Illinois Bell Company releases the first mobile phone, Space Invaders kicks off the video game craze, and William Kent writes Data and Reality. We have made amazing progress in the last four decades in terms of portable music, mobile communication, and entertainment, making devices such as the original Sony Walkman and suitcase-sized mobile phones museum pieces today. Yet remarkably, the book Data and Reality is just as relevant to the field of data management today as it was in 1978.
Data and Reality gracefully weaves the disciplines of psychology and philosophy with data management to create timeless takeaways on how we perceive and manage information. Although databases and related technology have come a long way since 1978, the process of eliciting business requirements and how we think about information remains constant. This book will provide valuable insights whether you are a 1970s data-processing expert or a modern-day business analyst, data modeler, database administrator, or data architect.
This third edition of Data and Reality differs substantially from the first and second editions. Data modeling thought leader Steve Hoberman has updated many of the original examples and references and added his commentary throughout the book, including key points at the end of each chapter.
The important takeaways in this book are rich with insight yet presented in a conversational and easy-to-grasp writing style. Here are just a few of the issues this book tackles:
DMN is the standard for model-based decision automation. Using standardized diagrams and tables together with the Low-Code expression language FEEL, DMN empowers both business and technical users to decompose complex decision logic into transparent and easily maintained models that can be instantly deployed as executable REST services. Moreover, it is a vendor-neutral standard maintained by the Object Management Group, so models behave the same in any compliant tool.
Using a combination of logic decomposition diagrams (DRDs), standard tabular formats (boxed expressions), and the Low-Code expression language FEEL, DMN allows subject matter experts to automate the operational decisions that drive the business. More powerful and business-friendly than Microsoft Power FX, DMN is used by both business and technical modelers to create, test, and deploy cloud-based decision services.
This book provides a comprehensive guide to the language, completely revised from the 2nd edition, and updated to the current draft DMN 1.6 version. It includes many practical examples, with 271 diagrams and tables. Part I is the business-oriented Guide to Decision Modeling, explaining the creation and use of Decision Requirements Diagrams, decision tables, and all the tabular boxed expression types, as well as a deep dive into all the FEEL functions and operators. Part II is the more technically oriented DMN Cookbook (formerly a separate book), updated to DMN 1.5/1.6, with solutions to over 40 modeling challenges.
Learn how to use the Power Query M formula language and its functions effectively for better data modeling and impactful business intelligence reports.
Purchase of the print or Kindle book includes a free PDF eBook
Key FeaturesData transformation is a critical step in building data models and business intelligence reports. Power Query is an invaluable tool for anyone who wants to master data transformation, and this book will equip you with the knowledge and skills to make the most of it.
The Definitive Guide to Power Query (M) will help you build a solid foundation in the Power Query M language. As you progress through the chapters, you'll learn how to use that knowledge to implement advanced concepts and data transformations before going on a no-compromise 'deep dive' into the Power Query M language.
You'll also get to grips with optimizing performance, handling errors, and implementing efficient data processing techniques. As this is a hands-on guide, the practical examples in the chapters will help you gain the skills to apply Power Query to real-world problems and improve your data analysis capabilities.
By the end of this book, you will be able to leverage all of Power Query's remarkable capabilities for data transformation
What you will learnThis book is for business analysts, business intelligence professionals, and power business users working with data who want to add Power Query mastery to their resume. This book will be beneficial for anyone who wants to automate their process of data cleaning and save huge amount of time. Having some basic experience in Power Query is recommended.
Table of ContentsData modeling is the single most overlooked feature in Power BI Desktop, yet it's what sets Power BI apart from other tools on the market. This practical book serves as your fast-forward button for data modeling with Power BI, Analysis Services tabular, and SQL databases. It serves as a starting point for data modeling, as well as a handy refresher.
Author Markus Ehrenmueller-Jensen, founder of Savory Data, shows you the basic concepts of Power BI's semantic model with hands-on examples in DAX, Power Query, and T-SQL. If you're looking to build a data warehouse layer, chapters with T-SQL examples will get you started. You'll begin with simple steps and gradually solve more complex problems.
This book shows you how to:
Between major privacy regulations like the GDPR and CCPA and expensive and notorious data breaches, there has never been so much pressure to ensure data privacy. Unfortunately, integrating privacy into data systems is still complicated. This essential guide will give you a fundamental understanding of modern privacy building blocks, like differential privacy, federated learning, and encrypted computation. Based on hard-won lessons, this book provides solid advice and best practices for integrating breakthrough privacy-enhancing technologies into production systems.
Practical Data Privacy answers important questions such as:
Deep learning is often viewed as the exclusive domain of math PhDs and big tech companies. But as this hands-on guide demonstrates, programmers comfortable with Python can achieve impressive results in deep learning with little math background, small amounts of data, and minimal code. How? With fastai, the first library to provide a consistent interface to the most frequently used deep learning applications.
Authors Jeremy Howard and Sylvain Gugger, the creators of fastai, show you how to train a model on a wide range of tasks using fastai and PyTorch. You'll also dive progressively further into deep learning theory to gain a complete understanding of the algorithms behind the scenes.
Data Modeling Made Simple will provide the business or IT professional with a practical working knowledge of data modeling concepts and best practices. This book is written in a conversational style that encourages you to read it from start to finish and master these ten objectives:
Book Review by Johnny Gay
In this book review, I address each section in the book and provide what I found most valuable as a data modeler. I compare, as I go, how the book's structure eases the new data modeler into the subject much like an instructor might ease a beginning swimmer into the pool.
This book begins like a Dan Brown novel. It even starts out with the protagonist, our favorite data modeler, lost on a dark road somewhere in France. In this case, what saves him isn't a cipher, but of all things, something that's very much like a data model in the form of a map The author deems they are both way-finding tools.