Data Science & AI Books
Seeing Theory
Statistics is quickly becoming the most important and multi-disciplinary field of mathematics. According to the American Statistical Association, statistician is one of the top ten fastest-growing occupations and statistics is one of the fastest-growing bachelor degrees. Statistical literacy is essential to our data driven society. Despite the incr
Understanding Metadata
One viable option for organizations looking to harness massive amounts of data is the data lake, a single repository for storing all the raw data, both structured and unstructured, that floods into the company. But that isn't the end of the story. The key to making a data lake work is data governance, using metadata to provide valuable context thro
Robotic Process Automation For Dummies
RPA is the use of computer software "robots" to handle repetitive, rule-based digital tasks, interacting with applications and information sources the same way humans do now. At its most basic level, it's pretty impressive technology, and it gets more powerful all the time. The newest advances have robots not only handling more complex functions,
Operating Systems and Infrastructure in Data Science
In data science, mastering a system environment with its tools and processes is essential to achieve minimum productivity. Feeling alien to an environment, using the wrong tools or combining the right tools in the wrong order can lead not only to effectivity limitations but also yield wrong results. Hence, in this book, besides basic computer knowl
An Introduction to Python Jupyter Notebooks
This book is an introduction to the use of Python Jupyter Notebooks (JNBs) for college math teachers and their students. Each chapter contains a JNB lab with solutions. Experienced teachers can modify these labs and create new labs tailored to their courses. The chapters were written by different authors/authorship teams, and as such, vary in style
Automating Data Transformations
The modern data stack has evolved rapidly in the past decade. Yet, as enterprises migrate vast amounts of data from on-premises platforms to the cloud, data teams continue to face limitations executing data transformation at scale. Data transformation is an integral part of the analytics workflow-but it's also the most time-consuming, expensive, an
Notes on Randomized Algorithms
Lecture notes for the Yale Computer Science course CPSC 469/569 Randomized Algorithms. Suitable for use as a supplementary text for an introductory graduate or advanced undergraduate course on randomized algorithms. Discusses tools from probability theory, including random variables and expectations, union bound arguments, concentration bounds, app
An Introduction to Machine Learning Interpretability
Innovation and competition are driving analysts and data scientists toward increasingly complex predictive modeling and machine learning algorithms. This complexity makes these models accurate but also makes their predictions difficult to understand. When accuracy outpaces interpretability, human trust suffers, affecting business adoption, regulato
Entity-Oriented Search
This open access book covers all facets of entity-oriented search - where "search" can be interpreted in the broadest sense of information access - from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selecte
Data Mesh For Dummies
Data Mesh is a relatively new approach to data management. It combines several important trends in data management, including domain-driven design and data as a product, to decentralize the ownership of ingestion, processing, and serving of data. Zhamak Dehghani defined the term in 2019 as "a decentralized sociotechnical approach to share, access,