Understanding Metadata

Create the Foundation for a Scalable Data Architecture


Understanding Metadata
Understanding Metadata
Compliments of Zaloni

Book Details

Authors Federico Castanedo, Scott Gidley
Publisher O'Reilly Media
Published 2017
Edition 1st
Paperback 23 pages
Language English
ISBN-13 9781491988992, 9781491974889
ISBN-10 1491988991, 1491974885
License Compliments of Zaloni

Book Description

One viable option for organizations looking to harness massive amounts of data is the data lake, a single repository for storing all the raw data, both structured and unstructured, that floods into the company. But that isn't the end of the story. The key to making a data lake work is data governance, using metadata to provide valuable context through tagging and cataloging.

This practical report examines why metadata is essential for managing, migrating, accessing, and deploying any big data solution. Authors Federico Castanedo and Scott Gidley dive into the specifics of analyzing metadata for keeping track of your data - where it comes from, where it's located, and how it's being used - so you can provide safeguards and reduce risk. In the process, you'll learn about methods for automating metadata capture.

This report also explains the main features of a data lake architecture, and discusses the pros and cons of several data lake management solutions that support metadata. These solutions include:
- Traditional data integration/management vendors such as the IBM Research Accelerated Discovery Lab
- Tooling from open source projects, including Teradata Kylo and Informatica
- Startups such as Trifacta and Zaloni that provide best of breed technology


This book is published as open-access, which means it is freely available to read, download, and share without restrictions.

If you enjoyed the book and would like to support the author, you can purchase a printed copy (hardcover or paperback) from official retailers.

Download and Read Links

PDF

Share this Book

[localhost]# find . -name "*Similar_Books*"


Intel Xeon Phi Coprocessor Architecture and Tools

Intel Xeon

Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers provides developers a comprehensive introduction and in-depth look at the Intel Xeon Phi coprocessor architecture and the corresponding parallel data structure tools and algorithms used in the various technical computing applications for which it is suitable. It

Python Data Science Handbook

Python Pandas

For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all - IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other relate

The Nature of Code

Java

How can we capture the unpredictable evolutionary and emergent properties of nature in software? How can understanding the mathematical principles behind our physical world help us to create digital worlds? This book focuses on a range of programming strategies and techniques behind computer simulations of natural systems, from elementary concepts

Rethinking the Internet of Things

IoT

Apress is proud to announce that Rethinking the Internet of Things was a 2014 Jolt Award Finalist, the highest honor for a programming book. And the amazing part is that there is no code in the book. Over the next decade, most devices connected to the Internet will not be used by people in the familiar way that personal computers, tablets and smart

Learn Programming

C / C++ Python JavaScript Unix

This book is aimed at readers who are interested in software development but have very little to no prior experience. The book focuses on teaching the core principles around software development. It uses several technologies to this goal (e.g. C, Python, JavaScript, HTML, etc.) but is not a book about the technologies themselves. The reader will le

OpenIntro Statistics, 4th Edition

Statistics

OpenIntro Statistics provides a traditional college-level introduction to the field of statistics. This widely adopted textbook offers an exceptional and accessible foundation for a diverse range of students, from those at community colleges to attendees of Ivy League institutions. It is estimated that approximately 20,000 students use this thoroug