Authors | Garrett Grolemund, Hadley Wickham |
Publisher | O'Reilly Media |
Published | 2016 |
Edition | 1 |
Paperback | 520 pages |
Language | English |
ISBN-13 | 9781491910399 |
ISBN-10 | 1491910399 |
License | Creative Commons Attribution-NonCommercial-NoDerivatives |
Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.
Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You'll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you've learned along the way.
You'll learn how to:
- Wrangle: transform your datasets into a form convenient for analysis;
- Program: learn powerful R tools for solving data problems with greater clarity and ease;
- Explore: examine your data, generate hypotheses, and quickly test them;
- Model: provide a low-dimensional summary that captures true "signals" in your dataset;
- Communicate: learn R Markdown for integrating prose, code, and results.
This book is available under a Creative Commons Attribution-NonCommercial-NoDerivatives license (CC BY-NC-ND), which means that you are free to copy and distribute it, as long as you attribute the source, don't use it commercially, and don't create modified versions.
If you enjoyed the book and would like to support the author, you can purchase a printed copy (hardcover or paperback) from official retailers.
This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed wit
Learn how to accelerate C++ programs using data parallelism. This open book enables C++ programmers to be at the forefront of this exciting and important new development that is helping to push computing to new levels. It is full of practical advice, detailed explanations, and code examples to illustrate key topics. Data parallelism in C++ enables
Intel Galileo and Intel Galileo Gen 2: API Features and Arduino Projects for Linux Programmers provides detailed information about Intel Galileo and Intel Galileo Gen 2 boards for all software developers interested in Arduino and the Linux platform. The book covers the new Arduino APIs and is an introduction for developers on natively using Linux.
NoSQL is a modern data persistence storage paradigm that provides data persistence for environments where high performance is a primary requirement. Within NoSQL, data is stored in such a way as to make both writing and reading quite fast, even under heavy load. Redis and Redis Enterprise are market-leading, multi-model NoSQL databases that bring N
Creative Scala is designed for developers with no prior experience in Scala, offering a fun and gentle introduction to functional programming. The book assumes only basic familiarity with another programming language and little to no exposure to Scala or functional programming concepts. The authors have three main objectives with this book: 1. Intr
Energy Efficient Servers: Blueprints for Data Center Optimization introduces engineers and IT professionals to the power management technologies and techniques used in energy efficient servers. The book includes a deep examination of different features used in processors, memory, interconnects, I/O devices, and other platform components. It outline