Author | Jeroen Janssens |
Publisher | O'Reilly Media |
Published | 2021 |
Edition | 2 |
Paperback | 282 pages |
Language | English |
ISBN-13 | 9781492087915, 9781492087908 |
ISBN-10 | 1492087912, 1492087904 |
License | Creative Commons Attribution-NonCommercial-NoDerivatives |
This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools - useful whether you work with Windows, macOS, or Linux.You'll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you're comfortable processing data with Python or R, you'll learn how to greatly improve your data science workflow by leveraging the command line's power. This book is ideal for data scientists, analysts, engineers, system administrators, and researchers.
- Obtain data from websites, APIs, databases, and spreadsheets;
- Perform scrub operations on text, CSV, HTML, XML, and JSON files;
- Explore data, compute descriptive statistics, and create visualizations;
- Manage your data science workflow;
- Create your own tools from one-liners and existing Python or R code;
- Parallelize and distribute data-intensive pipelines;
- Model data with dimensionality reduction, regression, and classification algorithms;
- Leverage the command line from Python, Jupyter, R, RStudio, and Apache Spark.
This book is available under a Creative Commons Attribution-NonCommercial-NoDerivatives license (CC BY-NC-ND), which means that you are free to copy and distribute it, as long as you attribute the source, don't use it commercially, and don't create modified versions.
If you enjoyed the book and would like to support the author, you can purchase a printed copy (hardcover or paperback) from official retailers.
The home computer boom of the 1980s brought with it now iconic machines such as the ZX Spectrum, BBC Micro, and Commodore 64. Those machines would inspire a generation. Written by Tim Danton. The Computers That Made Britain (300 pages, hardback) tells the story of 19 of those computers - and what happened behind the scenes. With dozens of new inter
The Linux Command Line takes you from your very first terminal keystrokes to writing full programs in Bash, the most popular Linux shell (or command line). Along the way you'll learn the timeless skills handed down by generations of experienced, mouse-shunning gurus: file navigation, environment configuration, command chaining, pattern matching wit
Three of CouchDB's creators show you how to use this document-oriented database as a standalone application framework or with high-volume, distributed applications. With its simple model for storing, processing, and accessing data, CouchDB is ideal for web applications that handle huge amounts of loosely structured data. That alone would stretch th
For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all - IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other relate
Learn how to accelerate C++ programs using data parallelism. This open book enables C++ programmers to be at the forefront of this exciting and important new development that is helping to push computing to new levels. It is full of practical advice, detailed explanations, and code examples to illustrate key topics. Data parallelism in C++ enables
Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers provides developers a comprehensive introduction and in-depth look at the Intel Xeon Phi coprocessor architecture and the corresponding parallel data structure tools and algorithms used in the various technical computing applications for which it is suitable. It