Data Science at the Command Line, 2nd Edition

Obtain, Scrub, Explore, and Model Data with Unix Power Tools


Data Science at the Command Line, 2nd Edition
Data Science at the Command Line, 2nd Edition
CC BY-NC-ND

Book Details

Author Jeroen Janssens
Publisher O'Reilly Media
Published 2021
Edition 2
Paperback 282 pages
Language English
ISBN-13 9781492087915, 9781492087908
ISBN-10 1492087912, 1492087904
License Creative Commons Attribution-NonCommercial-NoDerivatives

Book Description

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools - useful whether you work with Windows, macOS, or Linux.You'll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you're comfortable processing data with Python or R, you'll learn how to greatly improve your data science workflow by leveraging the command line's power. This book is ideal for data scientists, analysts, engineers, system administrators, and researchers.

- Obtain data from websites, APIs, databases, and spreadsheets;
- Perform scrub operations on text, CSV, HTML, XML, and JSON files;
- Explore data, compute descriptive statistics, and create visualizations;
- Manage your data science workflow;
- Create your own tools from one-liners and existing Python or R code;
- Parallelize and distribute data-intensive pipelines;
- Model data with dimensionality reduction, regression, and classification algorithms;
- Leverage the command line from Python, Jupyter, R, RStudio, and Apache Spark.


This book is available under a Creative Commons Attribution-NonCommercial-NoDerivatives license (CC BY-NC-ND), which means that you are free to copy and distribute it, as long as you attribute the source, don't use it commercially, and don't create modified versions.

If you enjoyed the book and would like to support the author, you can purchase a printed copy (hardcover or paperback) from official retailers.

Download and Read Links

Share This Book

[localhost]# find . -name "*Similar_Books*"


The Computers That Made Britain

The home computer boom of the 1980s brought with it now iconic machines such as the ZX Spectrum, BBC Micro, and Commodore 64. Those machines would inspire a generation. Written by Tim Danton. The Computers That Made Britain (300 pages, hardback) tells the story of 19 of those computers - and what happened behind the scenes. With dozens of new inter

The Linux Command Line, 5th Edition

Linux

The Linux Command Line takes you from your very first terminal keystrokes to writing full programs in Bash, the most popular Linux shell (or command line). Along the way you'll learn the timeless skills handed down by generations of experienced, mouse-shunning gurus: file navigation, environment configuration, command chaining, pattern matching wit

CouchDB: The Definitive Guide

Couchdb

Three of CouchDB's creators show you how to use this document-oriented database as a standalone application framework or with high-volume, distributed applications. With its simple model for storing, processing, and accessing data, CouchDB is ideal for web applications that handle huge amounts of loosely structured data. That alone would stretch th

Python Data Science Handbook

Python Pandas

For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all - IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other relate

Data Parallel C++

C / C++

Learn how to accelerate C++ programs using data parallelism. This open book enables C++ programmers to be at the forefront of this exciting and important new development that is helping to push computing to new levels. It is full of practical advice, detailed explanations, and code examples to illustrate key topics. Data parallelism in C++ enables

Intel Xeon Phi Coprocessor Architecture and Tools

Intel Xeon

Intel Xeon Phi Coprocessor Architecture and Tools: The Guide for Application Developers provides developers a comprehensive introduction and in-depth look at the Intel Xeon Phi coprocessor architecture and the corresponding parallel data structure tools and algorithms used in the various technical computing applications for which it is suitable. It