(as of Feb 10,2022 17:58:37 UTC – Details)
Everybody’s talking about machine learning. It’s moved from an academic discipline to one of the most exciting technologies around. From understanding video feeds in self-driving cars to personalizing medications, it’s becoming important in every industry. While the model architectures and concepts have received a lot of attention, machine learning has yet to go through the standardization of processes that the software industry experienced in the last two decades. In this book, we’d like to show you how to build a standardized machine learning system that is automated and results in models that are reproducible.
Who Is This Book For?
The primary audience for the book is data scientists and machine learning engineers who want to go beyond training a one-off machine learning model and who want to successfully productize their data science projects.
You should be comfortable with basic machine learning concepts and familiar with at least one machine learning framework (e.g., PyTorch, TensorFlow, Keras). The machine learning examples in this book are based on TensorFlow and Keras, but the core concepts can be applied to any framework.
A secondary audience for this book is managers of data science projects, software developers, or DevOps engineers who want to enable their organization to accelerate their data science projects. If you are interested in better understanding automated machine learning life cycles and how they can benefit your organization, the book will introduce a toolchain to achieve exactly that.
What Are Machine Learning Pipelines?
During the last few years, the developments in the field of machine learning have been astonishing. With the broad availability of graphical processing units (GPUs) and the rise of new deep learning concepts like Transformers such as BERT, or Generative Adversarial Networks (GANs) such as deep convolutional GANs, the number of AI projects has skyrocketed. The number of AI startups is enormous. Organizations are increasingly applying the latest machine learning concepts to all kinds of business problems. In this rush for the most performant machine learning solution, we have observed a few things that have received less attention. We have seen that data scientists and machine learning engineers are lacking good sources of information for concepts and tools to accelerate, reuse, manage, and deploy their developments. What is needed is the standardization of machine learning pipelines.
Machine learning pipelines implement and formalize processes to accelerate, reuse, manage, and deploy machine learning models. Software engineering went through the same changes a decade or so ago with the introduction of continuous integration (CI) and continuous deployment (CD). Back in the day, it was a lengthy process to test and deploy a web app. These days, these processes have been greatly simplified by a few tools and concepts. Previously, the deployment of web apps required collaboration between a DevOps engineer and the software developer. Today, the app can be tested and deployed reliably in a matter of minutes. Data scientists and machine learning engineers can learn a lot about workflows from software engineering. Our intention with this book is to contribute to the standardization of machine learning projects by walking readers through an entire machine learning pipeline, end to end.
From our personal experience, most data science projects that aim to deploy models into production do not have the luxury of a large team. This makes it difficult to build an entire pipeline in-house from scratch. It may mean that machine learning projects turn into one-off efforts where performance degrades after time, the data scientist spends much of their time fixing errors when the underlying data changes, or the model is not used widely. An automated, reproducible pipeline reduces the effort required to deploy a model. The pipeline should include steps that:
Version your data effectively and kick off a new model training runValidate the received data and check against data driftEfficiently preprocess data for your model training and validationEffectively train your machine learning modelsTrack your model trainingAnalyze and validate your trained and tuned modelsDeploy the validated modelScale the deployed modelCapture new training data and model performance metrics with feedback loops
This list leaves out one important point: choosing the model architecture. We assume that you already have a good working knowledge of this step. If you are getting started with machine or deep learning, these resources are a great starting point to familiarize yourself with machine learning:
Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence Algorithms, 1st edition by Nikhil Buduma and Nicholas Locascio (O’Reilly)
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd edition by Aurélien Géron (O’Reilly)
O’Reilly’s mission is to change the world by sharing the knowledge of innovators. For over 40 years, we’ve inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
At the heart of our business is a unique network of expert pioneers and practitioners who share their knowledge through the O’Reilly learning platform and our books—which have been heralded for decades as the definitive way to learn the technologies that are shaping the future. So individuals, teams, and organizations learn the tools, best practices, and emerging trends that will transform their industries.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
Publisher : O’Reilly Media; 1st edition (August 4, 2020)
Language : English
Paperback : 366 pages
ISBN-10 : 1492053198
ISBN-13 : 978-1492053194
Item Weight : 1.28 pounds
Dimensions : 7 x 0.76 x 9.19 inches
With our extensive collection of elements, creating and customizing layouts becomes
second nature. Forget about coding and enjoy our themes.