Launch, monitor and maintain the system.
This section of the chapter outlines how a trained model can be deployed to a production environment.
Any trained scikit-learn models including their full preprocessing and prediction pipelines can be saved using the
joblib library. The trained model can be loaded within the production environment and predictions made by calling its
If the model is used within a website, a user will send a query containing the data to the web server which forwards it to the web application, then the code will simply call the model’s
predict method. The model should only be loaded during server startup and not every time the model is used.
Alternatively the model can be wrapped within a dedicated web service that your web application can query through a REST API. This makes it easier to upgrade the model to a new version without interupting the main application as well as making scaling simpler as you can start as many web services as needed and load-balance the requests coming from the web application across these web services.
Another option is to deploy the model to the cloud such as Google Cloud AI Platform by saving the model using
joblib and uploading to Google Cloud Storage (GCS). This gives a simple web service that looks after the load balancing and scaling. It takes JSON requests containing the input data and returns JSON responses containing the predictions. This web service can then be used in your website or production environment.
Once the system has been deployed it needs to be monitored to check the system’s live performance at regular intervals with alerts triggered when the performance drops. The model’s live performance can also be monitored through downstream metrics.
Models tend to rot over time. A model could experience a gentle decay over time as the data changes with the world over time. The datasets might need to be updated regularly and the model retrained on the updated datasets. This whole process should be automated as much as possible including collecting and labelling new data, training the model and fine-tuning the hyperparameters and evaluating both the new and previous models on the updated testsets, with the new model deployed if it performs better than the existing model.
The quality of the input data should be monitored with alerts triggered for example if more inputs are missing features or if the mean and standard deviations drift too far from the training set or if there are new categories appearing in a categorical variable.
There should be a backup made of ever model with the process and tools in place to be able to roll back to a previous model if the new model beings to fail badly. There should also be backups of every version of the datasets which would allow you to roll back to previous datasets if newer ones begin to contain a lot more outliers or to evaluate any new model against previous datasets. Several subsets of the test set could be made to evaluate how well the model performs on subsets of the data such as a subset with most recent data, a particular type of input etc.
Much of the work in machine learning in the data preparation step.
- building monitoring tools
- setting up human evaluation pipelines
- automating regular model training