Bulk email delivery is a tricky business. Many common email service providers (ESP), such as Gmail, Yahoo, Hotmail, etc., have different sets of delivery/spam rules. Violation of those rules results in rejected messages or messages sent directly to potential customers’ junk mail folders. Successful bulk email marketers pick up signals from variations in email delivery metrics on a day-to-day basis and take corrective actions.
At RevolutionCredit we use bulk email channel extensively to invite customers to our platform. Number of email batches we send are quite large as we have different customer segments, A/B tests, different languages, etc. Monitoring these email batches to ensure proper delivery and optimal user engagement is very important. We track a wide range of metrics that provide data necessary for monitoring performance: sent rate, delivered rate, open rate, click rate, signup rate, completion rate, number of email batches sent for each customer, etc. Keeping track of these metrics manually consumes a great deal of time—and no one enjoys doing it.
In this article, we outline our implementation of a machine learning concept called time series for tracking email metrics and detecting anomalies. A time series consists of a set of observations generated sequentially over a certain time interval. It enables extrapolation of information acquired from one observation period and apply the pattern to the next time interval. We use a powerful forecasting technique called auto-regressive integrated moving average (ARIMA) that allows modelling of wide range of time series data.
Following sections outline the steps we took to build the model.
Assembling right data set is critical for a successful implementation. We utilized delivery rate, open rate and click rate in the model. Including open rate in the mix was critical because it provides visibility into user engagement. Oftentimes, ESPs return a “delivered” status, but the email may have landed in a prospect’s spam or the junk folder, so open rates inclusion is important.
Email metrics collected during a 9-week period was used in the model building. Each week has four days of delivery: Tuesday, Thursday, Friday and Saturday. For each delivery we have three data points: delivered rate, open rate and click rate, as shown below. We selected a frequency of 12 points to account for seasonality that comes with rate differences over weekdays and weekends.
Exploratory Data Analysis
We started off with basic summary/descriptive statistics analysis to analyze the data range and outliers using tools in R software. Decomposing the time series is a good way to gain insights into trends and seasonal effects. Below is our data when decomposed.
Model Parameter Selection
An important aspect of an ARIMA model is arriving at model parameters that appropriately reflect the patterns in the data. These parameters guide the model as to how to make the time series stationary, how to handle seasonality, trend, etc.
Notation for an ARIMA model is defined as:
ARIMA(p, d, q) × (P, D, Q) S, where:
- p = non-seasonal autoregressive order
- d = non-seasonal differencing
- q = non-seasonal moving average order
- P = seasonal autoregressive order
- D = seasonal differencing
- Q = seasonal moving average order
- S = time span of repeating seasonal pattern.
ARIMA models can potentially have different parameters and combination terms. We tried out a range of models when fitting the data and selected the best-fitting model using an appropriate criterion such as Akaike’s information criteria (AIC).
We’ll refrain from going into how the model parameters were selected, as that is a topic deserving separate discussion.
Model Evaluation and Validation
Model evaluation included plotting and evaluating original series, estimated series and residual series. In addition, Ljung-Box test and other statistical tests were used.
To validate the model, we performed in-sample testing to compare forecasted values against actual values. The data was split into two parts, first part was used for modeling. The second part was used for evaluating the forecast.
The chart below visualizes such a forecast. It shows that the model does a decent job of predicting the range for the future values.
We have a centralized monitoring service which hosts the model. This service is set to run at certain schedules to generate necessary forecasts and compares the same against the actual within a certain confidence interval threshold. Any value that falls outside the selected confidence interval is considered as a candidate for anomaly analysis. The model notifies operations personnel in such events.
The goal of this implementation is to identify email batches that deviate from the normal sending pattern. Manual tracking of multiple email batches overwhelms human limitations. This hands-off implementation via algorithm enables us to monitor large numbers of email batches in an automated fashion. We continually refine the model as we accumulate more training data.
We plan to incorporate ranking and scoring of anomalies to minimize false positives in future models. In addition, incorporate a feature that correlates anomalies with other application events, such as issues with email server, etc., that affect the accuracy of statistical predictions.