Machine Learning

A machine learning configuration is a set of features that control the behavior of the forecasting engine during a volume run. You must define a machine learning configuration before you specify Machine Learning as the method for a category property set. You can then apply the category property set to any category or hierarchy of categories for use when running a volume forecast.

Overview

Machine Learning is a type of predictive analysis that creates a computer program, or model, by uncovering patterns in data. For example, if you want to predict an estimated selling price of your house, Machine Learning would look at the price of sold houses and their characteristics like location, number of rooms, living area, land area, and so on. From this data, Machine Learning would build a model by finding historical data patterns between the selling price of a house and its characteristics. It would use that model to predict the selling price of a given house.

Machine Learning uses this approach to predict volumes for different drivers, such as sales and items sold, based on characteristics like the day-of-week average.

There are three computational steps in the machine learning method:

Step 1 - Feature calculation

The Feature calculation collects the values of predictive characteristics such as a category’s recent day-of-week average for the purposes of the next step, training. Feature calculation runs when you first import data that includes a category with a machine learning category property set defined. Afterward, feature calculation is automatically updated whenever volume data is imported to that category.

A feature is simply a relevant element of the input data (for instance day of week), or a derivative of one or more of these inputs (for instance average volume on a given day of week), or even of other features (for instance a ratio of two averages).
The Machine Learning forecast considers many features, including some that can be configured by the user. The training process will determine which features are the best predictors of volume.
Some features are calculated during the Daily POS Volume Import, but we recommend configuring a weekly batch task to ensure all features’ values are calculated and current, which improves the accuracy of the forecast.
The features calculation should be set to run once per week for the previous two weeks

To create a batch task:
1. Go to Application Setup > Common Setup > Batch Processing.
2. Click New,
3. Add a name and description (optional).
4. In Sequence, enter “1”,
5. From the Action Type drop-down, select Feature Calculation.
6. To search for an action name, click the magnifying glass. Select the top root entry for forecasting locations.
7. Click Select and Return.
8. In Parameter String, enter the offset and duration.
9. The offset and duration should be based on when the Schedule Event will run. We recommend using the start day of the forecast week day and duration for two weeks in the past (14 days). For example, if the Week is Sunday through Saturday and the event will be running on Mondays, use “-1” as the offset. For the parameter string, enter:
  /offset:-1/duration:14
Click Create a batch event.
On the Batch Events page, use the arrows to move the newly created batch task from Available to Selected.
Click Save and Return.
Click Schedule and schedule the batch task for once per week at a non-peak day and time.

Step 3 - Training

Training compares the feature calculation to the existing volume data to devise the best model for mapping business conditions to volume predictions.

In the training process, a machine learning algorithm builds a (possibly complex) function that accurately maps feature values to recorded volumes

Note: The training step demands enormous computational resources and must be coordinated with product support.

Step 4 - Run Volume

Run Volume applies the generated model to the selected business unit and timeframe to predict business volume.

Create, edit or delete a machine learning configuration

Warning: When you create or edit a machine learning entity, the system must be retrained. Contact UKG Pro Workforce Management™ Support for advice.

Go to Administration > Application Setup > Forecaster Setup > Machine Learning.
Note: If there are many existing machine learning configurations in the table you can more easily find the one you are looking for by clicking Filter and typing a keyword in the field at the top of the Name column or the Description column.
Do one of the following:
- Click Create and enter a Name (and optionally a description).
- Select an existing configuration and click Edit .
  Note:
  Choose one of two purposes for the edit:
  - To modify the features of the selected configuration everywhere it is assigned: For this purpose, select Save changes everywhere that the named entity is used
  - To use an existing configuration as a template to define a new configuration: For this purpose, select Save as a new named entity and give the entity a new name and (optionally) a description.
- Select an existing configuration and click Delete .
  Note: You cannot delete a machine learning configurationif it is selected in a category property set.
Select one or more values for Recent Average - Day of Week tab, Current Assigned displays a list of current selections. Higher values produce more stable but less responsive results. The default setting (4 weeks) works well for most implementations. By adjusting the Recent Average - Day of Week setting, you can reduce the impact of anomalies associated with specific values and capture longer or shorter-term trends.
Show example.
In a weather-sensitive business, if unusual weather has caused a change in volume over the past month, a period of 4 weeks may provide an inaccurate forecast. Yet normal seasonal variation indicates that an average of 16 weeks would include too much of the previous season's weather, introducing inaccuracy. In this case set the Recent Average -Day of Week to 8 or 12 for best results.
Select one or more values for Recent Average tab, Current Assigned displays a list of current selections. Higher values produce more stable but less responsive results. It is recommended that you choose at least two of these features to capture both recent trends and longer term trends. The default settings of 60 and 90 days work well for most implementations. By adjusting the Recent Average setting, you can reduce the impact of anomalies associated with specific values.
Show example.
A store with seasonal trends may benefit from a 30-day Recent Average that can quickly capture a surge in volume. However, sporadic weather events may cause ths 30-day average to fluctuate to quickly, so it can be tempered by also including a more stable 90-day Recent Average. Together, these (recommended) settings capture both short-term and medium-term seasonal trends.
Under Organization:
- Specify which levels of the business structure you want to use as features for predicting volume. If the volume trends differ between entities at a level, it should be checked. For instance, if different districts have different patterns, “District” should be checked. The default selection of all levels is recommended since, in most cases, Machine Learning will determine if these features are not needed.
- Map two levels in your business structure to the levels here designated as District and Region.For District, choose lowest level that contains multiple sites. For Region, choose the lowest level that contains multiple instances of the level you mapped to District
Under Special Event specify the level of the business structure where you want to apply the volume multiplier. A lower level provides more granularity, while a higher level lessens the effects of outliers. If special events tend to impact different categories in the same store differently, Category may be used, but otherwise Site may provide less noise in the calculation of special event effects, and therefore provide more accurate forecasts.
Under Other Configuration define the following features.
- Training Period specifies the number of weeks in the past that the method uses to build the model. A minimum of two years of data plus the forecast horizon is required, and in general a full 3 years of data is recommended. In general a longer period is provides more accuracy at the expense of computational time, but there are diminishing returns after several years. If business conditions or activities have changed radically at some point, including earlier data may be skewed. If so, consider starting the training period after the change.
- Pooling Strategies specifies the business structure level where the engine pools volume data and features to create the machine learning model. Individual predictions are still made at the category level, but the model utilizes trends in historical data throughout the pool. The default setting, By Driver, generally provides best results.
- Generic Department to Exclude allows you to specify one or more department types which you want to exclude from the machine learning model. Excluding departments that have known and significant problems with data cleanliness including missing, negative or fractional values, or data that is duplicated in other categories, will improve model accuracy considerably. Matching departments are excluded regardless of where they reside in the business structure being forecasted. Current Assigned displays a list of generic departments excluded.
- Category to Exclude allows you to specify one or more categories which you want to exclude from the volume run. The categories must be selected as specific locations in the business structure; the exclusion is not inherited. Current Assigned displays a list of categories excluded.
Click Save.