Consider a log sequence “54→57”, and suppose the predicted probability distribution is “{18: 0.8, 56: 0.2}”, which means that the next step could be either “18” or “56”. This ambiguity could be caused by using an insufficient history sequence length. For example, if two tasks share the same workow segment “54→57”, the first task has a pattern “18→54→57→18” that is executed 80% of the time, and the second task has a pattern “31→54→57→56” that is executed 20% of the time. This will lead to a model that predicts “{18: 0.8, 56: 0.2}” given the sequence “54→57”.