In order to learn whether training with similar dynamics can improve performance, we train a 2nd LSTM model with 1/3 frames and their ground-truth labels of OTB-30, testing on the whole sequence frames. The OPE result is shown in 7(b). We find that, once trained on auxiliary frames with the similar dynamics, ROLO will perform better on testing sequences. This attribute makes ROLO especially useful in surveillance environments, where models can be trained offline with pre-captured data. Considering this attribute, we experiment incrementing training frames, expecting to see an improved performance. We train a 3rd LSTM model with 1/3 ground truths, but with all the sequence frames. Results in Fig 7(c) show that even without addition of ground truth boxes, the performance can increase dramatically when more frames are used for training to learn the dynamics. It also shows that for tracking, the training data in the benchmark is quite limited [19]. Its SRE and TRE results are shown in Fig. 8 for robustness evaluation. The AOS for each video sequence is illustrated in Table 1. Our method achieves the best performance for most test video sequences, often outperforms the second best by a large margin.