The training was done using stochastic gradient descent for 60 epochs with initial learning rate of 0.001 and 0.9 momentum. The learning rate was reduced to 1/10th of the original value after 1/3rd and 2/3rd of the training finished. The network weights were initialized using He norm initialization [19]. The dataset was augmented using rotation and horizontal flipping to balance positive and negative class ratios along with increasing generalizability of the model. The hyperparameters were tuned so as to give best performance on validation set. Training took 40 hours on a single Nvidia GTX 1080Ti GPU.