Within a single LSTM block, the input (e.g. mt−i ) and the previous output (Ht−i−1) are used to decide (1) how much of the previous cell state Ct−i−1 to retain in state Ct−i , (2) how to use the current input and the previous output to influence the state, and (3) how to construct the output Ht−i . This is accomplished using a set of gating functions to determine state dynamics by controlling the amount of information to keep from input and previous output, and the information flow going to the next step. Each gating function is parameterized by a set of weights to be learned. e expressive capacity of an LSTM block is determined by the number of memory units (i.e. the dimensionality of the hidden state vector H). Due to space constraints, we refer the reader to NLP primers (e.g., [12]) for a formal characterization of LSTMs.