Fundamentals of recurrent neural network (RNN) and long-short term memory (LSTM) network.
A Sherstinsky.
Physica D: Nonlinear Phenomena, 404, 2020.

Neural networks and deep learning.
CC Aggarwal.
Springer, 2018.

Adam: A method for stochastic optimization.
Diederik P Kingma and Jimmy Ba.
arXiv preprint arXiv:1412.6980, 2014.

Learning precise timing with lstm recurrent networks.
FA Gers, NN Schraudolph and J Schmidhuber.
Journal of Machine Learning Research, 3(1):115-143, 2003.

Neural networks for pattern recognition.
CM Bishop.
Oxford university press, 1995.

Training feedforward networks with the Marquardt algorithm.
MT Hagan and M Menhaj.
IEEE Transactions on Neural Networks, vol. 5, no. 6, pp. 989–993, 1994.

First and second order methods for learning: Between steepest descent and Newton’s method.
R Battiti.
Neural Computation, vol. 4, no. 2, pp. 141–166, 1992.

Multilayer feedforward networks are universal approximators.
K Hornik, M Stinchcombe and H White.
Neural Networks, 2(5):359-366, 1989.

Learning representations by back-propagating errors.
DE Rumelhart, GE Hinton and RJ Williams.
Nature, vol. 323, pp. 533–536, 1986.

Function minimization by conjugate gradients.
R Fletcher and CM Reeves
Computer Journal, vol. 7, pp. 149-154, 1964.

Principles of Neurodynamics.
F Rosenblatt.
Washington D.C.: Spartan Press, 1961.

The Organization of Behavior.
DO Hebb.
New York: Wiley, 1949.

A logical calculus of ideas immanent in nervous activity.
WS McCulloch and WH Pitts.
Bulletin of Mathematical Biophysics, vol. 5, pp. 115-133, 1943.

⇐ Model Deployment