What is an AR Model?
An Auto Regressive Model, or simply an AR-Model, is a statistical model for time series.
The main assumption is that the value in time $t+1$ is a linear combination of the $p$ previous values.
The $p$ parameter is called the order of the model. Hence an auto regressive model of order $p$ can be concisily referred to as an $AR(p)$ model.
Given $\{x_t\}_{t=0}^{n}$ data points the main assumption of an AR($p$) model is expresed through the following equation:
$$ x_{t+1} = \alpha_0x_t + \cdots \alpha_{p-1}x_{t-p-1}. $$
In order to callibrate this model one needs to find the coefficients $alpha_i$, $i=0,\cdots,p-1$ that minimize a given loss function.
How Can It Be Written Using Matrices?
This section is aimed at expressing the same ideas using matrices and vectors. $$ X_t := \big( x_{t+p-1}, x_{t+p-2}, \cdots, x_{t+1}, x_t \big)^T, $$
$$ W_{xy} := (\alpha_0, \alpha_1, \cdots, \alpha_{p-2}, \alpha_{p-1}), $$
$$ Y_t := ( x_{t+p} ). $$ Hence, the same $AR(p)$ model is given by $$ Y_t = W_{xy}X_t. $$
What if a Hidden Unit is Added?
Having in a mind a classic neural network architecture, one could add a hidden unit.
Now instead of having a weight that takes $X_t$ straight up to $Y_t$, one decomposes it into $W_{xh}$ and $W_{hy}$ in such a way that $W_{xy} = W_{hy} W_{xh} $.
The equation then becomes $$ Y_t = W_{xy}X_t = W_{hy} W_{xh} X_t = W_{hy} h_t. $$
Now one can add recurrence between the the hidden units. That is, one can add a term $W_{hh}$ that integrates the value of $h_{t-1}$ into $h_t$: $$ Y_t = W_{hy} h_t = W_{hy} \big( W_{xh} X_t + W_{hh} h_{t-1} \big). $$
What if Activation Functions and Biases are Added?
Finally, one can add activation functions and biases. $$ \begin{split} Y_t &= \sigma_y \big( W_{hy} h_t + b_y \big) \\\ &= \sigma_y \Bigg( W_{hy} \sigma_h \Big( W_{xh} X_t + W_{hh} h_{t-1} + b_h \Big) + b_y \Bigg). \end{split} $$
Now note that $$ \begin{split} Y_t &= \sigma_y\big( W_{hy} h_t + b_y \big), \\\ h_t &= \sigma_h \Big( W_{xh} X_t + W_{hh} h_{t-1} + b_h \Big), \end{split} $$ is precisily the architecture of an Elman Network, introduced by Jeffrey L. Elman in $1990$.