Fully recurrent networks differ from simple recurrent networks in that fully recurrent networks use concurrent updates and propagate error derivatives backwards through time. A feed-forward or simple recurrent network will transfer activity from the input layer to the output layer in a single tick: each group in order updates its inputs and immediately updates its outputs. In a fully recurrent network, on the other hand, all groups first update their inputs and then all groups update their outputs. Therefore, information can propagate across just one set of connections per tick.
What are typically called recurrent-backprop-through-time (RBPTT) networks use a single tick per time interval. Therefore, the unit activations will change completely on each tick. Continuous networks are a more general version of fully recurrent networks which use more than one tick per interval and integrate the unit inputs or outputs, causing them to change gradually. Because these are really part of a continuum, it's not always meaningful to draw a distinction between RBPTTs and continuous networks and they will collectively be called "fully recurrent" to distinguish them from simple recurrent networks.
Example events are the same in continuous, fully recurrent, and standard networks, each having an optional minTime, maxTime, and graceTime, with the example set defaults used when a value is not specified. These are specified in terms of time intervals, not in ticks, so the scale at which a continuous network is simulated can be changed without altering the example files. In the case of continuous networks, it becomes useful to have event time values that are not integers.
With continuous networks, the first tick of each example is used to set the initial outputs of the units. This tick therefore has no event associated with it. This is not necessary in standard networks because updates are sequential.
A continuous network must be given the CONTINUOUS type when it is created. An existing network will not become continuous if the setTime command is used to set the ticksPerInterval to something other than 1, as it would have in previous versions of Lens. There is no way to change the type of a network once it is created.
Hidden and output groups in continuous networks will typically have an IN_INTEGR or OUT_INTEGR function appended to the input or output pipeline. These integrate the inputs or outputs over time, forcing them to change gradually. OUT_INTEGR is the default. If a hidden or output group is created in a continuous network, OUT_INTEGR will be added automatically unless IN_INTEGR is explicitly specified. If a unit is using a logistic (sigmoidal) output function, IN_INTEGR tends to allow the unit to move towards the extremes (away from an output of 0.5) more easily but resists movement away from the extremes, relative to OUT_INTEGR.
The following creates a continuous network with a maximum of 10 time intervals per example, 5 ticks per time interval, an output-integrating hidden layer, and an input-integrating output layer:
addNet cont1 -i 10 -t 5 addGroup input 10 INPUT addGroup hidden 20 addGroup output 10 IN_INTEGR connectGroups {input output} hidden outputOr, equivalently:
addNet cont1 -i 10 -t 5 10 20 10 IN_INTEGR connectGroups output hidden
To create a RBPTT network, which has one tick per time interval, you
will need to use the CONTINUOUS
network type, as in:
addNet myRBPTT -i 5 CONTINUOUS
When training continuous networks, the network is run in the forward mode through all of the ticks in the example. Then there is a single backward pass through all of the ticks in the example and error derivatives are injected at the appropriate time.
The dt parameter determines how quickly the input and output integrators will respond to a change in value. By default, dt is equal to 1/ticksPerInterval. Whenever the ticksPerInterval is changed using setTime the dt will be recalculated, unless you explicitly prevent it using the "-dtfixed" flag.
However, it is also possible to change dt to something other than the default value. You might view dt as the product of the small increment of time and the network's time constant. It could be increased to reflect a shorter network-wide time constant, causing units to change their states more rapidly. Each group and unit also have their own dtScale parameters. These are multiplied by the network's dt to produce the effective dt for each unit. These function as the groups' and the units' individual time constants.
When training continuous networks, the error and unit output costs assessed on each tick are scaled by 1/ticksPerInterval. Therefore, if you double the ticksPerInterval, the error won't double as well just because it is being calculated twice as often.
Often when running continuous networks, you do not want the targets to come on at the same time as the inputs, but only after a delay. The event's graceTime parameter controls this. The graceTime is specified in terms of time intervals, not ticks. In general, the event's minTime should be longer than the graceTime. Otherwise the event will stop at the end of the graceTime because there is no error.
An event will stop, and the next event begin under several conditions. One is if the event has already lasted for its maxTime. Another is if each group's criterion function is satisfied, meaning that they adequately matched their targets. OUTPUT groups in a continuous network are given the STANDARD_CRIT by default.
In some networks, you do not want the example to continue unless the network performs adequately on each event. If the network's groupCritRequired flag is true, the next event will only begin if the group criterion was reached on the previous event. This is not the default behavior.