Special Topics: Group Cost Types

There are actually two very different kinds of cost functions: error functions and unit output cost functions. The error functions are based on the similarity of the outputs and targets. The unit output cost functions simply charge the unit for producing certain outputs, such as non-binary ones. The error functions assess no error when the target is NaN.

Error Types

SUM_SQUARED: This simply takes the sum over all units of the squared difference between the output and target. This is only the default for LINEAR output groups.
CROSS_ENTROPY: This is the sum over all units of:
t log(t/o) + (1-t) log((1-t)/(1-o)),
where t is the target and o is the output. This can become infinite if the output incorrectly reaches 0.0 or 1.0. This may happen if the training parameters are too aggressive. Lens caps the error at a very large value. CROSS_ENTROPY is the default error type for most output groups.
DIVERGENCE: This is the sum over all units of:
t log(t/o) This is only stable if the target vector and output vector are each normalized to sum to 1.0. This is the default error type for SOFT_MAX output groups.
COSINE: This calculates the 1.0 - the cosine of the angle between the output and target vectors. This can be used for training as well as evaluation. However, training can be tricky because there is only pressure for the angle of the output vector to be correct, not the absolute values of the outputs. You could use a unit cost function (such as LOGISTIC_COST) on the output units to encourage them to be binary if that is desired.

Target Types

TARGET_COPY: The units in a group with a TARGET_COPY cost function will copy their targets from some field in the corresponding units of another group. The copyConnect command must be used to specify which group and which field will be the source of the copying. The TARGET_COPY type should be specified prior to the main error type.

Output Cost Types

Unit output costs are error terms that penalize units for having certain outputs. For bounded units (ones whose outputs are limited to a finite range), there are five unit cost functions, all of which encourage the unit to have binary output. Non-bounded units can have one of two cost functions that encourage the unit to be silent. Output costs would typically only be applied to hidden layers, although they may be useful on output layers as well. They can be used with simple and continuous networks, but not with Boltzmann machines.

When used on a bounded group, the cost functions will be low at the extremes and will have a maximum cost of 1.0 at the outputCostPeak, which is typically at 0.5.

LINEAR_COST: For a bounded unit this changes linearly from 1.0 at the peak to 0.0 at the min and max output. For an unbounded unit, this is simply equal to the absolute value of the output.
QUADRATIC_COST: For a bounded unit, this has a derivative of 0 at the extremes and slopes up concavely to the peak. For unbounded units this is equal to the output squared.
CONV_QUAD_COST: This can only be used on bounded units. It is shaped like a downward-facing parabola. The derivative is 0 at the peak.
LOGISTIC_COST: This can only be used on bounded units. It is similar in shape to the CONV_QUAD_COST but the derivative goes to infinity as it approaches the extremes. However, the derivative is capped as if the output could not get closer than 1e-6 of the min or max.
COSINE_COST: This can only be used on bounded units. It has zero derivative at the min, max, and the peak.

The following figure shows the derivatives of the above functions:

Here are the functions as they would appear with a outputCostPeak of 0.25. Note that convex-quadratic and logistic are not necessarily 0.0 at the extremes, although no function will become negative:

And the derivatives:

The network's outputCostStrength scales the derivatives when they are injected into the units' outputDeriv fields. Generally a value about the same order of magnitude as the learning rate should be reasonable, though you may not want to activate unit costs too early in training or the units will get pinned. The network's outputCostStrength does not affect the outputCost as calculated for the whole network. It only affects the derivatives.

Groups can be given their own outputCostStrength and outputCostPeak to override the network defaults. If the group's unit cost strength is different from the network's, the group's contribution to the network's unit cost will be scaled by their ratio. In this way, if the cost of some groups is more important than that of others, it will be reflected in the outputCost.

Douglas Rohde

Last modified: Fri Nov 10 23:02:30 EST 2000