API - Cost

To make TensorLayer simple, we minimize the number of cost functions as much as we can. So we encourage you to use TensorFlow’s function. For example, you can implement L1, L2 and sum regularization by tf.nn.l2_loss, tf.contrib.layers.l1_regularizer, tf.contrib.layers.l2_regularizer and tf.contrib.layers.sum_regularizer, see TensorFlow API.

Your cost function

TensorLayer provides a simple way to create you own cost function. Take a MLP below for example.

network = InputLayer(x, name='input')
network = DropoutLayer(network, keep=0.8, name='drop1')
network = DenseLayer(network, n_units=800, act=tf.nn.relu, name='relu1')
network = DropoutLayer(network, keep=0.5, name='drop2')
network = DenseLayer(network, n_units=800, act=tf.nn.relu, name='relu2')
network = DropoutLayer(network, keep=0.5, name='drop3')
network = DenseLayer(network, n_units=10, act=tf.identity, name='output')

The network parameters will be [W1, b1, W2, b2, W_out, b_out], then you can apply L2 regularization on the weights matrix of first two layer as follow.

cost = tl.cost.cross_entropy(y, y_)
cost = cost + tf.contrib.layers.l2_regularizer(0.001)(network.all_params[0]) + tf.contrib.layers.l2_regularizer(0.001)(network.all_params[2])

Besides, TensorLayer provides a easy way to get all variables by a given name, so you can also apply L2 regularization on some weights as follow.

l2 = 0
for w in tl.layers.get_variables_with_name('W_conv2d', train_only=True, printable=False):
    l2 += tf.contrib.layers.l2_regularizer(1e-4)(w)
cost = tl.cost.cross_entropy(y, y_) + l2

Regularization of Weights

After initializing the variables, the informations of network parameters can be observed by using network.print_params().

tl.layers.initialize_global_variables(sess)
network.print_params()
param 0: (784, 800) (mean: -0.000000, median: 0.000004 std: 0.035524)
param 1: (800,) (mean: 0.000000, median: 0.000000 std: 0.000000)
param 2: (800, 800) (mean: 0.000029, median: 0.000031 std: 0.035378)
param 3: (800,) (mean: 0.000000, median: 0.000000 std: 0.000000)
param 4: (800, 10) (mean: 0.000673, median: 0.000763 std: 0.049373)
param 5: (10,) (mean: 0.000000, median: 0.000000 std: 0.000000)
num of params: 1276810

The output of network is network.outputs, then the cross entropy can be defined as follow. Besides, to regularize the weights, the network.all_params contains all parameters of the network. In this case, network.all_params = [W1, b1, W2, b2, Wout, bout] according to param 0, 1 … 5 shown by network.print_params(). Then max-norm regularization on W1 and W2 can be performed as follow.

y = network.outputs
# Alternatively, you can use tl.cost.cross_entropy(y, y_) instead.
cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(y, y_))
cost = cross_entropy
cost = cost + tl.cost.maxnorm_regularizer(1.0)(network.all_params[0]) +
          tl.cost.maxnorm_regularizer(1.0)(network.all_params[2])

In addition, all TensorFlow’s regularizers like tf.contrib.layers.l2_regularizer can be used with TensorLayer.

Regularization of Activation outputs

Instance method network.print_layers() prints all outputs of different layers in order. To achieve regularization on activation output, you can use network.all_layers which contains all outputs of different layers. If you want to apply L1 penalty on the activations of first hidden layer, just simply add tf.contrib.layers.l2_regularizer(lambda_l1)(network.all_layers[1]) to the cost function.

network.print_layers()
layer 0: Tensor("dropout/mul_1:0", shape=(?, 784), dtype=float32)
layer 1: Tensor("Relu:0", shape=(?, 800), dtype=float32)
layer 2: Tensor("dropout_1/mul_1:0", shape=(?, 800), dtype=float32)
layer 3: Tensor("Relu_1:0", shape=(?, 800), dtype=float32)
layer 4: Tensor("dropout_2/mul_1:0", shape=(?, 800), dtype=float32)
layer 5: Tensor("add_2:0", shape=(?, 10), dtype=float32)
cross_entropy(output, target[, name]) It is a softmax cross-entropy operation, returns the TensorFlow expression of cross-entropy of two distributions, implement softmax internally.
sigmoid_cross_entropy(output, target[, name]) It is a sigmoid cross-entropy operation, see tf.nn.sigmoid_cross_entropy_with_logits.
binary_cross_entropy(output, target[, …]) Computes binary cross entropy given output.
mean_squared_error(output, target[, is_mean]) Return the TensorFlow expression of mean-square-error (L2) of two batch of data.
normalized_mean_square_error(output, target) Return the TensorFlow expression of normalized mean-square-error of two distributions.
absolute_difference_error(output, target[, …]) Return the TensorFlow expression of absolute difference error (L1) of two batch of data.
dice_coe(output, target[, loss_type, axis, …]) Soft dice (Sørensen or Jaccard) coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e.
dice_hard_coe(output, target[, threshold, …]) Non-differentiable Sørensen–Dice coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e.
iou_coe(output, target[, threshold, axis, …]) Non-differentiable Intersection over Union (IoU) for comparing the similarity of two batch of data, usually be used for evaluating binary image segmentation.
cross_entropy_seq(logits, target_seqs[, …]) Returns the expression of cross-entropy of two sequences, implement softmax internally.
cross_entropy_seq_with_mask(logits, …[, …]) Returns the expression of cross-entropy of two sequences, implement softmax internally.
cosine_similarity(v1, v2) Cosine similarity [-1, 1], wiki.
li_regularizer(scale[, scope]) li regularization removes the neurons of previous layer, i represents inputs.
lo_regularizer(scale[, scope]) lo regularization removes the neurons of current layer, o represents outputs
maxnorm_regularizer([scale, scope]) Max-norm regularization returns a function that can be used to apply max-norm regularization to weights.
maxnorm_o_regularizer(scale, scope) Max-norm output regularization removes the neurons of current layer.
maxnorm_i_regularizer(scale[, scope]) Max-norm input regularization removes the neurons of previous layer.

Softmax cross entropy

tensorlayer.cost.cross_entropy(output, target, name=None)[source]

It is a softmax cross-entropy operation, returns the TensorFlow expression of cross-entropy of two distributions, implement softmax internally. See tf.nn.sparse_softmax_cross_entropy_with_logits.

Parameters:
output : Tensorflow variable

A distribution with shape: [batch_size, n_feature].

target : Tensorflow variable

A batch of index with shape: [batch_size, ].

name : string

Name of this loss.

References

  • About cross-entropy: wiki.
  • The code is borrowed from: here.

Examples

>>> ce = tl.cost.cross_entropy(y_logits, y_target_logits, 'my_loss')

Sigmoid cross entropy

tensorlayer.cost.sigmoid_cross_entropy(output, target, name=None)[source]

It is a sigmoid cross-entropy operation, see tf.nn.sigmoid_cross_entropy_with_logits.

Binary cross entropy

tensorlayer.cost.binary_cross_entropy(output, target, epsilon=1e-08, name='bce_loss')[source]

Computes binary cross entropy given output.

For brevity, let x = output, z = target. The binary cross entropy loss is

loss(x, z) = - sum_i (x[i] * log(z[i]) + (1 - x[i]) * log(1 - z[i]))
Parameters:
output : tensor of type float32 or float64.
target : tensor of the same type and shape as output.
epsilon : float

A small value to avoid output is zero.

name : string

An optional name to attach to this layer.

References

Mean squared error (L2)

tensorlayer.cost.mean_squared_error(output, target, is_mean=False)[source]

Return the TensorFlow expression of mean-square-error (L2) of two batch of data.

Parameters:
output : 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, w, h] or [batch_size, w, h, c].
target : 2D, 3D or 4D tensor.
is_mean : boolean, if True, use tf.reduce_mean to compute the loss of one data, otherwise, use tf.reduce_sum (default).

References

Normalized mean square error

tensorlayer.cost.normalized_mean_square_error(output, target)[source]

Return the TensorFlow expression of normalized mean-square-error of two distributions.

Parameters:
output : 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, w, h] or [batch_size, w, h, c].
target : 2D, 3D or 4D tensor.

Absolute difference error (L1)

tensorlayer.cost.absolute_difference_error(output, target, is_mean=False)[source]

Return the TensorFlow expression of absolute difference error (L1) of two batch of data.

Parameters:
output : 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, w, h] or [batch_size, w, h, c].
target : 2D, 3D or 4D tensor.
is_mean : boolean, if True, use tf.reduce_mean to compute the loss of one data, otherwise, use tf.reduce_sum (default).

Dice coefficient

tensorlayer.cost.dice_coe(output, target, loss_type='jaccard', axis=[1, 2, 3], smooth=1e-05)[source]

Soft dice (Sørensen or Jaccard) coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. labels are binary. The coefficient between 0 to 1, 1 means totally match.

Parameters:
output : tensor

A distribution with shape: [batch_size, ….], (any dimensions).

target : tensor

A distribution with shape: [batch_size, ….], (any dimensions).

loss_type : string

jaccard or sorensen, default is jaccard.

axis : list of integer

All dimensions are reduced, default [1,2,3].

smooth : float

This small value will be added to the numerator and denominator. If both output and target are empty, it makes sure dice is 1. If either output or target are empty (all pixels are background), dice = `smooth/(small_value + smooth), then if smooth is very small, dice close to 0 (even the image values lower than the threshold), so in this case, higher smooth can have a higher dice.

References

Examples

>>> outputs = tl.act.pixel_wise_softmax(network.outputs)
>>> dice_loss = 1 - tl.cost.dice_coe(outputs, y_)

Hard Dice coefficient

tensorlayer.cost.dice_hard_coe(output, target, threshold=0.5, axis=[1, 2, 3], smooth=1e-05)[source]

Non-differentiable Sørensen–Dice coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. labels are binary. The coefficient between 0 to 1, 1 if totally match.

Parameters:
output : tensor

A distribution with shape: [batch_size, ….], (any dimensions).

target : tensor

A distribution with shape: [batch_size, ….], (any dimensions).

threshold : float

The threshold value to be true.

axis : list of integer

All dimensions are reduced, default [1,2,3].

smooth : float

This small value will be added to the numerator and denominator, see dice_coe.

References

IOU coefficient

tensorlayer.cost.iou_coe(output, target, threshold=0.5, axis=[1, 2, 3], smooth=1e-05)[source]

Non-differentiable Intersection over Union (IoU) for comparing the similarity of two batch of data, usually be used for evaluating binary image segmentation. The coefficient between 0 to 1, 1 means totally match.

Parameters:
output : tensor

A distribution with shape: [batch_size, ….], (any dimensions).

target : tensor

A distribution with shape: [batch_size, ….], (any dimensions).

threshold : float

The threshold value to be true.

axis : list of integer

All dimensions are reduced, default [1,2,3].

smooth : float

This small value will be added to the numerator and denominator, see dice_coe.

Notes

  • IoU cannot be used as training loss, people usually use dice coefficient for training, IoU and hard-dice for evaluating.

Cross entropy for sequence

tensorlayer.cost.cross_entropy_seq(logits, target_seqs, batch_size=None)[source]

Returns the expression of cross-entropy of two sequences, implement softmax internally. Normally be used for Fixed Length RNN outputs.

Parameters:
logits : Tensorflow variable

2D tensor, network.outputs, [batch_size*n_steps (n_examples), number of output units]

target_seqs : Tensorflow variable

target : 2D tensor [batch_size, n_steps], if the number of step is dynamic, please use cross_entropy_seq_with_mask instead.

batch_size : None or int.

If not None, the return cost will be divided by batch_size.

Examples

>>> see PTB tutorial for more details
>>> input_data = tf.placeholder(tf.int32, [batch_size, num_steps])
>>> targets = tf.placeholder(tf.int32, [batch_size, num_steps])
>>> cost = tl.cost.cross_entropy_seq(network.outputs, targets)

Cross entropy with mask for sequence

tensorlayer.cost.cross_entropy_seq_with_mask(logits, target_seqs, input_mask, return_details=False, name=None)[source]

Returns the expression of cross-entropy of two sequences, implement softmax internally. Normally be used for Dynamic RNN outputs.

Parameters:
logits : network identity outputs

2D tensor, network.outputs, [batch_size, number of output units].

target_seqs : int of tensor, like word ID.

[batch_size, ?]

input_mask : the mask to compute loss

The same size with target_seqs, normally 0 and 1.

return_details : boolean
  • If False (default), only returns the loss.
  • If True, returns the loss, losses, weights and targets (reshape to one vetcor).

Examples

  • see Image Captioning Example.

Cosine similarity

tensorlayer.cost.cosine_similarity(v1, v2)[source]

Cosine similarity [-1, 1], wiki.

Parameters:
v1, v2 : tensor of [batch_size, n_feature], with the same number of features.
Returns:
a tensor of [batch_size, ]

Regularization functions

For tf.nn.l2_loss, tf.contrib.layers.l1_regularizer, tf.contrib.layers.l2_regularizer and tf.contrib.layers.sum_regularizer, see TensorFlow API.

Maxnorm

tensorlayer.cost.maxnorm_regularizer(scale=1.0, scope=None)[source]

Max-norm regularization returns a function that can be used to apply max-norm regularization to weights. About max-norm: wiki.

The implementation follows TensorFlow contrib.

Parameters:
scale : float

A scalar multiplier Tensor. 0.0 disables the regularizer.

scope: An optional scope name.
Returns:
A function with signature `mn(weights, name=None)` that apply Lo regularization.
Raises:
ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.

Special

tensorlayer.cost.li_regularizer(scale, scope=None)[source]

li regularization removes the neurons of previous layer, i represents inputs.

Returns a function that can be used to apply group li regularization to weights.

The implementation follows TensorFlow contrib.

Parameters:
scale : float

A scalar multiplier Tensor. 0.0 disables the regularizer.

scope: An optional scope name for TF12+.
Returns:
A function with signature `li(weights, name=None)` that apply Li regularization.
Raises:
ValueError : if scale is outside of the range [0.0, 1.0] or if scale is not a float.
tensorlayer.cost.lo_regularizer(scale, scope=None)[source]

lo regularization removes the neurons of current layer, o represents outputs

Returns a function that can be used to apply group lo regularization to weights.

The implementation follows TensorFlow contrib.

Parameters:
scale : float

A scalar multiplier Tensor. 0.0 disables the regularizer.

scope: An optional scope name for TF12+.
Returns:
A function with signature `lo(weights, name=None)` that apply Lo regularization.
Raises:
ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.
tensorlayer.cost.maxnorm_o_regularizer(scale, scope)[source]

Max-norm output regularization removes the neurons of current layer.

Returns a function that can be used to apply max-norm regularization to each column of weight matrix.

The implementation follows TensorFlow contrib.

Parameters:
scale : float

A scalar multiplier Tensor. 0.0 disables the regularizer.

scope: An optional scope name.
Returns:
A function with signature `mn_o(weights, name=None)` that apply Lo regularization.
Raises:
ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.
tensorlayer.cost.maxnorm_i_regularizer(scale, scope=None)[source]

Max-norm input regularization removes the neurons of previous layer.

Returns a function that can be used to apply max-norm regularization to each row of weight matrix.

The implementation follows TensorFlow contrib.

Parameters:
scale : float

A scalar multiplier Tensor. 0.0 disables the regularizer.

scope: An optional scope name.
Returns:
A function with signature `mn_i(weights, name=None)` that apply Lo regularization.
Raises:
ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.