# API - Cost¶

To make TensorLayer simple, we minimize the number of cost functions as much as we can. So we encourage you to use TensorFlow’s function, , see TensorFlow API.

Note

Please refer to Getting Started for getting specific weights for weight regularization.

 `cross_entropy`(output, target[, name]) Softmax cross-entropy operation, returns the TensorFlow expression of cross-entropy for two distributions, it implements softmax internally. `sigmoid_cross_entropy`(output, target[, name]) Sigmoid cross-entropy operation, see `tf.nn.sigmoid_cross_entropy_with_logits`. `binary_cross_entropy`(output, target[, …]) Binary cross entropy operation. `mean_squared_error`(output, target[, …]) Return the TensorFlow expression of mean-square-error (L2) of two batch of data. `normalized_mean_square_error`(output, target) Return the TensorFlow expression of normalized mean-square-error of two distributions. `absolute_difference_error`(output, target[, …]) Return the TensorFlow expression of absolute difference error (L1) of two batch of data. `dice_coe`(output, target[, loss_type, axis, …]) Soft dice (Sørensen or Jaccard) coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. `dice_hard_coe`(output, target[, threshold, …]) Non-differentiable Sørensen–Dice coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. `iou_coe`(output, target[, threshold, axis, …]) Non-differentiable Intersection over Union (IoU) for comparing the similarity of two batch of data, usually be used for evaluating binary image segmentation. `cross_entropy_seq`(logits, target_seqs[, …]) Returns the expression of cross-entropy of two sequences, implement softmax internally. `cross_entropy_seq_with_mask`(logits, …[, …]) Returns the expression of cross-entropy of two sequences, implement softmax internally. `cosine_similarity`(v1, v2) Cosine similarity [-1, 1]. `li_regularizer`(scale[, scope]) Li regularization removes the neurons of previous layer. `lo_regularizer`(scale) Lo regularization removes the neurons of current layer. `maxnorm_regularizer`([scale]) Max-norm regularization returns a function that can be used to apply max-norm regularization to weights. Max-norm output regularization removes the neurons of current layer. Max-norm input regularization removes the neurons of previous layer. `huber_loss`(output, target[, is_mean, delta, …]) Huber Loss operation, see `https://en.wikipedia.org/wiki/Huber_loss` .

## Softmax cross entropy¶

`tensorlayer.cost.``cross_entropy`(output, target, name=None)[source]

Softmax cross-entropy operation, returns the TensorFlow expression of cross-entropy for two distributions, it implements softmax internally. See `tf.nn.sparse_softmax_cross_entropy_with_logits`.

Parameters
• output (Tensor) – A batch of distribution with shape: [batch_size, num of classes].

• target (Tensor) – A batch of index with shape: [batch_size, ].

• name (string) – Name of this loss.

Examples

```>>> import tensorlayer as tl
>>> ce = tl.cost.cross_entropy(y_logits, y_target_logits, 'my_loss')
```

References

## Sigmoid cross entropy¶

`tensorlayer.cost.``sigmoid_cross_entropy`(output, target, name=None)[source]

Sigmoid cross-entropy operation, see `tf.nn.sigmoid_cross_entropy_with_logits`.

Parameters
• output (Tensor) – A batch of distribution with shape: [batch_size, num of classes].

• target (Tensor) – A batch of index with shape: [batch_size, ].

• name (string) – Name of this loss.

## Binary cross entropy¶

`tensorlayer.cost.``binary_cross_entropy`(output, target, epsilon=1e-08, name='bce_loss')[source]

Binary cross entropy operation.

Parameters
• output (Tensor) – Tensor with type of float32 or float64.

• target (Tensor) – The target distribution, format the same with output.

• epsilon (float) – A small value to avoid output to be zero.

• name (str) – An optional name to attach to this function.

References

## Mean squared error (L2)¶

`tensorlayer.cost.``mean_squared_error`(output, target, is_mean=False, axis=-1, name='mean_squared_error')[source]

Return the TensorFlow expression of mean-square-error (L2) of two batch of data.

Parameters
• output (Tensor) – 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, height, width] or [batch_size, height, width, channel].

• target (Tensor) – The target distribution, format the same with output.

• is_mean (boolean) –

Whether compute the mean or sum for each example.
• If True, use `tf.reduce_mean` to compute the loss between one target and predict data.

• If False, use `tf.reduce_sum` (default).

• axis (int or list of int) – The dimensions to reduce.

• name (str) – An optional name to attach to this function.

References

## Normalized mean square error¶

`tensorlayer.cost.``normalized_mean_square_error`(output, target, axis=-1, name='normalized_mean_squared_error_loss')[source]

Return the TensorFlow expression of normalized mean-square-error of two distributions.

Parameters
• output (Tensor) – 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, height, width] or [batch_size, height, width, channel].

• target (Tensor) – The target distribution, format the same with output.

• axis (int or list of int) – The dimensions to reduce.

• name (str) – An optional name to attach to this function.

## Absolute difference error (L1)¶

`tensorlayer.cost.``absolute_difference_error`(output, target, is_mean=False, axis=-1, name='absolute_difference_error_loss')[source]

Return the TensorFlow expression of absolute difference error (L1) of two batch of data.

Parameters
• output (Tensor) – 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, height, width] or [batch_size, height, width, channel].

• target (Tensor) – The target distribution, format the same with output.

• is_mean (boolean) –

Whether compute the mean or sum for each example.
• If True, use `tf.reduce_mean` to compute the loss between one target and predict data.

• If False, use `tf.reduce_sum` (default).

• axis (int or list of int) – The dimensions to reduce.

• name (str) – An optional name to attach to this function.

## Dice coefficient¶

`tensorlayer.cost.``dice_coe`(output, target, loss_type='jaccard', axis=(1, 2, 3), smooth=1e-05)[source]

Soft dice (Sørensen or Jaccard) coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. labels are binary. The coefficient between 0 to 1, 1 means totally match.

Parameters
• output (Tensor) – A distribution with shape: [batch_size, ….], (any dimensions).

• target (Tensor) – The target distribution, format the same with output.

• loss_type (str) – `jaccard` or `sorensen`, default is `jaccard`.

• axis (tuple of int) – All dimensions are reduced, default `[1,2,3]`.

• smooth (float) –

This small value will be added to the numerator and denominator.
• If both output and target are empty, it makes sure dice is 1.

• If either output or target are empty (all pixels are background), dice = ``smooth/(small_value + smooth)`, then if smooth is very small, dice close to 0 (even the image values lower than the threshold), so in this case, higher smooth can have a higher dice.

Examples

```>>> import tensorlayer as tl
>>> outputs = tl.act.pixel_wise_softmax(outputs)
>>> dice_loss = 1 - tl.cost.dice_coe(outputs, y_)
```

References

## Hard Dice coefficient¶

`tensorlayer.cost.``dice_hard_coe`(output, target, threshold=0.5, axis=(1, 2, 3), smooth=1e-05)[source]

Non-differentiable Sørensen–Dice coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. labels are binary. The coefficient between 0 to 1, 1 if totally match.

Parameters
• output (tensor) – A distribution with shape: [batch_size, ….], (any dimensions).

• target (tensor) – The target distribution, format the same with output.

• threshold (float) – The threshold value to be true.

• axis (tuple of integer) – All dimensions are reduced, default `(1,2,3)`.

• smooth (float) – This small value will be added to the numerator and denominator, see `dice_coe`.

References

## IOU coefficient¶

`tensorlayer.cost.``iou_coe`(output, target, threshold=0.5, axis=(1, 2, 3), smooth=1e-05)[source]

Non-differentiable Intersection over Union (IoU) for comparing the similarity of two batch of data, usually be used for evaluating binary image segmentation. The coefficient between 0 to 1, and 1 means totally match.

Parameters
• output (tensor) – A batch of distribution with shape: [batch_size, ….], (any dimensions).

• target (tensor) – The target distribution, format the same with output.

• threshold (float) – The threshold value to be true.

• axis (tuple of integer) – All dimensions are reduced, default `(1,2,3)`.

• smooth (float) – This small value will be added to the numerator and denominator, see `dice_coe`.

Notes

• IoU cannot be used as training loss, people usually use dice coefficient for training, IoU and hard-dice for evaluating.

## Cross entropy for sequence¶

`tensorlayer.cost.``cross_entropy_seq`(logits, target_seqs, batch_size=None)[source]

Returns the expression of cross-entropy of two sequences, implement softmax internally. Normally be used for fixed length RNN outputs, see PTB example.

Parameters
• logits (Tensor) – 2D tensor with shape of [batch_size * n_steps, n_classes].

• target_seqs (Tensor) – The target sequence, 2D tensor [batch_size, n_steps], if the number of step is dynamic, please use `tl.cost.cross_entropy_seq_with_mask` instead.

• batch_size (None or int.) –

Whether to divide the cost by batch size.
• If integer, the return cost will be divided by batch_size.

• If None (default), the return cost will not be divided by anything.

Examples

```>>> import tensorlayer as tl
>>> # see `PTB example <https://github.com/tensorlayer/tensorlayer/blob/master/example/tutorial_ptb_lstm.py>`__.for more details
>>> # outputs shape : (batch_size * n_steps, n_classes)
>>> # targets shape : (batch_size, n_steps)
>>> cost = tl.cost.cross_entropy_seq(outputs, targets)
```

## Cross entropy with mask for sequence¶

`tensorlayer.cost.``cross_entropy_seq_with_mask`(logits, target_seqs, input_mask, return_details=False, name=None)[source]

Returns the expression of cross-entropy of two sequences, implement softmax internally. Normally be used for Dynamic RNN with Synced sequence input and output.

Parameters
• logits (Tensor) – 2D tensor with shape of [batch_size * ?, n_classes], ? means dynamic IDs for each example. - Can be get from DynamicRNNLayer by setting `return_seq_2d` to True.

• target_seqs (Tensor) – int of tensor, like word ID. [batch_size, ?], ? means dynamic IDs for each example.

• input_mask (Tensor) – The mask to compute loss, it has the same size with target_seqs, normally 0 or 1.

• return_details (boolean) –

Whether to return detailed losses.
• If False (default), only returns the loss.

• If True, returns the loss, losses, weights and targets (see source code).

Examples

```>>> import tensorlayer as tl
>>> import tensorflow as tf
>>> import numpy as np
>>> batch_size = 64
>>> vocab_size = 10000
>>> embedding_size = 256
>>> ni = tl.layers.Input([batch_size, None], dtype=tf.int64)
>>> net = tl.layers.Embedding(
...         vocabulary_size = vocab_size,
...         embedding_size = embedding_size,
...         name = 'seq_embedding')(ni)
>>> net = tl.layers.RNN(
...         cell =tf.keras.layers.LSTMCell(units=embedding_size, dropout=0.1),
...         return_seq_2d = True,
...         name = 'dynamicrnn')(net)
>>> net = tl.layers.Dense(n_units=vocab_size, name="output")(net)
>>> model = tl.models.Model(inputs=ni, outputs=net)
>>> input_seqs = np.random.randint(0, 10, size=(batch_size, 10), dtype=np.int64)
>>> target_seqs = np.random.randint(0, 10, size=(batch_size, 10), dtype=np.int64)
>>> input_mask = np.random.randint(0, 2, size=(batch_size, 10), dtype=np.int64)
>>> outputs = model(input_seqs, is_train=True)
```

## Cosine similarity¶

`tensorlayer.cost.``cosine_similarity`(v1, v2)[source]

Cosine similarity [-1, 1].

Parameters

v2 (v1,) – Tensor with the same shape [batch_size, n_feature].

References

## Regularization functions¶

For `tf.nn.l2_loss`, `tf.contrib.layers.l1_regularizer`, `tf.contrib.layers.l2_regularizer` and `tf.contrib.layers.sum_regularizer`, see tensorflow API. Maxnorm ^^^^^^^^^^ .. autofunction:: maxnorm_regularizer

### Special¶

`tensorlayer.cost.``li_regularizer`(scale, scope=None)[source]

Li regularization removes the neurons of previous layer. The i represents inputs. Returns a function that can be used to apply group li regularization to weights. The implementation follows TensorFlow contrib.

Parameters
• scale (float) – A scalar multiplier Tensor. 0.0 disables the regularizer.

• scope (str) – An optional scope name for this function.

Returns

Return type

A function with signature li(weights, name=None) that apply Li regularization.

:raises ValueError : if scale is outside of the range [0.0, 1.0] or if scale is not a float.:

`tensorlayer.cost.``lo_regularizer`(scale)[source]

Lo regularization removes the neurons of current layer. The o represents outputs Returns a function that can be used to apply group lo regularization to weights. The implementation follows TensorFlow contrib.

Parameters

scale (float) – A scalar multiplier Tensor. 0.0 disables the regularizer.

Returns

Return type

A function with signature lo(weights, name=None) that apply Lo regularization.

:raises ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.:

`tensorlayer.cost.``maxnorm_o_regularizer`(scale)[source]

Max-norm output regularization removes the neurons of current layer. Returns a function that can be used to apply max-norm regularization to each column of weight matrix. The implementation follows TensorFlow contrib.

Parameters

scale (float) – A scalar multiplier Tensor. 0.0 disables the regularizer.

Returns

Return type

A function with signature mn_o(weights, name=None) that apply Lo regularization.

:raises ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.:

`tensorlayer.cost.``maxnorm_i_regularizer`(scale)[source]

Max-norm input regularization removes the neurons of previous layer. Returns a function that can be used to apply max-norm regularization to each row of weight matrix. The implementation follows TensorFlow contrib.

Parameters

scale (float) – A scalar multiplier Tensor. 0.0 disables the regularizer.

Returns

Return type

A function with signature mn_i(weights, name=None) that apply Lo regularization.

:raises ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.:

### Huber Loss¶

`tensorlayer.cost.``huber_loss`(output, target, is_mean=True, delta=1.0, dynamichuber=False, reverse=False, axis=-1, epsilon=1e-05, name=None)[source]

Huber Loss operation, see `https://en.wikipedia.org/wiki/Huber_loss` . Reverse Huber Loss operation, see ‘’https://statweb.stanford.edu/~owen/reports/hhu.pdf’‘. Dynamic Reverse Huber Loss operation, see ‘’https://arxiv.org/pdf/1606.00373.pdf’‘.

Parameters
• output (Tensor) – A distribution with shape: [batch_size, ….], (any dimensions).

• target (Tensor) – The target distribution, format the same with output.

• is_mean (boolean) – Whether compute the mean or sum for each example. - If True, use `tf.reduce_mean` to compute the loss between one target and predict data (default). - If False, use `tf.reduce_sum`.

• delta (float) – The point where the huber loss function changes from a quadratic to linear.

• dynamichuber (boolean) – Whether compute the coefficient c for each batch. - If True, c is 20% of the maximal per-batch error. - If False, c is delta.

• reverse (boolean) – Whether compute the reverse huber loss.

• axis (int or list of int) – The dimensions to reduce.

• epsilon – Eplison.

• name (string) – Name of this loss.