API - Cost¶
To make TensorLayer simple, we minimize the number of cost functions as much as we can. So we encourage you to use TensorFlow’s function, , see TensorFlow API.
Note
Please refer to Getting Started for getting specific weights for weight regularization.
cross_entropy (output, target[, name]) |
Softmax cross-entropy operation, returns the TensorFlow expression of cross-entropy for two distributions, it implements softmax internally. |
sigmoid_cross_entropy (output, target[, name]) |
Sigmoid cross-entropy operation, see tf.nn.sigmoid_cross_entropy_with_logits . |
binary_cross_entropy (output, target[, …]) |
Binary cross entropy operation. |
mean_squared_error (output, target[, …]) |
Return the TensorFlow expression of mean-square-error (L2) of two batch of data. |
normalized_mean_square_error (output, target) |
Return the TensorFlow expression of normalized mean-square-error of two distributions. |
absolute_difference_error (output, target[, …]) |
Return the TensorFlow expression of absolute difference error (L1) of two batch of data. |
dice_coe (output, target[, loss_type, axis, …]) |
Soft dice (Sørensen or Jaccard) coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. |
dice_hard_coe (output, target[, threshold, …]) |
Non-differentiable Sørensen–Dice coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. |
iou_coe (output, target[, threshold, axis, …]) |
Non-differentiable Intersection over Union (IoU) for comparing the similarity of two batch of data, usually be used for evaluating binary image segmentation. |
cross_entropy_seq (logits, target_seqs[, …]) |
Returns the expression of cross-entropy of two sequences, implement softmax internally. |
cross_entropy_seq_with_mask (logits, …[, …]) |
Returns the expression of cross-entropy of two sequences, implement softmax internally. |
cosine_similarity (v1, v2) |
Cosine similarity [-1, 1]. |
li_regularizer (scale[, scope]) |
Li regularization removes the neurons of previous layer. |
lo_regularizer (scale) |
Lo regularization removes the neurons of current layer. |
maxnorm_regularizer ([scale]) |
Max-norm regularization returns a function that can be used to apply max-norm regularization to weights. |
maxnorm_o_regularizer (scale) |
Max-norm output regularization removes the neurons of current layer. |
maxnorm_i_regularizer (scale) |
Max-norm input regularization removes the neurons of previous layer. |
huber_loss (output, target[, is_mean, delta, …]) |
Huber Loss operation, see https://en.wikipedia.org/wiki/Huber_loss . |
Softmax cross entropy¶
-
tensorlayer.cost.
cross_entropy
(output, target, name=None)[source]¶ Softmax cross-entropy operation, returns the TensorFlow expression of cross-entropy for two distributions, it implements softmax internally. See
tf.nn.sparse_softmax_cross_entropy_with_logits
.Parameters: - output (Tensor) – A batch of distribution with shape: [batch_size, num of classes].
- target (Tensor) – A batch of index with shape: [batch_size, ].
- name (string) – Name of this loss.
Examples
>>> import tensorlayer as tl >>> ce = tl.cost.cross_entropy(y_logits, y_target_logits, 'my_loss')
References
- About cross-entropy: https://en.wikipedia.org/wiki/Cross_entropy.
- The code is borrowed from: https://en.wikipedia.org/wiki/Cross_entropy.
Sigmoid cross entropy¶
-
tensorlayer.cost.
sigmoid_cross_entropy
(output, target, name=None)[source]¶ Sigmoid cross-entropy operation, see
tf.nn.sigmoid_cross_entropy_with_logits
.Parameters: - output (Tensor) – A batch of distribution with shape: [batch_size, num of classes].
- target (Tensor) – A batch of index with shape: [batch_size, ].
- name (string) – Name of this loss.
Binary cross entropy¶
-
tensorlayer.cost.
binary_cross_entropy
(output, target, epsilon=1e-08, name='bce_loss')[source]¶ Binary cross entropy operation.
Parameters: - output (Tensor) – Tensor with type of float32 or float64.
- target (Tensor) – The target distribution, format the same with output.
- epsilon (float) – A small value to avoid output to be zero.
- name (str) – An optional name to attach to this function.
References
Mean squared error (L2)¶
-
tensorlayer.cost.
mean_squared_error
(output, target, is_mean=False, axis=-1, name='mean_squared_error')[source]¶ Return the TensorFlow expression of mean-square-error (L2) of two batch of data.
Parameters: - output (Tensor) – 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, height, width] or [batch_size, height, width, channel].
- target (Tensor) – The target distribution, format the same with output.
- is_mean (boolean) –
- Whether compute the mean or sum for each example.
- If True, use
tf.reduce_mean
to compute the loss between one target and predict data. - If False, use
tf.reduce_sum
(default).
- If True, use
- axis (int or list of int) – The dimensions to reduce.
- name (str) – An optional name to attach to this function.
References
Normalized mean square error¶
-
tensorlayer.cost.
normalized_mean_square_error
(output, target, axis=-1, name='normalized_mean_squared_error_loss')[source]¶ Return the TensorFlow expression of normalized mean-square-error of two distributions.
Parameters: - output (Tensor) – 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, height, width] or [batch_size, height, width, channel].
- target (Tensor) – The target distribution, format the same with output.
- axis (int or list of int) – The dimensions to reduce.
- name (str) – An optional name to attach to this function.
Absolute difference error (L1)¶
-
tensorlayer.cost.
absolute_difference_error
(output, target, is_mean=False, axis=-1, name='absolute_difference_error_loss')[source]¶ Return the TensorFlow expression of absolute difference error (L1) of two batch of data.
Parameters: - output (Tensor) – 2D, 3D or 4D tensor i.e. [batch_size, n_feature], [batch_size, height, width] or [batch_size, height, width, channel].
- target (Tensor) – The target distribution, format the same with output.
- is_mean (boolean) –
- Whether compute the mean or sum for each example.
- If True, use
tf.reduce_mean
to compute the loss between one target and predict data. - If False, use
tf.reduce_sum
(default).
- If True, use
- axis (int or list of int) – The dimensions to reduce.
- name (str) – An optional name to attach to this function.
Dice coefficient¶
-
tensorlayer.cost.
dice_coe
(output, target, loss_type='jaccard', axis=(1, 2, 3), smooth=1e-05)[source]¶ Soft dice (Sørensen or Jaccard) coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. labels are binary. The coefficient between 0 to 1, 1 means totally match.
Parameters: - output (Tensor) – A distribution with shape: [batch_size, ….], (any dimensions).
- target (Tensor) – The target distribution, format the same with output.
- loss_type (str) –
jaccard
orsorensen
, default isjaccard
. - axis (tuple of int) – All dimensions are reduced, default
[1,2,3]
. - smooth (float) –
- This small value will be added to the numerator and denominator.
- If both output and target are empty, it makes sure dice is 1.
- If either output or target are empty (all pixels are background), dice =
`smooth/(small_value + smooth)
, then if smooth is very small, dice close to 0 (even the image values lower than the threshold), so in this case, higher smooth can have a higher dice.
Examples
>>> import tensorlayer as tl >>> outputs = tl.act.pixel_wise_softmax(outputs) >>> dice_loss = 1 - tl.cost.dice_coe(outputs, y_)
References
Hard Dice coefficient¶
-
tensorlayer.cost.
dice_hard_coe
(output, target, threshold=0.5, axis=(1, 2, 3), smooth=1e-05)[source]¶ Non-differentiable Sørensen–Dice coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. labels are binary. The coefficient between 0 to 1, 1 if totally match.
Parameters: - output (tensor) – A distribution with shape: [batch_size, ….], (any dimensions).
- target (tensor) – The target distribution, format the same with output.
- threshold (float) – The threshold value to be true.
- axis (tuple of integer) – All dimensions are reduced, default
(1,2,3)
. - smooth (float) – This small value will be added to the numerator and denominator, see
dice_coe
.
References
IOU coefficient¶
-
tensorlayer.cost.
iou_coe
(output, target, threshold=0.5, axis=(1, 2, 3), smooth=1e-05)[source]¶ Non-differentiable Intersection over Union (IoU) for comparing the similarity of two batch of data, usually be used for evaluating binary image segmentation. The coefficient between 0 to 1, and 1 means totally match.
Parameters: - output (tensor) – A batch of distribution with shape: [batch_size, ….], (any dimensions).
- target (tensor) – The target distribution, format the same with output.
- threshold (float) – The threshold value to be true.
- axis (tuple of integer) – All dimensions are reduced, default
(1,2,3)
. - smooth (float) – This small value will be added to the numerator and denominator, see
dice_coe
.
Notes
- IoU cannot be used as training loss, people usually use dice coefficient for training, IoU and hard-dice for evaluating.
Cross entropy for sequence¶
-
tensorlayer.cost.
cross_entropy_seq
(logits, target_seqs, batch_size=None)[source]¶ Returns the expression of cross-entropy of two sequences, implement softmax internally. Normally be used for fixed length RNN outputs, see PTB example.
Parameters: - logits (Tensor) – 2D tensor with shape of [batch_size * n_steps, n_classes].
- target_seqs (Tensor) – The target sequence, 2D tensor [batch_size, n_steps], if the number of step is dynamic, please use
tl.cost.cross_entropy_seq_with_mask
instead. - batch_size (None or int.) –
- Whether to divide the cost by batch size.
- If integer, the return cost will be divided by batch_size.
- If None (default), the return cost will not be divided by anything.
Examples
>>> import tensorlayer as tl >>> # see `PTB example <https://github.com/tensorlayer/tensorlayer/blob/master/example/tutorial_ptb_lstm_state_is_tuple.py>`__.for more details >>> # outputs shape : (batch_size * n_steps, n_classes) >>> # targets shape : (batch_size, n_steps) >>> cost = tl.cost.cross_entropy_seq(outputs, targets)
Cross entropy with mask for sequence¶
-
tensorlayer.cost.
cross_entropy_seq_with_mask
(logits, target_seqs, input_mask, return_details=False, name=None)[source]¶ Returns the expression of cross-entropy of two sequences, implement softmax internally. Normally be used for Dynamic RNN with Synced sequence input and output.
Parameters: - logits (Tensor) – 2D tensor with shape of [batch_size * ?, n_classes], ? means dynamic IDs for each example.
- Can be get from DynamicRNNLayer by setting
return_seq_2d
to True. - target_seqs (Tensor) – int of tensor, like word ID. [batch_size, ?], ? means dynamic IDs for each example.
- input_mask (Tensor) – The mask to compute loss, it has the same size with target_seqs, normally 0 or 1.
- return_details (boolean) –
- Whether to return detailed losses.
- If False (default), only returns the loss.
- If True, returns the loss, losses, weights and targets (see source code).
Examples
>>> import tensorlayer as tl >>> import tensorflow as tf >>> import numpy as np >>> batch_size = 64 >>> vocab_size = 10000 >>> embedding_size = 256 >>> ni = tl.layers.Input([batch_size, None], dtype=tf.int64) >>> net = tl.layers.Embedding( ... vocabulary_size = vocab_size, ... embedding_size = embedding_size, ... name = 'seq_embedding')(ni) >>> net = tl.layers.RNN( ... cell =tf.keras.layers.LSTMCell(units=embedding_size, dropout=0.1), ... return_seq_2d = True, ... name = 'dynamicrnn')(net) >>> net = tl.layers.Dense(n_units=vocab_size, name="output")(net) >>> model = tl.models.Model(inputs=ni, outputs=net) >>> input_seqs = np.random.randint(0, 10, size=(batch_size, 10), dtype=np.int64) >>> target_seqs = np.random.randint(0, 10, size=(batch_size, 10), dtype=np.int64) >>> input_mask = np.random.randint(0, 2, size=(batch_size, 10), dtype=np.int64) >>> outputs = model(input_seqs, is_train=True) >>> loss = tl.cost.cross_entropy_seq_with_mask(outputs, target_seqs, input_mask)
- logits (Tensor) – 2D tensor with shape of [batch_size * ?, n_classes], ? means dynamic IDs for each example.
- Can be get from DynamicRNNLayer by setting
Cosine similarity¶
Regularization functions¶
For tf.nn.l2_loss
, tf.contrib.layers.l1_regularizer
, tf.contrib.layers.l2_regularizer
and
tf.contrib.layers.sum_regularizer
, see tensorflow API.
Maxnorm
^^^^^^^^^^
.. autofunction:: maxnorm_regularizer
Special¶
-
tensorlayer.cost.
li_regularizer
(scale, scope=None)[source]¶ Li regularization removes the neurons of previous layer. The i represents inputs. Returns a function that can be used to apply group li regularization to weights. The implementation follows TensorFlow contrib.
Parameters: - scale (float) – A scalar multiplier Tensor. 0.0 disables the regularizer.
- scope (str) – An optional scope name for this function.
Returns: Return type: A function with signature li(weights, name=None) that apply Li regularization.
Raises: ValueError : if scale is outside of the range [0.0, 1.0] or if scale is not a float.
-
tensorlayer.cost.
lo_regularizer
(scale)[source]¶ Lo regularization removes the neurons of current layer. The o represents outputs Returns a function that can be used to apply group lo regularization to weights. The implementation follows TensorFlow contrib.
Parameters: scale (float) – A scalar multiplier Tensor. 0.0 disables the regularizer. Returns: Return type: A function with signature lo(weights, name=None) that apply Lo regularization. Raises: ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.
-
tensorlayer.cost.
maxnorm_o_regularizer
(scale)[source]¶ Max-norm output regularization removes the neurons of current layer. Returns a function that can be used to apply max-norm regularization to each column of weight matrix. The implementation follows TensorFlow contrib.
Parameters: scale (float) – A scalar multiplier Tensor. 0.0 disables the regularizer. Returns: Return type: A function with signature mn_o(weights, name=None) that apply Lo regularization. Raises: ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.
-
tensorlayer.cost.
maxnorm_i_regularizer
(scale)[source]¶ Max-norm input regularization removes the neurons of previous layer. Returns a function that can be used to apply max-norm regularization to each row of weight matrix. The implementation follows TensorFlow contrib.
Parameters: scale (float) – A scalar multiplier Tensor. 0.0 disables the regularizer. Returns: Return type: A function with signature mn_i(weights, name=None) that apply Lo regularization. Raises: ValueError : If scale is outside of the range [0.0, 1.0] or if scale is not a float.
Huber Loss¶
-
tensorlayer.cost.
huber_loss
(output, target, is_mean=True, delta=1.0, dynamichuber=False, reverse=False, axis=-1, epsilon=1e-05, name=None)[source]¶ Huber Loss operation, see
https://en.wikipedia.org/wiki/Huber_loss
. Reverse Huber Loss operation, see ‘’https://statweb.stanford.edu/~owen/reports/hhu.pdf’‘. Dynamic Reverse Huber Loss operation, see ‘’https://arxiv.org/pdf/1606.00373.pdf’‘.Parameters: - output (Tensor) – A distribution with shape: [batch_size, ….], (any dimensions).
- target (Tensor) – The target distribution, format the same with output.
- is_mean (boolean) – Whether compute the mean or sum for each example.
- If True, use
tf.reduce_mean
to compute the loss between one target and predict data (default). - If False, usetf.reduce_sum
. - delta (float) – The point where the huber loss function changes from a quadratic to linear.
- dynamichuber (boolean) – Whether compute the coefficient c for each batch. - If True, c is 20% of the maximal per-batch error. - If False, c is delta.
- reverse (boolean) – Whether compute the reverse huber loss.
- axis (int or list of int) – The dimensions to reduce.
- epsilon – Eplison.
- name (string) – Name of this loss.