API - Iteration

Data iteration.

minibatches([inputs, targets, batch_size, …])

Generate a generator that input a group of example in numpy.array and their labels, return the examples and labels by the given batch size.

seq_minibatches(inputs, targets, batch_size, …)

Generate a generator that return a batch of sequence inputs and targets.

seq_minibatches2(inputs, targets, …)

Generate a generator that iterates on two list of words.

ptb_iterator(raw_data, batch_size, num_steps)

Generate a generator that iterates on a list of words, see PTB example.

Non-time series

tensorlayer.iterate.minibatches(inputs=None, targets=None, batch_size=None, allow_dynamic_batch_size=False, shuffle=False)[source]

Generate a generator that input a group of example in numpy.array and their labels, return the examples and labels by the given batch size.

Parameters
  • inputs (numpy.array) – The input features, every row is a example.

  • targets (numpy.array) – The labels of inputs, every row is a example.

  • batch_size (int) – The batch size.

  • allow_dynamic_batch_size (boolean) – Allow the use of the last data batch in case the number of examples is not a multiple of batch_size, this may result in unexpected behaviour if other functions expect a fixed-sized batch-size.

  • shuffle (boolean) – Indicating whether to use a shuffling queue, shuffle the dataset before return.

Examples

>>> X = np.asarray([['a','a'], ['b','b'], ['c','c'], ['d','d'], ['e','e'], ['f','f']])
>>> y = np.asarray([0,1,2,3,4,5])
>>> for batch in tl.iterate.minibatches(inputs=X, targets=y, batch_size=2, shuffle=False):
>>>     print(batch)
(array([['a', 'a'], ['b', 'b']], dtype='<U1'), array([0, 1]))
(array([['c', 'c'], ['d', 'd']], dtype='<U1'), array([2, 3]))
(array([['e', 'e'], ['f', 'f']], dtype='<U1'), array([4, 5]))

Notes

If you have two inputs and one label and want to shuffle them together, e.g. X1 (1000, 100), X2 (1000, 80) and Y (1000, 1), you can stack them together (np.hstack((X1, X2))) into (1000, 180) and feed to inputs. After getting a batch, you can split it back into X1 and X2.

Time series

Sequence iteration 1

tensorlayer.iterate.seq_minibatches(inputs, targets, batch_size, seq_length, stride=1)[source]

Generate a generator that return a batch of sequence inputs and targets. If batch_size=100 and seq_length=5, one return will have 500 rows (examples).

Parameters
  • inputs (numpy.array) – The input features, every row is a example.

  • targets (numpy.array) – The labels of inputs, every element is a example.

  • batch_size (int) – The batch size.

  • seq_length (int) – The sequence length.

  • stride (int) – The stride step, default is 1.

Examples

Synced sequence input and output.

>>> X = np.asarray([['a','a'], ['b','b'], ['c','c'], ['d','d'], ['e','e'], ['f','f']])
>>> y = np.asarray([0, 1, 2, 3, 4, 5])
>>> for batch in tl.iterate.seq_minibatches(inputs=X, targets=y, batch_size=2, seq_length=2, stride=1):
>>>     print(batch)
(array([['a', 'a'], ['b', 'b'], ['b', 'b'], ['c', 'c']], dtype='<U1'), array([0, 1, 1, 2]))
(array([['c', 'c'], ['d', 'd'], ['d', 'd'], ['e', 'e']], dtype='<U1'), array([2, 3, 3, 4]))

Many to One

>>> return_last = True
>>> num_steps = 2
>>> X = np.asarray([['a','a'], ['b','b'], ['c','c'], ['d','d'], ['e','e'], ['f','f']])
>>> Y = np.asarray([0,1,2,3,4,5])
>>> for batch in tl.iterate.seq_minibatches(inputs=X, targets=Y, batch_size=2, seq_length=num_steps, stride=1):
>>>     x, y = batch
>>>     if return_last:
>>>         tmp_y = y.reshape((-1, num_steps) + y.shape[1:])
>>>     y = tmp_y[:, -1]
>>>     print(x, y)
[['a' 'a']
['b' 'b']
['b' 'b']
['c' 'c']] [1 2]
[['c' 'c']
['d' 'd']
['d' 'd']
['e' 'e']] [3 4]

Sequence iteration 2

tensorlayer.iterate.seq_minibatches2(inputs, targets, batch_size, num_steps)[source]

Generate a generator that iterates on two list of words. Yields (Returns) the source contexts and the target context by the given batch_size and num_steps (sequence_length). In TensorFlow’s tutorial, this generates the batch_size pointers into the raw PTB data, and allows minibatch iteration along these pointers.

Parameters
  • inputs (list of data) – The context in list format; note that context usually be represented by splitting by space, and then convert to unique word IDs.

  • targets (list of data) – The context in list format; note that context usually be represented by splitting by space, and then convert to unique word IDs.

  • batch_size (int) – The batch size.

  • num_steps (int) – The number of unrolls. i.e. sequence length

Yields

Pairs of the batched data, each a matrix of shape [batch_size, num_steps].

:raises ValueError : if batch_size or num_steps are too high.:

Examples

>>> X = [i for i in range(20)]
>>> Y = [i for i in range(20,40)]
>>> for batch in tl.iterate.seq_minibatches2(X, Y, batch_size=2, num_steps=3):
...     x, y = batch
...     print(x, y)

[[ 0. 1. 2.] [ 10. 11. 12.]] [[ 20. 21. 22.] [ 30. 31. 32.]]

[[ 3. 4. 5.] [ 13. 14. 15.]] [[ 23. 24. 25.] [ 33. 34. 35.]]

[[ 6. 7. 8.] [ 16. 17. 18.]] [[ 26. 27. 28.] [ 36. 37. 38.]]

Notes

  • Hint, if the input data are images, you can modify the source code data = np.zeros([batch_size, batch_len) to data = np.zeros([batch_size, batch_len, inputs.shape[1], inputs.shape[2], inputs.shape[3]]).

PTB dataset iteration

tensorlayer.iterate.ptb_iterator(raw_data, batch_size, num_steps)[source]

Generate a generator that iterates on a list of words, see PTB example. Yields the source contexts and the target context by the given batch_size and num_steps (sequence_length).

In TensorFlow’s tutorial, this generates batch_size pointers into the raw PTB data, and allows minibatch iteration along these pointers.

Parameters
  • raw_data (a list) – the context in list format; note that context usually be represented by splitting by space, and then convert to unique word IDs.

  • batch_size (int) – the batch size.

  • num_steps (int) – the number of unrolls. i.e. sequence_length

Yields
  • Pairs of the batched data, each a matrix of shape [batch_size, num_steps].

  • The second element of the tuple is the same data time-shifted to the

  • right by one.

:raises ValueError : if batch_size or num_steps are too high.:

Examples

>>> train_data = [i for i in range(20)]
>>> for batch in tl.iterate.ptb_iterator(train_data, batch_size=2, num_steps=3):
>>>     x, y = batch
>>>     print(x, y)
[[ 0  1  2] <---x                       1st subset/ iteration
 [10 11 12]]
[[ 1  2  3] <---y
 [11 12 13]]
[[ 3 4 5] <— 1st batch input 2nd subset/ iteration

[13 14 15]] <— 2nd batch input

[[ 4 5 6] <— 1st batch target

[14 15 16]] <— 2nd batch target

[[ 6 7 8] 3rd subset/ iteration

[16 17 18]]

[[ 7 8 9]

[17 18 19]]