API - Iteration

Data iteration.

minibatches([inputs, targets, batch_size, ...]) Generate a generator that input a group of example in numpy.array and their labels, return the examples and labels by the given batchsize.
seq_minibatches(inputs, targets, batch_size, ...) Generate a generator that return a batch of sequence inputs and targets.
seq_minibatches2(inputs, targets, ...) Generate a generator that iterates on two list of words.
ptb_iterator(raw_data, batch_size, num_steps) Generate a generator that iterates on a list of words, see PTB tutorial.

Non-time series

tensorlayer.iterate.minibatches(inputs=None, targets=None, batch_size=None, shuffle=False)[source]

Generate a generator that input a group of example in numpy.array and their labels, return the examples and labels by the given batchsize.

Parameters:

inputs : numpy.array

  1. The input features, every row is a example.

targets : numpy.array

  1. The labels of inputs, every row is a example.

batch_size : int

The batch size.

shuffle : boolean

Indicating whether to use a shuffling queue, shuffle the dataset before return.

Notes

  • If you have two inputs, e.g. X1 (1000, 100) and X2 (1000, 80), you can ``np.hstack((X1, X2))

into (1000, 180) and feed into inputs, then you can split a batch of X1 and X2.

Examples

>>> X = np.asarray([['a','a'], ['b','b'], ['c','c'], ['d','d'], ['e','e'], ['f','f']])
>>> y = np.asarray([0,1,2,3,4,5])
>>> for batch in tl.iterate.minibatches(inputs=X, targets=y, batch_size=2, shuffle=False):
>>>     print(batch)
... (array([['a', 'a'],
...        ['b', 'b']],
...         dtype='<U1'), array([0, 1]))
... (array([['c', 'c'],
...        ['d', 'd']],
...         dtype='<U1'), array([2, 3]))
... (array([['e', 'e'],
...        ['f', 'f']],
...         dtype='<U1'), array([4, 5]))

Time series

Sequence iteration 1

tensorlayer.iterate.seq_minibatches(inputs, targets, batch_size, seq_length, stride=1)[source]

Generate a generator that return a batch of sequence inputs and targets. If batch_size = 100, seq_length = 5, one return will have 500 rows (examples).

Examples

  • Synced sequence input and output.
>>> X = np.asarray([['a','a'], ['b','b'], ['c','c'], ['d','d'], ['e','e'], ['f','f']])
>>> y = np.asarray([0, 1, 2, 3, 4, 5])
>>> for batch in tl.iterate.seq_minibatches(inputs=X, targets=y, batch_size=2, seq_length=2, stride=1):
>>>     print(batch)
... (array([['a', 'a'],
...        ['b', 'b'],
...         ['b', 'b'],
...         ['c', 'c']],
...         dtype='<U1'), array([0, 1, 1, 2]))
... (array([['c', 'c'],
...         ['d', 'd'],
...         ['d', 'd'],
...         ['e', 'e']],
...         dtype='<U1'), array([2, 3, 3, 4]))
...
...
  • Many to One
>>> return_last = True
>>> num_steps = 2
>>> X = np.asarray([['a','a'], ['b','b'], ['c','c'], ['d','d'], ['e','e'], ['f','f']])
>>> Y = np.asarray([0,1,2,3,4,5])
>>> for batch in tl.iterate.seq_minibatches(inputs=X, targets=Y, batch_size=2, seq_length=num_steps, stride=1):
>>>     x, y = batch
>>>     if return_last:
>>>         tmp_y = y.reshape((-1, num_steps) + y.shape[1:])
>>>     y = tmp_y[:, -1]
>>>     print(x, y)
... [['a' 'a']
... ['b' 'b']
... ['b' 'b']
... ['c' 'c']] [1 2]
... [['c' 'c']
... ['d' 'd']
... ['d' 'd']
... ['e' 'e']] [3 4]

Sequence iteration 2

tensorlayer.iterate.seq_minibatches2(inputs, targets, batch_size, num_steps)[source]

Generate a generator that iterates on two list of words. Yields (Returns) the source contexts and the target context by the given batch_size and num_steps (sequence_length), see PTB tutorial. In TensorFlow’s tutorial, this generates the batch_size pointers into the raw PTB data, and allows minibatch iteration along these pointers.

  • Hint, if the input data are images, you can modify the code as follow.
from
data = np.zeros([batch_size, batch_len)
to
data = np.zeros([batch_size, batch_len, inputs.shape[1], inputs.shape[2], inputs.shape[3]])
Parameters:

inputs : a list

the context in list format; note that context usually be represented by splitting by space, and then convert to unique word IDs.

targets : a list

the context in list format; note that context usually be represented by splitting by space, and then convert to unique word IDs.

batch_size : int

the batch size.

num_steps : int

the number of unrolls. i.e. sequence_length

Yields:

Pairs of the batched data, each a matrix of shape [batch_size, num_steps].

Raises:

ValueError : if batch_size or num_steps are too high.

Examples

>>> X = [i for i in range(20)]
>>> Y = [i for i in range(20,40)]
>>> for batch in tl.iterate.seq_minibatches2(X, Y, batch_size=2, num_steps=3):
...     x, y = batch
...     print(x, y)
...
... [[  0.   1.   2.]
... [ 10.  11.  12.]]
... [[ 20.  21.  22.]
... [ 30.  31.  32.]]
...
... [[  3.   4.   5.]
... [ 13.  14.  15.]]
... [[ 23.  24.  25.]
... [ 33.  34.  35.]]
...
... [[  6.   7.   8.]
... [ 16.  17.  18.]]
... [[ 26.  27.  28.]
... [ 36.  37.  38.]]

PTB dataset iteration

tensorlayer.iterate.ptb_iterator(raw_data, batch_size, num_steps)[source]

Generate a generator that iterates on a list of words, see PTB tutorial. Yields (Returns) the source contexts and the target context by the given batch_size and num_steps (sequence_length).

see PTB tutorial.

e.g. x = [0, 1, 2] y = [1, 2, 3] , when batch_size = 1, num_steps = 3, raw_data = [i for i in range(100)]

In TensorFlow’s tutorial, this generates batch_size pointers into the raw PTB data, and allows minibatch iteration along these pointers.

Parameters:

raw_data : a list

the context in list format; note that context usually be represented by splitting by space, and then convert to unique word IDs.

batch_size : int

the batch size.

num_steps : int

the number of unrolls. i.e. sequence_length

Yields:

Pairs of the batched data, each a matrix of shape [batch_size, num_steps].

The second element of the tuple is the same data time-shifted to the

right by one.

Raises:

ValueError : if batch_size or num_steps are too high.

Examples

>>> train_data = [i for i in range(20)]
>>> for batch in tl.iterate.ptb_iterator(train_data, batch_size=2, num_steps=3):
>>>     x, y = batch
>>>     print(x, y)
... [[ 0  1  2] <---x                       1st subset/ iteration
...  [10 11 12]]
... [[ 1  2  3] <---y
...  [11 12 13]]
...
... [[ 3  4  5]  <--- 1st batch input       2nd subset/ iteration
...  [13 14 15]] <--- 2nd batch input
... [[ 4  5  6]  <--- 1st batch target
...  [14 15 16]] <--- 2nd batch target
...
... [[ 6  7  8]                             3rd subset/ iteration
...  [16 17 18]]
... [[ 7  8  9]
...  [17 18 19]]