API - Database

This is the alpha version of database management system. If you have any trouble, please ask for help at tensorlayer@gmail.com .

Why Database

TensorLayer is designed for real world production, capable of large scale machine learning applications. TensorLayer database is introduced to address the many data management challenges in the large scale machine learning projects, such as:

  1. Finding training data from an enterprise data warehouse.

  2. Loading large datasets that are beyond the storage limitation of one computer.

  3. Managing different models with version control, and comparing them(e.g. accuracy).

  4. Automating the process of training, evaluating and deploying machine learning models.

With the TensorLayer system, we introduce this database technology to address the challenges above.

The database management system is designed with the following three principles in mind.

Everything is Data

Data warehouses can store and capture the entire machine learning development process. The data can be categorized as:

  1. Dataset: This includes all the data used for training, validation and prediction. The labels can be manually specified or generated by model prediction.

  2. Model architecture: The database includes a table that stores different model architectures, enabling users to reuse the many model development works.

  3. Model parameters: This database stores all the model parameters of each epoch in the training step.

  4. Tasks: A project usually include many small tasks. Each task contains the necessary information such as hyper-parameters for training or validation. For a training task, typical information includes training data, the model parameter, the model architecture, how many epochs the training task has. Validation, testing and inference are also supported by the task system.

  5. Loggings: The logs store all the metrics of each machine learning model, such as the time stamp, loss and accuracy of each batch or epoch.

TensorLayer database in principle is a keyword based search engine. Each model, parameter, or training data is assigned many tags. The storage system organizes data into two layers: the index layer, and the blob layer. The index layer stores all the tags and references to the blob storage. The index layer is implemented based on NoSQL document database such as MongoDB. The blob layer stores videos, medical images or label masks in large chunk size, which is usually implemented based on a file system. Our database is based on MongoDB. The blob system is based on the GridFS while the indexes are stored as documents.

Everything is identified by Query

Within the database framework, any entity within the data warehouse, such as the data, model or tasks is specified by the database query language. As a reference, the query is more space efficient for storage and it can specify multiple objects in a concise way. Another advantage of such a design is enabling a highly flexible software system. Many system can be implemented by simply rewriting different components, with many new applications can be implemented just by update the query without modification of any application code.

Preparation

In principle, the database can be implemented by any document oriented NoSQL database system. The existing implementation is based on MongoDB. Further implementations on other databases will be released depending on the progress. It will be straightforward to port our database system to Google Cloud, AWS and Azure. The following tutorials are based on the MongoDB implementation.

Installing and running MongoDB

The installation instruction of MongoDB can be found at MongoDB Docs. There are also many MongoDB services from Amazon or GCP, such as Mongo Atlas from MongoDB User can also use docker, which is a powerful tool for deploying software . After installing MongoDB, a MongoDB management tool with graphic user interface will be extremely useful. Users can also install Studio3T(MongoChef), which is powerful user interface tool for MongoDB and is free for non-commercial use studio3t.

Tutorials

Connect to the database

Similar with MongoDB management tools, IP and port number are required for connecting to the database. To distinguish the different projects, the database instances have a project_name argument. In the following example, we connect to MongoDB on a local machine with the IP localhost, and port 27017 (this is the default port number of MongoDB).

db = tl.db.TensorHub(ip='localhost', port=27017, dbname='temp',
      username=None, password='password', project_name='tutorial')

Dataset management

You can save a dataset into the database and allow all machines to access it. Apart from the dataset key, you can also insert a custom argument such as version and description, for better managing the datasets. Note that, all saving functions will automatically save a timestamp, allowing you to load staff (data, model, task) using the timestamp.

db.save_dataset(dataset=[X_train, y_train, X_test, y_test], dataset_name='mnist', description='this is a tutorial')

After saving the dataset, others can access the dataset as followed:

dataset = db.find_dataset('mnist')
dataset = db.find_dataset('mnist', version='1.0')

If you have multiple datasets that use the same dataset key, you can get all of them as followed:

datasets = db.find_all_datasets('mnist')

Model management

Save model architecture and parameters into database. The model architecture is represented by a TL graph, and the parameters are stored as a list of array.

db.save_model(net, accuracy=0.8, loss=2.3, name='second_model')

After saving the model into database, we can load it as follow:

net = db.find_model(sess=sess, accuracy=0.8, loss=2.3)

If there are many models, you can use MongoDB’s ‘sort’ method to find the model you want. To get the newest or oldest model, you can sort by time:

## newest model

net = db.find_model(sess=sess, sort=[("time", pymongo.DESCENDING)])
net = db.find_model(sess=sess, sort=[("time", -1)])

## oldest model

net = db.find_model(sess=sess, sort=[("time", pymongo.ASCENDING)])
net = db.find_model(sess=sess, sort=[("time", 1)])

If you save the model along with accuracy, you can get the model with the best accuracy as followed:

net = db.find_model(sess=sess, sort=[("test_accuracy", -1)])

To delete all models in a project:

db.delete_model()

If you want to specify which model you want to delete, you need to put arguments inside.

Event / Logging management

Save training log:

db.save_training_log(accuracy=0.33)
db.save_training_log(accuracy=0.44)

Delete logs that match the requirement:

db.delete_training_log(accuracy=0.33)

Delete all logging of this project:

db.delete_training_log()
db.delete_validation_log()
db.delete_testing_log()

Task distribution

A project usually consists of many tasks such as hyper parameter selection. To make it easier, we can distribute these tasks to several GPU servers. A task consists of a task script, hyper parameters, desired result and a status.

A task distributor can push both dataset and tasks into a database, allowing task runners on GPU servers to pull and run. The following is an example that pushes 3 tasks with different hyper parameters.

## save dataset into database, then allow other servers to use it
X_train, y_train, X_val, y_val, X_test, y_test = tl.files.load_mnist_dataset(shape=(-1, 784))
db.save_dataset((X_train, y_train, X_val, y_val, X_test, y_test), 'mnist', description='handwriting digit')

## push tasks into database, then allow other servers pull tasks to run
db.create_task(
    task_name='mnist', script='task_script.py', hyper_parameters=dict(n_units1=800, n_units2=800),
    saved_result_keys=['test_accuracy'], description='800-800'
)

db.create_task(
    task_name='mnist', script='task_script.py', hyper_parameters=dict(n_units1=600, n_units2=600),
    saved_result_keys=['test_accuracy'], description='600-600'
)

db.create_task(
    task_name='mnist', script='task_script.py', hyper_parameters=dict(n_units1=400, n_units2=400),
    saved_result_keys=['test_accuracy'], description='400-400'
)

## wait for tasks to finish
while db.check_unfinished_task(task_name='mnist'):
    print("waiting runners to finish the tasks")
    time.sleep(1)

## you can get the model and result from database and do some analysis at the end

The task runners on GPU servers can monitor the database, and run the tasks immediately when they are made available. In the task script, we can save the final model and results to the database, this allows task distributors to get the desired model and results.

## monitors the database and pull tasks to run
while True:
    print("waiting task from distributor")
    db.run_task(task_name='mnist', sort=[("time", -1)])
    time.sleep(1)

Example codes

See here.

TensorHub API

class tensorlayer.db.TensorHub(ip='localhost', port=27017, dbname='dbname', username='None', password='password', project_name=None)[source]

It is a MongoDB based manager that help you to manage data, network architecture, parameters and logging.

Parameters
  • ip (str) – Localhost or IP address.

  • port (int) – Port number.

  • dbname (str) – Database name.

  • username (str or None) – User name, set to None if you do not need authentication.

  • password (str) – Password.

  • project_name (str or None) – Experiment key for this entire project, similar with the repository name of Github.

ip, port, dbname and other input parameters

See above.

Type

see above

project_name

The given project name, if no given, set to the script name.

Type

str

db

See pymongo.MongoClient.

Type

mongodb client

check_unfinished_task(task_name=None, **kwargs)[source]

Finds and runs a pending task.

Parameters
  • task_name (str) – The task name.

  • kwargs (other parameters) – Users customized parameters such as description, version number.

Examples

Wait until all tasks finish in user’s local console

>>> while not db.check_unfinished_task():
>>>     time.sleep(1)
>>> print("all tasks finished")
>>> sess = tf.InteractiveSession()
>>> net = db.find_top_model(sess=sess, sort=[("test_accuracy", -1)])
>>> print("the best accuracy {} is from model {}".format(net._test_accuracy, net._name))
Returns

boolean

Return type

True for success, False for fail.

create_task(task_name=None, script=None, hyper_parameters=None, saved_result_keys=None, **kwargs)[source]

Uploads a task to the database, timestamp will be added automatically.

Parameters
  • task_name (str) – The task name.

  • script (str) – File name of the python script.

  • hyper_parameters (dictionary) – The hyper parameters pass into the script.

  • saved_result_keys (list of str) – The keys of the task results to keep in the database when the task finishes.

  • kwargs (other parameters) – Users customized parameters such as description, version number.

Examples

Uploads a task >>> db.create_task(task_name=’mnist’, script=’example/tutorial_mnist_simple.py’, description=’simple tutorial’)

Finds and runs the latest task >>> db.run_top_task(sort=[(“time”, pymongo.DESCENDING)]) >>> db.run_top_task(sort=[(“time”, -1)])

Finds and runs the oldest task >>> db.run_top_task(sort=[(“time”, pymongo.ASCENDING)]) >>> db.run_top_task(sort=[(“time”, 1)])

delete_datasets(**kwargs)[source]

Delete datasets.

Parameters

kwargs (logging information) – Find items to delete, leave it empty to delete all log.

delete_model(**kwargs)[source]

Delete model.

Parameters

kwargs (logging information) – Find items to delete, leave it empty to delete all log.

delete_tasks(**kwargs)[source]

Delete tasks.

Parameters

kwargs (logging information) – Find items to delete, leave it empty to delete all log.

Examples

>>> db.delete_tasks()
delete_testing_log(**kwargs)[source]

Deletes testing log.

Parameters

kwargs (logging information) – Find items to delete, leave it empty to delete all log.

Examples

  • see save_training_log.

delete_training_log(**kwargs)[source]

Deletes training log.

Parameters

kwargs (logging information) – Find items to delete, leave it empty to delete all log.

Examples

Save training log >>> db.save_training_log(accuracy=0.33) >>> db.save_training_log(accuracy=0.44)

Delete logs that match the requirement >>> db.delete_training_log(accuracy=0.33)

Delete all logs >>> db.delete_training_log()

delete_validation_log(**kwargs)[source]

Deletes validation log.

Parameters

kwargs (logging information) – Find items to delete, leave it empty to delete all log.

Examples

  • see save_training_log.

find_datasets(dataset_name=None, **kwargs)[source]

Finds and returns all datasets from the database which matches the requirement. In some case, the data in a dataset can be stored separately for better management.

Parameters
  • dataset_name (str) – The name/key of dataset.

  • kwargs (other events) – Other events, such as description, author and etc (optional).

Returns

params

Return type

the parameters, return False if nothing found.

find_top_dataset(dataset_name=None, sort=None, **kwargs)[source]

Finds and returns a dataset from the database which matches the requirement.

Parameters
  • dataset_name (str) – The name of dataset.

  • sort (List of tuple) – PyMongo sort comment, search “PyMongo find one sorting” and collection level operations for more details.

  • kwargs (other events) – Other events, such as description, author and etc (optinal).

Examples

Save dataset >>> db.save_dataset([X_train, y_train, X_test, y_test], ‘mnist’, description=’this is a tutorial’)

Get dataset >>> dataset = db.find_top_dataset(‘mnist’) >>> datasets = db.find_datasets(‘mnist’)

Returns

dataset – Return False if nothing found.

Return type

the dataset or False

find_top_model(sort=None, model_name='model', **kwargs)[source]

Finds and returns a model architecture and its parameters from the database which matches the requirement.

Parameters
  • sort (List of tuple) – PyMongo sort comment, search “PyMongo find one sorting” and collection level operations for more details.

  • model_name (str or None) – The name/key of model.

  • kwargs (other events) – Other events, such as name, accuracy, loss, step number and etc (optinal).

Examples

  • see save_model.

Returns

network – Note that, the returned network contains all information of the document (record), e.g. if you saved accuracy in the document, you can get the accuracy by using net._accuracy.

Return type

TensorLayer Model

run_top_task(task_name=None, sort=None, **kwargs)[source]

Finds and runs a pending task that in the first of the sorting list.

Parameters
  • task_name (str) – The task name.

  • sort (List of tuple) – PyMongo sort comment, search “PyMongo find one sorting” and collection level operations for more details.

  • kwargs (other parameters) – Users customized parameters such as description, version number.

Examples

Monitors the database and pull tasks to run >>> while True: >>> print(“waiting task from distributor”) >>> db.run_top_task(task_name=’mnist’, sort=[(“time”, -1)]) >>> time.sleep(1)

Returns

boolean

Return type

True for success, False for fail.

save_dataset(dataset=None, dataset_name=None, **kwargs)[source]

Saves one dataset into database, timestamp will be added automatically.

Parameters
  • dataset (any type) – The dataset you want to store.

  • dataset_name (str) – The name of dataset.

  • kwargs (other events) – Other events, such as description, author and etc (optinal).

Examples

Save dataset >>> db.save_dataset([X_train, y_train, X_test, y_test], ‘mnist’, description=’this is a tutorial’)

Get dataset >>> dataset = db.find_top_dataset(‘mnist’)

Returns

boolean

Return type

Return True if save success, otherwise, return False.

save_model(network=None, model_name='model', **kwargs)[source]

Save model architecture and parameters into database, timestamp will be added automatically.

Parameters
  • network (TensorLayer Model) – TensorLayer Model instance.

  • model_name (str) – The name/key of model.

  • kwargs (other events) – Other events, such as name, accuracy, loss, step number and etc (optinal).

Examples

Save model architecture and parameters into database. >>> db.save_model(net, accuracy=0.8, loss=2.3, name=’second_model’)

Load one model with parameters from database (run this in other script) >>> net = db.find_top_model(accuracy=0.8, loss=2.3)

Find and load the latest model. >>> net = db.find_top_model(sort=[(“time”, pymongo.DESCENDING)]) >>> net = db.find_top_model(sort=[(“time”, -1)])

Find and load the oldest model. >>> net = db.find_top_model(sort=[(“time”, pymongo.ASCENDING)]) >>> net = db.find_top_model(sort=[(“time”, 1)])

Get model information >>> net._accuracy … 0.8

Returns

boolean

Return type

True for success, False for fail.

save_testing_log(**kwargs)[source]

Saves the testing log, timestamp will be added automatically.

Parameters

kwargs (logging information) – Events, such as accuracy, loss, step number and etc.

Examples

>>> db.save_testing_log(accuracy=0.33, loss=0.98)
save_training_log(**kwargs)[source]

Saves the training log, timestamp will be added automatically.

Parameters

kwargs (logging information) – Events, such as accuracy, loss, step number and etc.

Examples

>>> db.save_training_log(accuracy=0.33, loss=0.98)
save_validation_log(**kwargs)[source]

Saves the validation log, timestamp will be added automatically.

Parameters

kwargs (logging information) – Events, such as accuracy, loss, step number and etc.

Examples

>>> db.save_validation_log(accuracy=0.33, loss=0.98)