How to debug TensorFlow
This Python tutorial is a part of our series of Python packages related tutorials.
Try a faster and easier way to manage your Python packages and resolve dependencies. Use Python 3.9 by ActiveState and build your own runtime with the packages and dependencies you need. The ActiveState Platform’s command line interface, the State Tool will automatically resolve dependencies for you to ensure your Python environment won’t be corrupted by installing incompatible dependencies.
Get started for free by creating an account on the ActiveState Platform or logging in with your GitHub account.
How to debug TensorFlow
TensorFlow is an open source Python library for complex numeric computation. It is also a language for computational graphs that represent neural networks, and may require debugging and error correction for code that does not always behave as intended.
The graph-based functionality of TensorFlow combined with the stochastic nature of many of the deep learning algorithms that it uses, makes debugging TensorFlow challenging. Certain traditional methods of debugging languages do not work with TensorFlow.
For example, inserting print statements helps with debugging imperative languages, but because of its graph structure TensorFlow is likely to ignore print statements.
Different Types of Bugs
Debugging in TensorFlow can refer to bugs in code, as well as to models that are not converging or not producing target values.
This post provides recommendations and best practices for avoiding both kinds of problems by using:
- Methods that reduce the need for debugging wherever possible
- Functions and debugging tools included in the TensorFlow library
TensorFlow Debug Methods
There are four principle methods for debugging TensorFlow:
- Print values within Session.run
- Use the tf.Print operation
- Increase logging
- Use the TensorFlow debugger and API functions
Debug TensorFlow by Printing values within Session.run
Create TensorFlow objects in a session environment, and run them in a print statement. This is an easy, safe method for getting tensor values.
A Session object encapsulates the environment in which other objects are executed and evaluated. An encapsulated session avoids having possible errors in the session affect functionality outside of the session. This simplifies debugging. For example:
import tensorflow as tf # Eager execution is automatic in TensorFlow 2.x # and needs to be disabled in this instance: tf.compat.v1.disable_eager_execution() # Create a graph: a = tf.constant(4.0) b = tf.constant(5.0) c = a * b # Launch the graph in a session: sess = tf.compat.v1.Session() # Evaluate tensor 'c' in this session: print(sess.run(c))
Figure 1. Print session.run() values.
Debug TensorFlow Using the tf.print Operation
Substitute Python print with tf.print to print tensor values during graph execution.
tf.print() is a TensorFlow operator that prints specified inputs to a specified output stream including potential errors, in case of debugging. tf.print() can help track values during execution. For example:
import tensorflow as tf import sys tensor = tf.range(8) # Print "[0 1 2 ... 7]" to sys.stderr: tf.print(tensor, output_stream=sys.stderr)
Debug TensorFlow by Increasing Logging
Logfiles can be a source of potential debugging information. Begin by importing the required library: import logging
In general, most TensorFlow logging facilities support different levels of severity (levels). TensorFlow supports the following five standard severity levels, in order of severity: DEBUG, ERROR, FATAL, INFO, * WARN
In order to make better use of logging, increase the verbosity level in TensorFlow logs by entering the following code in a python console:
TF_CPP_VMODULE=segment=2 convert_graph=2 convert_nodes=2 trt_engine_op=2
To return the TensorFlow logger instance, enter:
Debug TensorFlow Using the Tensor Debugger and API Functions
The TensorFlow debugger dumps debugging information like:
- TensorFlow Function constructions, such as:
- A compilation of Python functions decorated with @tf.function
- Op types
- Names (if available)
- Input and output tensors, and the associated stack traces
- Execution of TensorFlow operations (ops) and Functions and their stack traces, op types, names (if available) and contexts. Depending on the value of the tensor_debug_mode argument, the value(s) of the output tensors or more concise summaries of the tensor values will be dumped.
- A snapshot of Python source files involved in the execution of the TensorFlow program.
You can implement TensorFlow debugging and send the info to a specified location by adding a single line of code at the top of your TensorFlow application:
tf.debugging.experimental.enable_dump_debug_info('/tmp/my-tfdbg-dump s') # TensorFlow application code: Import tensorflow as tf ....
Figure 2: TensorFlow debugging:
TensorFlow debugging can be disabled by running:
Enable TensorFlow Numerics Checking
When tf.debugging.enable_check_numerics is activated it will stop execution as soon as a tensor containing Infinity or NaN is encountered. This method is effective only on the thread in which it is called.
import tensorflow as tf import numpy as np # Enable numeric checking: tf.debugging.enable_check_numerics() x = np.array([[0.0, -1.0], [7.0, 8.0]]) # Negative element in the input array will generate a NaN error. Because numeric checking # is enabled, the program will throw an error and print an error message: y = tf.math.sqrt(x)
Figure 3. Numeric checking output:
The numerics checking mechanism can be disabled with:
Decorate Python Functions with @tf.function for Error Detection
tf.function is a decorator function that converts Python code to a callable Tensorflow graph function. In this example, the tf.function decorator converts ‘add’ into a callable function:
@tf.function def add(a, b): return a + b add(tf.ones([2, 2]), tf.ones([2, 2]))
Figure 4. ‘add’ is called as a function by tf.function and returns a + b:
Utilize TensorFlow tf.debugging API Functions
Use the API to check for bugs, errors, and True/False conditions depending on specifications within each function.
For example, to check tensors for NaNs and Infinity values, you can use the tf.debugging.check_numerics function. When run, the function returns an InvalidArgument error if the tensor has any NaNs and Infinity values:
tf.debugging.check_numerics( tensor, message, name=None )
For a complete list of tf.debugging API functions, refer to the API docs.
A modern solution to Python package management – Try ActiveState’s Platform
The ActiveState Platform is a cloud-based build tool for Python. It provides build automation and vulnerability remediation for:
- Python language cores, including Python 2.7 and Python 3.5+
- Python packages and their dependencies, including:
- Transitive dependencies (ie., dependencies of dependencies)
- Linked C and Fortran libraries, so you can build data science packages
- Operating system-level dependencies for Windows, Linux, and macOS
- Shared dependencies (ie., OpenSSL)
- Find, fix and automatically rebuild a secure version of Python packages like Django and environments in minutes
Python Package Management In Action
Get a hands-on appreciation for how the ActiveState Platform can help you manage your dependencies for Python environments. Just run the following command to install Python 3.9 and our package manager, the State Tool:
powershell -Command "& $([scriptblock]::Create((New-Object Net.WebClient).DownloadString('https://platform.activestate.com/dl/cli/install.ps1'))) -activate-default ActiveState-Labs/Python-3.9Beta"
sh <(curl -q https://platform.activestate.com/dl/cli/install.sh) --activate-default ActiveState-Labs/Python-3.9Beta
Now you can run state install <packagename>. Learn more about how to use the State Tool to manage your Python environment.
Let us know your experience in the ActiveState Community forum.
The following tutorials will provide you with step-by-step instructions on how to work with other popular machine learning Python packages: