Tensorflow: solution to memory exhaustion due to repeated loading of models

I had a few deep learning models saved. From a test data, I wanted to make subsets in specific way and evaluate them. Each subset was a combination of few samples of the test data chosen randomly. I wanted to evaluate thousands of such subsets on every model.

For this, the program needed to load the models for each subset. I could have loaded all the models in one workspace but my system did not have that much amount of memory.

As the evaluation proceeded, the memory usage went up even after clearing up the memory after each evaluation as follows:

Python

import gc

for range(n):
	
	evaluation_function(*args, **kwargs)
	
	# delete model and flush memory
	del model
	tf.compat.v1.reset_default_graph()
	tf.keras.backend.clear_session()
	tf.config.experimental.reset_memory_stats(sel_gpu)
	gc.collect()

import gc

for range(n):
	
	evaluation_function(*args, **kwargs)
	
	# delete model and flush memory
	del model
	tf.compat.v1.reset_default_graph()
	tf.keras.backend.clear_session()
	tf.config.experimental.reset_memory_stats(sel_gpu)
	gc.collect()

Following is the memory usage log:

log text

[After dataset 0] Memory usage: 605.10 MB

[After dataset 220] Memory usage: 19494.14 MB

[After dataset 230] Memory usage: 20301.51 MB

[After dataset 240] Memory usage: 21134.62 MB

[After dataset 250] Memory usage: 21952.45 MB

[After dataset 260] Memory usage: 22774.45 MB

[After dataset 270] Memory usage: 23605.70 MB

[After dataset 280] Memory usage: 24416.38 MB

[After dataset 290] Memory usage: 25248.49 MB

[After dataset 300] Memory usage: 26067.98 MB

[After dataset 310] Memory usage: 26876.53 MB

[After dataset 320] Memory usage: 27712.44 MB

[After dataset 0] Memory usage: 605.10 MB

[After dataset 220] Memory usage: 19494.14 MB

[After dataset 230] Memory usage: 20301.51 MB

[After dataset 240] Memory usage: 21134.62 MB

[After dataset 250] Memory usage: 21952.45 MB

[After dataset 260] Memory usage: 22774.45 MB

[After dataset 270] Memory usage: 23605.70 MB

[After dataset 280] Memory usage: 24416.38 MB

[After dataset 290] Memory usage: 25248.49 MB

[After dataset 300] Memory usage: 26067.98 MB

[After dataset 310] Memory usage: 26876.53 MB

[After dataset 320] Memory usage: 27712.44 MB

The only thing that worked was to run the evaluation function in a separate python process. Following is the format how I did this.

Python

# imports

# helper functions

# helper statements

def evaluation_function(*args, **kwargs)
  # statments
  # call evaluation helper functions

# import Process
from multiprocessing import Process

desired_iters = 1000
c = 0
while c < desired_iters:
    # run prediction in separate process
    p = Process(
        target = evaluation_function,
        args=(arg1, arg2),
        kwargs={key1: value1,
                key2: value2}
        )
                 
    p.start()
    p.join() # Wait for the subprocess to finish

    c += 1

# imports

# helper functions

# helper statements

def evaluation_function(*args, **kwargs)
  # statments
  # call evaluation helper functions

# import Process
from multiprocessing import Process

desired_iters = 1000
c = 0
while c < desired_iters:
    # run prediction in separate process
    p = Process(
        target = evaluation_function,
        args=(arg1, arg2),
        kwargs={key1: value1,
                key2: value2}
        )
                 
    p.start()
    p.join() # Wait for the subprocess to finish

    c += 1

For each iteration of the Process, I evaluated only so many sets as my system memory could have handled. For each process the result was stored in separate file and later combined and analyzed.

Following memory usage log shows how after each Process, the memory usage drops as seen below:

log text

[After dataset 430] Memory usage: 36763.20 MB

[After dataset 440] Memory usage: 37616.81 MB

[After dataset 450] Memory usage: 38419.14 MB

[After dataset 460] Memory usage: 39266.66 MB

[After dataset 470] Memory usage: 40063.92 MB

[After dataset 480] Memory usage: 40867.15 MB

[After dataset 490] Memory usage: 41704.53 MB

[After dataset 500] Memory usage: 42522.88 MB

[After dataset 0] Memory usage: 605.57 MB

[After dataset 10] Memory usage: 2210.55 MB

[After dataset 20] Memory usage: 3034.72 MB

[After dataset 30] Memory usage: 3849.43 MB

[After dataset 40] Memory usage: 4684.55 MB

[After dataset 50] Memory usage: 5503.93 MB

[After dataset 60] Memory usage: 6324.65 MB

[After dataset 430] Memory usage: 36763.20 MB

[After dataset 440] Memory usage: 37616.81 MB

[After dataset 450] Memory usage: 38419.14 MB

[After dataset 460] Memory usage: 39266.66 MB

[After dataset 470] Memory usage: 40063.92 MB

[After dataset 480] Memory usage: 40867.15 MB

[After dataset 490] Memory usage: 41704.53 MB

[After dataset 500] Memory usage: 42522.88 MB

[After dataset 0] Memory usage: 605.57 MB

[After dataset 10] Memory usage: 2210.55 MB

[After dataset 20] Memory usage: 3034.72 MB

[After dataset 30] Memory usage: 3849.43 MB

[After dataset 40] Memory usage: 4684.55 MB

[After dataset 50] Memory usage: 5503.93 MB

[After dataset 60] Memory usage: 6324.65 MB

The evaluation_function function can be a complex function that calls other helper functions. All those helper functions are automatically loaded into the subprocess. The global variables in the script are also utilized.

Note that this code format works fine on Linux system. The Windows systems may be unable to make some of the imported packages available in the subprocess.

Probably Iris

Tensorflow: solution to memory exhaustion due to repeated loading of models

More posts

Generative adversarial networks (GAN)

RNA-Seq Analysis in WSL – Part 3 : Differential expression analysis using Ballgown

RNA-Seq Analysis in WSL – Part 2 : Raw sequence reads to transcript abundance

RNA-Seq Analysis in WSL – Part 1 : Installation of tools

Probably Iris

Tutorials+

Python+

Tensorflow+

Tensorflow: solution to memory exhaustion due to repeated loading of models

More posts

Generative adversarial networks (GAN)

RNA-Seq Analysis in WSL – Part 3 : Differential expression analysis using Ballgown

RNA-Seq Analysis in WSL – Part 2 : Raw sequence reads to transcript abundance

RNA-Seq Analysis in WSL – Part 1 : Installation of tools