In-memory caching for precomputed weights

Note

This caching is only related to the precomputed backends in regrid().

Purpose

earthkit-geo provides an in-memory cache for precomputed interpolation weights (i.e. sparse interpolation matrices). When it is enabled, weights loaded from the disk are stored in memory and when we call regrid() with the same grids they do not have to be loaded from disk again. The cache can be configured to have a maximum size and eviction policy.

Note

Please note that the earthkit-geo in-memory cache configuration is managed through the Configuration.

In-memory cache policies

The primary config option to control the in-memory cache is precomputed-weights-memory-cache-policy, which can take the following values:

LRU cache policy

When the regrid-precomputed-weights-memory-cache-policy is “lru” first evicts the least recently used matrices from the in-memory cache (default). The cache eviction policy is applied before loading the weights to ensure that it will fit into the cache. When it is not possible the behaviour depends on the regrid-precomputed-weights-memory-cache-strict-mode option. The maximum memory size of the in-memory cache is defined by the regrid-precomputed-weights-maximum-memory-cache-size option. The default is 500 MB.

>>> from earthkit.geo import cache, config
>>> config.set("regrid-precomputed-weights-memory-cache-policy", "lru")
>>> config.get("regrid-precomputed-weights-memory-cache-policy")
'lru'
>>> config.get("regrid-precomputed-weights-maximum-memory-cache-size")
524288000
>>> config.get("regrid-precomputed-weights-memory-cache-strict-mode")
False

Largest cache policy

When the regrid-precomputed-weights-memory-cache-policy is “largest” first evicts the largest matrices from the in-memory cache. The cache eviction policy is applied before loading the weights to ensure that it will fit into the cache. When it is not possible the behaviour depends on the regrid-precomputed-weights-memory-cache-strict-mode option. The maximum memory size of the in-memory cache is defined by the regrid-precomputed-weights-maximum-memory-cache-size option. The default is 500 MB.

>>> from earthkit.geo import cache, config
>>> config.set("regrid-precomputed-weights-memory-cache-policy", "user")
>>> config.get("regrid-precomputed-weights-memory-cache-policy")
'user'
>>> config.get("regrid-precomputed-weights-maximum-memory-cache-size")
524288000
>>> config.get("regrid-precomputed-weights-memory-cache-strict-mode")
False

Unlimited cache policy

When the regrid-precomputed-weights-memory-cache-policy is “unlimited” will keep all matrices in memory.

>>> from earthkit.geo import cache, config
>>> config.set("regrid-precomputed-weights-memory-cache-policy", "unlimited")
>>> config.get("regrid-precomputed-weights-memory-cache-policy")
'unlimited'

Off cache policy

When the regrid-precomputed-weights-memory-cache-policy is “off” there is no cache, the matrices are always loaded from disk.

>>> from earthkit.geo import cache, config
>>> config.set("regrid-precomputed-weights-memory-cache-policy", "off")
>>> config.get("regrid-precomputed-weights-memory-cache-policy")
'off'

Getting the state of the in-memory cache

The current status of the in-memory cache can be retrieved using the precomputed_memory_cache_info() function. It returns a namedtuple with fields hits, misses, maxsize, currsize, count and policy.

>>> from earthkit.geo.regrid import precomputed_memory_cache_info
>>> precomputed_memory_cache_info()
CacheInfo(hits=9, misses=1, maxsize=524288000, currsize=259170724, count=1, policy='largest')

Clearing the in-memory cache

The in-memory cache can be cleared using the clear_precomputed_memory_cache() function.

>>> from earthkit.geo.regrid import clear_precomputed_memory_cache, precomputed_memory_cache_info
>>> clear_precomputed_memory_cache()
>>> precomputed_memory_cache_info()
CacheInfo(hits=0, misses=0, maxsize=524288000, currsize=0, count=0, policy='largest')

In-memory cache limits

Warning

These config options are only used when regrid-precomputed-weights-memory-cache-policy is largest or lru.

regrid-precomputed-weights-maximum-memory-cache-size

The regrid-precomputed-weights-maximum-memory-cache-size option defines the maximum memory size of the in-memory cache in bytes. The default is 500 MB.

regrid-precomputed-weights-memory-cache-strict-mode

When the regrid-precomputed-weights-memory-cache-strict-mode option is True, raises ValueError if the weights cannot be fitted into the cache. If False and the weights cannot be fitted into the cache it simply does not load the weights into the cache. The default is False.

In-memory cache config parameters

Name

Default

Description

regrid‑precomputed‑weights‑maximum‑memory‑cache‑size

‘500MB’

The maximum memory size of the in-memory precomputed weight cache in bytes. Only used when regrid-precomputed-weights-memory-cache-policy is "largest" or "lru". Can be set to None. See In-memory caching for precomputed weights for more information.

regrid‑precomputed‑weights‑memory‑cache‑policy

‘lru’

The in-memory precomputed weights cache policy. Valid values: off, unlimited, largest and lru. See In-memory caching for precomputed weights for more information.

regrid‑precomputed‑weights‑memory‑cache‑strict‑mode

False

Raise exception if the weights cannot be fitted into the in-memory cache. Only used when regrid-precomputed-weights-memory-cache-policy is "largest" or "lru". See In-memory caching for precomputed weights for more information.

Other earthkit-geo config options can be found here.

Notebooks

Examples

import numpy as np
from earthkit.geo import config
from earthkit.geo.grids.array import regrid
from earthkit.geo.grids.utils import precomputed_memory_cache_info

# set memory cache with a maximum size of 100 MB to evict the largest matrices first
config.set(
    regrid_precomputed_weights_memory_cache_policy="largest",
    regrid_precomputed_weights_maximum_memory_cache_size=100 * 1024**2,
)
print(precomputed_memory_cache_info())

# create a random data array and regrid it
data = np.random.rand(5248)
interpolated_data = regrid(
    data, in_grid={"grid": "O32"}, out_grid={"grid": [5, 5]}, backend="precomputed"
)
print(precomputed_memory_cache_info())

# repeat interpolation, this time the weights are loaded from the cache
data = np.random.rand(5248)
interpolated_data = regrid(
    data, in_grid={"grid": "O32"}, out_grid={"grid": [5, 5]}, backend="precomputed"
)
print(precomputed_memory_cache_info())

output:

CacheInfo(hits=0, misses=0, maxsize=104857600, currsize=0, count=0, policy='largest')
CacheInfo(hits=0, misses=1, maxsize=104857600, currsize=102340, count=1, policy='largest')
CacheInfo(hits=1, misses=1, maxsize=104857600, currsize=102340, count=1, policy='largest')