In-memory caching for precomputed weights¶
Note
This caching is only related to the precomputed
backends in regrid().
Purpose¶
earthkit-geo provides an in-memory cache for precomputed interpolation weights (i.e. sparse interpolation matrices). When it is enabled, weights loaded from the disk are stored in memory and when we call regrid() with the same grids they do not have to be loaded from disk again. The cache can be configured to have a maximum size and eviction policy.
Note
Please note that the earthkit-geo in-memory cache configuration is managed through the Configuration.
In-memory cache policies¶
The primary config option to control the in-memory cache is precomputed-weights-memory-cache-policy, which can take the following values:
LRU cache policy¶
When the regrid-precomputed-weights-memory-cache-policy is “lru” first evicts the least recently used matrices from the in-memory cache (default). The cache eviction policy is applied before loading the weights to ensure that it will fit into the cache. When it is not possible the behaviour depends on the regrid-precomputed-weights-memory-cache-strict-mode option. The maximum memory size of the in-memory cache is defined by the regrid-precomputed-weights-maximum-memory-cache-size option. The default is 500 MB.
>>> from earthkit.geo import cache, config
>>> config.set("regrid-precomputed-weights-memory-cache-policy", "lru")
>>> config.get("regrid-precomputed-weights-memory-cache-policy")
'lru'
>>> config.get("regrid-precomputed-weights-maximum-memory-cache-size")
524288000
>>> config.get("regrid-precomputed-weights-memory-cache-strict-mode")
False
Largest cache policy¶
When the regrid-precomputed-weights-memory-cache-policy is “largest” first evicts the largest matrices from the in-memory cache. The cache eviction policy is applied before loading the weights to ensure that it will fit into the cache. When it is not possible the behaviour depends on the regrid-precomputed-weights-memory-cache-strict-mode option. The maximum memory size of the in-memory cache is defined by the regrid-precomputed-weights-maximum-memory-cache-size option. The default is 500 MB.
>>> from earthkit.geo import cache, config
>>> config.set("regrid-precomputed-weights-memory-cache-policy", "user")
>>> config.get("regrid-precomputed-weights-memory-cache-policy")
'user'
>>> config.get("regrid-precomputed-weights-maximum-memory-cache-size")
524288000
>>> config.get("regrid-precomputed-weights-memory-cache-strict-mode")
False
Unlimited cache policy¶
When the regrid-precomputed-weights-memory-cache-policy is “unlimited” will keep all matrices in memory.
>>> from earthkit.geo import cache, config
>>> config.set("regrid-precomputed-weights-memory-cache-policy", "unlimited")
>>> config.get("regrid-precomputed-weights-memory-cache-policy")
'unlimited'
Off cache policy¶
When the regrid-precomputed-weights-memory-cache-policy is “off” there is no cache, the matrices are always loaded from disk.
>>> from earthkit.geo import cache, config
>>> config.set("regrid-precomputed-weights-memory-cache-policy", "off")
>>> config.get("regrid-precomputed-weights-memory-cache-policy")
'off'
Getting the state of the in-memory cache¶
The current status of the in-memory cache can be retrieved using the precomputed_memory_cache_info() function. It returns a namedtuple with fields hits, misses, maxsize, currsize, count and policy.
>>> from earthkit.geo.regrid import precomputed_memory_cache_info
>>> precomputed_memory_cache_info()
CacheInfo(hits=9, misses=1, maxsize=524288000, currsize=259170724, count=1, policy='largest')
Clearing the in-memory cache¶
The in-memory cache can be cleared using the clear_precomputed_memory_cache() function.
>>> from earthkit.geo.regrid import clear_precomputed_memory_cache, precomputed_memory_cache_info
>>> clear_precomputed_memory_cache()
>>> precomputed_memory_cache_info()
CacheInfo(hits=0, misses=0, maxsize=524288000, currsize=0, count=0, policy='largest')
In-memory cache limits¶
Warning
These config options are only used when regrid-precomputed-weights-memory-cache-policy is largest or lru.
- regrid-precomputed-weights-maximum-memory-cache-size
The
regrid-precomputed-weights-maximum-memory-cache-sizeoption defines the maximum memory size of the in-memory cache in bytes. The default is 500 MB.- regrid-precomputed-weights-memory-cache-strict-mode
When the
regrid-precomputed-weights-memory-cache-strict-modeoption isTrue, raises ValueError if the weights cannot be fitted into the cache. IfFalseand the weights cannot be fitted into the cache it simply does not load the weights into the cache. The default isFalse.
In-memory cache config parameters¶
Name |
Default |
Description |
|---|---|---|
regrid‑precomputed‑weights‑maximum‑memory‑cache‑size |
‘500MB’ |
The maximum memory size of the in-memory precomputed weight cache in bytes.
Only used when |
regrid‑precomputed‑weights‑memory‑cache‑policy |
‘lru’ |
The in-memory precomputed weights cache policy. Valid values: off, unlimited, largest and lru. See In-memory caching for precomputed weights for more information. |
regrid‑precomputed‑weights‑memory‑cache‑strict‑mode |
False |
Raise exception if the weights cannot be fitted into the in-memory cache.
Only used when |
Other earthkit-geo config options can be found here.
Notebooks¶
Examples¶
import numpy as np
from earthkit.geo import config
from earthkit.geo.grids.array import regrid
from earthkit.geo.grids.utils import precomputed_memory_cache_info
# set memory cache with a maximum size of 100 MB to evict the largest matrices first
config.set(
regrid_precomputed_weights_memory_cache_policy="largest",
regrid_precomputed_weights_maximum_memory_cache_size=100 * 1024**2,
)
print(precomputed_memory_cache_info())
# create a random data array and regrid it
data = np.random.rand(5248)
interpolated_data = regrid(
data, in_grid={"grid": "O32"}, out_grid={"grid": [5, 5]}, backend="precomputed"
)
print(precomputed_memory_cache_info())
# repeat interpolation, this time the weights are loaded from the cache
data = np.random.rand(5248)
interpolated_data = regrid(
data, in_grid={"grid": "O32"}, out_grid={"grid": [5, 5]}, backend="precomputed"
)
print(precomputed_memory_cache_info())
output:
CacheInfo(hits=0, misses=0, maxsize=104857600, currsize=0, count=0, policy='largest')
CacheInfo(hits=0, misses=1, maxsize=104857600, currsize=102340, count=1, policy='largest')
CacheInfo(hits=1, misses=1, maxsize=104857600, currsize=102340, count=1, policy='largest')