Configuration

New in version 1.0.0.

Note

At the moment this configuration is only related to the precomputed backend in regrid().

earthkit-geo maintains a global configuration.

The configuration is automatically loaded from and saved into a yaml file located at ~/.config/earthkit/geo/config.yaml. An alternative path can be specified via the EARTHKIT_GEO_CONFIG_FILE environmental variable (it is only read at startup).

The configuration can be accessed and modified from Python. The configuration options can also be defined as environment variables, which take precedence over the config file.

See the following notebooks for examples:

Accessing configuration options

The earthkit-geo configuration can be accessed using the python API:

import earthkit.regrid

# Access one of the config options
cache_path = earthkit.geo.config.get("user-cache-directory")
print(cache_path)

# If this is the last line of a Notebook cell, this
# will display a table with all the current configuration
earthkit.geo.config

Warning

When an environment variable is set, it takes precedence over the config parameter, and its value is returned from get().

Changing configuration

Note

It is recommended to restart your Jupyter kernels after changing or resetting config options.

The earthkit-geo configuration can be modified using the python API:

import earthkit.geo

# Change the location of the user defined cache:
earthkit.geo.config.set("user-cache-directory", "/big-disk/earthkit-geo-cache")

# Change the download timeout
earthkit.geo.config.set("url-download-timeout", "1m")

# Multiple values can be set together. The argument list
# can be a dictionary:
earthkit.geo.config.set({"url-download-timeout": "1m", "check-out-of-date-urls": True})

# Alternatively, we can use keyword arguments. However, because
# the “-” character is not allowed in variable names in Python we have
# to replace “-” with “_” in all the keyword arguments:
earthkit.geo.config.set(url_download_timeout="1m", check_out_of_date_urls=True)

Warning

When an environment variable is set, the new value provided for set() is saved into the config file but get() wil still return the value of the environment variable. A warning is also generated.

Temporary configuration

We can create a temporary configuration (as a context manager) as a copy of the original configuration. We will still refer to it as “config”, but it is completely independent from the original object and changes are not saved into the yaml file (even when config.autosave is True).

import earthkit.regrid

print(earthkit.geo.config.get("url-download-timeout"))

with earthkit.geo.config.temporary():
    earthkit.geo.config.set("url-download-timeout", 5)
    print(earthkit.geo.config.get("url-download-timeout"))

# Temporary config can also be created with arguments:
with earthkit.geo.config.temporary("url-download-timeout", 11):
    print(earthkit.geo.config.get("url-download-timeout"))

Output:

30
5
11

Warning

When an environment variable is set, the same rules applies as for set().

Resetting configuration

Note

It is recommended to restart your Jupyter kernels after changing or resetting the configuration.

The earthkit-geo configuration can be reset using the python API:

import earthkit.regrid

# Reset a named config option to its default value
earthkit.geo.config.reset("user-cache-directory")

# Reset all the config options to their default values
earthkit.geo.config.reset()

Warning

When an environment variable is set, the same rules applies as for set().

Environment variables

Each configuration parameter has a corresponding environment variable (see the full list here). When an environment variable is set, it takes precedence over the config parameter as the following examples show.

First, let us assume that the value of url-download-timeout is 30 in the config file and no environment variable is set.

>>> from earthkit.geo import config
>>> config.get("url-download-timeout")
30

Then, set the environment variable EARTHKIT_GEO_URL_DOWNLOAD_TIMEOUT.

export EARTHKIT_GEO_URL_DOWNLOAD_TIMEOUT=5
>>> from earthkit.geo import config
>>> config.get("url-download-timeout")
5
>>> config.env()
{'url-download-timeout': ('EARTHKIT_GEO_URL_DOWNLOAD_TIMEOUT', '5')}
>>> config.set("url-download-timeout", 10)
UserWarning: Config option 'url-download-timeout' is also set by environment variable
'EARTHKIT_GEO_URL_DOWNLOAD_TIMEOUT'.The environment variable takes precedence and
its value is returned when calling get(). Still, the value set here will be
saved to the config file.
>>> config.get("url-download-timeout")
5

Finally, unset the environment variable and check the config value again, which is now the value from the config file.

unset EARTHKIT_GEO_URL_DOWNLOAD_TIMEOUT
>>> from earthkit.geo import config
>>> config.get("url-download-timeout")
10

See also the following notebook:

List of configuration parameters

This is the list of all the config parameters:

Name

Default

Description

cache‑policy

‘user’

Caching policy. Valid values: off, temporary and user. See Disk-based caching of regridding precomputed weights caching for more information.

check‑out‑of‑date‑urls

False

Perform a HTTP request to check if the remote version of a cache file has changed

download‑out‑of‑date‑urls

False

Re-download URLs when the remote version of a cached file as been changed

maximum‑cache‑disk‑usage

None

Specify maximum disk usage as a percentage of the full disk capacity on the filesystem the cache is located (e.g.: 90%). When the total disk usage exceeds this limit (it’s not limited to the cache usage alone), earthkit-geo evicts older cached entries until the usage is below the specified limit. Can be set to None. Ignored when cache-policy is off. See Disk-based caching of regridding precomputed weights caching for more information.

maximum‑cache‑size

‘5GB’

Maximum disk space used by the earthkit-geo cache (e.g.: 100G or 2T). When exceeded, earthkit-geo evicts older cached entries until the usage is below the specified limit. Can be set to None. Ignored when cache-policy is off. See Disk-based caching of regridding precomputed weights caching for more information.

regrid‑precomputed‑weights‑maximum‑memory‑cache‑size

‘500MB’

The maximum memory size of the in-memory precomputed weight cache in bytes. Only used when regrid-precomputed-weights-memory-cache-policy is "largest" or "lru". Can be set to None. See In-memory caching for precomputed weights for more information.

regrid‑precomputed‑weights‑memory‑cache‑policy

‘lru’

The in-memory precomputed weights cache policy. Valid values: off, unlimited, largest and lru. See In-memory caching for precomputed weights for more information.

regrid‑precomputed‑weights‑memory‑cache‑strict‑mode

False

Raise exception if the weights cannot be fitted into the in-memory cache. Only used when regrid-precomputed-weights-memory-cache-policy is "largest" or "lru". See In-memory caching for precomputed weights for more information.

temporary‑cache‑directory‑root

None

Parent of the cache directory when cache-policy is temporary. See Disk-based caching of regridding precomputed weights caching for more information.

temporary‑directory‑root

None

Parent of the temporary directory when cache-policy is off. See Disk-based caching of regridding precomputed weights caching for more information.

url‑download‑timeout

’30s’

Timeout when downloading from an url.

user‑cache‑directory

‘~/.cache/earthkit‑geo’

Cache directory used when cache-policy is user. See Disk-based caching of regridding precomputed weights caching for more information.

List of environment variables

This is the list of the config environment variables:

Config option name

Environment variable

cache‑policy

EARTHKIT_GEO_CACHE_POLICY

check‑out‑of‑date‑urls

EARTHKIT_GEO_CHECK_OUT_OF_DATE_URLS

download‑out‑of‑date‑urls

EARTHKIT_GEO_DOWNLOAD_OUT_OF_DATE_URLS

maximum‑cache‑disk‑usage

EARTHKIT_GEO_MAXIMUM_CACHE_DISK_USAGE

maximum‑cache‑size

EARTHKIT_GEO_MAXIMUM_CACHE_SIZE

regrid‑precomputed‑weights‑maximum‑memory‑cache‑size

EARTHKIT_GEO_REGRID_PRECOMPUTED_WEIGHTS_MAXIMUM_MEMORY_CACHE_SIZE

regrid‑precomputed‑weights‑memory‑cache‑policy

EARTHKIT_GEO_REGRID_PRECOMPUTED_WEIGHTS_MEMORY_CACHE_POLICY

regrid‑precomputed‑weights‑memory‑cache‑strict‑mode

EARTHKIT_GEO_REGRID_PRECOMPUTED_WEIGHTS_MEMORY_CACHE_STRICT_MODE

temporary‑cache‑directory‑root

EARTHKIT_GEO_TEMPORARY_CACHE_DIRECTORY_ROOT

temporary‑directory‑root

EARTHKIT_GEO_TEMPORARY_DIRECTORY_ROOT

url‑download‑timeout

EARTHKIT_GEO_URL_DOWNLOAD_TIMEOUT

user‑cache‑directory

EARTHKIT_GEO_USER_CACHE_DIRECTORY