komanawa.BASE.runtime_memtest#

created matt_dumont on: 7/4/25

A runtime and memory test for the BASE solver. These can either be imported and run or this script can be called directly. python komanawa.BASE.runtime_memtest [memtest|runtimetest] if neither is specified, it will run both tests.

memtest requires an optional dependency of memory_profiler, which can be installed with pip install memory-profiler.

Expected runtime and maximum memory usage (from my machine, AMD Ryzen 5 7600X 6-Core Processor):

  • Runtime Test: ~ 117 seconds for 4 chains and 25000 iterations with 1 year inflections (39 inflection points).

  • Memory Test: ~ 150 MiB (note runtime was much higher, around 380 seconds due to the overhead of memory profiling).

Classes#

BASESolver

class for predicting the source concentration from the BPEFM model

BaseSampledParam

Schema for a sampled parameter which captures the distribution and arguments for the distribution to check that the distribution is valid and has not changed.

NormalPathGenerator

A base class for the path change generators which defines the KS test for the path change generator.

Functions#

memory_profile()

a memory profile for the BASE solver using a temporary directory to avoid cluttering the filesystem.

runtime_test([nchains, niterations, n_inflections])

A runtime test for the BASE solver using a temporary directory to avoid cluttering the filesystem.

Module Contents#

class BASESolver(save_dir, n_inflections, ts_data, mrt, mrt_p1, mrt_p2, frac_p1, f_p1, f_p2, precision=2, min_conc=0.01, inflection_times=None, cdf_inflection_start=None, sigma=None, interpolator='linear', interpolator_kwargs=None, serialize_receptor_data=True, predict_ncores=None, nchains=5, niterations=50000, start_age=np.nan, verbose=True)#

class for predicting the source concentration from the BPEFM model

BASE: Bayesian approach to source estimation. This class uses MT-DREAM(ZS) algorithm to conduct Bayesian inference on the source concentration for a measured time series data and a BPEFM model.

The MT-DREAM(ZS) algorithm was introduced in: Laloy, E. & Vrugt, J. A. High-dimensional posterior exploration of hydrologic models using multiple-try DREAM(ZS) and high-performance computing. Water Resources Research 48, W01526 (2012).

Instance Variables:

Variables:
  • age_bounds – Age bounds for the model (years since start date)

  • age_fractions – Age fractions for age in the BEPFM model (see komanawa.gw_age_tools.exponential_piston_flow.make_age_dist)

  • age_step – Age step for the BEPFM model (see komanawa.gw_age_tools.exponential_piston_flow.make_age_dist)

  • ages – Ages for the BEPFM model (see komanawa.gw_age_tools.exponential_piston_flow.make_age_dist)

  • end_date – End date of the observed time series data (pd.Timestamp)

  • f_p1 – Fraction of the total mass in the first peak (age model parameter)

  • f_p2 – Fraction of the total mass in the second peak (age model parameter)

  • frac_p1 – Fraction of the total mass in the first peak (age model parameter)

  • inflection_times – Inflection times for the model (years since start date). Source concentration will be parameterized at these times. Note that len(inflection_times) = n_inflections + 2, where n_inflections is passed in the init as the method automatically includes the start and end of the time series

  • measured_age_bounds – Age bounds for the measured time series data (years since start date)

  • mrt – Mean residence time for the age distribution (yrs) (age model parameter)

  • mrt_p1 – Mean residence time for the first peak (yrs) (age model parameter)

  • mrt_p2 – Mean residence time for the second peak (yrs) (age model parameter)

  • nchains – Number of pydream chains for the model. note each chain is a separate process so if nchains > the processors available then the process will be MUCH slower

  • niterations – The number of algorithm iterations to run.

  • precision – precision of the age distribution (decimal places years) default is 2, approximately monthly

  • predict_ncores – Number of cores for prediction (see param)

  • save_dir – working directory for the model

  • serialize_receptor_data – Boolean flag to serialize receptor data

  • start_date – Start date of the time series data (pd.Timestamp)

  • t – time index for the time series data (decimal years since start date)

  • ts_data – Time series data (pandas Series) Index=self.t, values=concentration

Class attributes:

Variables:
  • available_interpolators – tuple of available interpolators

  • all_t – not used in this class, but used in the child classes

  • _force_old_predict – a flag to force the use of the old predict method for testing purposes

Basic Usage:

from komanawa.BASE import BASESolver, generators, BaseSampledParam
from pathlib import Path
import matplotlib.pyplot as plt

# Set up the problem
first_model_name = 'first_model'
n_inflections = 5  # this will increase to 7 with the addition of a start and end point
ndata = get_ndata()  # get your data, this should be a pandas series with a datetime index
output_dir = Path.home()
cdf_threshold = 0.15
mrt = 12
f_p1 = 0.7

# create the solver
solver = BASESolver(
            save_dir=output_dir,
            ts_data=ndata,
            mrt=mrt,
            mrt_p1=mrt,
            mrt_p2=mrt,
            frac_p1=1,
            f_p1=f_p1,
            f_p2=f_p1,
            n_inflections=n_inflections,
            cdf_inflection_start=cdf_threshold,
        )

# set priors
lows = [0.001, 0.001, 0.1, 0.1, 0.2, 0.5, 1.0]
ups = [0.5, 2, 6, 6, 6, 6, 6]
params = BaseSampledParam(distribution=generators.UniformPathGenerator, low=lows, up=ups, n=7)

# run the model
run = solver.run_base(model_name=first_model_name,
        params=params, verbose=False)

# plot the results
nplot = 0.2 # plot 20% of the results
fig, ax = solver.plot_best_source_receptor(first_model_name, nplot=nplot, receptor_spacing=1)
plt.show()
Parameters:
  • save_dir – directory to run dreamzs in, files are named via model name kwarg

  • n_inflections – number of inflection times to use (int) or frequency of inflection times (see https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases)

  • ts_data – time series data to fit to

  • mrt – mean residence time of the source (yrs)

  • mrt_p1 – mean residence time of the first piston (yrs)

  • mrt_p2 – mean residence time of the second piston (yrs)

  • frac_p1 – fraction of the source that is in the first piston

  • f_p1 – fraction of exponential flow the first piston

  • f_p2 – fraction of exponential flow the second piston (use a dummy value if frac_p1 = 1)

  • precision – precision of the age distribution (decimal places years) default is 2, approximately monthly

  • min_conc – minimum concentration for the interpolated source concentration, default is 0.01. This prevents weird interpolation behaviours in the early poorly constrained part of the time series. This is not a hard limit, but rather a minimum value for the interpolated source concentration. The actual value will be set to the maxumum of the interpolated value and min_conc.

  • inflection_times – None or list of datetimes of the inflection times. If None then the inflection times will be evenly spaced between the start and end of the ages + time series only one of cdf_inflection_start or inflection_times must be None inflection times must be None if n_inflections is a frequency code

  • cdf_inflection_start – None or float, the threshold for the cdf of ages to start the evenly spaced inflection points only one of cdf_inflection_start or inflection_times must be None.

  • sigma – None or float or array of floats, the uncertainty in the time series data, if None then no uncertainty is used and self.log_likelihood_no_sigma is used, if a float then the same uncertainty is used for all points, if an array then the uncertainty is used for each point. Regardless if sigma is not None then self.log_likelihood_sigma is used.

  • interpolator

    str, the interpolator to use to interpolate the source concentration from the n_inflection points to the needed frequency for the 1d age function (BEPFM), options are:

    • ’linear’: Piecewise linear interpolation (default)

    • ’cubicspline’:Cubic spline data interpolator., see scipy.interpolate.CubicSpline

    • ’pchip’: PCHIP 1-D monotonic cubic interpolation., see scipy.interpolate.PchipInterpolator

    • ’akima’: Akima interpolator, see scipy.interpolate.Akima1DInterpolator

  • interpolator_kwargs – None or dict, additional kwargs passed to the interpolator (see interpolator)

  • serialize_receptor_data – bool, whether to serialize the receptor data for later use (increase diskuse but decrease runtime)

  • predict_ncores – None or int, number of cores to use for prediction, if None then use all available (virtual) cores

  • nchains – number of chains to run in dreamzs note each chain is a separate process so if nchains > the processors available then the process will be MUCH slower

  • niterations – The number of algorithm iterations to run. The default is 50000. Higher values will take longer to run but may produce better results.

  • start_age – start age for the age distribution (yrs) default is np.nan which will use the maximum of the mrt_p1 and mrt_p2

  • verbose – bool, whether to print progress to the console

convert_age_to_decimal_years(age)#

convert an age to a decimal year

Parameters:

age – years since start date

Returns:

decimal year (e.g. 1950.54)

convert_age_to_dt(age)#

convert an age to a datetime

Parameters:

age – age(s) to convert (years since start date)

Returns:

datetime(s)

convert_decimal_years_to_age(decimal_years)#

convert a decimal year to an age

Parameters:

decimal_years – decimal years to convert e.g. 1950.54

Returns:

age (float) years since start date

convert_decimal_years_to_dt(decimal_years)#

convert a decimal year to a datetime

Parameters:

decimal_years – decimal years to convert e.g. 1950.54

Returns:

datetime

convert_dt_to_age(dt)#

convert a datetime to an age (years since start date)

Parameters:

dt – datetime(s) to convert

Returns:

age(s) years from start date

convert_dt_to_decimal_years(dt)#

convert a datetime to a decimal year

Parameters:

dt – datetime to convert

Returns:

decimal year (e.g. 1950.54)

export_source_receptor(model_name, outdir, nexport, receptor_spacing=1, percentiles=(1, 5, 10, 25), tvals=None, project_forward=None, forward_ts_deltas=None, forward_ts_deltas_type='absolute', percentile_format='hdf', include_raw=False)#

export the source and receptor data to csv files or hdf file store

The HDF file will contain the following keys the csv files will have the same suffixes as the keys below:

  • source_percentiles: source data percentiles

  • receptor_percentiles: receptor data percentiles

  • measured: measured data

  • source_raw: optional if include_raw=True, source data

  • receptor_raw: optional if include_raw=True, receptor data

Parameters:
  • outdir – output directory

  • nexport – number of results to export (int) or fraction of results to export (float, nplot<=1)

  • receptor_spacing – space between receptor prediction points

  • percentiles – tuple of percentiles to export fill between. e.g. a percentile of 5 will plot the 5th and 95th

  • tvals – None or list of times to sample at (years since data starts, self.start_date), Note tvals serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series

  • forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:

  • if ‘absolute’:
    • forward_source = ss_conc + forward_ts_deltas

  • if ‘fraction’:
    • forward_source = ss_conc * forward_ts_deltas

Parameters:

percentile_format – str, ‘hdf’ or ‘csv’ format to save the percentiles

Returns:

get_parameter_log(model_name)#

get the parameter log for the model run from the model name :param model_name: name for the model (str) :return: dict of parameter names and str values

get_rmse(model_name, ntest, raw=False)#

get the rmse for the model run from the model name

Parameters:
  • model_name – name for the model (str)

  • ntest – number of test simulations to use for the rmse

  • raw – if True return the raw rmse values, if False return the 50th percentile and 5th and 95th percentiles

Returns:

rmse (float)

get_ss_conc(model_name, nplot, site=None)#

get the steady state concentration (e.g. the last concentration in the time series) for the model run :param model_name: model name :param nplot: number of results to plot (int) or fraction of results to plot (float, nplot<=1) :param site: None (not used, for compatibility with MultiSiteDilutionBASESolver) :return: pd.Series of steady state concentrations

load_logps_params(model_name, _concat=True)#

load the unsorted logps and sampled parameters from the model run

Parameters:

model_name – name for the model (str)

Returns:

log_p (np.array) shape=(nsimulations)

Returns:

params (np.array) shape=(nsimulations, nparams)

log_likelihood_no_sigma(params)#

calculate the log likelihood of the model (where the sigma of the observations IS NOT provided). The log likelihood without observation sigma function can be changed by subclassing this class and overwriting this method.

Parameters:

params – sampled parameters for the model instance (see BASESolver.run_base)

Returns:

likelihood (float)

log_likelihood_sigma(params)#

calculate the log likelihood of the model (where the sigma of the observations IS provided). The log likelihood function with observation sigma can be changed by subclassing this class and overwriting this method.

Parameters:

params – sampled parameters for the model instance (see BASESolver.run_base)

Returns:

likelihood (float)

make_forward_ts_deltas(time_series=None, age_series=None, age_precision=2, values_precision=2, forward_ts_deltas_type='absolute')#

make a given time series / age series into deltas (differences between start date) this is a convenience function for creating forward_ts_deltas parameters for the model

Parameters:
  • ts – pd.Series of time series (datetime index)

  • age_series – pd.Series of ages (years since start date)

  • age_precision – int, precision to round the age series index

  • values_precision – int, precision to round the values in the age series values

  • forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:

  • if ‘absolute’:
    • forward_source = ss_conc + forward_ts_deltas

  • if ‘fraction’:
    • forward_source = ss_conc * forward_ts_deltas

Returns:

pd.Series of forward_ts_deltas

Returns:

forward_ts_deltas_type: str, ‘absolute’ or ‘fraction’ index = years since start date (self.age_bounds[1])

Returns:

project_forward: int, the number of years to project forward

Returns:

start_val: the value at self.age_bounds[1]

make_receptor_points(receptor_spacing=None, project_forward=None, tvals=None, site=None)#

make the receptor points for the model, the following options are available:

  • make_receptor_points(receptor_spacing=None, project_forward=None, tvals=None) -> self.t (sampling points)

  • make_receptor_points(receptor_spacing=1, project_forward=None, tvals=None) -> evenly spaced receptor points (about yearly) from 0 to self.t.max()

  • make_receptor_points(receptor_spacing=1, project_forward=10, tvals=None) -> evenly spaced receptor points (about yearly) from 0 to self.t.max() + 10

  • make_receptor_points(receptor_spacing=None, project_forward=None, tvals=*args) -> sampling at tvals (years since data starts)

  • all other combinations are invalid and will raise an exception

Parameters:
  • receptor_spacing – the spacing to use (decimal years)

  • project_forward – None or int, if Int project the final concentration N years forward.

  • tvals – None or list of times to sample at (years since data starts, self.start_date)

  • site – None and should always be None, used to ensure same signature as other methods in children classes

Returns:

make_source_receptor_data(model_name, nplot, receptor_spacing=1, tvals=None, project_forward=None, source_only=False, forward_ts_deltas=None, forward_ts_deltas_type='absolute', recalc=False, _return_ncomputed=False, return_interpolated_source=True)#

make the source and receptor data for the run model. if self.serialize_receptor_data is True then the receptor data will be saved to a npz file for later use, and any receptors already computed will be loaded from the file. If the file is not found then the receptor data will be recomputed. The process of predicting the receptor data is parallelized using multiprocessing, but still can take some time, especially for large numbers of simulations.

The save file is unique to the model_name, receptor_spacing, project_forward, and tvals., Note that changes to the class instance attributes are not checked and will not cause the data to be recomputed.

Note that passing tvals or forward_ts_deltas, will serialize the receptor data with the hash of the tvals/forward_ts_deltas used in the filename. Tvals/forward_ts_deltas will also be saved in the npz file and checked on reload. Note that minor changes in the tvals/forward_ts_deltas (e.g. float imprecision) will cause the hash to change and the data to be recomputed. Therefore we suggest rounding tvals/forward_ts_deltas to a reasonable precision before passing them to this method.

Parameters:
  • nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)

  • receptor_spacing – the number of years between receptor prediction points

  • tvals – None or list of times to sample at (years since data starts, self.start_date), Note tvals serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series

  • source_only – bool, if True only return the source data (don’t compute the receptor data)

  • forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:

  • if ‘absolute’:
    • forward_source = ss_conc + forward_ts_deltas

  • if ‘relative’:
    • forward_source = ss_conc * forward_ts_deltas

Parameters:
  • source_only – bool, if True only return the source data (don’t compute the receptor data)

  • recalc – bool, if True then recalculate the receptor data even if it has been serialized

  • _return_ncomputed – bool, if True return the number of computed receptor data points, for testing only

  • return_interpolated_source – bool, if True return the interpolated source data, if False return the raw source data

Returns:

source_data: source concentration predictions (ndarray, shape=(nsims, source_points) (interpolated to self.precision if return_interpolated_source=True)

Returns:

source_points: source concentration timings years since start date (ndarray, shape=(self.n_inflections) if project_forward=None else shape=(self.n_inflections+1)(nsource_points,) (interpolated to self.precision if return_interpolated_source=True)

Returns:

outreceptor_data: receptor concentration predictions (ndarray, shape=(nsims, nreceptor_points)

Returns:

receptor_points: receptor timing years since start date (ndarray, shape=(nreceptor_points)

parameters_match_log(model_name, params, starts, start_random, hardboundaries, multitry, **kwargs)#

check the parameters against the log file, see run_base for more info

Parameters:
  • params – as would have been passed to self.run_base

  • starts – as would have been passed to self.run_base

  • start_random – as would have been passed to self.run_base

  • hardboundaries – as would have been passed to self.run_base

  • multitry – as would have been passed to self.run_base

  • kwargs – as would have been passed to self.run_base

Returns:

args_match(bool), (got_args_dict, expect_arg_dict)

plot_best_receptor(model_name, nplot, receptor_spacing=1, percentiles=(1, 5, 10, 25), pdf_alpha=0.1, plot_inflection_points=True, plot_annual_vlines=True, tvals=None, project_forward=None, forward_ts_deltas=None, forward_ts_deltas_type='absolute', figsize=(10, 10), plot_measured=True, add_rmse=False, ax_recept=None, allsite_on_one=False, rt_leg_hand_lab=False)#

plot the best n predictions of the receptor concentrations

Parameters:
  • model_name – name for the model (str)

  • nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)

  • receptor_spacing – space between receptor points (decimal years)

  • percentiles – tuple of percentiles to plot fill between. e.g. a percentile of 5 will plot the 5th and 95th

  • pdf_alpha – alpha for the pdfs note that alphas are additive (e.g. if there are 4 percentiles then the alpha for each percentile will be [pdf_alpha, pdf_alpha * 2, pdf_alpha * 3, pdf_alpha * 4])

  • plot_inflection_points – bool, whether to plot the inflection points

  • plot_annual_vlines – bool, whether to plot annual vertical lines

  • tvals – None or list of times to sample at (years since data starts, self.start_date), Note tvals serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series

  • forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:

  • if ‘absolute’:
    • forward_source = ss_conc + forward_ts_deltas

  • if ‘fraction’:
    • forward_source = ss_conc * forward_ts_deltas

Parameters:
  • figsize – size for the figure

  • plot_measured – bool, whether to plot the measured data on the plot

  • add_rmse – bool, whether to add the rmse to the plot note this will require generating the receptor concentration at all self.t times, which may add significant time to the plot generation, particularly if the predictions are not serialized

  • ax_recept – None or axis to plot on

  • allsite_on_one – not used for BASESolver

  • rt_leg_hand_lab – if True also return the handles and labels for the legend

Returns:

fig, ax_recept if rt_leg_hand_lab is False (default)

Returns:

fig, ax_recept, (handles, labels), rmse if rt_leg_hand_lab is True

plot_best_source(model_name, nplot, percentiles=(1, 5, 10, 25), plot_inflection_points=True, plot_annual_vlines=True, project_forward=None, forward_ts_deltas=None, forward_ts_deltas_type='absolute', pdf_alpha=0.1, figsize=(10, 10), plot_measured=True, ax_source=None, rt_leg_hand_lab=False)#

plot fill between plots of the best n params and the predicted receptor concentrations

Parameters:
  • model_name – name for the model (str)

  • nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)

  • percentiles – tuple of percentiles to plot fill between. e.g. a percentile of 5 will plot the 5th and 95th

  • plot_inflection_points – bool, whether to plot the inflection points

  • plot_annual_vlines – bool, whether to plot annual vertical lines

  • project_forward – None or int, if an int then project the best n params forward by this many years at constant concentration

  • forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:

  • if ‘absolute’:
    • forward_source = ss_conc + forward_ts_deltas

  • if ‘fraction’:
    • forward_source = ss_conc * forward_ts_deltas

Parameters:
  • pdf_alpha – alpha for the pdfs note that alphas are additive (e.g. if there are 4 percentiles then the alpha for each percentile will be [pdf_alpha, pdf_alpha * 2, pdf_alpha * 3, pdf_alpha * 4])

  • figsize – size for the figure

  • plot_measured – bool, whether to plot the measured data on the plot

  • ax_source – None or axis to plot on

  • rt_leg_hand_lab – if True also return the handles and labels for the legend

Returns:

fig, ax_source - if rt_leg_hand_lab is False (default)

Returns:

fig, ax_source (handles, labels), rmse - if rt_leg_hand_lab is True

plot_best_source_receptor(model_name, nplot, receptor_spacing=1, sharey=True, percentiles=(1, 5, 10, 25), title_additon='', pdf_alpha=0.1, plot_inflection_points=True, plot_annual_vlines=True, tvals=None, project_forward=None, forward_ts_deltas=None, forward_ts_deltas_type='absolute', figsize=(10, 10), plot_measured=True, add_rmse=False, ax_source=None, ax_recept=None, allsite_on_one=False, rt_leg_hand_lab=False)#

plot fill between plots of the best n params and the predicted receptor concentrations

Parameters:
  • model_name – name for the model (str)

  • nplot – number of results to plot (int) or fraction of results to plot (float, nplot<=1)

  • receptor_spacing – space between receptor points (decimal years)

  • sharey – bool, whether to share the y axis between the source and receptor plots

  • percentiles – tuple of percentiles to plot fill between. e.g. a percentile of 5 will plot the 5th and 95th

  • title_additon – str, addition to the title no character between the model name and the addition string

  • pdf_alpha – alpha for the pdfs note that alphas are additive (e.g. if there are 4 percentiles then the alpha for each percentile will be [pdf_alpha, pdf_alpha * 2, pdf_alpha * 3, pdf_alpha * 4])

  • plot_inflection_points – bool, whether to plot the inflection points

  • plot_annual_vlines – bool, whether to plot annual vertical lines

  • tvals – None or list of times to sample at (years since data starts, self.start_date), Note tvals serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series

  • forward_ts_deltas – None or pd.Series(index=decimal years since end date(self.age_bounds[1]), values = deltas from time zero if the index for forward ts_deltas is < project forward the remainder of the assumed time series will be assumed to have the same delta as the last point. Note forward_ts_deltas serialize the data by hash so we suggest rounding to a reasonable precision before passing them to this method.

  • forward_ts_deltas_type – str, ‘absolute’ (default) or ‘fraction’ where ss_conc is the steady state concentration (conc. at self.age_bounds[1]) the forward source is calculated as:

  • if ‘absolute’:
    • forward_source = ss_conc + forward_ts_deltas

  • if ‘fraction’:
    • forward_source = ss_conc * forward_ts_deltas

Parameters:
  • figsize – size for the figure

  • plot_measured – bool, whether to plot the measured data on the prediction plot

  • add_rmse – bool, whether to add the rmse to the plot note this will require generating the receptor concentration at all self.t times, which may add significant time to the plot generation, particularly if the predictions are not serialized

  • ax_source – matplotlib axis to plot the source data on or None to create a new figure

  • ax_recept – matplotlib axis to plot the receptor data on or None to create a new figure

  • allsite_on_one – not used in BASESolver

  • rt_leg_hand_lab – if True also return the handles and labels for the legend

Returns:

fig, (ax_source, ax_recept) - if rt_leg_hand_lab is False (default)

Returns:

fig, (ax_source, ax_recept), (src_handles, src_labels), (rec_handles, rec_labels), rmse - if rt_leg_hand_lab is True

plot_breakpoints(ax=None)#

plot the inflection points on the time series data

Parameters:

ax – matplotlib axis to plot on or None to create a new figure

Returns:

plot_log_ps(model_name, ax=None)#

plot the log likelihoods for each chain

Parameters:
  • model_name – name for the model (str)

  • ax – matplotlib axis to plot on or None to create a new figure

Returns:

plot_measured(ax, handles, labels, c=None, site=None, label='Measured data', alpha=1, **kwargs)#

plot the measured data on the plot

Parameters:
  • ax – matplotlib axis to plot on

  • handles – handles for the legend (don’t use ax.legend() as it may cause weird behavior)

  • labels – labels for the legend (don’t use ax.legend() as it may cause weird behavior)

  • c – color if None use ‘r’

  • site – site to plot (not used in BASESolver)

  • label – label for the measured data

  • alpha – alpha for the measured data

  • kwargs – other kwargs for the scatter plot

Returns:

plot_single_pdf(model_name, baselab, ax, percentiles, data, xdata, project_forward, pdf_alpha, plot_inflection_points, plot_annual_vlines, plot_xlab, plot_ylab, plot_measured, add_rmse, site=None, plot_color='b', leg_color=None, data_color=None)#

Actually plot the pdfs for the source or receptor data. This is used internally both for the source and receptor data for the plot_best_source_receptor, plot_best_source, and plot_best_receptor methods. It can also be used externally to plot pdfs in the same fashion to support custom plotting. Note that it expects specific data formats, so read the docs!

Parameters:
  • model_name – name for the model (str)

  • baselab – the fundamental label for the dataset (e.g. Source, Receptor… etc.)

  • ax – matplotlib axis to plot on

  • percentiles – tuple of percentiles to plot fill between. e.g. a percentile of 5 will plot the 5th and 95th

  • data – the data to plot (np.array) shape=(nsims, time). typically the data will be either source or receptor data from self.make_source_receptor_data

  • xdata – the xdata to plot against (np.array) shape=(time,). typically the xdata will be either source or receptor points from self.make_source_receptor_data

  • project_forward – None or int, if an int then project the best n params forward by this many years at the end of the time series

  • pdf_alpha – alpha for the pdfs note that alphas are additive (e.g. if there are 4 percentiles then the alpha for each percentile will be [pdf_alpha, pdf_alpha * 2, pdf_alpha * 3, pdf_alpha * 4])

  • plot_inflection_points – bool, whether to plot the inflection points

  • plot_annual_vlines – bool, whether to plot annual vertical lines

  • plot_xlab – bool, whether to set the xlabel

  • plot_ylab – bool, whether to set the ylabel

  • plot_measured – bool, whether to plot the measured data

  • add_rmse – bool, whether to calculate the rmse of the model, which will be returned

  • site – None or the site to plot. site must == None in BASESolver. If the site is None in the multi solvers and plot_measured then the measured data will be plotted for all sites

  • plot_color – color for the pdfs

  • leg_color – color for the pdfs on the legend (if None then plot_color is used)

  • data_color – color for the measured data if None then ‘r’ will be used (BASESolver) or the site color will be used (MultiSolver+)

Returns:

out_ups: the upper percentiles (useful for setting ylims)

Returns:

out_downs: the lower percentiles (useful for setting ylims)

Returns:

handles: handles for the legend (don’t use ax.legend() as it may cause weird behavior)

Returns:

labels: labels for the legend (don’t use ax.legend() as it may cause weird behavior)

Returns:

rmse: float or None, the rmse of the model if add_rmse is True

predict_bespoke(ts_concs, receptor_tvals, site=None, use_dilution=0)#

Predict a bespoke time series source concentration to a give set of receptor time values. This is useful for predicting the concentration at a receptor point that is not in the original time series. Note that this is an advanced feature and may be more difficult to use

Parameters:
  • ts_concs – Pd.Series of source concentrations (index = age in years since start date, self.start_date) values = concentration, Note that the full period of the source data must be passed in, not just the time series of interest.

  • receptor_tvals – list of times to sample at (years since data starts, self.start_date)

  • site – dummy to support same call in MultiSiteDilutionBASESolver etc.

  • use_dilution – dummy to support same call in MultiSiteDilutionBASESolver etc.

Returns:

pd.Series of receptor concentrations (index = receptor_tvals, values = concentration)

run_base(model_name, params, starts=None, verbose=None, nverbose=100, start_random=True, hardboundaries=True, multitry=False, rerun=False, test_run=False, _clear_test_data=True, _raise_exception_on_test=True, raise_if_run=False, **kwargs)#

run the BASE model. This method will run the model using pydream and save the results to the save_dir.

  • If the model has already been run (and rerun=False) then it will check the log files to ensure that the args/kwargs/instance.attrs match the expected args/kwargs/instance.attrs.

  • If the model has not been run then it will run the model and save the results to the save_dir.

  • If test_run is True then it will run the model for the first 10 iterations and then raise an InterruptedError after removing the test data (assuming _clear_test_data=False). This is useful for testing the setup of the model. you may also want to set _raise_exception_on_test=False to avoid the exception being raised.

Parameters:
  • model_name – name for the model (str)

  • params – A list of pydream.parameters.SampledParam objects Note SampledParam uses scipy, so you can pass either a list of instances or an instance that will produce the correct number of parameters. e.g., [SampledParam(uniform, [1,2], [2,3)] should be equivalent to [SampledParam(uniform, 1, 2), SampledParam(uniform, 2, 3)] you can also pass a single Sampled_Param instance which will be used for all inflection points. Note be careful of scipy’s uniform function which uses loc and scale rather than min and max. You can also pass a komanawa.BASE.generators.uniform_path_generator.UniformPathGenerator or komanawa.BASE.generators.normal_path_generator.NormalPathGenerator instance as parameters.

  • starts – Either a list of start locations to initialize chains in, or a single start location to initialize all chains in. Default: None, note that if start_random is True then this will be ignored/overwritten. Starts can be provided as a 1d iterable (nparams,) or a 2d iterable (nchains, nparams)

  • verbose – None (use self.verbose) or bool Whether to print verbose output (including acceptance or rejection of moves and the current acceptance rate)

  • nverbose – The number of iterations between verbose output. Default = 100

  • start_random – Whether to intialize chains from a random point in parameter space drawn from the prior (default = True). Will override starting position set when sample was called, if any.

  • hardboundaries – Whether to relect point back into bounds of hard prior (i.e., if using a uniform prior, reflect points outside of boundaries back in, so you don’t waste time looking at points with logpdf = -inf).

  • multitry – {bool, int} hether to utilize multi-try sampling. Default is False. If set to True, will be set to 5 multiple tries. Can also directly specify an integer if desired.

  • rerun – bool, whether to rerun the model if it already exists

  • test_run – bool, if True then setup the model and run for the first 10 iterations to test the setup, note that this will clear any saved data after running then raise InterruptedError

  • _clear_test_data – bool, if True then clear the data after running, only valid if test_run is True, used for testing only

  • _raise_exception_on_test – bool, if True then raise an InterruptedError after running the test, only valid if test_run is True, used for testing only

  • raise_if_run – bool, if True then raise an InterruptedError if the model would actually run base. This is used for checking that runs exist, parameters are set effectively, but avoiding large re-runs

  • kwargs – other kwargs passed to pydream.core.run_dream

Returns:

run (bool if the model was run)

class BaseSampledParam(distribution, *args, **kwargs)#

Schema for a sampled parameter which captures the distribution and arguments for the distribution to check that the distribution is valid and has not changed.

Parameters:
  • distribution – The distribution to use (e.g. scipy.stats.uniform, scipy.stats.norm)

  • args – The arguments to pass to the distribution

  • kwargs – The keyword arguments to pass to the distribution

serialize()#

Serialize the object to a unique string to capture metadata

Returns:

class NormalPathGenerator(start_bounds, low, up, scale, n, delta_center=0)#

Bases: komanawa.BASE.generators.parent_path_change.ParentPathChange

A base class for the path change generators which defines the KS test for the path change generator.

Generate a random path of length n, with a starting point in the interval start_bounds process is first pick a value out of the uniform distribution (”*start_bounds”), then for each additional point in the series pick a value from a normal distribution centered on the previous value with a scale of scale. If the picked value is outside the interval set to the upper and lower bounds then the value is truncated to the bounds value.

The delta_center parameter can be used to set the mean difference between points in the path to a value other than 0. This can be useful for generating paths that represent a know increase in the source data (e.g., intensification).

The up, low, scale, and delta_center parameters can be scalars or lists of length n-1. If they are scalars the value is used for all points in the path. If they are iterable the value is used for the corresponding point in the path. The first point in the path is always generated from the uniform distribution(*start_bounds).

Parameters:
  • start_bounds – (low, high) bounds for the starting point

  • up – upper bound for the path, can be a scalar or a list of length n-1 (one for change)

  • low – lower bound for the path, can be a scalar or a list of length n-1 (one for change)

  • scale – scale for the normal distribution used to generate the changes along the path, can be a scalar or a list of length n-1 (one for change)

  • n – length of the path

  • delta_center – center of the normal distribution used to generate the changes along the path, can be a scalar or a list of length n-1 (one for change)

comptest(sample, test=kstest, n_gen=None, random_state=None, **kwargs)#

Calculate the scipy test for the path change generator. The test is calculated for each time step in the path. The first time step is the starting point of the path, the second time step is the difference between the first and second points in the path, the third time step is the difference between the second and third points in the path, etc.

The test can be any of the scipy tests like kstest, Mann-Whitney U test, wasserstein_distance_nd, etc.

Parameters:
  • sample – sample to test shape=(nsims, self.n)

  • n_gen – Number of samples to generate for the KS test, if None use the length of the sample

  • kwargs – other keyword arguments to pass to scipy.stats test

Returns:

[estResult, …] for each time step in the path (see scipy function)

logpdf(q0)#

Calculate the log probability of the path (q0) given the parameters of the generator.

Parameters:

q0

Returns:

plot_path(x=None, nsigmas=2, ax=None, round_lab=2, lab_rotation=0)#

Plot the path generated by the generator

Parameters:
  • x – None or an array of length n to use as the x axis if None the x axis will be the integers from 0 to n

  • nsigmas – number of standard deviations to plot around the center path (default 2, 95th percentile)

  • ax – None or an axis to plot on

  • round_lab – number of decimal places to round the labels to

  • lab_rotation – rotation of the x axis labels

Returns:

fig, ax

plot_test(test_results, ax, x=None, plot_stats=('statistic', 'pvalue'), plot_on_twinx=True, colors=None, **kwargs)#

Plot the results of the KS test for the path change generator.

Parameters:
  • ktest_results – [testResult, …] for each time step in the path

  • ax – axis to plot on or None (if None a new figure is created)

  • plot_stats – tuple subset of result values to plot or None (assumes results are float)

  • plot_on_twinx – if True plot the first of plot_stats on the ax and then each subsequent statistic on individual twinx axes

  • colors – dictionary of colors for each statistic or None (if None use default colors)

  • kwargs – other keyword arguments to pass to the ax.plot

Returns:

fig, ax, (handles, labels)

rvs(random_state=None)#

Generate a random path of length n, with a starting point in the interval start_bounds process is first pick a value out of the uniform distribution (”*start_bounds”), then for each additional point in the series pick a value from a normal distribution centered on the previous value with a scale of scale. If the picked value is outside the interval set to the upper and lower bounds then the value is truncated to the bounds value.

Parameters:

random_state – None, use the global random state, or an integer to seed the random state. As this is a markov chain process the random state is used to generate a suite of integer seeds to make each delta deterministic.

Returns:

memory_profile()#

a memory profile for the BASE solver using a temporary directory to avoid cluttering the filesystem. :return: Memory profile is printed to the console.

runtime_test(nchains=4, niterations=25000, n_inflections='1YE')#

A runtime test for the BASE solver using a temporary directory to avoid cluttering the filesystem.

Parameters:
  • nchains – number of chains to run concurrently

  • niterations – number of iterations for each chain

  • n_inflections – ‘1YE’ n_inflections parameter for the BASESolver

Returns:

print and return the total runtime of the solver (setup and execution) in seconds.