AIMGenerator

class risksyn.generator.AIMGenerator(risk, degree=2, max_model_size=80, compress=True, proc_epsilon=0.1, n_jobs=-1)[source]

Bases: object

Generate synthetic data with interpretable privacy risk guarantees.

Uses the AIM (Adaptive and Iterative Mechanism) pipeline for synthesis.

Parameters:

risk (Risk) – Risk specification defining the privacy guarantee.
degree (int, default 2) – Maximum degree of marginals used by AIM.
max_model_size (int, default 80) – Maximum model size parameter for AIM.
compress (bool, default True) – Whether to use compression in AIM.
proc_epsilon (float, default 0.1) – Epsilon budget allocated to data preprocessing (domain estimation). Only used if domain bounds are not provided for numeric columns.
n_jobs (int, default -1) – Number of parallel jobs for AIM’s internal graphical model.

Examples

>>> from risksyn import Risk, AIMGenerator
>>> risk = Risk.from_advantage(0.2)
>>> gen = AIMGenerator(risk=risk, degree=3)
>>> gen.fit(df, domain={"age": {"lower": 0, "upper": 100}})
>>> synthetic_df = gen.generate(count=1000)

__init__(risk, degree=2, max_model_size=80, compress=True, proc_epsilon=0.1, n_jobs=-1)[source]

fit(data, domain=None, unsafe_infer_bounds=False)[source]

Fit the generator to the data.

Parameters:

data (pd.DataFrame) – DataFrame to fit the model on.
domain (dict, optional) – Domain specification for columns. For numeric columns, use {"col": {"lower": min, "upper": max}}. For categorical columns, use {"col": {"categories": ["val1", "val2", ...]}}. Required when privacy budget is low to avoid private domain estimation failures.
unsafe_infer_bounds (bool, default False) – If True, infer domain bounds from data min/max for numeric columns that lack explicit bounds. This leaks information about the data and weakens the privacy guarantee.

Returns:

Returns self for method chaining.

Return type:

AIMGenerator

Raises:

ValueError – If privacy budget is insufficient for the required preprocessing (when numeric columns lack domain bounds).

Warns:

UserWarning – If privacy budget for generation is smaller than for processing, or if unsafe_infer_bounds is used.

generate(count)[source]

Generate synthetic records.

Parameters:: count (int) – Number of records to generate.
Returns:: DataFrame with synthetic data matching the schema of the fitted data.
Return type:: pd.DataFrame
Raises:: RuntimeError – If fit() has not been called.

store(path)[source]

Store the fitted generator to disk.

Parameters:: path (str or Path) – Directory path to store the generator.
Raises:: RuntimeError – If fit() has not been called.

classmethod load(path)[source]

Load a fitted generator from disk.

Parameters:: path (str or Path) – Directory path containing the stored generator.
Returns:: The loaded generator, ready for generation.
Return type:: AIMGenerator