AIMGenerator
- class risksyn.generator.AIMGenerator(risk, degree=2, max_model_size=80, compress=True, proc_epsilon=0.1, n_jobs=-1)[source]
Bases:
objectGenerate synthetic data with interpretable privacy risk guarantees.
Uses the AIM (Adaptive and Iterative Mechanism) pipeline for synthesis.
- Parameters:
risk (Risk) – Risk specification defining the privacy guarantee.
degree (int, default 2) – Maximum degree of marginals used by AIM.
max_model_size (int, default 80) – Maximum model size parameter for AIM.
compress (bool, default True) – Whether to use compression in AIM.
proc_epsilon (float, default 0.1) – Epsilon budget allocated to data preprocessing (domain estimation). Only used if domain bounds are not provided for numeric columns.
n_jobs (int, default -1) – Number of parallel jobs for AIM’s internal graphical model.
Examples
>>> from risksyn import Risk, AIMGenerator >>> risk = Risk.from_advantage(0.2) >>> gen = AIMGenerator(risk=risk, degree=3) >>> gen.fit(df, domain={"age": {"lower": 0, "upper": 100}}) >>> synthetic_df = gen.generate(count=1000)
- fit(data, domain=None, unsafe_infer_bounds=False)[source]
Fit the generator to the data.
- Parameters:
data (pd.DataFrame) – DataFrame to fit the model on.
domain (dict, optional) – Domain specification for columns. For numeric columns, use
{"col": {"lower": min, "upper": max}}. For categorical columns, use{"col": {"categories": ["val1", "val2", ...]}}. Required when privacy budget is low to avoid private domain estimation failures.unsafe_infer_bounds (bool, default False) – If True, infer domain bounds from data min/max for numeric columns that lack explicit bounds. This leaks information about the data and weakens the privacy guarantee.
- Returns:
Returns self for method chaining.
- Return type:
- Raises:
ValueError – If privacy budget is insufficient for the required preprocessing (when numeric columns lack domain bounds).
- Warns:
UserWarning – If privacy budget for generation is smaller than for processing, or if
unsafe_infer_boundsis used.
- generate(count)[source]
Generate synthetic records.
- Parameters:
count (int) – Number of records to generate.
- Returns:
DataFrame with synthetic data matching the schema of the fitted data.
- Return type:
pd.DataFrame
- Raises:
RuntimeError – If
fit()has not been called.
- store(path)[source]
Store the fitted generator to disk.
- Parameters:
path (str or Path) – Directory path to store the generator.
- Raises:
RuntimeError – If
fit()has not been called.