ACRO Demonstration#

This is a simple notebook to get you started with using the acro package to add disclosure risk control to your analysis.

Assumptions#

For the purpose of this tutorial we assume some minimal prior experience with using python for data science.

In particular the use of the industry-standard Pandas package for:

storing and manipulating datasets
creating basic tables, pivot_tables, and plots (e.g. histograms)

This example is a Jupyter notebook split into cells.

Cells may contain code or text/images, and normally they are processed by stepping through them one-by-one.
To run (or render) a cell click the run icon (at the top of the page) or shift-return on your keyboard. That will display any output created (for a code cell) and move the focus to the next cell.

A: The basic concepts#

1: A research session:#

by which we mean the activity of running a series of commands (interactively or via a script) that:

ingest some data,
manipulate it, and then
produce (and store) some outputs.

2: Types of commands:#

Whether interactive, or just running a final script, we can think of the commands that get run in a session as dividing into:

manipulation commands that load and transform data into the shape you want
feedback commands that report on your data - but are never intended to be exported. For example, running head() or describe() commands to make sure your manipulations have got the data into the format you want.
query commands that produce an output from your data (table/plot/regression model etc.) that you might want to export from the Trusted Research Environment (TRE)

3: Risk Assessment vs decision making:#

SACRO stands for Semi-Automated Checking of Research Outputs. The prefix ‘Semi’ is important here - because in a principles-based system humans should make decisions about output requests. To help with that we provide the SACRO-Viewer, which collates all the relevant information for them.

A key part of that information is the Risk Assessment.

Since it involves calculating metrics and comparing them to thresholds (the TRE’s risk appetite) it can be done automatically, at the time an output query runs on the data.
This is what the ACRO tool does when you use it as part of your workflow.

4: What ACRO does#

The ACRO package aims to support you in producing Safe Outputs with minimal changes to your work flow. To do that we provide:

drop-in replacements for the most commonly used output commands,
- keeping the same syntax as the originals, and
- supporting as many of the options as we can (features supported will increase over time in response to demand).
a set of session-management commands to help you manage the set of files you request for output.
Important to note that currently acro outputs results (tables, details of regression models etc.) as .csv files.
- In other words we separate the processes of creating outputs - which must be done inside the TRE. from the process of formatting them for publication - which can be done outside the TRE with your preferred toolchain.
- ACRO handles creation. We are interested in hearing from researchers whether it is important to support them with formatting

B: Getting Started with the demonstration#

Step 1: Setting up the environment with the tools we will use#

We will begin by importing some standard data science packages, and also the acro package itself.

[1]:

import os

import numpy as np
import pandas as pd

from acro import ACRO

Step 2: Starting an ACRO session#

To do this we create an acro object by running the cell below.

You can leave out the default parameters, but the cell below shows how you can:

provide the name of a config (risk appetite) file the TRE may have asked you to use
turn automatic suppression on or off right from the start of your session

Note that when the cell runs it should report (in a different coloured font/background)

what version of acro is running: this should be 0.4.12
the TRE’s risk appetite: that defines the rules your outputs will be checked against.
whether suppression is automatically applied to disclosive outputs.

[2]:

acro = ACRO(config="default", suppress=False)

INFO:acro:version: 0.4.12
INFO:acro:config: {'safe_threshold': 10, 'safe_dof_threshold': 10, 'safe_nk_n': 2, 'safe_nk_k': 0.9, 'safe_pratio_p': 0.1, 'check_missing_values': False, 'survival_safe_threshold': 10, 'zeros_are_disclosive': True}
INFO:acro:automatic suppression: False

Step 3: Loading some test data#

The following cells in this step just contain standard ingestion and manipulation commands to load some data into a Pandas dataframe ready to be queried. We will use some open-source data about nursery admissions.

There is no change to your workflow here

Do whatever you want in this step!
We just assume you end up with your data in a pandas dataframe.

[3]:

from scipy.io.arff import loadarff

##--- Manipulation  commands ---
# specify where the  data is
path = os.path.join("../data", "nursery.arff")

# read it in using a common dataloader
data = loadarff(path)


# store in a pandas dataframe with some manipulation of type variable names
df = pd.DataFrame(data[0])
df = df.select_dtypes([object])
df = df.stack().str.decode("utf-8").unstack()
df.rename(columns={"class": "recommendation"}, inplace=True)


# make the children variable numeric
# so we can report statistics like mean etc.

df["children"].replace(to_replace={"more": "4"}, inplace=True)
df["children"] = pd.to_numeric(df["children"])

df["children"] = df.apply(
    lambda row: (
        row["children"] if row["children"] in (1, 2, 3) else np.random.randint(4, 10)
    ),
    axis=1,
)

[4]:

##--- Feedback Command ----
# show the first 5 rows to make sure everything is how we would expect
df.head()

[4]:

	parents	has_nurs	form	children	housing	finance	social	health	recommendation
0	usual	proper	complete	1	convenient	convenient	nonprob	recommended	recommend
1	usual	proper	complete	1	convenient	convenient	nonprob	priority	priority
2	usual	proper	complete	1	convenient	convenient	nonprob	not_recom	not_recom
3	usual	proper	complete	1	convenient	convenient	slightly_prob	recommended	recommend
4	usual	proper	complete	1	convenient	convenient	slightly_prob	priority	priority

C: Producing tables that are ‘Safe Outputs’#

The easiest way to make tables in python is to use the industry-standard pandas crosstab() function.

There are hundreds (thousands?) of web sites showing how to do this.
You can make (hierarchical) 2-D tables (or 1-D if you add a ‘dummy’ variable containing the same value for each row)
you can specify what the table cells contain by:
- providing a statistic - for example: mean, count, std deviation, median etc.(pandas calls these aggregation functions)
- specifying what variable to report on

The acro version uses all the pandas code - but it adds extra code that checks for disclosure risks depending on the statistic you ask for

Example 1: A simple 2-D table of frequencies stratified by two variables#

Note that having imported the pandas package with the shortname pd(most people do) you would normally write

pd.crosstab(...)

so the only change is to use the prefix acro. rather than pd.

NB: the first two parameters to crosstab() are mandatory, so you could just do crosstab(df.recommendation,df.parents) to save typing.

Now run the next cell.

[5]:

acro.crosstab(index=df.recommendation, columns=df.parents)

INFO:acro:get_summary(): fail; threshold: 4 cells may need suppressing;
INFO:acro:outcome_df:
--------------------------------------------------------|
parents        |great_pret   |pretentious  |usual       |
recommendation |             |             |            |
--------------------------------------------------------|
not_recom      |          ok |          ok |          ok|
priority       |          ok |          ok |          ok|
recommend      | threshold;  | threshold;  | threshold; |
spec_prior     |          ok |          ok |          ok|
very_recom     | threshold;  |          ok |          ok|
--------------------------------------------------------|

INFO:acro:records:add(): output_0

[5]:

parents	great_pret	pretentious	usual
recommendation
not_recom	1440	1440	1440
priority	858	1484	1924
recommend	0	0	2
spec_prior	2022	1264	758
very_recom	0	132	196

How to understand this output#

The top part (with a pink background) is the risk analysis produced by acro. It is telling us that:

the overall summary is fail because 4 cells are failing the ‘minimum threshold’ check
then it is showing which cells failed so you can choose how to respond
finally it is telling us that is has saved the table and risk assessment to our acro session with id “output_0”

The part below is the normal output produced by the pandas crosstab() function.

As this is such a small table it is not hard to spot the four problematic cells with zero or low counts
but of course this might be harder for a bigger table.

How to respond to this input#

There are basically three choices:

We might decide these low numbers reveal something where the public interest outweighs the disclosure risk. Rather than being a strict rules-based system, acro lets you attach an ‘exception request’ to an output, to send a message to the output checkers. For example, you could type:

acro.add_exception('output_0',"I think you should let me have this because...")

We redesign our data so that table so that none of the cells in the resulting table represent fewer than n people (10 for the default risk appetite) For example, we could recode ‘very_recommend’ and ‘priority’ into one label. But maybe it is revealing that the ‘recommend’ label is not used?
We can redact the disclosive cells - and acro will do this for us. We simply enable the option to suppress disclosive cells and re-run the query.

The cell below shows option 3. When you run the cell below you should see that:

the status now changes to review (so the output-checker knows what has been applied)
the code automatically adds an exception request saying that suppression has been applied
and, most importantly, the cells are redacted.

[6]:

acro.enable_suppression()
acro.crosstab(index=df.recommendation, columns=df.parents)

INFO:acro:get_summary(): review; threshold: 4 cells suppressed;
INFO:acro:outcome_df:
--------------------------------------------------------|
parents        |great_pret   |pretentious  |usual       |
recommendation |             |             |            |
--------------------------------------------------------|
not_recom      |          ok |          ok |          ok|
priority       |          ok |          ok |          ok|
recommend      | threshold;  | threshold;  | threshold; |
spec_prior     |          ok |          ok |          ok|
very_recom     | threshold;  |          ok |          ok|
--------------------------------------------------------|

INFO:acro:records:add(): output_1
INFO:acro:records:exception request was added to output_1

[6]:

parents	great_pret	pretentious	usual
recommendation
not_recom	1440.0	1440.0	1440.0
priority	858.0	1484.0	1924.0
recommend	NaN	NaN	NaN
spec_prior	2022.0	1264.0	758.0
very_recom	NaN	132.0	196.0

An example of a more complex table#

Just to show off the sort of tables that crosstab() can produce, let’s make something more complex. Going through the parameters in order:

passing a list of variable names to index (rather than a single variable/column name) tells it we want a hierarchy within the rows.
- we can do the same to columns as well (or instead) if we want to
setting values=df.children(the name of a column in the dataset) tells it we want to report something about the number of children for each sub-group (table cell)
setting aggfunc=mean tells it the statistic we want to report is the mean number of children (which introduces additional risks of dominance)
setting margins=True tells it to display row and column sub-totals

It’s worth noting that including the totals there are 6 columns in the risk assessment and 5 in the suppressed table. This is because after suppression has replaced numbers with NaN, pandas removes the fully suppressed column (‘recommend’) from the table.

[7]:

acro.suppress = True
acro.crosstab(
    index=[df.parents, df.finance],
    columns=df.recommendation,
    values=df.children,
    aggfunc="mean",
    margins=True,
)

INFO:acro:get_summary(): review; threshold: 2 cells suppressed; p-ratio: 9 cells suppressed; nk-rule: 9 cells suppressed;
INFO:acro:outcome_df:
-----------------------------------------------------------------------------------------------------------------|
|recommendation        | not_recom| priority recommend                      |spec_prior |very_recom          |All|
|parents     finance   |          |                                         |           |                    |   |
-----------------------------------------------------------------------------------------------------------------|
|great_pret  convenient|  ok      |  ok                  p-ratio; nk-rule;  | ok        | p-ratio; nk-rule;  | ok|
|            inconv    |  ok      |  ok                  p-ratio; nk-rule;  | ok        | p-ratio; nk-rule;  | ok|
|pretentious convenient|  ok      |  ok                  p-ratio; nk-rule;  | ok        |                 ok | ok|
|            inconv    |  ok      |  ok                  p-ratio; nk-rule;  | ok        |                 ok | ok|
|usual       convenient|  ok      |  ok       threshold; p-ratio; nk-rule;  | ok        |                 ok | ok|
|            inconv    |  ok      |  ok                  p-ratio; nk-rule;  | ok        |                 ok | ok|
|All                   |  ok      |  ok       threshold; p-ratio; nk-rule;  | ok        |                 ok | ok|
-----------------------------------------------------------------------------------------------------------------|

INFO:acro:records:add(): output_2
INFO:acro:records:exception request was added to output_2

[7]:

	recommendation	not_recom	priority	spec_prior	very_recom	All
parents	finance
great_pret	convenient	3.104167	2.789062	3.320043	NaN	3.122222
great_pret	inconv	3.123611	2.401734	3.372943	NaN	3.134259
pretentious	convenient	3.065278	3.058594	3.289384	2.590909	3.104167
pretentious	inconv	3.008333	2.997207	3.345588	1.363636	3.077315
usual	convenient	3.134722	3.135892	3.325581	2.607692	3.133920
usual	inconv	3.102778	3.075000	3.362319	1.363636	3.087037
All		3.089815	2.983826	3.339021	2.185976	3.109816

D: What other sorts of analysis does ACRO currently support?#

We are continually adding support for more types of analysis as users prioritise them.

ACRO currently supports:

Tables via acro.crosstab() and acro.pivot_table().
- supported aggregation functions are: mean, median, sum, std, count, and mode.
Survival analysis via: acro.surv_function(), acro.survival_table() and acro.survival_plot()
Histograms via:acro.hist()
Regression via: acro.ols(), acro.logit(),acro.probit() with options for specifying formula in ‘R-style’ by adding the suffix ‘r’ e.g. acro.olsr() etc.

You can get help on using any of these using the standard python help() syntax as shown in the next cell

[8]:

help(acro.logit)

Help on method logit in module acro.acro_regression:

logit(endog, exog, missing: 'str | None' = None, check_rank: 'bool' = True) -> 'BinaryResultsWrapper' method of acro.acro.ACRO instance
    Fits Logit model.

    Parameters
    ----------
    endog : array_like
        A 1-d endogenous response variable. The dependent variable.
    exog : array_like
        A nobs x k array where nobs is the number of observations and k is
        the number of regressors. An intercept is not included by default
        and should be added by the user.
    missing : str | None
        Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no
        nan checking is done. If ‘drop’, any observations with nans are
        dropped. If ‘raise’, an error is raised. Default is ‘none’.
    check_rank : bool
        Check exog rank to determine model degrees of freedom. Default is
        True. Setting to False reduces model initialization time when
        exog.shape[1] is large.

    Returns
    -------
    BinaryResultsWrapper
        Results.

E: ACRO functionality to let users manage their outputs#

As explained above, you need to create an “acro session” whenever your code is run.

After that, every time you run an acro `query’ command both the output and the risk assessment are saved as part of the acro session.

But we recognise that:

You may not want to request release of all your outputs - for example, the first table we produced above.
It is good practice to provide a more informative name than just output_n for the .csv files that acro produces
It helps the output checker if you provide some comments saying what the outputs are.
You might want to add more things to the bundles of files you want to take out, such as:
- outputs from analyses that acro doesn’t currently support
- your code itself (which many journals want)
- maybe a version of your paper in pdf/word format etc.

Therefore acro provides the following commands for ‘session management’ ### 1: Listing the current contents of an ACRO session This output is not beautiful (there’s a GUI coming soon) but it should let you identify outputs you want to rename,comment on, or delete.

[9]:

_ = acro.print_outputs()

uid: output_0
status: fail
type: table
properties: {'method': 'crosstab'}
sdc: {'summary': {'suppressed': False, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}
command: acro.crosstab(index=df.recommendation, columns=df.parents)
summary: fail; threshold: 4 cells may need suppressing;
outcome: parents          great_pret  pretentious        usual
recommendation
not_recom                ok           ok           ok
priority                 ok           ok           ok
recommend       threshold;   threshold;   threshold;
spec_prior               ok           ok           ok
very_recom      threshold;            ok           ok
output: [parents         great_pret  pretentious  usual
recommendation
not_recom             1440         1440   1440
priority               858         1484   1924
recommend                0            0      2
spec_prior            2022         1264    758
very_recom               0          132    196]
timestamp: 2026-02-11T18:33:37.547019
comments: []
exception:

uid: output_1
status: review
type: table
properties: {'method': 'crosstab'}
sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}
command: acro.crosstab(index=df.recommendation, columns=df.parents)
summary: review; threshold: 4 cells suppressed;
outcome: parents          great_pret  pretentious        usual
recommendation
not_recom                ok           ok           ok
priority                 ok           ok           ok
recommend       threshold;   threshold;   threshold;
spec_prior               ok           ok           ok
very_recom      threshold;            ok           ok
output: [parents         great_pret  pretentious   usual
recommendation
not_recom           1440.0       1440.0  1440.0
priority             858.0       1484.0  1924.0
recommend              NaN          NaN     NaN
spec_prior          2022.0       1264.0   758.0
very_recom             NaN        132.0   196.0]
timestamp: 2026-02-11T18:33:37.566599
comments: []
exception: Suppression automatically applied where needed

uid: output_2
status: review
type: table
properties: {'method': 'crosstab'}
sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 2, 'p-ratio': 9, 'nk-rule': 9, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[4, 2], [6, 2]], 'p-ratio': [[0, 2], [0, 4], [1, 2], [1, 4], [2, 2], [3, 2], [4, 2], [5, 2], [6, 2]], 'nk-rule': [[0, 2], [0, 4], [1, 2], [1, 4], [2, 2], [3, 2], [4, 2], [5, 2], [6, 2]], 'all-values-are-same': []}}
command: acro.crosstab(
summary: review; threshold: 2 cells suppressed; p-ratio: 9 cells suppressed; nk-rule: 9 cells suppressed;
outcome: recommendation         not_recom priority                      recommend  \
parents     finance
great_pret  convenient        ok       ok             p-ratio; nk-rule;
            inconv            ok       ok             p-ratio; nk-rule;
pretentious convenient        ok       ok             p-ratio; nk-rule;
            inconv            ok       ok             p-ratio; nk-rule;
usual       convenient        ok       ok  threshold; p-ratio; nk-rule;
            inconv            ok       ok             p-ratio; nk-rule;
All                           ok       ok  threshold; p-ratio; nk-rule;

recommendation         spec_prior          very_recom All
parents     finance
great_pret  convenient         ok  p-ratio; nk-rule;   ok
            inconv             ok  p-ratio; nk-rule;   ok
pretentious convenient         ok                  ok  ok
            inconv             ok                  ok  ok
usual       convenient         ok                  ok  ok
            inconv             ok                  ok  ok
All                            ok                  ok  ok
output: [recommendation          not_recom  priority  spec_prior  very_recom       All
parents     finance
great_pret  convenient   3.104167  2.789062    3.320043         NaN  3.122222
            inconv       3.123611  2.401734    3.372943         NaN  3.134259
pretentious convenient   3.065278  3.058594    3.289384    2.590909  3.104167
            inconv       3.008333  2.997207    3.345588    1.363636  3.077315
usual       convenient   3.134722  3.135892    3.325581    2.607692  3.133920
            inconv       3.102778  3.075000    3.362319    1.363636  3.087037
All                      3.089815  2.983826    3.339021    2.185976  3.109816]
timestamp: 2026-02-11T18:33:37.754124
comments: []
exception: Suppression automatically applied where needed

2: Remove some ACRO outputs before finalising#

At the start of this demo we made a disclosive output -it;s the first one with status fail.

We don’t want to waste the output checker’s time so lets remove it.

[10]:

acro.remove_output("output_0")

INFO:acro:records:remove(): output_0 removed

3: Rename ACRO outputs before finalising#

This is an example of renaming the outputs to provide more descriptive names.

[11]:

acro.rename_output("output_1", " crosstab_recommendation_vs_parents")
acro.rename_output("output_2", "mean_children_by_parents_finance_recommendation")

INFO:acro:records:rename_output(): output_1 renamed to  crosstab_recommendation_vs_parents
INFO:acro:records:rename_output(): output_2 renamed to mean_children_by_parents_finance_recommendation

4: Add a comment to output#

This is an example of adding a comment to outputs.

It can be used to provide a description or to pass additional information to the TRE staff. They will see it alongside your file in the output checking viewer - rather than having it in an email somewhere.

[12]:

acro.add_comments(
    "mean_children_by_parents_finance_recommendation",
    "too few cases of recommend to report",
)

INFO:acro:records:a comment was added to mean_children_by_parents_finance_recommendation

5. Request an exception#

An example of providing a reason why an exception should be made

acro.add_exception("output_n", "This is evidence of systematic bias?")

6: Adding a custom output.#

As mentioned above you might want to request release of all sorts of things

including your code,
or outputs from analyses acro doesn’t support (yet)

In ACRO we can add a file to our session with a comment describing what it is

[13]:

acro.custom_output("acro_demo_2026.py", "This is the code that produced this session")

INFO:acro:records:add_custom(): output_3

F: Finishing your session and producing a folder of files to release.#

This is an example of the function finalise() which the users must call at the end of each session.

It takes each output and saves it to a CSV file (or the original file type for custom outputs)
It also saves the SDC analysis for each output to a json file.
It adds checksums for everything - so we know they’ve not been edited.
It puts them all in a folder with the name you supply.

ACRO will not overwrite previous sessions

So every time you call finalise on a session you need to either:

delete the previous folder, or
provide a new folder name

[14]:

output = acro.finalise("my_acro_outputs_v1")

INFO:acro:records:outputs written to: my_acro_outputs_v1

G: Reminder about getting help while you work#

if you remember the name of the command and want an explanation or to explain the syntax from the python prompt type: help(acro.command_name)
if you can’t remember the name of the command, from the python prompt type: help(acro.ACRO)
- not as user friendly but will list all the available commands

[ ]: