ACRO Demonstration#
This is a simple notebook to get you started with using the acro package to add disclosure risk control to your analysis.
Assumptions#
storing and manipulating datasets
creating basic tables, pivot_tables, and plots (e.g. histograms)
This example is a Jupyter notebook split into cells.
Cells may contain code or text/images, and normally they are processed by stepping through them one-by-one.
To run (or render) a cell click the run icon (at the top of the page) or shift-return on your keyboard. That will display any output created (for a code cell) and move the focus to the next cell.
A: The basic concepts#
1: A research session:#
by which we mean the activity of running a series of commands (interactively or via a script) that:
ingest some data,
manipulate it, and then
produce (and store) some outputs.
2: Types of commands:#
Whether interactive, or just running a final script, we can think of the commands that get run in a session as dividing into:
manipulation commands that load and transform data into the shape you want
feedback commands that report on your data - but are never intended to be exported. For example, running
head()ordescribe()commands to make sure your manipulations have got the data into the format you want.query commands that produce an output from your data (table/plot/regression model etc.) that you might want to export from the Trusted Research Environment (TRE)
3: Risk Assessment vs decision making:#
SACRO stands for Semi-Automated Checking of Research Outputs. The prefix ‘Semi’ is important here - because in a principles-based system humans should make decisions about output requests. To help with that we provide the SACRO-Viewer, which collates all the relevant information for them.
A key part of that information is the Risk Assessment.
Since it involves calculating metrics and comparing them to thresholds (the TRE’s risk appetite) it can be done automatically, at the time an output query runs on the data.
This is what the ACRO tool does when you use it as part of your workflow.
4: What ACRO does#
The ACRO package aims to support you in producing Safe Outputs with minimal changes to your work flow. To do that we provide:
drop-in replacements for the most commonly used output commands,
keeping the same syntax as the originals, and
supporting as many of the options as we can (features supported will increase over time in response to demand).
a set of session-management commands to help you manage the set of files you request for output.
Important to note that currently acro outputs results (tables, details of regression models etc.) as
.csvfiles.In other words we separate the processes of creating outputs - which must be done inside the TRE. from the process of formatting them for publication - which can be done outside the TRE with your preferred toolchain.
ACRO handles creation. We are interested in hearing from researchers whether it is important to support them with formatting
B: Getting Started with the demonstration#
Step 1: Setting up the environment with the tools we will use#
We will begin by importing some standard data science packages, and also the acro package itself.
[1]:
import os
import numpy as np
import pandas as pd
from acro import ACRO
Step 2: Starting an ACRO session#
To do this we create an acro object by running the cell below.
You can leave out the default parameters, but the cell below shows how you can:
provide the name of a config (risk appetite) file the TRE may have asked you to use
turn automatic suppression on or off right from the start of your session
Note that when the cell runs it should report (in a different coloured font/background)
what version of acro is running: this should be 0.4.12
the TRE’s risk appetite: that defines the rules your outputs will be checked against.
whether suppression is automatically applied to disclosive outputs.
[2]:
acro = ACRO(config="default", suppress=False)
INFO:acro:version: 0.4.12
INFO:acro:config: {'safe_threshold': 10, 'safe_dof_threshold': 10, 'safe_nk_n': 2, 'safe_nk_k': 0.9, 'safe_pratio_p': 0.1, 'check_missing_values': False, 'survival_safe_threshold': 10, 'zeros_are_disclosive': True}
INFO:acro:automatic suppression: False
Step 3: Loading some test data#
The following cells in this step just contain standard ingestion and manipulation commands to load some data into a Pandas dataframe ready to be queried. We will use some open-source data about nursery admissions.
There is no change to your workflow here
Do whatever you want in this step!
We just assume you end up with your data in a pandas dataframe.
[3]:
from scipy.io.arff import loadarff
##--- Manipulation commands ---
# specify where the data is
path = os.path.join("../data", "nursery.arff")
# read it in using a common dataloader
data = loadarff(path)
# store in a pandas dataframe with some manipulation of type variable names
df = pd.DataFrame(data[0])
df = df.select_dtypes([object])
df = df.stack().str.decode("utf-8").unstack()
df.rename(columns={"class": "recommendation"}, inplace=True)
# make the children variable numeric
# so we can report statistics like mean etc.
df["children"].replace(to_replace={"more": "4"}, inplace=True)
df["children"] = pd.to_numeric(df["children"])
df["children"] = df.apply(
lambda row: (
row["children"] if row["children"] in (1, 2, 3) else np.random.randint(4, 10)
),
axis=1,
)
[4]:
##--- Feedback Command ----
# show the first 5 rows to make sure everything is how we would expect
df.head()
[4]:
| parents | has_nurs | form | children | housing | finance | social | health | recommendation | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | usual | proper | complete | 1 | convenient | convenient | nonprob | recommended | recommend |
| 1 | usual | proper | complete | 1 | convenient | convenient | nonprob | priority | priority |
| 2 | usual | proper | complete | 1 | convenient | convenient | nonprob | not_recom | not_recom |
| 3 | usual | proper | complete | 1 | convenient | convenient | slightly_prob | recommended | recommend |
| 4 | usual | proper | complete | 1 | convenient | convenient | slightly_prob | priority | priority |
C: Producing tables that are ‘Safe Outputs’#
The easiest way to make tables in python is to use the industry-standard pandas crosstab() function.
There are hundreds (thousands?) of web sites showing how to do this.
You can make (hierarchical) 2-D tables (or 1-D if you add a ‘dummy’ variable containing the same value for each row)
you can specify what the table cells contain by:
providing a statistic - for example: mean, count, std deviation, median etc.(pandas calls these aggregation functions)
specifying what variable to report on
The acro version uses all the pandas code - but it adds extra code that checks for disclosure risks depending on the statistic you ask for
Example 1: A simple 2-D table of frequencies stratified by two variables#
Note that having imported the pandas package with the shortname pd(most people do) you would normally write
pd.crosstab(...)
so the only change is to use the prefix acro. rather than pd.
NB: the first two parameters to crosstab() are mandatory, so you could just do crosstab(df.recommendation,df.parents) to save typing.
Now run the next cell.
[5]:
acro.crosstab(index=df.recommendation, columns=df.parents)
INFO:acro:get_summary(): fail; threshold: 4 cells may need suppressing;
INFO:acro:outcome_df:
--------------------------------------------------------|
parents |great_pret |pretentious |usual |
recommendation | | | |
--------------------------------------------------------|
not_recom | ok | ok | ok|
priority | ok | ok | ok|
recommend | threshold; | threshold; | threshold; |
spec_prior | ok | ok | ok|
very_recom | threshold; | ok | ok|
--------------------------------------------------------|
INFO:acro:records:add(): output_0
[5]:
| parents | great_pret | pretentious | usual |
|---|---|---|---|
| recommendation | |||
| not_recom | 1440 | 1440 | 1440 |
| priority | 858 | 1484 | 1924 |
| recommend | 0 | 0 | 2 |
| spec_prior | 2022 | 1264 | 758 |
| very_recom | 0 | 132 | 196 |
How to understand this output#
The top part (with a pink background) is the risk analysis produced by acro. It is telling us that:
the overall summary is fail because 4 cells are failing the ‘minimum threshold’ check
then it is showing which cells failed so you can choose how to respond
finally it is telling us that is has saved the table and risk assessment to our acro session with id “output_0”
The part below is the normal output produced by the pandas crosstab() function.
As this is such a small table it is not hard to spot the four problematic cells with zero or low counts
but of course this might be harder for a bigger table.
How to respond to this input#
There are basically three choices:
We might decide these low numbers reveal something where the public interest outweighs the disclosure risk. Rather than being a strict rules-based system, acro lets you attach an ‘exception request’ to an output, to send a message to the output checkers. For example, you could type:
acro.add_exception('output_0',"I think you should let me have this because...")
We redesign our data so that table so that none of the cells in the resulting table represent fewer than n people (10 for the default risk appetite) For example, we could recode ‘very_recommend’ and ‘priority’ into one label. But maybe it is revealing that the ‘recommend’ label is not used?
We can redact the disclosive cells - and acro will do this for us. We simply enable the option to suppress disclosive cells and re-run the query.
The cell below shows option 3. When you run the cell below you should see that:
the status now changes to
review(so the output-checker knows what has been applied)the code automatically adds an exception request saying that suppression has been applied
and, most importantly, the cells are redacted.
[6]:
acro.enable_suppression()
acro.crosstab(index=df.recommendation, columns=df.parents)
INFO:acro:get_summary(): review; threshold: 4 cells suppressed;
INFO:acro:outcome_df:
--------------------------------------------------------|
parents |great_pret |pretentious |usual |
recommendation | | | |
--------------------------------------------------------|
not_recom | ok | ok | ok|
priority | ok | ok | ok|
recommend | threshold; | threshold; | threshold; |
spec_prior | ok | ok | ok|
very_recom | threshold; | ok | ok|
--------------------------------------------------------|
INFO:acro:records:add(): output_1
INFO:acro:records:exception request was added to output_1
[6]:
| parents | great_pret | pretentious | usual |
|---|---|---|---|
| recommendation | |||
| not_recom | 1440.0 | 1440.0 | 1440.0 |
| priority | 858.0 | 1484.0 | 1924.0 |
| recommend | NaN | NaN | NaN |
| spec_prior | 2022.0 | 1264.0 | 758.0 |
| very_recom | NaN | 132.0 | 196.0 |
An example of a more complex table#
Just to show off the sort of tables that crosstab() can produce, let’s make something more complex. Going through the parameters in order:
passing a list of variable names to
index(rather than a single variable/column name) tells it we want a hierarchy within the rows.we can do the same to columns as well (or instead) if we want to
setting
values=df.children(the name of a column in the dataset) tells it we want to report something about the number of children for each sub-group (table cell)setting
aggfunc=meantells it the statistic we want to report is the mean number of children (which introduces additional risks of dominance)setting
margins=Truetells it to display row and column sub-totals
It’s worth noting that including the totals there are 6 columns in the risk assessment and 5 in the suppressed table. This is because after suppression has replaced numbers with NaN, pandas removes the fully suppressed column (‘recommend’) from the table.
[7]:
acro.suppress = True
acro.crosstab(
index=[df.parents, df.finance],
columns=df.recommendation,
values=df.children,
aggfunc="mean",
margins=True,
)
INFO:acro:get_summary(): review; threshold: 2 cells suppressed; p-ratio: 9 cells suppressed; nk-rule: 9 cells suppressed;
INFO:acro:outcome_df:
-----------------------------------------------------------------------------------------------------------------|
|recommendation | not_recom| priority recommend |spec_prior |very_recom |All|
|parents finance | | | | | |
-----------------------------------------------------------------------------------------------------------------|
|great_pret convenient| ok | ok p-ratio; nk-rule; | ok | p-ratio; nk-rule; | ok|
| inconv | ok | ok p-ratio; nk-rule; | ok | p-ratio; nk-rule; | ok|
|pretentious convenient| ok | ok p-ratio; nk-rule; | ok | ok | ok|
| inconv | ok | ok p-ratio; nk-rule; | ok | ok | ok|
|usual convenient| ok | ok threshold; p-ratio; nk-rule; | ok | ok | ok|
| inconv | ok | ok p-ratio; nk-rule; | ok | ok | ok|
|All | ok | ok threshold; p-ratio; nk-rule; | ok | ok | ok|
-----------------------------------------------------------------------------------------------------------------|
INFO:acro:records:add(): output_2
INFO:acro:records:exception request was added to output_2
[7]:
| recommendation | not_recom | priority | spec_prior | very_recom | All | |
|---|---|---|---|---|---|---|
| parents | finance | |||||
| great_pret | convenient | 3.104167 | 2.789062 | 3.320043 | NaN | 3.122222 |
| inconv | 3.123611 | 2.401734 | 3.372943 | NaN | 3.134259 | |
| pretentious | convenient | 3.065278 | 3.058594 | 3.289384 | 2.590909 | 3.104167 |
| inconv | 3.008333 | 2.997207 | 3.345588 | 1.363636 | 3.077315 | |
| usual | convenient | 3.134722 | 3.135892 | 3.325581 | 2.607692 | 3.133920 |
| inconv | 3.102778 | 3.075000 | 3.362319 | 1.363636 | 3.087037 | |
| All | 3.089815 | 2.983826 | 3.339021 | 2.185976 | 3.109816 |
D: What other sorts of analysis does ACRO currently support?#
We are continually adding support for more types of analysis as users prioritise them.
ACRO currently supports:
Tables via
acro.crosstab()andacro.pivot_table().supported aggregation functions are: mean, median, sum, std, count, and mode.
Survival analysis via:
acro.surv_function(),acro.survival_table()andacro.survival_plot()Histograms via:
acro.hist()Regression via:
acro.ols(),acro.logit(),acro.probit()with options for specifying formula in ‘R-style’ by adding the suffix ‘r’ e.g.acro.olsr()etc.
You can get help on using any of these using the standard python help() syntax as shown in the next cell
[8]:
help(acro.logit)
Help on method logit in module acro.acro_regression:
logit(endog, exog, missing: 'str | None' = None, check_rank: 'bool' = True) -> 'BinaryResultsWrapper' method of acro.acro.ACRO instance
Fits Logit model.
Parameters
----------
endog : array_like
A 1-d endogenous response variable. The dependent variable.
exog : array_like
A nobs x k array where nobs is the number of observations and k is
the number of regressors. An intercept is not included by default
and should be added by the user.
missing : str | None
Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no
nan checking is done. If ‘drop’, any observations with nans are
dropped. If ‘raise’, an error is raised. Default is ‘none’.
check_rank : bool
Check exog rank to determine model degrees of freedom. Default is
True. Setting to False reduces model initialization time when
exog.shape[1] is large.
Returns
-------
BinaryResultsWrapper
Results.
E: ACRO functionality to let users manage their outputs#
As explained above, you need to create an “acro session” whenever your code is run.
After that, every time you run an acro `query’ command both the output and the risk assessment are saved as part of the acro session.
But we recognise that:
You may not want to request release of all your outputs - for example, the first table we produced above.
It is good practice to provide a more informative name than just output_n for the .csv files that acro produces
It helps the output checker if you provide some comments saying what the outputs are.
You might want to add more things to the bundles of files you want to take out, such as:
outputs from analyses that acro doesn’t currently support
your code itself (which many journals want)
maybe a version of your paper in pdf/word format etc.
Therefore acro provides the following commands for ‘session management’ ### 1: Listing the current contents of an ACRO session This output is not beautiful (there’s a GUI coming soon) but it should let you identify outputs you want to rename,comment on, or delete.
[9]:
_ = acro.print_outputs()
uid: output_0
status: fail
type: table
properties: {'method': 'crosstab'}
sdc: {'summary': {'suppressed': False, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}
command: acro.crosstab(index=df.recommendation, columns=df.parents)
summary: fail; threshold: 4 cells may need suppressing;
outcome: parents great_pret pretentious usual
recommendation
not_recom ok ok ok
priority ok ok ok
recommend threshold; threshold; threshold;
spec_prior ok ok ok
very_recom threshold; ok ok
output: [parents great_pret pretentious usual
recommendation
not_recom 1440 1440 1440
priority 858 1484 1924
recommend 0 0 2
spec_prior 2022 1264 758
very_recom 0 132 196]
timestamp: 2026-02-11T18:33:37.547019
comments: []
exception:
uid: output_1
status: review
type: table
properties: {'method': 'crosstab'}
sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}
command: acro.crosstab(index=df.recommendation, columns=df.parents)
summary: review; threshold: 4 cells suppressed;
outcome: parents great_pret pretentious usual
recommendation
not_recom ok ok ok
priority ok ok ok
recommend threshold; threshold; threshold;
spec_prior ok ok ok
very_recom threshold; ok ok
output: [parents great_pret pretentious usual
recommendation
not_recom 1440.0 1440.0 1440.0
priority 858.0 1484.0 1924.0
recommend NaN NaN NaN
spec_prior 2022.0 1264.0 758.0
very_recom NaN 132.0 196.0]
timestamp: 2026-02-11T18:33:37.566599
comments: []
exception: Suppression automatically applied where needed
uid: output_2
status: review
type: table
properties: {'method': 'crosstab'}
sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 2, 'p-ratio': 9, 'nk-rule': 9, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[4, 2], [6, 2]], 'p-ratio': [[0, 2], [0, 4], [1, 2], [1, 4], [2, 2], [3, 2], [4, 2], [5, 2], [6, 2]], 'nk-rule': [[0, 2], [0, 4], [1, 2], [1, 4], [2, 2], [3, 2], [4, 2], [5, 2], [6, 2]], 'all-values-are-same': []}}
command: acro.crosstab(
summary: review; threshold: 2 cells suppressed; p-ratio: 9 cells suppressed; nk-rule: 9 cells suppressed;
outcome: recommendation not_recom priority recommend \
parents finance
great_pret convenient ok ok p-ratio; nk-rule;
inconv ok ok p-ratio; nk-rule;
pretentious convenient ok ok p-ratio; nk-rule;
inconv ok ok p-ratio; nk-rule;
usual convenient ok ok threshold; p-ratio; nk-rule;
inconv ok ok p-ratio; nk-rule;
All ok ok threshold; p-ratio; nk-rule;
recommendation spec_prior very_recom All
parents finance
great_pret convenient ok p-ratio; nk-rule; ok
inconv ok p-ratio; nk-rule; ok
pretentious convenient ok ok ok
inconv ok ok ok
usual convenient ok ok ok
inconv ok ok ok
All ok ok ok
output: [recommendation not_recom priority spec_prior very_recom All
parents finance
great_pret convenient 3.104167 2.789062 3.320043 NaN 3.122222
inconv 3.123611 2.401734 3.372943 NaN 3.134259
pretentious convenient 3.065278 3.058594 3.289384 2.590909 3.104167
inconv 3.008333 2.997207 3.345588 1.363636 3.077315
usual convenient 3.134722 3.135892 3.325581 2.607692 3.133920
inconv 3.102778 3.075000 3.362319 1.363636 3.087037
All 3.089815 2.983826 3.339021 2.185976 3.109816]
timestamp: 2026-02-11T18:33:37.754124
comments: []
exception: Suppression automatically applied where needed
2: Remove some ACRO outputs before finalising#
At the start of this demo we made a disclosive output -it;s the first one with status fail.
We don’t want to waste the output checker’s time so lets remove it.
[10]:
acro.remove_output("output_0")
INFO:acro:records:remove(): output_0 removed
3: Rename ACRO outputs before finalising#
This is an example of renaming the outputs to provide more descriptive names.
[11]:
acro.rename_output("output_1", " crosstab_recommendation_vs_parents")
acro.rename_output("output_2", "mean_children_by_parents_finance_recommendation")
INFO:acro:records:rename_output(): output_1 renamed to crosstab_recommendation_vs_parents
INFO:acro:records:rename_output(): output_2 renamed to mean_children_by_parents_finance_recommendation
4: Add a comment to output#
[12]:
acro.add_comments(
"mean_children_by_parents_finance_recommendation",
"too few cases of recommend to report",
)
INFO:acro:records:a comment was added to mean_children_by_parents_finance_recommendation
5. Request an exception#
An example of providing a reason why an exception should be made
acro.add_exception("output_n", "This is evidence of systematic bias?")
6: Adding a custom output.#
As mentioned above you might want to request release of all sorts of things
including your code,
or outputs from analyses acro doesn’t support (yet)
In ACRO we can add a file to our session with a comment describing what it is
[13]:
acro.custom_output("acro_demo_2026.py", "This is the code that produced this session")
INFO:acro:records:add_custom(): output_3
F: Finishing your session and producing a folder of files to release.#
This is an example of the function finalise() which the users must call at the end of each session.
It takes each output and saves it to a CSV file (or the original file type for custom outputs)
It also saves the SDC analysis for each output to a json file.
It adds checksums for everything - so we know they’ve not been edited.
It puts them all in a folder with the name you supply.
ACRO will not overwrite previous sessions
So every time you call finalise on a session you need to either:
delete the previous folder, or
provide a new folder name
[14]:
output = acro.finalise("my_acro_outputs_v1")
INFO:acro:records:outputs written to: my_acro_outputs_v1
G: Reminder about getting help while you work#
if you remember the name of the command and want an explanation or to explain the syntax from the python prompt type:
help(acro.command_name)if you can’t remember the name of the command, from the python prompt type:
help(acro.ACRO)not as user friendly but will list all the available commands
[ ]: