ACRO Demonstration#

[1]:
import os

import pandas as pd
[2]:
# uncomment this line if acro is not installed
# ie you are in development mode
# sys.path.insert(0, os.path.abspath(".."))
[3]:
from acro import ACRO

Instantiate ACRO#

[4]:
acro = ACRO(suppress=False)
INFO:acro:version: 0.4.8
INFO:acro:config: {'safe_threshold': 10, 'safe_dof_threshold': 10, 'safe_nk_n': 2, 'safe_nk_k': 0.9, 'safe_pratio_p': 0.1, 'check_missing_values': False, 'survival_safe_threshold': 10, 'zeros_are_disclosive': True}
INFO:acro:automatic suppression: False

Load test data#

The dataset used in this notebook is the nursery dataset from OpenML.

  • In this version, the data can be read directly from the local machine after it has been downloaded.

  • The code below reads the data from a folder called “data” which we assume is at the same level as the folder where you are working.

  • The path might need to be changed if the data has been downloaded and stored elsewhere.

  • for example use: path = os.path.join(“data”, “nursery.arff”) if the data is in a sub-folder of your work folder

[5]:
from scipy.io.arff import loadarff

path = os.path.join("../data", "nursery.arff")
data = loadarff(path)
df = pd.DataFrame(data[0])
df = df.select_dtypes([object])
df = df.stack().str.decode("utf-8").unstack()
df.rename(columns={"class": "recommend"}, inplace=True)
df.head()
[5]:
parents has_nurs form children housing finance social health recommend
0 usual proper complete 1 convenient convenient nonprob recommended recommend
1 usual proper complete 1 convenient convenient nonprob priority priority
2 usual proper complete 1 convenient convenient nonprob not_recom not_recom
3 usual proper complete 1 convenient convenient slightly_prob recommended recommend
4 usual proper complete 1 convenient convenient slightly_prob priority priority

Examples of producing tabular output#

We rely on the industry-standard package pandas for tabulating data.
In the next few examples we show:
  • first, how a researcher would normally make a call in pandas, saving the results in a variable that they can view on screen (or save to file?)

  • then how the call is identical in SACRO, except that:

    • “pd” is replaced by “acro”

    • the researcher immediately sees a copy of what the TRE output checker will see.

Pandas crosstab#

This is an example of crosstab using pandas.
We first make the call, then the second line print the outputs to screen.
[6]:
table = pd.crosstab(df.recommend, df.parents)
print(table)
parents     great_pret  pretentious  usual
recommend
not_recom         1440         1440   1440
priority           858         1484   1924
recommend            0            0      2
spec_prior        2022         1264    758
very_recom           0          132    196

ACRO crosstab#

  • This is an example of crosstab using ACRO.

  • The INFO lines show the researcher what will be reported to the output checkers.

  • Then the (suppressed as necessary) table is shown via the print command as before.

[7]:
safe_table = acro.crosstab(
    df.recommend, df.parents, rownames=["recommendation"], colnames=["parents"]
)
print(safe_table)
INFO:acro:get_summary(): fail; threshold: 4 cells may need suppressing;
INFO:acro:outcome_df:
--------------------------------------------------------|
parents        |great_pret   |pretentious  |usual       |
recommendation |             |             |            |
--------------------------------------------------------|
not_recom      |          ok |          ok |          ok|
priority       |          ok |          ok |          ok|
recommend      | threshold;  | threshold;  | threshold; |
spec_prior     |          ok |          ok |          ok|
very_recom     | threshold;  |          ok |          ok|
--------------------------------------------------------|

INFO:acro:records:add(): output_0
parents         great_pret  pretentious  usual
recommendation
not_recom             1440         1440   1440
priority               858         1484   1924
recommend                0            0      2
spec_prior            2022         1264    758
very_recom               0          132    196

ACRO crosstab with suppression#

  • This is an example of crosstab with suppressing the cells that violate the disclosure tests.

  • Note that you need to change the value of the suppress variable in the acro object to True. Then run the crosstab command.

  • If you wish to continue the research while suppressing the outputs, leave the suppress variable as it is, otherwise turn it off.

[8]:
acro.suppress = True

safe_table = acro.crosstab(df.recommend, df.parents)
print(safe_table)
INFO:acro:get_summary(): fail; threshold: 4 cells suppressed;
INFO:acro:outcome_df:
----------------------------------------------------|
parents    |great_pret   |pretentious  |usual       |
recommend  |             |             |            |
----------------------------------------------------|
not_recom  |          ok |          ok |          ok|
priority   |          ok |          ok |          ok|
recommend  | threshold;  | threshold;  | threshold; |
spec_prior |          ok |          ok |          ok|
very_recom | threshold;  |          ok |          ok|
----------------------------------------------------|

INFO:acro:records:add(): output_1
parents     great_pret  pretentious   usual
recommend
not_recom       1440.0       1440.0  1440.0
priority         858.0       1484.0  1924.0
recommend          NaN          NaN     NaN
spec_prior      2022.0       1264.0   758.0
very_recom         NaN        132.0   196.0
[9]:
acro.suppress = False

ACRO functionality to let users manage their outputs#

1: List current ACRO outputs#

This is an example of using the print_output function to list all the outputs created so far

[10]:
acro.print_outputs()
uid: output_0
status: fail
type: table
properties: {'method': 'crosstab'}
sdc: {'summary': {'suppressed': False, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}
command: safe_table = acro.crosstab(
summary: fail; threshold: 4 cells may need suppressing;
outcome: parents          great_pret  pretentious        usual
recommendation
not_recom                ok           ok           ok
priority                 ok           ok           ok
recommend       threshold;   threshold;   threshold;
spec_prior               ok           ok           ok
very_recom      threshold;            ok           ok
output: [parents         great_pret  pretentious  usual
recommendation
not_recom             1440         1440   1440
priority               858         1484   1924
recommend                0            0      2
spec_prior            2022         1264    758
very_recom               0          132    196]
timestamp: 2025-03-06T19:38:29.296719
comments: []
exception:

uid: output_1
status: fail
type: table
properties: {'method': 'crosstab'}
sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}
command: safe_table = acro.crosstab(df.recommend, df.parents)
summary: fail; threshold: 4 cells suppressed;
outcome: parents      great_pret  pretentious        usual
recommend
not_recom            ok           ok           ok
priority             ok           ok           ok
recommend   threshold;   threshold;   threshold;
spec_prior           ok           ok           ok
very_recom  threshold;            ok           ok
output: [parents     great_pret  pretentious   usual
recommend
not_recom       1440.0       1440.0  1440.0
priority         858.0       1484.0  1924.0
recommend          NaN          NaN     NaN
spec_prior      2022.0       1264.0   758.0
very_recom         NaN        132.0   196.0]
timestamp: 2025-03-06T19:38:29.315826
comments: []
exception:


[10]:
"uid: output_0\nstatus: fail\ntype: table\nproperties: {'method': 'crosstab'}\nsdc: {'summary': {'suppressed': False, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\ncommand: safe_table = acro.crosstab(\nsummary: fail; threshold: 4 cells may need suppressing; \noutcome: parents          great_pret  pretentious        usual\nrecommendation                                       \nnot_recom                ok           ok           ok\npriority                 ok           ok           ok\nrecommend       threshold;   threshold;   threshold; \nspec_prior               ok           ok           ok\nvery_recom      threshold;            ok           ok\noutput: [parents         great_pret  pretentious  usual\nrecommendation                                \nnot_recom             1440         1440   1440\npriority               858         1484   1924\nrecommend                0            0      2\nspec_prior            2022         1264    758\nvery_recom               0          132    196]\ntimestamp: 2025-03-06T19:38:29.296719\ncomments: []\nexception: \n\nuid: output_1\nstatus: fail\ntype: table\nproperties: {'method': 'crosstab'}\nsdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\ncommand: safe_table = acro.crosstab(df.recommend, df.parents)\nsummary: fail; threshold: 4 cells suppressed; \noutcome: parents      great_pret  pretentious        usual\nrecommend                                        \nnot_recom            ok           ok           ok\npriority             ok           ok           ok\nrecommend   threshold;   threshold;   threshold; \nspec_prior           ok           ok           ok\nvery_recom  threshold;            ok           ok\noutput: [parents     great_pret  pretentious   usual\nrecommend                                  \nnot_recom       1440.0       1440.0  1440.0\npriority         858.0       1484.0  1924.0\nrecommend          NaN          NaN     NaN\nspec_prior      2022.0       1264.0   758.0\nvery_recom         NaN        132.0   196.0]\ntimestamp: 2025-03-06T19:38:29.315826\ncomments: []\nexception: \n\n"

2: Remove some ACRO outputs before finalising#

This is an example of deleting some of the ACRO outputs.
The name of the output that needs to be removed should be passed to the function remove_output.
  • The output name can be taken from the outputs listed by the print_outputs function,

  • or by listing the results and choosing the specific output that needs to be removed

[11]:
acro.remove_output("output_0")
INFO:acro:records:remove(): output_0 removed

3: Rename ACRO outputs before finalising#

This is an example of renaming the outputs to provide a more descriptive name.

[12]:
acro.rename_output("output_1", "cross_tabulation")
INFO:acro:records:rename_output(): output_1 renamed to cross_tabulation

4: Add a comment to output#

This is an example to add a comment to outputs.
It can be used to provide a description or to pass additional information to the output checkers.
[13]:
acro.add_comments("cross_tabulation", "Please let me have this data.")
INFO:acro:records:a comment was added to cross_tabulation

5: (the big one) Finalise ACRO#

This is an example of the function finalise() which the users must call at the end of each session.

  • It takes each output and saves it to a CSV file.

  • It also saves the SDC analysis for each output to a json file or Excel file (depending on the extension of the name of the file provided as an input to the function)

[14]:
output = acro.finalise("Examples", "json")
INFO:acro:records:
uid: cross_tabulation
status: fail
type: table
properties: {'method': 'crosstab'}
sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}
command: safe_table = acro.crosstab(df.recommend, df.parents)
summary: fail; threshold: 4 cells suppressed;
outcome: parents      great_pret  pretentious        usual
recommend
not_recom            ok           ok           ok
priority             ok           ok           ok
recommend   threshold;   threshold;   threshold;
spec_prior           ok           ok           ok
very_recom  threshold;            ok           ok
output: [parents     great_pret  pretentious   usual
recommend
not_recom       1440.0       1440.0  1440.0
priority         858.0       1484.0  1924.0
recommend          NaN          NaN     NaN
spec_prior      2022.0       1264.0   758.0
very_recom         NaN        132.0   196.0]
timestamp: 2025-03-06T19:38:29.315826
comments: ['Please let me have this data.']
exception:

The status of the record above is: fail.
Please explain why an exception should be granted.

 exception requested
INFO:acro:records:outputs written to: Examples