ACRO Demonstration#
[1]:
import os
import pandas as pd
[2]:
# uncomment this line if acro is not installed
# ie you are in development mode
# sys.path.insert(0, os.path.abspath(".."))
[3]:
from acro import ACRO
Instantiate ACRO#
[4]:
acro = ACRO(suppress=False)
INFO:acro:version: 0.4.8
INFO:acro:config: {'safe_threshold': 10, 'safe_dof_threshold': 10, 'safe_nk_n': 2, 'safe_nk_k': 0.9, 'safe_pratio_p': 0.1, 'check_missing_values': False, 'survival_safe_threshold': 10, 'zeros_are_disclosive': True}
INFO:acro:automatic suppression: False
Load test data#
The dataset used in this notebook is the nursery dataset from OpenML.
In this version, the data can be read directly from the local machine after it has been downloaded.
The code below reads the data from a folder called “data” which we assume is at the same level as the folder where you are working.
The path might need to be changed if the data has been downloaded and stored elsewhere.
for example use: path = os.path.join(“data”, “nursery.arff”) if the data is in a sub-folder of your work folder
[5]:
from scipy.io.arff import loadarff
path = os.path.join("../data", "nursery.arff")
data = loadarff(path)
df = pd.DataFrame(data[0])
df = df.select_dtypes([object])
df = df.stack().str.decode("utf-8").unstack()
df.rename(columns={"class": "recommend"}, inplace=True)
df.head()
[5]:
| parents | has_nurs | form | children | housing | finance | social | health | recommend | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | usual | proper | complete | 1 | convenient | convenient | nonprob | recommended | recommend |
| 1 | usual | proper | complete | 1 | convenient | convenient | nonprob | priority | priority |
| 2 | usual | proper | complete | 1 | convenient | convenient | nonprob | not_recom | not_recom |
| 3 | usual | proper | complete | 1 | convenient | convenient | slightly_prob | recommended | recommend |
| 4 | usual | proper | complete | 1 | convenient | convenient | slightly_prob | priority | priority |
Examples of producing tabular output#
first, how a researcher would normally make a call in pandas, saving the results in a variable that they can view on screen (or save to file?)
then how the call is identical in SACRO, except that:
“pd” is replaced by “acro”
the researcher immediately sees a copy of what the TRE output checker will see.
Pandas crosstab#
[6]:
table = pd.crosstab(df.recommend, df.parents)
print(table)
parents great_pret pretentious usual
recommend
not_recom 1440 1440 1440
priority 858 1484 1924
recommend 0 0 2
spec_prior 2022 1264 758
very_recom 0 132 196
ACRO crosstab#
This is an example of crosstab using ACRO.
The INFO lines show the researcher what will be reported to the output checkers.
Then the (suppressed as necessary) table is shown via the print command as before.
[7]:
safe_table = acro.crosstab(
df.recommend, df.parents, rownames=["recommendation"], colnames=["parents"]
)
print(safe_table)
INFO:acro:get_summary(): fail; threshold: 4 cells may need suppressing;
INFO:acro:outcome_df:
--------------------------------------------------------|
parents |great_pret |pretentious |usual |
recommendation | | | |
--------------------------------------------------------|
not_recom | ok | ok | ok|
priority | ok | ok | ok|
recommend | threshold; | threshold; | threshold; |
spec_prior | ok | ok | ok|
very_recom | threshold; | ok | ok|
--------------------------------------------------------|
INFO:acro:records:add(): output_0
parents great_pret pretentious usual
recommendation
not_recom 1440 1440 1440
priority 858 1484 1924
recommend 0 0 2
spec_prior 2022 1264 758
very_recom 0 132 196
ACRO crosstab with suppression#
This is an example of crosstab with suppressing the cells that violate the disclosure tests.
Note that you need to change the value of the suppress variable in the acro object to True. Then run the crosstab command.
If you wish to continue the research while suppressing the outputs, leave the suppress variable as it is, otherwise turn it off.
[8]:
acro.suppress = True
safe_table = acro.crosstab(df.recommend, df.parents)
print(safe_table)
INFO:acro:get_summary(): fail; threshold: 4 cells suppressed;
INFO:acro:outcome_df:
----------------------------------------------------|
parents |great_pret |pretentious |usual |
recommend | | | |
----------------------------------------------------|
not_recom | ok | ok | ok|
priority | ok | ok | ok|
recommend | threshold; | threshold; | threshold; |
spec_prior | ok | ok | ok|
very_recom | threshold; | ok | ok|
----------------------------------------------------|
INFO:acro:records:add(): output_1
parents great_pret pretentious usual
recommend
not_recom 1440.0 1440.0 1440.0
priority 858.0 1484.0 1924.0
recommend NaN NaN NaN
spec_prior 2022.0 1264.0 758.0
very_recom NaN 132.0 196.0
[9]:
acro.suppress = False
ACRO functionality to let users manage their outputs#
1: List current ACRO outputs#
This is an example of using the print_output function to list all the outputs created so far
[10]:
acro.print_outputs()
uid: output_0
status: fail
type: table
properties: {'method': 'crosstab'}
sdc: {'summary': {'suppressed': False, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}
command: safe_table = acro.crosstab(
summary: fail; threshold: 4 cells may need suppressing;
outcome: parents great_pret pretentious usual
recommendation
not_recom ok ok ok
priority ok ok ok
recommend threshold; threshold; threshold;
spec_prior ok ok ok
very_recom threshold; ok ok
output: [parents great_pret pretentious usual
recommendation
not_recom 1440 1440 1440
priority 858 1484 1924
recommend 0 0 2
spec_prior 2022 1264 758
very_recom 0 132 196]
timestamp: 2025-03-06T19:38:29.296719
comments: []
exception:
uid: output_1
status: fail
type: table
properties: {'method': 'crosstab'}
sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}
command: safe_table = acro.crosstab(df.recommend, df.parents)
summary: fail; threshold: 4 cells suppressed;
outcome: parents great_pret pretentious usual
recommend
not_recom ok ok ok
priority ok ok ok
recommend threshold; threshold; threshold;
spec_prior ok ok ok
very_recom threshold; ok ok
output: [parents great_pret pretentious usual
recommend
not_recom 1440.0 1440.0 1440.0
priority 858.0 1484.0 1924.0
recommend NaN NaN NaN
spec_prior 2022.0 1264.0 758.0
very_recom NaN 132.0 196.0]
timestamp: 2025-03-06T19:38:29.315826
comments: []
exception:
[10]:
"uid: output_0\nstatus: fail\ntype: table\nproperties: {'method': 'crosstab'}\nsdc: {'summary': {'suppressed': False, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\ncommand: safe_table = acro.crosstab(\nsummary: fail; threshold: 4 cells may need suppressing; \noutcome: parents great_pret pretentious usual\nrecommendation \nnot_recom ok ok ok\npriority ok ok ok\nrecommend threshold; threshold; threshold; \nspec_prior ok ok ok\nvery_recom threshold; ok ok\noutput: [parents great_pret pretentious usual\nrecommendation \nnot_recom 1440 1440 1440\npriority 858 1484 1924\nrecommend 0 0 2\nspec_prior 2022 1264 758\nvery_recom 0 132 196]\ntimestamp: 2025-03-06T19:38:29.296719\ncomments: []\nexception: \n\nuid: output_1\nstatus: fail\ntype: table\nproperties: {'method': 'crosstab'}\nsdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\ncommand: safe_table = acro.crosstab(df.recommend, df.parents)\nsummary: fail; threshold: 4 cells suppressed; \noutcome: parents great_pret pretentious usual\nrecommend \nnot_recom ok ok ok\npriority ok ok ok\nrecommend threshold; threshold; threshold; \nspec_prior ok ok ok\nvery_recom threshold; ok ok\noutput: [parents great_pret pretentious usual\nrecommend \nnot_recom 1440.0 1440.0 1440.0\npriority 858.0 1484.0 1924.0\nrecommend NaN NaN NaN\nspec_prior 2022.0 1264.0 758.0\nvery_recom NaN 132.0 196.0]\ntimestamp: 2025-03-06T19:38:29.315826\ncomments: []\nexception: \n\n"
2: Remove some ACRO outputs before finalising#
The output name can be taken from the outputs listed by the print_outputs function,
or by listing the results and choosing the specific output that needs to be removed
[11]:
acro.remove_output("output_0")
INFO:acro:records:remove(): output_0 removed
3: Rename ACRO outputs before finalising#
This is an example of renaming the outputs to provide a more descriptive name.
[12]:
acro.rename_output("output_1", "cross_tabulation")
INFO:acro:records:rename_output(): output_1 renamed to cross_tabulation
4: Add a comment to output#
[13]:
acro.add_comments("cross_tabulation", "Please let me have this data.")
INFO:acro:records:a comment was added to cross_tabulation
5: (the big one) Finalise ACRO#
This is an example of the function finalise() which the users must call at the end of each session.
It takes each output and saves it to a CSV file.
It also saves the SDC analysis for each output to a json file or Excel file (depending on the extension of the name of the file provided as an input to the function)
[14]:
output = acro.finalise("Examples", "json")
INFO:acro:records:
uid: cross_tabulation
status: fail
type: table
properties: {'method': 'crosstab'}
sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}
command: safe_table = acro.crosstab(df.recommend, df.parents)
summary: fail; threshold: 4 cells suppressed;
outcome: parents great_pret pretentious usual
recommend
not_recom ok ok ok
priority ok ok ok
recommend threshold; threshold; threshold;
spec_prior ok ok ok
very_recom threshold; ok ok
output: [parents great_pret pretentious usual
recommend
not_recom 1440.0 1440.0 1440.0
priority 858.0 1484.0 1924.0
recommend NaN NaN NaN
spec_prior 2022.0 1264.0 758.0
very_recom NaN 132.0 196.0]
timestamp: 2025-03-06T19:38:29.315826
comments: ['Please let me have this data.']
exception:
The status of the record above is: fail.
Please explain why an exception should be granted.
exception requested
INFO:acro:records:outputs written to: Examples