{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ACRO Demonstration\n",
    "This is a simple notebook to get you started with using the ```acro``` package to add disclosure risk control  to your analysis.\n",
    "\n",
    "### Assumptions\n",
    "For the purpose of this tutorial we assume some minimal prior experience with using python for data science.  \n",
    "In particular the use of the industry-standard Pandas package for:\n",
    "   -  storing and manipulating datasets\n",
    "   -  creating basic  tables, pivot_tables, and plots (e.g. histograms)\n",
    "\n",
    "This example is a Jupyter notebook split into cells.\n",
    "- Cells may contain code or text/images, and normally they are processed by stepping through them one-by-one.\n",
    "- To run (or render) a cell click the *run* icon (at the top of the page) or *shift-return* on your keyboard.\n",
    "  That will display any output created (for a code cell) and move the focus to the next cell."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A: The basic concepts\n",
    "### 1: A research _session_: \n",
    "by which we mean the activity of running a series of commands (interactively or via a script) that:\n",
    "-  ingest some data,\n",
    "-  manipulate it, and then\n",
    "-  produce (and store) some outputs.\n",
    "\n",
    "### 2: Types of commands: \n",
    "Whether interactive, or just running a final script, we can think of the commands that get run in a session as dividing into:\n",
    "- *manipulation* commands that load and transform data into the shape you want\n",
    "- *feedback* commands that report on your data - but are never intended to be exported.\n",
    "  For example, running ```head()``` or ```describe()``` commands to make sure your manipulations have got the data into the format you want.\n",
    "-  *query* commands that produce an output from your data (table/plot/regression model etc.) that you might want to export from the Trusted Research Environment (TRE)\n",
    "\n",
    "### 3: Risk Assessment vs decision making: \n",
    "SACRO stands for Semi-Automated Checking of Research Outputs. <br>\n",
    " The prefix 'Semi' is important here - because in a principles-based system humans should make _decisions_ about output requests. <br>\n",
    "To help with that we provide the SACRO-Viewer, which collates all the relevant information for them.\n",
    "\n",
    "A key part of that information is the  _Risk Assessment_. \n",
    "- Since it involves calculating metrics and comparing them to thresholds (the TRE's risk appetite) it can be done automatically, at the time an output query runs on the data.\n",
    "- This is what the ACRO tool does when you use it as part of your workflow.\n",
    "\n",
    "### 4: What ACRO does\n",
    "The ACRO package aims to support you in producing *Safe Outputs* with minimal changes to your work flow.\n",
    "To do that we provide:\n",
    "- drop-in replacements for the most commonly used *output commands*,\n",
    "  - keeping the same syntax as the originals, and\n",
    "  - supporting as many of the options as we can (features supported will increase over time in response to demand).\n",
    "- a set of *session-management* commands to help you manage the set of files you request for output.\n",
    "- **Important to note** that currently acro outputs results (tables, details of regression models etc.) as `.csv` files. <br>\n",
    "  - In other words we separate the processes of _creating_ outputs - which *must* be done *inside* the TRE.<br>\n",
    "    from the process of _formatting_ them for publication - which can be done outside the TRE with your preferred toolchain.\n",
    "  - ACRO handles creation. We are interested in hearing from researchers whether it is important to support them with formatting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## B: Getting Started with the demonstration\n",
    "\n",
    "### Step 1: Setting up the environment with the tools we will use\n",
    "We will begin by importing some standard data science packages, and also the acro  package itself."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "from acro import ACRO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2: Starting an ACRO session\n",
    "To do this we create an acro object by running the cell below. \n",
    "\n",
    "You can leave out the default parameters, but the cell below shows how you can:\n",
    "- provide the name of a *config* (risk appetite) file the TRE may have asked you to use\n",
    "- turn automatic suppression on or off right from the start of your session\n",
    "\n",
    "Note that when the cell runs it should report (in a different coloured font/background)\n",
    "- what version of acro is running: *this should be 0.4.12*\n",
    "- the TRE's risk appetite: that defines the rules your outputs will be checked against.\n",
    "- whether suppression is automatically applied to disclosive outputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:acro:version: 0.4.12\n",
      "INFO:acro:config: {'safe_threshold': 10, 'safe_dof_threshold': 10, 'safe_nk_n': 2, 'safe_nk_k': 0.9, 'safe_pratio_p': 0.1, 'check_missing_values': False, 'survival_safe_threshold': 10, 'zeros_are_disclosive': True}\n",
      "INFO:acro:automatic suppression: False\n"
     ]
    }
   ],
   "source": [
    "acro = ACRO(config=\"default\", suppress=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3: Loading some test data\n",
    "\n",
    "The following cells in this step just contain standard *ingestion* and *manipulation* commands to load some data into a Pandas dataframe ready to be queried.<br>\n",
    "We will use some open-source data about nursery admissions.\n",
    "\n",
    "**There is no change to your workflow here** \n",
    "- Do whatever you want in this step!  \n",
    "- We just assume you end up with your data in a pandas dataframe. \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.io.arff import loadarff\n",
    "\n",
    "##--- Manipulation  commands ---\n",
    "# specify where the  data is\n",
    "path = os.path.join(\"../data\", \"nursery.arff\")\n",
    "\n",
    "# read it in using a common dataloader\n",
    "data = loadarff(path)\n",
    "\n",
    "\n",
    "# store in a pandas dataframe with some manipulation of type variable names\n",
    "df = pd.DataFrame(data[0])\n",
    "df = df.select_dtypes([object])\n",
    "df = df.stack().str.decode(\"utf-8\").unstack()\n",
    "df.rename(columns={\"class\": \"recommendation\"}, inplace=True)\n",
    "\n",
    "\n",
    "# make the children variable numeric\n",
    "# so we can report statistics like mean etc.\n",
    "\n",
    "df[\"children\"].replace(to_replace={\"more\": \"4\"}, inplace=True)\n",
    "df[\"children\"] = pd.to_numeric(df[\"children\"])\n",
    "\n",
    "df[\"children\"] = df.apply(\n",
    "    lambda row: (\n",
    "        row[\"children\"] if row[\"children\"] in (1, 2, 3) else np.random.randint(4, 10)\n",
    "    ),\n",
    "    axis=1,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>parents</th>\n",
       "      <th>has_nurs</th>\n",
       "      <th>form</th>\n",
       "      <th>children</th>\n",
       "      <th>housing</th>\n",
       "      <th>finance</th>\n",
       "      <th>social</th>\n",
       "      <th>health</th>\n",
       "      <th>recommendation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>usual</td>\n",
       "      <td>proper</td>\n",
       "      <td>complete</td>\n",
       "      <td>1</td>\n",
       "      <td>convenient</td>\n",
       "      <td>convenient</td>\n",
       "      <td>nonprob</td>\n",
       "      <td>recommended</td>\n",
       "      <td>recommend</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>usual</td>\n",
       "      <td>proper</td>\n",
       "      <td>complete</td>\n",
       "      <td>1</td>\n",
       "      <td>convenient</td>\n",
       "      <td>convenient</td>\n",
       "      <td>nonprob</td>\n",
       "      <td>priority</td>\n",
       "      <td>priority</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>usual</td>\n",
       "      <td>proper</td>\n",
       "      <td>complete</td>\n",
       "      <td>1</td>\n",
       "      <td>convenient</td>\n",
       "      <td>convenient</td>\n",
       "      <td>nonprob</td>\n",
       "      <td>not_recom</td>\n",
       "      <td>not_recom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>usual</td>\n",
       "      <td>proper</td>\n",
       "      <td>complete</td>\n",
       "      <td>1</td>\n",
       "      <td>convenient</td>\n",
       "      <td>convenient</td>\n",
       "      <td>slightly_prob</td>\n",
       "      <td>recommended</td>\n",
       "      <td>recommend</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>usual</td>\n",
       "      <td>proper</td>\n",
       "      <td>complete</td>\n",
       "      <td>1</td>\n",
       "      <td>convenient</td>\n",
       "      <td>convenient</td>\n",
       "      <td>slightly_prob</td>\n",
       "      <td>priority</td>\n",
       "      <td>priority</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  parents has_nurs      form  children     housing     finance         social  \\\n",
       "0   usual   proper  complete         1  convenient  convenient        nonprob   \n",
       "1   usual   proper  complete         1  convenient  convenient        nonprob   \n",
       "2   usual   proper  complete         1  convenient  convenient        nonprob   \n",
       "3   usual   proper  complete         1  convenient  convenient  slightly_prob   \n",
       "4   usual   proper  complete         1  convenient  convenient  slightly_prob   \n",
       "\n",
       "        health recommendation  \n",
       "0  recommended      recommend  \n",
       "1     priority       priority  \n",
       "2    not_recom      not_recom  \n",
       "3  recommended      recommend  \n",
       "4     priority       priority  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "##--- Feedback Command ----\n",
    "# show the first 5 rows to make sure everything is how we would expect\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## C: Producing tables that are 'Safe Outputs'\n",
    "\n",
    "The easiest way to make tables in python is to use the industry-standard pandas *crosstab()* function.  \n",
    "- There are hundreds (thousands?) of web sites showing how to do this.\n",
    "- You can make (hierarchical) 2-D tables (or 1-D if you add a 'dummy' variable containing the same value for each row)\n",
    "- you can specify what the table cells contain by:\n",
    "   - providing a statistic - for example: mean, count, std deviation, median etc.(pandas calls these *aggregation functions*)\n",
    "   - specifying what variable to report on\n",
    "\n",
    "The acro version uses all the pandas code - but it adds extra code that checks for disclosure risks depending on the statistic you ask for\n",
    "\n",
    "### Example 1: A simple 2-D table of frequencies stratified by two variables\n",
    "\n",
    "Note that having imported the pandas package with the shortname `pd`(most people do)  you would normally  write\n",
    "````\n",
    "pd.crosstab(...)\n",
    "````\n",
    "so the only change is to use the prefix `acro.` rather than `pd.`\n",
    "\n",
    "_NB_: the first two parameters to crosstab() are mandatory, so you could just do `crosstab(df.recommendation,df.parents)` to save typing.\n",
    "\n",
    "Now run the next cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:acro:get_summary(): fail; threshold: 4 cells may need suppressing; \n",
      "INFO:acro:outcome_df:\n",
      "--------------------------------------------------------|\n",
      "parents        |great_pret   |pretentious  |usual       |\n",
      "recommendation |             |             |            |\n",
      "--------------------------------------------------------|\n",
      "not_recom      |          ok |          ok |          ok|\n",
      "priority       |          ok |          ok |          ok|\n",
      "recommend      | threshold;  | threshold;  | threshold; |\n",
      "spec_prior     |          ok |          ok |          ok|\n",
      "very_recom     | threshold;  |          ok |          ok|\n",
      "--------------------------------------------------------|\n",
      "\n",
      "INFO:acro:records:add(): output_0\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>parents</th>\n",
       "      <th>great_pret</th>\n",
       "      <th>pretentious</th>\n",
       "      <th>usual</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>recommendation</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>not_recom</th>\n",
       "      <td>1440</td>\n",
       "      <td>1440</td>\n",
       "      <td>1440</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>priority</th>\n",
       "      <td>858</td>\n",
       "      <td>1484</td>\n",
       "      <td>1924</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>recommend</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>spec_prior</th>\n",
       "      <td>2022</td>\n",
       "      <td>1264</td>\n",
       "      <td>758</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>very_recom</th>\n",
       "      <td>0</td>\n",
       "      <td>132</td>\n",
       "      <td>196</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "parents         great_pret  pretentious  usual\n",
       "recommendation                                \n",
       "not_recom             1440         1440   1440\n",
       "priority               858         1484   1924\n",
       "recommend                0            0      2\n",
       "spec_prior            2022         1264    758\n",
       "very_recom               0          132    196"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "acro.crosstab(index=df.recommendation, columns=df.parents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### How to understand this output\n",
    "The top part (with a pink background) is the risk analysis produced by acro. \n",
    "It is telling us that:\n",
    "- the overall summary is _fail_ because 4 cells are failing the 'minimum threshold' check\n",
    "- then it is showing which cells failed so you can choose how to respond\n",
    "- finally it is telling us that is has saved the table and risk assessment to our acro session with id \"output_0\"\n",
    "\n",
    "The part below is the normal output produced by the pandas _crosstab()_ function. \n",
    "- As this is such a small table it is not hard to spot the four problematic cells with zero or low counts\n",
    "- but of course this might be harder for a bigger table.\n",
    "\n",
    "### How to respond to this input\n",
    "There are basically three choices:\n",
    "1. We might decide these low numbers reveal something where the public interest outweighs the disclosure risk.<br>\n",
    "Rather than being a strict rules-based system, acro lets you attach an 'exception request' to an output, to send a message to the output checkers.<br>\n",
    "For example, you could type: \n",
    "````\n",
    "acro.add_exception('output_0',\"I think you should let me have this because...\")\n",
    "````\n",
    "\n",
    "2. We redesign our data so that table so that none of the cells in the resulting table represent fewer than _n_ people (10 for the default risk appetite)<br>\n",
    "   For example, we could recode _'very_recommend'_ and _'priority'_ into one label.<br>\n",
    "   But maybe it is revealing that the _'recommend'_ label is not used?\n",
    "\n",
    "3. We can redact the disclosive cells - and **acro will do this for us**.<br>\n",
    "We simply enable the option to suppress disclosive cells and re-run the query.\n",
    "\n",
    "The cell below shows option 3.\n",
    "When you run the cell below you should see that:\n",
    "- the status now changes to `review` (so the output-checker knows what has been applied)\n",
    "- the code automatically adds an exception request saying that suppression has been applied\n",
    "- and, most importantly,  the cells are redacted. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:acro:get_summary(): review; threshold: 4 cells suppressed; \n",
      "INFO:acro:outcome_df:\n",
      "--------------------------------------------------------|\n",
      "parents        |great_pret   |pretentious  |usual       |\n",
      "recommendation |             |             |            |\n",
      "--------------------------------------------------------|\n",
      "not_recom      |          ok |          ok |          ok|\n",
      "priority       |          ok |          ok |          ok|\n",
      "recommend      | threshold;  | threshold;  | threshold; |\n",
      "spec_prior     |          ok |          ok |          ok|\n",
      "very_recom     | threshold;  |          ok |          ok|\n",
      "--------------------------------------------------------|\n",
      "\n",
      "INFO:acro:records:add(): output_1\n",
      "INFO:acro:records:exception request was added to output_1\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>parents</th>\n",
       "      <th>great_pret</th>\n",
       "      <th>pretentious</th>\n",
       "      <th>usual</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>recommendation</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>not_recom</th>\n",
       "      <td>1440.0</td>\n",
       "      <td>1440.0</td>\n",
       "      <td>1440.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>priority</th>\n",
       "      <td>858.0</td>\n",
       "      <td>1484.0</td>\n",
       "      <td>1924.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>recommend</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>spec_prior</th>\n",
       "      <td>2022.0</td>\n",
       "      <td>1264.0</td>\n",
       "      <td>758.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>very_recom</th>\n",
       "      <td>NaN</td>\n",
       "      <td>132.0</td>\n",
       "      <td>196.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "parents         great_pret  pretentious   usual\n",
       "recommendation                                 \n",
       "not_recom           1440.0       1440.0  1440.0\n",
       "priority             858.0       1484.0  1924.0\n",
       "recommend              NaN          NaN     NaN\n",
       "spec_prior          2022.0       1264.0   758.0\n",
       "very_recom             NaN        132.0   196.0"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "acro.enable_suppression()\n",
    "acro.crosstab(index=df.recommendation, columns=df.parents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## An example of a  more complex table\n",
    "Just to show off the sort of tables that `crosstab()` can produce, let's make something more complex.<br>\n",
    "Going through the parameters in order:\n",
    "- passing a list of variable names to `index`  (rather than a single variable/column name) tells it we want a hierarchy within the rows.\n",
    "  - we can do the same to columns as well (or instead) if we want to   \n",
    "- setting `values=df.children`(the name of a column in the dataset) tells it we want to report something about the number of children for each sub-group (table cell)\n",
    "- setting `aggfunc=mean` tells it the statistic we want to report is the  mean number of children (which introduces additional risks of *dominance*)\n",
    "- setting `margins=True` tells it to display row and column sub-totals \n",
    "\n",
    "It's worth noting that including the totals there are  6 columns in the risk assessment and 5 in the suppressed table. <br>\n",
    "This is because after suppression has replaced numbers with `NaN`, pandas removes the fully suppressed column (_'recommend'_) from the table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:acro:get_summary(): review; threshold: 2 cells suppressed; p-ratio: 9 cells suppressed; nk-rule: 9 cells suppressed; \n",
      "INFO:acro:outcome_df:\n",
      "-----------------------------------------------------------------------------------------------------------------|\n",
      "|recommendation        | not_recom| priority recommend                      |spec_prior |very_recom          |All|\n",
      "|parents     finance   |          |                                         |           |                    |   |\n",
      "-----------------------------------------------------------------------------------------------------------------|\n",
      "|great_pret  convenient|  ok      |  ok                  p-ratio; nk-rule;  | ok        | p-ratio; nk-rule;  | ok|\n",
      "|            inconv    |  ok      |  ok                  p-ratio; nk-rule;  | ok        | p-ratio; nk-rule;  | ok|\n",
      "|pretentious convenient|  ok      |  ok                  p-ratio; nk-rule;  | ok        |                 ok | ok|\n",
      "|            inconv    |  ok      |  ok                  p-ratio; nk-rule;  | ok        |                 ok | ok|\n",
      "|usual       convenient|  ok      |  ok       threshold; p-ratio; nk-rule;  | ok        |                 ok | ok|\n",
      "|            inconv    |  ok      |  ok                  p-ratio; nk-rule;  | ok        |                 ok | ok|\n",
      "|All                   |  ok      |  ok       threshold; p-ratio; nk-rule;  | ok        |                 ok | ok|\n",
      "-----------------------------------------------------------------------------------------------------------------|\n",
      "\n",
      "INFO:acro:records:add(): output_2\n",
      "INFO:acro:records:exception request was added to output_2\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>recommendation</th>\n",
       "      <th>not_recom</th>\n",
       "      <th>priority</th>\n",
       "      <th>spec_prior</th>\n",
       "      <th>very_recom</th>\n",
       "      <th>All</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>parents</th>\n",
       "      <th>finance</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">great_pret</th>\n",
       "      <th>convenient</th>\n",
       "      <td>3.104167</td>\n",
       "      <td>2.789062</td>\n",
       "      <td>3.320043</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3.122222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>inconv</th>\n",
       "      <td>3.123611</td>\n",
       "      <td>2.401734</td>\n",
       "      <td>3.372943</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3.134259</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">pretentious</th>\n",
       "      <th>convenient</th>\n",
       "      <td>3.065278</td>\n",
       "      <td>3.058594</td>\n",
       "      <td>3.289384</td>\n",
       "      <td>2.590909</td>\n",
       "      <td>3.104167</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>inconv</th>\n",
       "      <td>3.008333</td>\n",
       "      <td>2.997207</td>\n",
       "      <td>3.345588</td>\n",
       "      <td>1.363636</td>\n",
       "      <td>3.077315</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">usual</th>\n",
       "      <th>convenient</th>\n",
       "      <td>3.134722</td>\n",
       "      <td>3.135892</td>\n",
       "      <td>3.325581</td>\n",
       "      <td>2.607692</td>\n",
       "      <td>3.133920</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>inconv</th>\n",
       "      <td>3.102778</td>\n",
       "      <td>3.075000</td>\n",
       "      <td>3.362319</td>\n",
       "      <td>1.363636</td>\n",
       "      <td>3.087037</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>All</th>\n",
       "      <th></th>\n",
       "      <td>3.089815</td>\n",
       "      <td>2.983826</td>\n",
       "      <td>3.339021</td>\n",
       "      <td>2.185976</td>\n",
       "      <td>3.109816</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "recommendation          not_recom  priority  spec_prior  very_recom       All\n",
       "parents     finance                                                          \n",
       "great_pret  convenient   3.104167  2.789062    3.320043         NaN  3.122222\n",
       "            inconv       3.123611  2.401734    3.372943         NaN  3.134259\n",
       "pretentious convenient   3.065278  3.058594    3.289384    2.590909  3.104167\n",
       "            inconv       3.008333  2.997207    3.345588    1.363636  3.077315\n",
       "usual       convenient   3.134722  3.135892    3.325581    2.607692  3.133920\n",
       "            inconv       3.102778  3.075000    3.362319    1.363636  3.087037\n",
       "All                      3.089815  2.983826    3.339021    2.185976  3.109816"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "acro.suppress = True\n",
    "acro.crosstab(\n",
    "    index=[df.parents, df.finance],\n",
    "    columns=df.recommendation,\n",
    "    values=df.children,\n",
    "    aggfunc=\"mean\",\n",
    "    margins=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## D: What other sorts of analysis does ACRO currently support?\n",
    "We are continually adding support for more types of analysis as users prioritise them.\n",
    "\n",
    "ACRO currently supports:\n",
    "- **Tables** via `acro.crosstab()` and `acro.pivot_table()`.\n",
    "   - supported aggregation functions are:  _mean_, _median_, _sum_, _std_, _count_, and _mode_.<br>\n",
    "- **Survival analysis** via: `acro.surv_function()`, `acro.survival_table()` and `acro.survival_plot()`<br>\n",
    "- **Histograms** via:`acro.hist()` <br>\n",
    "- **Regression**  via: `acro.ols()`, `acro.logit()`,`acro.probit()`\n",
    "    with options for specifying  formula in 'R-style' by adding the suffix 'r' e.g. `acro.olsr()` etc.\n",
    "\n",
    "You can get help on using any of these using the standard python `help()` syntax as shown in the next cell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on method logit in module acro.acro_regression:\n",
      "\n",
      "logit(endog, exog, missing: 'str | None' = None, check_rank: 'bool' = True) -> 'BinaryResultsWrapper' method of acro.acro.ACRO instance\n",
      "    Fits Logit model.\n",
      "\n",
      "    Parameters\n",
      "    ----------\n",
      "    endog : array_like\n",
      "        A 1-d endogenous response variable. The dependent variable.\n",
      "    exog : array_like\n",
      "        A nobs x k array where nobs is the number of observations and k is\n",
      "        the number of regressors. An intercept is not included by default\n",
      "        and should be added by the user.\n",
      "    missing : str | None\n",
      "        Available options are ‘none’, ‘drop’, and ‘raise’. If ‘none’, no\n",
      "        nan checking is done. If ‘drop’, any observations with nans are\n",
      "        dropped. If ‘raise’, an error is raised. Default is ‘none’.\n",
      "    check_rank : bool\n",
      "        Check exog rank to determine model degrees of freedom. Default is\n",
      "        True. Setting to False reduces model initialization time when\n",
      "        exog.shape[1] is large.\n",
      "\n",
      "    Returns\n",
      "    -------\n",
      "    BinaryResultsWrapper\n",
      "        Results.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(acro.logit)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## E: ACRO functionality to let users manage their outputs\n",
    "As explained above, you need to create an \"acro session\" whenever your code is run.\n",
    "\n",
    "After that, every time you run an acro `query' command both the output and the risk assessment are saved as part of the acro session.\n",
    "\n",
    "But we recognise that: \n",
    "- You may not want to request release of all your outputs - for example, the first table we produced above.\n",
    "- It is  good practice to provide a more informative name than just *output_n* for the .csv files that acro produces\n",
    "- It helps the output checker if you provide some comments saying what the outputs are.\n",
    "- You might want to add more things to the bundles of files you want to take out, such as:\n",
    "   - outputs from analyses that acro doesn't currently support\n",
    "   - your code itself (which many journals want)\n",
    "   - maybe a version of your paper in pdf/word format etc.\n",
    "\n",
    "Therefore acro provides the following commands for  'session management'\n",
    "### 1: Listing the  current contents of an  ACRO session\n",
    "This output is not beautiful (there's a GUI coming soon) but it should let you identify outputs you want to rename,comment on, or delete."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "uid: output_0\n",
      "status: fail\n",
      "type: table\n",
      "properties: {'method': 'crosstab'}\n",
      "sdc: {'summary': {'suppressed': False, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\n",
      "command: acro.crosstab(index=df.recommendation, columns=df.parents)\n",
      "summary: fail; threshold: 4 cells may need suppressing; \n",
      "outcome: parents          great_pret  pretentious        usual\n",
      "recommendation                                       \n",
      "not_recom                ok           ok           ok\n",
      "priority                 ok           ok           ok\n",
      "recommend       threshold;   threshold;   threshold; \n",
      "spec_prior               ok           ok           ok\n",
      "very_recom      threshold;            ok           ok\n",
      "output: [parents         great_pret  pretentious  usual\n",
      "recommendation                                \n",
      "not_recom             1440         1440   1440\n",
      "priority               858         1484   1924\n",
      "recommend                0            0      2\n",
      "spec_prior            2022         1264    758\n",
      "very_recom               0          132    196]\n",
      "timestamp: 2026-02-11T18:33:37.547019\n",
      "comments: []\n",
      "exception: \n",
      "\n",
      "uid: output_1\n",
      "status: review\n",
      "type: table\n",
      "properties: {'method': 'crosstab'}\n",
      "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\n",
      "command: acro.crosstab(index=df.recommendation, columns=df.parents)\n",
      "summary: review; threshold: 4 cells suppressed; \n",
      "outcome: parents          great_pret  pretentious        usual\n",
      "recommendation                                       \n",
      "not_recom                ok           ok           ok\n",
      "priority                 ok           ok           ok\n",
      "recommend       threshold;   threshold;   threshold; \n",
      "spec_prior               ok           ok           ok\n",
      "very_recom      threshold;            ok           ok\n",
      "output: [parents         great_pret  pretentious   usual\n",
      "recommendation                                 \n",
      "not_recom           1440.0       1440.0  1440.0\n",
      "priority             858.0       1484.0  1924.0\n",
      "recommend              NaN          NaN     NaN\n",
      "spec_prior          2022.0       1264.0   758.0\n",
      "very_recom             NaN        132.0   196.0]\n",
      "timestamp: 2026-02-11T18:33:37.566599\n",
      "comments: []\n",
      "exception: Suppression automatically applied where needed\n",
      "\n",
      "uid: output_2\n",
      "status: review\n",
      "type: table\n",
      "properties: {'method': 'crosstab'}\n",
      "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 2, 'p-ratio': 9, 'nk-rule': 9, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[4, 2], [6, 2]], 'p-ratio': [[0, 2], [0, 4], [1, 2], [1, 4], [2, 2], [3, 2], [4, 2], [5, 2], [6, 2]], 'nk-rule': [[0, 2], [0, 4], [1, 2], [1, 4], [2, 2], [3, 2], [4, 2], [5, 2], [6, 2]], 'all-values-are-same': []}}\n",
      "command: acro.crosstab(\n",
      "summary: review; threshold: 2 cells suppressed; p-ratio: 9 cells suppressed; nk-rule: 9 cells suppressed; \n",
      "outcome: recommendation         not_recom priority                      recommend  \\\n",
      "parents     finance                                                        \n",
      "great_pret  convenient        ok       ok             p-ratio; nk-rule;    \n",
      "            inconv            ok       ok             p-ratio; nk-rule;    \n",
      "pretentious convenient        ok       ok             p-ratio; nk-rule;    \n",
      "            inconv            ok       ok             p-ratio; nk-rule;    \n",
      "usual       convenient        ok       ok  threshold; p-ratio; nk-rule;    \n",
      "            inconv            ok       ok             p-ratio; nk-rule;    \n",
      "All                           ok       ok  threshold; p-ratio; nk-rule;    \n",
      "\n",
      "recommendation         spec_prior          very_recom All  \n",
      "parents     finance                                        \n",
      "great_pret  convenient         ok  p-ratio; nk-rule;   ok  \n",
      "            inconv             ok  p-ratio; nk-rule;   ok  \n",
      "pretentious convenient         ok                  ok  ok  \n",
      "            inconv             ok                  ok  ok  \n",
      "usual       convenient         ok                  ok  ok  \n",
      "            inconv             ok                  ok  ok  \n",
      "All                            ok                  ok  ok  \n",
      "output: [recommendation          not_recom  priority  spec_prior  very_recom       All\n",
      "parents     finance                                                          \n",
      "great_pret  convenient   3.104167  2.789062    3.320043         NaN  3.122222\n",
      "            inconv       3.123611  2.401734    3.372943         NaN  3.134259\n",
      "pretentious convenient   3.065278  3.058594    3.289384    2.590909  3.104167\n",
      "            inconv       3.008333  2.997207    3.345588    1.363636  3.077315\n",
      "usual       convenient   3.134722  3.135892    3.325581    2.607692  3.133920\n",
      "            inconv       3.102778  3.075000    3.362319    1.363636  3.087037\n",
      "All                      3.089815  2.983826    3.339021    2.185976  3.109816]\n",
      "timestamp: 2026-02-11T18:33:37.754124\n",
      "comments: []\n",
      "exception: Suppression automatically applied where needed\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "_ = acro.print_outputs()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2: Remove some ACRO outputs before finalising \n",
    "At the start of this demo we made a disclosive output -it;s the first one with status _fail_.\n",
    "\n",
    "We don't want to waste the output checker's time so lets remove it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:acro:records:remove(): output_0 removed\n"
     ]
    }
   ],
   "source": [
    "acro.remove_output(\"output_0\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3: Rename ACRO outputs before finalising\n",
    "This is an example of renaming the outputs to provide  more descriptive names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:acro:records:rename_output(): output_1 renamed to  crosstab_recommendation_vs_parents\n",
      "INFO:acro:records:rename_output(): output_2 renamed to mean_children_by_parents_finance_recommendation\n"
     ]
    }
   ],
   "source": [
    "acro.rename_output(\"output_1\", \" crosstab_recommendation_vs_parents\")\n",
    "acro.rename_output(\"output_2\", \"mean_children_by_parents_finance_recommendation\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4: Add a comment to output\n",
    "This is an example of adding a comment to outputs.  \n",
    "It can be used to provide a description or to pass additional information to the TRE staff.<br>\n",
    "They will see it alongside your file in the output checking viewer - rather than having it in an email somewhere."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:acro:records:a comment was added to mean_children_by_parents_finance_recommendation\n"
     ]
    }
   ],
   "source": [
    "acro.add_comments(\n",
    "    \"mean_children_by_parents_finance_recommendation\",\n",
    "    \"too few cases of recommend to report\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Request an exception\n",
    "An example of providing a reason why an exception should be made\n",
    "````\n",
    "acro.add_exception(\"output_n\", \"This is evidence of systematic bias?\")\n",
    "````"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6: Adding a custom output.\n",
    "\n",
    "As mentioned above you might want to request release of all sorts of things\n",
    "- including your code,\n",
    "- or outputs from analyses *acro* doesn't support (yet)\n",
    "\n",
    "In ACRO we can add a file to our session with a comment describing what it is"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:acro:records:add_custom(): output_3\n"
     ]
    }
   ],
   "source": [
    "acro.custom_output(\"acro_demo_2026.py\", \"This is the code that produced this session\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## F: Finishing your session and producing a folder of files to release.\n",
    "This is an example of the function _finalise()_ which the users must call at the end of each session.  \n",
    "- It takes each output and saves it to a CSV file (or the original file type for custom outputs)\n",
    "- It also saves the SDC analysis for each output to a json file.\n",
    "- It adds checksums for everything - so we know they've not been edited.\n",
    "- It puts them all in a folder with the name you supply.\n",
    "\n",
    "**ACRO will not overwrite previous sessions**\n",
    "  \n",
    "So every time you call finalise on a session you need to either:\n",
    "  - delete the previous folder, or\n",
    "  - provide a new folder name  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:acro:records:outputs written to: my_acro_outputs_v1\n"
     ]
    }
   ],
   "source": [
    "output = acro.finalise(\"my_acro_outputs_v1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## G: Reminder about getting help while you work\n",
    "\n",
    "- if you remember the name of the command and want an explanation or to explain the syntax <br>\n",
    "from the python prompt type: ` help(acro.command_name)`\n",
    "\n",
    "\n",
    "- if you can't remember the name of the command, from the python prompt type: `help(acro.ACRO)`\n",
    "  - not as user friendly but will list all the available commands "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "sacro2025",
   "language": "python",
   "name": "sacro2025"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}