{ "cells": [ { "cell_type": "markdown", "id": "00cac1f9", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# ACRO Demonstration" ] }, { "cell_type": "code", "execution_count": 1, "id": "e33fd4fb", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import os\n", "\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "c01cfe12", "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# uncomment this line if acro is not installed\n", "# ie you are in development mode\n", "# sys.path.insert(0, os.path.abspath(\"..\"))" ] }, { "cell_type": "code", "execution_count": 3, "id": "cc8d993a", "metadata": { "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "from acro import ACRO, add_constant, add_to_acro" ] }, { "cell_type": "markdown", "id": "530efcfe", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Instantiate ACRO" ] }, { "cell_type": "code", "execution_count": 4, "id": "4b8a77e2", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:version: 0.4.8\n", "INFO:acro:config: {'safe_threshold': 10, 'safe_dof_threshold': 10, 'safe_nk_n': 2, 'safe_nk_k': 0.9, 'safe_pratio_p': 0.1, 'check_missing_values': False, 'survival_safe_threshold': 10, 'zeros_are_disclosive': True}\n", "INFO:acro:automatic suppression: True\n" ] } ], "source": [ "acro = ACRO(suppress=True)" ] }, { "cell_type": "markdown", "id": "27a2baaa", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Load test data\n", "The dataset used in this notebook is the nursery dataset from OpenML. \n", "- The dataset can be read directly from OpenML using the code commented in the next cell.\n", "- In this version, it can be read directly from the local machine if it has been downloaded. \n", "- The code below reads the data from a folder called \"data\" which we assume is at the same level as the folder where you are working.\n", "- The path might need to be changed if the data has been downloaded and stored elsewhere.\n", " - for example use: \n", " path = os.path.join(\"data\", \"nursery.arff\") \n", " if the data is in a sub-folder of your work folder" ] }, { "cell_type": "code", "execution_count": 5, "id": "ac790b2b-b02f-49f7-8237-a033abed6e87", "metadata": { "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
parentshas_nursformchildrenhousingfinancesocialhealthrecommend
0usualpropercomplete1convenientconvenientnonprobrecommendedrecommend
1usualpropercomplete1convenientconvenientnonprobprioritypriority
2usualpropercomplete1convenientconvenientnonprobnot_recomnot_recom
3usualpropercomplete1convenientconvenientslightly_probrecommendedrecommend
4usualpropercomplete1convenientconvenientslightly_probprioritypriority
\n", "
" ], "text/plain": [ " parents has_nurs form children housing finance social \\\n", "0 usual proper complete 1 convenient convenient nonprob \n", "1 usual proper complete 1 convenient convenient nonprob \n", "2 usual proper complete 1 convenient convenient nonprob \n", "3 usual proper complete 1 convenient convenient slightly_prob \n", "4 usual proper complete 1 convenient convenient slightly_prob \n", "\n", " health recommend \n", "0 recommended recommend \n", "1 priority priority \n", "2 not_recom not_recom \n", "3 recommended recommend \n", "4 priority priority " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from scipy.io.arff import loadarff\n", "\n", "path = os.path.join(\"../data\", \"nursery.arff\")\n", "data = loadarff(path)\n", "df = pd.DataFrame(data[0])\n", "df = df.select_dtypes([object])\n", "df = df.stack().str.decode(\"utf-8\").unstack()\n", "df.rename(columns={\"class\": \"recommend\"}, inplace=True)\n", "df.head()" ] }, { "cell_type": "markdown", "id": "ea2a0d76-ba68-4a74-93c3-bdaa88c48929", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Convert 'more than 3' children to random between 4 and 10\n", "Change the children column from categorical to numeric in order to be able to test some of the ACRO functions that require a numeric feature" ] }, { "cell_type": "code", "execution_count": 6, "id": "b43810a8-4da9-4cec-a613-e2562ed95601", "metadata": { "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " children column entries in raw file ['1' '2' '3' 'more']\n" ] } ], "source": [ "print(f\" children column entries in raw file {df.children.unique()}\")" ] }, { "cell_type": "code", "execution_count": 7, "id": "042f8e9d-b33b-4daf-851e-9f70b5d4859a", "metadata": { "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "df[\"children\"].replace(to_replace={\"more\": \"4\"}, inplace=True)\n", "df[\"children\"] = pd.to_numeric(df[\"children\"])\n", "\n", "df[\"children\"] = df.apply(\n", " lambda row: (\n", " row[\"children\"] if row[\"children\"] in (1, 2, 3) else np.random.randint(4, 10)\n", " ),\n", " axis=1,\n", ")" ] }, { "cell_type": "markdown", "id": "d098c704", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Examples of producing tabular output\n", "We rely on the industry-standard package **pandas** for tabulating data. \n", "In the next few examples we show:\n", "- first, how a researcher would normally make a call in pandas, saving the results in a variable that they can view on screen (or save to file?)\n", "- then how the call is identical in SACRO, except that:\n", " - \"pd\" is replaced by \"acro\"\n", " - the researcher immediately sees a copy of what the TRE output checker will see.\n", " " ] }, { "cell_type": "markdown", "id": "4ae844a0", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Pandas crosstab\n", "This is an example of crosstab using pandas. \n", "We first make the call, then the second line print the outputs to screen." ] }, { "cell_type": "code", "execution_count": 8, "id": "961684cb", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "parents great_pret pretentious usual\n", "recommend \n", "not_recom 1440 1440 1440\n", "priority 858 1484 1924\n", "recommend 0 0 2\n", "spec_prior 2022 1264 758\n", "very_recom 0 132 196\n" ] } ], "source": [ "table = pd.crosstab(df.recommend, df.parents)\n", "print(table)" ] }, { "cell_type": "markdown", "id": "d642ed00", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### ACRO crosstab\n", "This is an example of crosstab using ACRO. \n", "The INFO lines show the researcher what will be reported to the output checkers.\n", "Then the (suppressed as necessary) table is shown via. the print command as before." ] }, { "cell_type": "code", "execution_count": 9, "id": "bb4b2677", "metadata": { "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:get_summary(): fail; threshold: 4 cells suppressed; \n", "INFO:acro:outcome_df:\n", "----------------------------------------------------|\n", "parents |great_pret |pretentious |usual |\n", "recommend | | | |\n", "----------------------------------------------------|\n", "not_recom | ok | ok | ok|\n", "priority | ok | ok | ok|\n", "recommend | threshold; | threshold; | threshold; |\n", "spec_prior | ok | ok | ok|\n", "very_recom | threshold; | ok | ok|\n", "----------------------------------------------------|\n", "\n", "INFO:acro:records:add(): output_0\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "parents great_pret pretentious usual\n", "recommend \n", "not_recom 1440.0 1440.0 1440.0\n", "priority 858.0 1484.0 1924.0\n", "recommend NaN NaN NaN\n", "spec_prior 2022.0 1264.0 758.0\n", "very_recom NaN 132.0 196.0\n" ] } ], "source": [ "safe_table = acro.crosstab(\n", " df.recommend,\n", " df.parents,\n", ")\n", "print(safe_table)" ] }, { "cell_type": "markdown", "id": "b3d09450", "metadata": {}, "source": [ "### ACRO crosstab with totals\n", "This is an example of crosstab with totals columns and suppression. \n", "Note that when margins is true any row or column where all the cells are discolsive is deleted. If you wish to see such row or column set show_suppressed to True. \n", "show_suppressed parameter does not work with herichical tables and when the aggregation function is the standard deviation." ] }, { "cell_type": "code", "execution_count": 10, "id": "42825f24", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:get_summary(): fail; threshold: 5 cells suppressed; \n", "INFO:acro:outcome_df:\n", "------------------------------------------------------------------|\n", "parents |great_pret |pretentious |usual |All |\n", "recommend | | | | |\n", "------------------------------------------------------------------|\n", "not_recom | ok | ok | ok | ok|\n", "priority | ok | ok | ok | ok|\n", "recommend | threshold; | threshold; | threshold; | threshold; |\n", "spec_prior | ok | ok | ok | ok|\n", "very_recom | threshold; | ok | ok | ok|\n", "All | ok | ok | ok | ok|\n", "------------------------------------------------------------------|\n", "\n", "INFO:acro:records:add(): output_1\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "parents great_pret pretentious usual All\n", "recommend \n", "not_recom 1440.0 1440 1440 4320\n", "priority 858.0 1484 1924 4266\n", "spec_prior 2022.0 1264 758 4044\n", "very_recom NaN 132 196 328\n", "All 4320.0 4320 4318 12958\n" ] } ], "source": [ "safe_table = acro.crosstab(df.recommend, df.parents, margins=True)\n", "print(safe_table)" ] }, { "cell_type": "markdown", "id": "b3e48dec", "metadata": {}, "source": [ "### ACRO crosstab without suppression\n", "This is an example of crosstab without suppressing the cells that violate the disclosure tests. \n", "Note that you need to change the value of the suppress variable in the acro object to False. Then run the crosstab command. \n", "If you wish to continue the research while suppressing the outputs, turn on the suppress variable otherwise leave it as it is." ] }, { "cell_type": "code", "execution_count": 11, "id": "27329e57", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:get_summary(): fail; threshold: 4 cells may need suppressing; \n", "INFO:acro:outcome_df:\n", "----------------------------------------------------|\n", "parents |great_pret |pretentious |usual |\n", "recommend | | | |\n", "----------------------------------------------------|\n", "not_recom | ok | ok | ok|\n", "priority | ok | ok | ok|\n", "recommend | threshold; | threshold; | threshold; |\n", "spec_prior | ok | ok | ok|\n", "very_recom | threshold; | ok | ok|\n", "----------------------------------------------------|\n", "\n", "INFO:acro:records:add(): output_2\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "parents great_pret pretentious usual\n", "recommend \n", "not_recom 1440 1440 1440\n", "priority 858 1484 1924\n", "recommend 0 0 2\n", "spec_prior 2022 1264 758\n", "very_recom 0 132 196\n" ] } ], "source": [ "acro.suppress = False\n", "\n", "safe_table = acro.crosstab(df.recommend, df.parents)\n", "print(safe_table)" ] }, { "cell_type": "code", "execution_count": 12, "id": "ed5134ab", "metadata": {}, "outputs": [], "source": [ "acro.suppress = True" ] }, { "cell_type": "markdown", "id": "8b603548", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### ACRO crosstab with aggregation function" ] }, { "cell_type": "code", "execution_count": 13, "id": "298d2b40", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:get_summary(): fail; threshold: 1 cells suppressed; p-ratio: 4 cells suppressed; nk-rule: 4 cells suppressed; \n", "INFO:acro:outcome_df:\n", "------------------------------------------------------------------------------------|\n", "parents |great_pret |pretentious |usual |\n", "recommend | | | |\n", "------------------------------------------------------------------------------------|\n", "not_recom | ok | ok | ok|\n", "priority | ok | ok | ok|\n", "recommend | p-ratio; nk-rule; | p-ratio; nk-rule; | threshold; p-ratio; nk-rule; |\n", "spec_prior | ok | ok | ok|\n", "very_recom | p-ratio; nk-rule; | ok | ok|\n", "------------------------------------------------------------------------------------|\n", "\n", "INFO:acro:records:add(): output_3\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "parents great_pret pretentious usual\n", "recommend \n", "not_recom 1440.0 1440.0 1440.0\n", "priority 858.0 1484.0 1924.0\n", "recommend NaN NaN NaN\n", "spec_prior 2022.0 1264.0 758.0\n", "very_recom NaN 132.0 196.0\n" ] } ], "source": [ "safe_table = acro.crosstab(\n", " df.recommend, df.parents, values=df.children, aggfunc=\"count\"\n", ")\n", "print(safe_table)" ] }, { "cell_type": "code", "execution_count": 14, "id": "b4aea046", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:get_summary(): fail; threshold: 2 cells suppressed; p-ratio: 8 cells suppressed; nk-rule: 8 cells suppressed; \n", "INFO:acro:outcome_df:\n", "---------------------------------------------------------------------------------------------------------------------------------------------------------|\n", " mode_aggfunc |mean |\n", "parents great_pret pretentious usual |great_pret pretentious usual |\n", "recommend | |\n", "---------------------------------------------------------------------------------------------------------------------------------------------------------|\n", "not_recom ok ok ok | ok ok ok|\n", "priority ok ok ok | ok ok ok|\n", "recommend p-ratio; nk-rule; p-ratio; nk-rule; threshold; p-ratio; nk-rule; | p-ratio; nk-rule; p-ratio; nk-rule; threshold; p-ratio; nk-rule; |\n", "spec_prior ok ok ok | ok ok ok|\n", "very_recom p-ratio; nk-rule; ok ok | p-ratio; nk-rule; ok ok|\n", "---------------------------------------------------------------------------------------------------------------------------------------------------------|\n", "\n", "INFO:acro:records:add(): output_4\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " mode_aggfunc mean \n", "parents great_pret pretentious usual great_pret pretentious usual\n", "recommend \n", "not_recom 2.0 1.0 1.0 3.125694 3.105556 3.074306\n", "priority 1.0 1.0 1.0 2.665501 3.030323 3.116944\n", "recommend NaN NaN NaN NaN NaN NaN\n", "spec_prior 3.0 3.0 3.0 3.353610 3.370253 3.393140\n", "very_recom NaN 1.0 1.0 NaN 2.204545 2.244898\n" ] } ], "source": [ "safe_table = acro.crosstab(\n", " df.recommend, df.parents, values=df.children, aggfunc=[\"mode\", \"mean\"]\n", ")\n", "print(safe_table)" ] }, { "cell_type": "markdown", "id": "d66e565b", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### ACRO pivot_table\n", "This is an example of pivot table using ACRO. \n", "- Some researchers may prefer this to using crosstab. \n", "- Again the call syntax is identical to the pandas \"pd.pivot_table\"\n", "- in this case the output is non-disclosive" ] }, { "cell_type": "code", "execution_count": 15, "id": "966c1a9b", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:get_summary(): pass\n", "INFO:acro:outcome_df:\n", "------------------------------|\n", " mean |std |\n", " children |children|\n", "parents | |\n", "------------------------------|\n", "great_pret ok | ok |\n", "pretentious ok | ok |\n", "usual ok | ok |\n", "------------------------------|\n", "\n", "INFO:acro:records:add(): output_5\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " mean std\n", " children children\n", "parents \n", "great_pret 3.140972 2.270396\n", "pretentious 3.129630 2.250436\n", "usual 3.110648 2.213072\n" ] } ], "source": [ "table = acro.pivot_table(\n", " df, index=[\"parents\"], values=[\"children\"], aggfunc=[\"mean\", \"std\"]\n", ")\n", "print(table)" ] }, { "cell_type": "markdown", "id": "4adc374b", "metadata": {}, "source": [ "### ACRO pivot_table with margins" ] }, { "cell_type": "code", "execution_count": 16, "id": "9c024b65", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:get_summary(): fail; threshold: 5 cells suppressed; p-ratio: 5 cells suppressed; nk-rule: 5 cells suppressed; \n", "INFO:acro:outcome_df:\n", "-----------------------------------------------------------------------------------------------------------|\n", " children |\n", "recommend not_recom priority recommend spec_prior very_recom All|\n", "parents |\n", "-----------------------------------------------------------------------------------------------------------|\n", "great_pret ok ok threshold; p-ratio; nk-rule; ok threshold; p-ratio; nk-rule; ok|\n", "pretentious ok ok threshold; p-ratio; nk-rule; ok ok ok|\n", "usual ok ok threshold; p-ratio; nk-rule; ok ok ok|\n", "All ok ok threshold; p-ratio; nk-rule; ok ok ok|\n", "-----------------------------------------------------------------------------------------------------------|\n", "\n", "INFO:acro:Disclosive cells were deleted from the dataframe before calculating the pivot table\n", "INFO:acro:records:add(): output_6\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " children \n", "recommend not_recom priority spec_prior very_recom All\n", "parents \n", "great_pret 3.125694 2.665501 3.353610 NaN 3.140972\n", "pretentious 3.105556 3.030323 3.370253 2.204545 3.129630\n", "usual 3.074306 3.116944 3.393140 2.244898 3.111626\n", "All 3.101852 2.996015 3.366222 2.228659 3.127412\n" ] } ], "source": [ "safe_table = acro.pivot_table(\n", " df, columns=[\"recommend\"], index=[\"parents\"], values=[\"children\"], margins=True\n", ")\n", "print(safe_table)" ] }, { "cell_type": "markdown", "id": "8446fa99-c073-48b8-875e-700dfa17ea0c", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Regression examples using ACRO\n", "\n", "Again there is an industry-standard package in python, this time called **statsmodels**.\n", "- The examples below illustrate the use of the ACRO wrapper standard statsmodel functions\n", "- Note that statsmodels can be called using an 'R-like' format (using an 'r' suffix on the command names)\n", "- most statsmodels functiobns return a \"results object\" which has a \"summary\" function that produces printable/saveable outputs \n", "\n", "### Start by manipulating the nursery data to get two numeric variables\n", "- The 'recommend' column is converted to an integer scale" ] }, { "cell_type": "code", "execution_count": 17, "id": "72aefb22", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "df[\"recommend\"].replace(\n", " to_replace={\n", " \"not_recom\": \"0\",\n", " \"recommend\": \"1\",\n", " \"very_recom\": \"2\",\n", " \"priority\": \"3\",\n", " \"spec_prior\": \"4\",\n", " },\n", " inplace=True,\n", ")\n", "df[\"recommend\"] = pd.to_numeric(df[\"recommend\"])\n", "\n", "new_df = df[[\"recommend\", \"children\"]]\n", "new_df = new_df.dropna()" ] }, { "cell_type": "markdown", "id": "3ef880e6-726f-4a0b-9bcd-da8861dbd5a7", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### ACRO OLS \n", "This is an example of ordinary least square regression using ACRO. \n", "- Above recommend column was converted form categorical to numeric. \n", "- Now we perform a the linear regression between recommend and children. \n", "- This version includes a constant (intercept)\n", "- This is just to show how the regression is done using ACRO. \n", "- **No correlation is expected to be seen by using these variables**" ] }, { "cell_type": "code", "execution_count": 18, "id": "2f462e42", "metadata": { "scrolled": true, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:ols() outcome: pass; dof=12958.0 >= 10\n", "INFO:acro:records:add(): output_7\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: recommend R-squared: 0.001
Model: OLS Adj. R-squared: 0.001
Method: Least Squares F-statistic: 13.83
Date: Thu, 06 Mar 2025 Prob (F-statistic): 0.000201
Time: 19:39:47 Log-Likelihood: -25121.
No. Observations: 12960 AIC: 5.025e+04
Df Residuals: 12958 BIC: 5.026e+04
Df Model: 1
Covariance Type: nonrobust
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
const 2.2099 0.025 87.263 0.000 2.160 2.260
children 0.0245 0.007 3.718 0.000 0.012 0.037
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 77090.215 Durbin-Watson: 2.883
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1741.570
Skew: -0.486 Prob(JB): 0.00
Kurtosis: 1.489 Cond. No. 6.90


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/latex": [ "\\begin{center}\n", "\\begin{tabular}{lclc}\n", "\\toprule\n", "\\textbf{Dep. Variable:} & recommend & \\textbf{ R-squared: } & 0.001 \\\\\n", "\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.001 \\\\\n", "\\textbf{Method:} & Least Squares & \\textbf{ F-statistic: } & 13.83 \\\\\n", "\\textbf{Date:} & Thu, 06 Mar 2025 & \\textbf{ Prob (F-statistic):} & 0.000201 \\\\\n", "\\textbf{Time:} & 19:39:47 & \\textbf{ Log-Likelihood: } & -25121. \\\\\n", "\\textbf{No. Observations:} & 12960 & \\textbf{ AIC: } & 5.025e+04 \\\\\n", "\\textbf{Df Residuals:} & 12958 & \\textbf{ BIC: } & 5.026e+04 \\\\\n", "\\textbf{Df Model:} & 1 & \\textbf{ } & \\\\\n", "\\textbf{Covariance Type:} & nonrobust & \\textbf{ } & \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "\\begin{tabular}{lcccccc}\n", " & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n", "\\midrule\n", "\\textbf{const} & 2.2099 & 0.025 & 87.263 & 0.000 & 2.160 & 2.260 \\\\\n", "\\textbf{children} & 0.0245 & 0.007 & 3.718 & 0.000 & 0.012 & 0.037 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "\\begin{tabular}{lclc}\n", "\\textbf{Omnibus:} & 77090.215 & \\textbf{ Durbin-Watson: } & 2.883 \\\\\n", "\\textbf{Prob(Omnibus):} & 0.000 & \\textbf{ Jarque-Bera (JB): } & 1741.570 \\\\\n", "\\textbf{Skew:} & -0.486 & \\textbf{ Prob(JB): } & 0.00 \\\\\n", "\\textbf{Kurtosis:} & 1.489 & \\textbf{ Cond. No. } & 6.90 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "%\\caption{OLS Regression Results}\n", "\\end{center}\n", "\n", "Notes: \\newline\n", " [1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: recommend R-squared: 0.001\n", "Model: OLS Adj. R-squared: 0.001\n", "Method: Least Squares F-statistic: 13.83\n", "Date: Thu, 06 Mar 2025 Prob (F-statistic): 0.000201\n", "Time: 19:39:47 Log-Likelihood: -25121.\n", "No. Observations: 12960 AIC: 5.025e+04\n", "Df Residuals: 12958 BIC: 5.026e+04\n", "Df Model: 1 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const 2.2099 0.025 87.263 0.000 2.160 2.260\n", "children 0.0245 0.007 3.718 0.000 0.012 0.037\n", "==============================================================================\n", "Omnibus: 77090.215 Durbin-Watson: 2.883\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 1741.570\n", "Skew: -0.486 Prob(JB): 0.00\n", "Kurtosis: 1.489 Cond. No. 6.90\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "\"\"\"" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
OLS Regression Results
Dep. Variable: recommend R-squared: 0.000
Model: OLS Adj. R-squared: 0.000
Method: Least Squares F-statistic: 6.417
Date: Mon, 04 Mar 2024 Prob (F-statistic): 0.0113
Time: 21:21:09 Log-Likelihood: -25124.
No. Observations: 12960 AIC: 5.025e+04
Df Residuals: 12958 BIC: 5.027e+04
Df Model: 1
Covariance Type: nonrobust
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err t P>|t| [0.025 0.975]
const 2.2341 0.025 87.965 0.000 2.184 2.284
children 0.0168 0.007 2.533 0.011 0.004 0.030
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Omnibus: 76735.931 Durbin-Watson: 2.883
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1742.843
Skew: -0.485 Prob(JB): 0.00
Kurtosis: 1.487 Cond. No. 6.89


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified." ], "text/plain": [ "\n", "\"\"\"\n", " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: recommend R-squared: 0.000\n", "Model: OLS Adj. R-squared: 0.000\n", "Method: Least Squares F-statistic: 6.417\n", "Date: Mon, 04 Mar 2024 Prob (F-statistic): 0.0113\n", "Time: 21:21:09 Log-Likelihood: -25124.\n", "No. Observations: 12960 AIC: 5.025e+04\n", "Df Residuals: 12958 BIC: 5.027e+04\n", "Df Model: 1 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const 2.2341 0.025 87.965 0.000 2.184 2.284\n", "children 0.0168 0.007 2.533 0.011 0.004 0.030\n", "==============================================================================\n", "Omnibus: 76735.931 Durbin-Watson: 2.883\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 1742.843\n", "Skew: -0.485 Prob(JB): 0.00\n", "Kurtosis: 1.487 Cond. No. 6.89\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n", "\"\"\"" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = new_df[\"recommend\"]\n", "x = new_df[\"children\"]\n", "x = add_constant(x)\n", "\n", "results = acro.ols(y, x)\n", "results.summary()" ] }, { "cell_type": "markdown", "id": "0c826271", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### ACRO OLSR\n", "This is an example of ordinary least squares regression using the 'R-like' statsmodels api, i.e. from a formula and dataframe using ACRO " ] }, { "cell_type": "code", "execution_count": 19, "id": "cc90f7c9", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:olsr() outcome: pass; dof=12958.0 >= 10\n", "INFO:acro:records:add(): output_8\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: recommend R-squared: 0.001\n", "Model: OLS Adj. R-squared: 0.001\n", "Method: Least Squares F-statistic: 13.83\n", "Date: Thu, 06 Mar 2025 Prob (F-statistic): 0.000201\n", "Time: 19:39:47 Log-Likelihood: -25121.\n", "No. Observations: 12960 AIC: 5.025e+04\n", "Df Residuals: 12958 BIC: 5.026e+04\n", "Df Model: 1 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Intercept 2.2099 0.025 87.263 0.000 2.160 2.260\n", "children 0.0245 0.007 3.718 0.000 0.012 0.037\n", "==============================================================================\n", "Omnibus: 77090.215 Durbin-Watson: 2.883\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 1741.570\n", "Skew: -0.486 Prob(JB): 0.00\n", "Kurtosis: 1.489 Cond. No. 6.90\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " OLS Regression Results \n", "==============================================================================\n", "Dep. Variable: recommend R-squared: 0.000\n", "Model: OLS Adj. R-squared: 0.000\n", "Method: Least Squares F-statistic: 6.417\n", "Date: Mon, 04 Mar 2024 Prob (F-statistic): 0.0113\n", "Time: 21:21:09 Log-Likelihood: -25124.\n", "No. Observations: 12960 AIC: 5.025e+04\n", "Df Residuals: 12958 BIC: 5.027e+04\n", "Df Model: 1 \n", "Covariance Type: nonrobust \n", "==============================================================================\n", " coef std err t P>|t| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "Intercept 2.2341 0.025 87.965 0.000 2.184 2.284\n", "children 0.0168 0.007 2.533 0.011 0.004 0.030\n", "==============================================================================\n", "Omnibus: 76735.931 Durbin-Watson: 2.883\n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 1742.843\n", "Skew: -0.485 Prob(JB): 0.00\n", "Kurtosis: 1.487 Cond. No. 6.89\n", "==============================================================================\n", "\n", "Notes:\n", "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n" ] } ], "source": [ "results = acro.olsr(formula=\"recommend ~ children\", data=new_df)\n", "print(results.summary())" ] }, { "cell_type": "markdown", "id": "2816eac7", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### ACRO Probit\n", "This is an example of probit regression using ACRO \n", "We use a different combination of variables from the original dataset.\n", "\n", "Again, we support the use of R-like formulas - because we support R " ] }, { "cell_type": "code", "execution_count": 20, "id": "5b1a1611", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:probit() outcome: pass; dof=12958.0 >= 10\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.693145\n", " Iterations 2\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:add(): output_9\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " Probit Regression Results \n", "==============================================================================\n", "Dep. Variable: finance No. Observations: 12960\n", "Model: Probit Df Residuals: 12958\n", "Method: MLE Df Model: 1\n", "Date: Thu, 06 Mar 2025 Pseudo R-squ.: 3.602e-06\n", "Time: 19:39:47 Log-Likelihood: -8983.2\n", "converged: True LL-Null: -8983.2\n", "Covariance Type: nonrobust LLR p-value: 0.7992\n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const -0.0039 0.019 -0.207 0.836 -0.041 0.033\n", "children 0.0012 0.005 0.254 0.799 -0.008 0.011\n", "==============================================================================\n" ] } ], "source": [ "new_df = df[[\"finance\", \"children\"]]\n", "new_df = new_df.dropna()\n", "\n", "y = new_df[\"finance\"].astype(\"category\").cat.codes # numeric\n", "y.name = \"finance\"\n", "x = new_df[\"children\"]\n", "x = add_constant(x)\n", "\n", "results = acro.probit(y, x)\n", "print(results.summary())" ] }, { "cell_type": "markdown", "id": "f38b4334", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### ACRO Logit\n", "This is an example of logistic regression using ACRO using the statmodels function" ] }, { "cell_type": "code", "execution_count": 21, "id": "dcf30f8f", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:logit() outcome: pass; dof=12958.0 >= 10\n", "INFO:acro:records:add(): output_10\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.693145\n", " Iterations 3\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: finance No. Observations: 12960
Model: Logit Df Residuals: 12958
Method: MLE Df Model: 1
Date: Thu, 06 Mar 2025 Pseudo R-squ.: 3.602e-06
Time: 19:39:47 Log-Likelihood: -8983.2
converged: True LL-Null: -8983.2
Covariance Type: nonrobust LLR p-value: 0.7992
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [0.025 0.975]
const -0.0062 0.030 -0.207 0.836 -0.065 0.053
children 0.0020 0.008 0.254 0.799 -0.013 0.017
" ], "text/latex": [ "\\begin{center}\n", "\\begin{tabular}{lclc}\n", "\\toprule\n", "\\textbf{Dep. Variable:} & finance & \\textbf{ No. Observations: } & 12960 \\\\\n", "\\textbf{Model:} & Logit & \\textbf{ Df Residuals: } & 12958 \\\\\n", "\\textbf{Method:} & MLE & \\textbf{ Df Model: } & 1 \\\\\n", "\\textbf{Date:} & Thu, 06 Mar 2025 & \\textbf{ Pseudo R-squ.: } & 3.602e-06 \\\\\n", "\\textbf{Time:} & 19:39:47 & \\textbf{ Log-Likelihood: } & -8983.2 \\\\\n", "\\textbf{converged:} & True & \\textbf{ LL-Null: } & -8983.2 \\\\\n", "\\textbf{Covariance Type:} & nonrobust & \\textbf{ LLR p-value: } & 0.7992 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "\\begin{tabular}{lcccccc}\n", " & \\textbf{coef} & \\textbf{std err} & \\textbf{z} & \\textbf{P$> |$z$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n", "\\midrule\n", "\\textbf{const} & -0.0062 & 0.030 & -0.207 & 0.836 & -0.065 & 0.053 \\\\\n", "\\textbf{children} & 0.0020 & 0.008 & 0.254 & 0.799 & -0.013 & 0.017 \\\\\n", "\\bottomrule\n", "\\end{tabular}\n", "%\\caption{Logit Regression Results}\n", "\\end{center}" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: finance No. Observations: 12960\n", "Model: Logit Df Residuals: 12958\n", "Method: MLE Df Model: 1\n", "Date: Thu, 06 Mar 2025 Pseudo R-squ.: 3.602e-06\n", "Time: 19:39:47 Log-Likelihood: -8983.2\n", "converged: True LL-Null: -8983.2\n", "Covariance Type: nonrobust LLR p-value: 0.7992\n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const -0.0062 0.030 -0.207 0.836 -0.065 0.053\n", "children 0.0020 0.008 0.254 0.799 -0.013 0.017\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: finance No. Observations: 12960
Model: Logit Df Residuals: 12958
Method: MLE Df Model: 1
Date: Mon, 04 Mar 2024 Pseudo R-squ.: 1.186e-06
Time: 21:21:09 Log-Likelihood: -8983.2
converged: True LL-Null: -8983.2
Covariance Type: nonrobust LLR p-value: 0.8839
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [0.025 0.975]
const -0.0036 0.030 -0.119 0.905 -0.063 0.056
children 0.0012 0.008 0.146 0.884 -0.014 0.017
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: finance No. Observations: 12960\n", "Model: Logit Df Residuals: 12958\n", "Method: MLE Df Model: 1\n", "Date: Mon, 04 Mar 2024 Pseudo R-squ.: 1.186e-06\n", "Time: 21:21:09 Log-Likelihood: -8983.2\n", "converged: True LL-Null: -8983.2\n", "Covariance Type: nonrobust LLR p-value: 0.8839\n", "==============================================================================\n", " coef std err z P>|z| [0.025 0.975]\n", "------------------------------------------------------------------------------\n", "const -0.0036 0.030 -0.119 0.905 -0.063 0.056\n", "children 0.0012 0.008 0.146 0.884 -0.014 0.017\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results = acro.logit(y, x)\n", "results.summary()" ] }, { "cell_type": "markdown", "id": "a962ff17", "metadata": {}, "source": [ "### ACRO survival analysis\n", "This is an example of survival tables and plots using ACRO. \n", "- A dataset from statsmodels is used for the survival analysis.\n", "- A subset of tha dataset is used in this example to demostrate the survival analysis.\n", "- The output parameter in the surv_func define the type of output (table or plot)." ] }, { "cell_type": "code", "execution_count": 22, "id": "dcf2c86d", "metadata": {}, "outputs": [], "source": [ "import statsmodels.api as sm\n", "\n", "data = sm.datasets.get_rdataset(\"flchain\", \"survival\").data\n", "data = data.loc[data.sex == \"F\", :]\n", "data = data.iloc[:20, :]\n", "# data.head()" ] }, { "cell_type": "code", "execution_count": 23, "id": "df1cbb9e", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:get_summary(): fail; threshold: 76 cells suppressed; \n", "INFO:acro:outcome_df:\n", "-----------------------------------------------------------|\n", " Surv_prob |Surv_prob_SE |num_at_risk |num_events |\n", "Time | | | |\n", "-----------------------------------------------------------|\n", "51 ok | ok | ok | ok|\n", "69 threshold; | threshold; | threshold; | threshold; |\n", "85 threshold; | threshold; | threshold; | threshold; |\n", "91 threshold; | threshold; | threshold; | threshold; |\n", "115 threshold; | threshold; | threshold; | threshold; |\n", "372 threshold; | threshold; | threshold; | threshold; |\n", "667 threshold; | threshold; | threshold; | threshold; |\n", "874 threshold; | threshold; | threshold; | threshold; |\n", "1039 threshold; | threshold; | threshold; | threshold; |\n", "1046 threshold; | threshold; | threshold; | threshold; |\n", "1281 threshold; | threshold; | threshold; | threshold; |\n", "1286 threshold; | threshold; | threshold; | threshold; |\n", "1326 threshold; | threshold; | threshold; | threshold; |\n", "1355 threshold; | threshold; | threshold; | threshold; |\n", "1626 threshold; | threshold; | threshold; | threshold; |\n", "1903 threshold; | threshold; | threshold; | threshold; |\n", "1914 threshold; | threshold; | threshold; | threshold; |\n", "2776 threshold; | threshold; | threshold; | threshold; |\n", "2851 threshold; | threshold; | threshold; | threshold; |\n", "3309 threshold; | threshold; | threshold; | threshold; |\n", "-----------------------------------------------------------|\n", "\n", "INFO:acro:records:add(): output_11\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " Surv prob Surv prob SE num at risk num events\n", "Time \n", "51 0.95 0.048734 20.0 1.0\n", "69 NaN NaN NaN NaN\n", "85 NaN NaN NaN NaN\n", "91 NaN NaN NaN NaN\n", "115 NaN NaN NaN NaN\n", "372 NaN NaN NaN NaN\n", "667 NaN NaN NaN NaN\n", "874 NaN NaN NaN NaN\n", "1039 NaN NaN NaN NaN\n", "1046 NaN NaN NaN NaN\n", "1281 NaN NaN NaN NaN\n", "1286 NaN NaN NaN NaN\n", "1326 NaN NaN NaN NaN\n", "1355 NaN NaN NaN NaN\n", "1626 NaN NaN NaN NaN\n", "1903 NaN NaN NaN NaN\n", "1914 NaN NaN NaN NaN\n", "2776 NaN NaN NaN NaN\n", "2851 NaN NaN NaN NaN\n", "3309 NaN NaN NaN NaN\n" ] } ], "source": [ "safe_table = acro.surv_func(data.futime, data.death, output=\"table\")\n", "print(safe_table)" ] }, { "cell_type": "code", "execution_count": 24, "id": "8762ab29", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:get_summary(): fail; threshold: 76 cells suppressed; \n", "INFO:acro:outcome_df:\n", "-----------------------------------------------------------|\n", " Surv_prob |Surv_prob_SE |num_at_risk |num_events |\n", "Time | | | |\n", "-----------------------------------------------------------|\n", "51 ok | ok | ok | ok|\n", "69 threshold; | threshold; | threshold; | threshold; |\n", "85 threshold; | threshold; | threshold; | threshold; |\n", "91 threshold; | threshold; | threshold; | threshold; |\n", "115 threshold; | threshold; | threshold; | threshold; |\n", "372 threshold; | threshold; | threshold; | threshold; |\n", "667 threshold; | threshold; | threshold; | threshold; |\n", "874 threshold; | threshold; | threshold; | threshold; |\n", "1039 threshold; | threshold; | threshold; | threshold; |\n", "1046 threshold; | threshold; | threshold; | threshold; |\n", "1281 threshold; | threshold; | threshold; | threshold; |\n", "1286 threshold; | threshold; | threshold; | threshold; |\n", "1326 threshold; | threshold; | threshold; | threshold; |\n", "1355 threshold; | threshold; | threshold; | threshold; |\n", "1626 threshold; | threshold; | threshold; | threshold; |\n", "1903 threshold; | threshold; | threshold; | threshold; |\n", "1914 threshold; | threshold; | threshold; | threshold; |\n", "2776 threshold; | threshold; | threshold; | threshold; |\n", "2851 threshold; | threshold; | threshold; | threshold; |\n", "3309 threshold; | threshold; | threshold; | threshold; |\n", "-----------------------------------------------------------|\n", "\n", "INFO:acro:records:add(): output_12\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "(, 'acro_artifacts/kaplan-mier_0.png')\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "safe_plot = acro.surv_func(\n", " data.futime, data.death, output=\"plot\", filename=\"kaplan-mier.png\"\n", ")\n", "print(safe_plot)" ] }, { "cell_type": "markdown", "id": "9e554eea", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# ACRO functionality to let users manage their outputs\n", "\n", "### 1: List current ACRO outputs\n", "This is an example of using the print_output function to list all the outputs created so far" ] }, { "cell_type": "code", "execution_count": 25, "id": "ec960039", "metadata": { "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "uid: output_0\n", "status: fail\n", "type: table\n", "properties: {'method': 'crosstab'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\n", "command: safe_table = acro.crosstab(\n", "summary: fail; threshold: 4 cells suppressed; \n", "outcome: parents great_pret pretentious usual\n", "recommend \n", "not_recom ok ok ok\n", "priority ok ok ok\n", "recommend threshold; threshold; threshold; \n", "spec_prior ok ok ok\n", "very_recom threshold; ok ok\n", "output: [parents great_pret pretentious usual\n", "recommend \n", "not_recom 1440.0 1440.0 1440.0\n", "priority 858.0 1484.0 1924.0\n", "recommend NaN NaN NaN\n", "spec_prior 2022.0 1264.0 758.0\n", "very_recom NaN 132.0 196.0]\n", "timestamp: 2025-03-06T19:39:46.897407\n", "comments: []\n", "exception: \n", "\n", "uid: output_1\n", "status: fail\n", "type: table\n", "properties: {'method': 'crosstab'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 5, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [2, 3], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\n", "command: safe_table = acro.crosstab(df.recommend, df.parents, margins=True)\n", "summary: fail; threshold: 5 cells suppressed; \n", "outcome: parents great_pret pretentious usual All\n", "recommend \n", "not_recom ok ok ok ok\n", "priority ok ok ok ok\n", "recommend threshold; threshold; threshold; threshold; \n", "spec_prior ok ok ok ok\n", "very_recom threshold; ok ok ok\n", "All ok ok ok ok\n", "output: [parents great_pret pretentious usual All\n", "recommend \n", "not_recom 1440.0 1440 1440 4320\n", "priority 858.0 1484 1924 4266\n", "spec_prior 2022.0 1264 758 4044\n", "very_recom NaN 132 196 328\n", "All 4320.0 4320 4318 12958]\n", "timestamp: 2025-03-06T19:39:46.961631\n", "comments: []\n", "exception: \n", "\n", "uid: output_2\n", "status: fail\n", "type: table\n", "properties: {'method': 'crosstab'}\n", "sdc: {'summary': {'suppressed': False, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\n", "command: safe_table = acro.crosstab(df.recommend, df.parents)\n", "summary: fail; threshold: 4 cells may need suppressing; \n", "outcome: parents great_pret pretentious usual\n", "recommend \n", "not_recom ok ok ok\n", "priority ok ok ok\n", "recommend threshold; threshold; threshold; \n", "spec_prior ok ok ok\n", "very_recom threshold; ok ok\n", "output: [parents great_pret pretentious usual\n", "recommend \n", "not_recom 1440 1440 1440\n", "priority 858 1484 1924\n", "recommend 0 0 2\n", "spec_prior 2022 1264 758\n", "very_recom 0 132 196]\n", "timestamp: 2025-03-06T19:39:46.980090\n", "comments: []\n", "exception: \n", "\n", "uid: output_3\n", "status: fail\n", "type: table\n", "properties: {'method': 'crosstab'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 1, 'p-ratio': 4, 'nk-rule': 4, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 2]], 'p-ratio': [[2, 0], [2, 1], [2, 2], [4, 0]], 'nk-rule': [[2, 0], [2, 1], [2, 2], [4, 0]], 'all-values-are-same': []}}\n", "command: safe_table = acro.crosstab(\n", "summary: fail; threshold: 1 cells suppressed; p-ratio: 4 cells suppressed; nk-rule: 4 cells suppressed; \n", "outcome: parents great_pret pretentious \\\n", "recommend \n", "not_recom ok ok \n", "priority ok ok \n", "recommend p-ratio; nk-rule; p-ratio; nk-rule; \n", "spec_prior ok ok \n", "very_recom p-ratio; nk-rule; ok \n", "\n", "parents usual \n", "recommend \n", "not_recom ok \n", "priority ok \n", "recommend threshold; p-ratio; nk-rule; \n", "spec_prior ok \n", "very_recom ok \n", "output: [parents great_pret pretentious usual\n", "recommend \n", "not_recom 1440.0 1440.0 1440.0\n", "priority 858.0 1484.0 1924.0\n", "recommend NaN NaN NaN\n", "spec_prior 2022.0 1264.0 758.0\n", "very_recom NaN 132.0 196.0]\n", "timestamp: 2025-03-06T19:39:47.019919\n", "comments: []\n", "exception: \n", "\n", "uid: output_4\n", "status: fail\n", "type: table\n", "properties: {'method': 'crosstab'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 2, 'p-ratio': 8, 'nk-rule': 8, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 2], [2, 5]], 'p-ratio': [[2, 0], [2, 1], [2, 2], [2, 3], [2, 4], [2, 5], [4, 0], [4, 3]], 'nk-rule': [[2, 0], [2, 1], [2, 2], [2, 3], [2, 4], [2, 5], [4, 0], [4, 3]], 'all-values-are-same': []}}\n", "command: safe_table = acro.crosstab(\n", "summary: fail; threshold: 2 cells suppressed; p-ratio: 8 cells suppressed; nk-rule: 8 cells suppressed; \n", "outcome: mode_aggfunc \\\n", "parents great_pret pretentious \n", "recommend \n", "not_recom ok ok \n", "priority ok ok \n", "recommend p-ratio; nk-rule; p-ratio; nk-rule; \n", "spec_prior ok ok \n", "very_recom p-ratio; nk-rule; ok \n", "\n", " mean \\\n", "parents usual great_pret \n", "recommend \n", "not_recom ok ok \n", "priority ok ok \n", "recommend threshold; p-ratio; nk-rule; p-ratio; nk-rule; \n", "spec_prior ok ok \n", "very_recom ok p-ratio; nk-rule; \n", "\n", " \n", "parents pretentious usual \n", "recommend \n", "not_recom ok ok \n", "priority ok ok \n", "recommend p-ratio; nk-rule; threshold; p-ratio; nk-rule; \n", "spec_prior ok ok \n", "very_recom ok ok \n", "output: [ mode_aggfunc mean \n", "parents great_pret pretentious usual great_pret pretentious usual\n", "recommend \n", "not_recom 2.0 1.0 1.0 3.125694 3.105556 3.074306\n", "priority 1.0 1.0 1.0 2.665501 3.030323 3.116944\n", "recommend NaN NaN NaN NaN NaN NaN\n", "spec_prior 3.0 3.0 3.0 3.353610 3.370253 3.393140\n", "very_recom NaN 1.0 1.0 NaN 2.204545 2.244898]\n", "timestamp: 2025-03-06T19:39:47.068066\n", "comments: []\n", "exception: \n", "\n", "uid: output_5\n", "status: pass\n", "type: table\n", "properties: {'method': 'pivot_table'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 0, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\n", "command: table = acro.pivot_table(\n", "summary: pass\n", "outcome: mean std\n", " children children\n", "parents \n", "great_pret ok ok\n", "pretentious ok ok\n", "usual ok ok\n", "output: [ mean std\n", " children children\n", "parents \n", "great_pret 3.140972 2.270396\n", "pretentious 3.129630 2.250436\n", "usual 3.110648 2.213072]\n", "timestamp: 2025-03-06T19:39:47.105651\n", "comments: []\n", "exception: \n", "\n", "uid: output_6\n", "status: fail\n", "type: table\n", "properties: {'method': 'pivot_table'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 5, 'p-ratio': 5, 'nk-rule': 5, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[0, 2], [0, 4], [1, 2], [2, 2], [3, 2]], 'p-ratio': [[0, 2], [0, 4], [1, 2], [2, 2], [3, 2]], 'nk-rule': [[0, 2], [0, 4], [1, 2], [2, 2], [3, 2]], 'all-values-are-same': []}}\n", "command: safe_table = acro.pivot_table(\n", "summary: fail; threshold: 5 cells suppressed; p-ratio: 5 cells suppressed; nk-rule: 5 cells suppressed; \n", "outcome: children \\\n", "recommend not_recom priority recommend spec_prior \n", "parents \n", "great_pret ok ok threshold; p-ratio; nk-rule; ok \n", "pretentious ok ok threshold; p-ratio; nk-rule; ok \n", "usual ok ok threshold; p-ratio; nk-rule; ok \n", "All ok ok threshold; p-ratio; nk-rule; ok \n", "\n", " \n", "recommend very_recom All \n", "parents \n", "great_pret threshold; p-ratio; nk-rule; ok \n", "pretentious ok ok \n", "usual ok ok \n", "All ok ok \n", "output: [ children \n", "recommend not_recom priority spec_prior very_recom All\n", "parents \n", "great_pret 3.125694 2.665501 3.353610 NaN 3.140972\n", "pretentious 3.105556 3.030323 3.370253 2.204545 3.129630\n", "usual 3.074306 3.116944 3.393140 2.244898 3.111626\n", "All 3.101852 2.996015 3.366222 2.228659 3.127412]\n", "timestamp: 2025-03-06T19:39:47.231513\n", "comments: []\n", "exception: \n", "\n", "uid: output_7\n", "status: pass\n", "type: regression\n", "properties: {'method': 'ols', 'dof': 12958.0}\n", "sdc: {}\n", "command: results = acro.ols(y, x)\n", "summary: pass; dof=12958.0 >= 10\n", "outcome: Empty DataFrame\n", "Columns: []\n", "Index: []\n", "output: [ recommend R-squared: 0.001\n", "Dep. Variable: \n", "Model: OLS Adj. R-squared: 0.001000\n", "Method: Least Squares F-statistic: 13.830000\n", "Date: Thu, 06 Mar 2025 Prob (F-statistic): 0.000201\n", "Time: 19:39:47 Log-Likelihood: -25121.000000\n", "No. Observations: 12960 AIC: 50250.000000\n", "Df Residuals: 12958 BIC: 50260.000000\n", "Df Model: 1 NaN NaN\n", "Covariance Type: nonrobust NaN NaN, coef std err t P>|t| [0.025 0.975]\n", "const 2.2099 0.025 87.263 0.0 2.160 2.260\n", "children 0.0245 0.007 3.718 0.0 0.012 0.037, 77090.215 Durbin-Watson: 2.883\n", "Omnibus: \n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 1741.57\n", "Skew: -0.486 Prob(JB): 0.00\n", "Kurtosis: 1.489 Cond. No. 6.90]\n", "timestamp: 2025-03-06T19:39:47.388052\n", "comments: []\n", "exception: \n", "\n", "uid: output_8\n", "status: pass\n", "type: regression\n", "properties: {'method': 'olsr', 'dof': 12958.0}\n", "sdc: {}\n", "command: results = acro.olsr(formula=\"recommend ~ children\", data=new_df)\n", "summary: pass; dof=12958.0 >= 10\n", "outcome: Empty DataFrame\n", "Columns: []\n", "Index: []\n", "output: [ recommend R-squared: 0.001\n", "Dep. Variable: \n", "Model: OLS Adj. R-squared: 0.001000\n", "Method: Least Squares F-statistic: 13.830000\n", "Date: Thu, 06 Mar 2025 Prob (F-statistic): 0.000201\n", "Time: 19:39:47 Log-Likelihood: -25121.000000\n", "No. Observations: 12960 AIC: 50250.000000\n", "Df Residuals: 12958 BIC: 50260.000000\n", "Df Model: 1 NaN NaN\n", "Covariance Type: nonrobust NaN NaN, coef std err t P>|t| [0.025 0.975]\n", "Intercept 2.2099 0.025 87.263 0.0 2.160 2.260\n", "children 0.0245 0.007 3.718 0.0 0.012 0.037, 77090.215 Durbin-Watson: 2.883\n", "Omnibus: \n", "Prob(Omnibus): 0.000 Jarque-Bera (JB): 1741.57\n", "Skew: -0.486 Prob(JB): 0.00\n", "Kurtosis: 1.489 Cond. No. 6.90]\n", "timestamp: 2025-03-06T19:39:47.414293\n", "comments: []\n", "exception: \n", "\n", "uid: output_9\n", "status: pass\n", "type: regression\n", "properties: {'method': 'probit', 'dof': 12958.0}\n", "sdc: {}\n", "command: results = acro.probit(y, x)\n", "summary: pass; dof=12958.0 >= 10\n", "outcome: Empty DataFrame\n", "Columns: []\n", "Index: []\n", "output: [ finance No. Observations: 12960\n", "Dep. Variable: \n", "Model: Probit Df Residuals: 12958.000000\n", "Method: MLE Df Model: 1.000000\n", "Date: Thu, 06 Mar 2025 Pseudo R-squ.: 0.000004\n", "Time: 19:39:47 Log-Likelihood: -8983.200000\n", "converged: True LL-Null: -8983.200000\n", "Covariance Type: nonrobust LLR p-value: 0.799200, coef std err z P>|z| [0.025 0.975]\n", "const -0.0039 0.019 -0.207 0.836 -0.041 0.033\n", "children 0.0012 0.005 0.254 0.799 -0.008 0.011]\n", "timestamp: 2025-03-06T19:39:47.439598\n", "comments: []\n", "exception: \n", "\n", "uid: output_10\n", "status: pass\n", "type: regression\n", "properties: {'method': 'logit', 'dof': 12958.0}\n", "sdc: {}\n", "command: results = acro.logit(y, x)\n", "summary: pass; dof=12958.0 >= 10\n", "outcome: Empty DataFrame\n", "Columns: []\n", "Index: []\n", "output: [ finance No. Observations: 12960\n", "Dep. Variable: \n", "Model: Logit Df Residuals: 12958.000000\n", "Method: MLE Df Model: 1.000000\n", "Date: Thu, 06 Mar 2025 Pseudo R-squ.: 0.000004\n", "Time: 19:39:47 Log-Likelihood: -8983.200000\n", "converged: True LL-Null: -8983.200000\n", "Covariance Type: nonrobust LLR p-value: 0.799200, coef std err z P>|z| [0.025 0.975]\n", "const -0.0062 0.030 -0.207 0.836 -0.065 0.053\n", "children 0.0020 0.008 0.254 0.799 -0.013 0.017]\n", "timestamp: 2025-03-06T19:39:47.457696\n", "comments: []\n", "exception: \n", "\n", "uid: output_11\n", "status: fail\n", "type: table\n", "properties: {'method': 'surv_func'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 76, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[1, 0], [1, 1], [1, 2], [1, 3], [2, 0], [2, 1], [2, 2], [2, 3], [3, 0], [3, 1], [3, 2], [3, 3], [4, 0], [4, 1], [4, 2], [4, 3], [5, 0], [5, 1], [5, 2], [5, 3], [6, 0], [6, 1], [6, 2], [6, 3], [7, 0], [7, 1], [7, 2], [7, 3], [8, 0], [8, 1], [8, 2], [8, 3], [9, 0], [9, 1], [9, 2], [9, 3], [10, 0], [10, 1], [10, 2], [10, 3], [11, 0], [11, 1], [11, 2], [11, 3], [12, 0], [12, 1], [12, 2], [12, 3], [13, 0], [13, 1], [13, 2], [13, 3], [14, 0], [14, 1], [14, 2], [14, 3], [15, 0], [15, 1], [15, 2], [15, 3], [16, 0], [16, 1], [16, 2], [16, 3], [17, 0], [17, 1], [17, 2], [17, 3], [18, 0], [18, 1], [18, 2], [18, 3], [19, 0], [19, 1], [19, 2], [19, 3]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\n", "command: safe_table = acro.surv_func(data.futime, data.death, output=\"table\")\n", "summary: fail; threshold: 76 cells suppressed; \n", "outcome: Surv_prob Surv_prob_SE num_at_risk num_events\n", "Time \n", "51 ok ok ok ok\n", "69 threshold; threshold; threshold; threshold; \n", "85 threshold; threshold; threshold; threshold; \n", "91 threshold; threshold; threshold; threshold; \n", "115 threshold; threshold; threshold; threshold; \n", "372 threshold; threshold; threshold; threshold; \n", "667 threshold; threshold; threshold; threshold; \n", "874 threshold; threshold; threshold; threshold; \n", "1039 threshold; threshold; threshold; threshold; \n", "1046 threshold; threshold; threshold; threshold; \n", "1281 threshold; threshold; threshold; threshold; \n", "1286 threshold; threshold; threshold; threshold; \n", "1326 threshold; threshold; threshold; threshold; \n", "1355 threshold; threshold; threshold; threshold; \n", "1626 threshold; threshold; threshold; threshold; \n", "1903 threshold; threshold; threshold; threshold; \n", "1914 threshold; threshold; threshold; threshold; \n", "2776 threshold; threshold; threshold; threshold; \n", "2851 threshold; threshold; threshold; threshold; \n", "3309 threshold; threshold; threshold; threshold; \n", "output: [ Surv prob Surv prob SE num at risk num events\n", "Time \n", "51 0.95 0.048734 20.0 1.0\n", "69 NaN NaN NaN NaN\n", "85 NaN NaN NaN NaN\n", "91 NaN NaN NaN NaN\n", "115 NaN NaN NaN NaN\n", "372 NaN NaN NaN NaN\n", "667 NaN NaN NaN NaN\n", "874 NaN NaN NaN NaN\n", "1039 NaN NaN NaN NaN\n", "1046 NaN NaN NaN NaN\n", "1281 NaN NaN NaN NaN\n", "1286 NaN NaN NaN NaN\n", "1326 NaN NaN NaN NaN\n", "1355 NaN NaN NaN NaN\n", "1626 NaN NaN NaN NaN\n", "1903 NaN NaN NaN NaN\n", "1914 NaN NaN NaN NaN\n", "2776 NaN NaN NaN NaN\n", "2851 NaN NaN NaN NaN\n", "3309 NaN NaN NaN NaN]\n", "timestamp: 2025-03-06T19:39:48.298262\n", "comments: []\n", "exception: \n", "\n", "uid: output_12\n", "status: fail\n", "type: survival plot\n", "properties: {'method': 'surv_func'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 76, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[1, 0], [1, 1], [1, 2], [1, 3], [2, 0], [2, 1], [2, 2], [2, 3], [3, 0], [3, 1], [3, 2], [3, 3], [4, 0], [4, 1], [4, 2], [4, 3], [5, 0], [5, 1], [5, 2], [5, 3], [6, 0], [6, 1], [6, 2], [6, 3], [7, 0], [7, 1], [7, 2], [7, 3], [8, 0], [8, 1], [8, 2], [8, 3], [9, 0], [9, 1], [9, 2], [9, 3], [10, 0], [10, 1], [10, 2], [10, 3], [11, 0], [11, 1], [11, 2], [11, 3], [12, 0], [12, 1], [12, 2], [12, 3], [13, 0], [13, 1], [13, 2], [13, 3], [14, 0], [14, 1], [14, 2], [14, 3], [15, 0], [15, 1], [15, 2], [15, 3], [16, 0], [16, 1], [16, 2], [16, 3], [17, 0], [17, 1], [17, 2], [17, 3], [18, 0], [18, 1], [18, 2], [18, 3], [19, 0], [19, 1], [19, 2], [19, 3]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\n", "command: safe_plot = acro.surv_func(\n", "summary: fail; threshold: 76 cells suppressed; \n", "outcome: Empty DataFrame\n", "Columns: []\n", "Index: []\n", "output: ['acro_artifacts/kaplan-mier_0.png']\n", "timestamp: 2025-03-06T19:39:48.450221\n", "comments: []\n", "exception: \n", "\n", "\n" ] }, { "data": { "text/plain": [ "'uid: output_0\\nstatus: fail\\ntype: table\\nproperties: {\\'method\\': \\'crosstab\\'}\\nsdc: {\\'summary\\': {\\'suppressed\\': True, \\'negative\\': 0, \\'missing\\': 0, \\'threshold\\': 4, \\'p-ratio\\': 0, \\'nk-rule\\': 0, \\'all-values-are-same\\': 0}, \\'cells\\': {\\'negative\\': [], \\'missing\\': [], \\'threshold\\': [[2, 0], [2, 1], [2, 2], [4, 0]], \\'p-ratio\\': [], \\'nk-rule\\': [], \\'all-values-are-same\\': []}}\\ncommand: safe_table = acro.crosstab(\\nsummary: fail; threshold: 4 cells suppressed; \\noutcome: parents great_pret pretentious usual\\nrecommend \\nnot_recom ok ok ok\\npriority ok ok ok\\nrecommend threshold; threshold; threshold; \\nspec_prior ok ok ok\\nvery_recom threshold; ok ok\\noutput: [parents great_pret pretentious usual\\nrecommend \\nnot_recom 1440.0 1440.0 1440.0\\npriority 858.0 1484.0 1924.0\\nrecommend NaN NaN NaN\\nspec_prior 2022.0 1264.0 758.0\\nvery_recom NaN 132.0 196.0]\\ntimestamp: 2025-03-06T19:39:46.897407\\ncomments: []\\nexception: \\n\\nuid: output_1\\nstatus: fail\\ntype: table\\nproperties: {\\'method\\': \\'crosstab\\'}\\nsdc: {\\'summary\\': {\\'suppressed\\': True, \\'negative\\': 0, \\'missing\\': 0, \\'threshold\\': 5, \\'p-ratio\\': 0, \\'nk-rule\\': 0, \\'all-values-are-same\\': 0}, \\'cells\\': {\\'negative\\': [], \\'missing\\': [], \\'threshold\\': [[2, 0], [2, 1], [2, 2], [2, 3], [4, 0]], \\'p-ratio\\': [], \\'nk-rule\\': [], \\'all-values-are-same\\': []}}\\ncommand: safe_table = acro.crosstab(df.recommend, df.parents, margins=True)\\nsummary: fail; threshold: 5 cells suppressed; \\noutcome: parents great_pret pretentious usual All\\nrecommend \\nnot_recom ok ok ok ok\\npriority ok ok ok ok\\nrecommend threshold; threshold; threshold; threshold; \\nspec_prior ok ok ok ok\\nvery_recom threshold; ok ok ok\\nAll ok ok ok ok\\noutput: [parents great_pret pretentious usual All\\nrecommend \\nnot_recom 1440.0 1440 1440 4320\\npriority 858.0 1484 1924 4266\\nspec_prior 2022.0 1264 758 4044\\nvery_recom NaN 132 196 328\\nAll 4320.0 4320 4318 12958]\\ntimestamp: 2025-03-06T19:39:46.961631\\ncomments: []\\nexception: \\n\\nuid: output_2\\nstatus: fail\\ntype: table\\nproperties: {\\'method\\': \\'crosstab\\'}\\nsdc: {\\'summary\\': {\\'suppressed\\': False, \\'negative\\': 0, \\'missing\\': 0, \\'threshold\\': 4, \\'p-ratio\\': 0, \\'nk-rule\\': 0, \\'all-values-are-same\\': 0}, \\'cells\\': {\\'negative\\': [], \\'missing\\': [], \\'threshold\\': [[2, 0], [2, 1], [2, 2], [4, 0]], \\'p-ratio\\': [], \\'nk-rule\\': [], \\'all-values-are-same\\': []}}\\ncommand: safe_table = acro.crosstab(df.recommend, df.parents)\\nsummary: fail; threshold: 4 cells may need suppressing; \\noutcome: parents great_pret pretentious usual\\nrecommend \\nnot_recom ok ok ok\\npriority ok ok ok\\nrecommend threshold; threshold; threshold; \\nspec_prior ok ok ok\\nvery_recom threshold; ok ok\\noutput: [parents great_pret pretentious usual\\nrecommend \\nnot_recom 1440 1440 1440\\npriority 858 1484 1924\\nrecommend 0 0 2\\nspec_prior 2022 1264 758\\nvery_recom 0 132 196]\\ntimestamp: 2025-03-06T19:39:46.980090\\ncomments: []\\nexception: \\n\\nuid: output_3\\nstatus: fail\\ntype: table\\nproperties: {\\'method\\': \\'crosstab\\'}\\nsdc: {\\'summary\\': {\\'suppressed\\': True, \\'negative\\': 0, \\'missing\\': 0, \\'threshold\\': 1, \\'p-ratio\\': 4, \\'nk-rule\\': 4, \\'all-values-are-same\\': 0}, \\'cells\\': {\\'negative\\': [], \\'missing\\': [], \\'threshold\\': [[2, 2]], \\'p-ratio\\': [[2, 0], [2, 1], [2, 2], [4, 0]], \\'nk-rule\\': [[2, 0], [2, 1], [2, 2], [4, 0]], \\'all-values-are-same\\': []}}\\ncommand: safe_table = acro.crosstab(\\nsummary: fail; threshold: 1 cells suppressed; p-ratio: 4 cells suppressed; nk-rule: 4 cells suppressed; \\noutcome: parents great_pret pretentious \\\\\\nrecommend \\nnot_recom ok ok \\npriority ok ok \\nrecommend p-ratio; nk-rule; p-ratio; nk-rule; \\nspec_prior ok ok \\nvery_recom p-ratio; nk-rule; ok \\n\\nparents usual \\nrecommend \\nnot_recom ok \\npriority ok \\nrecommend threshold; p-ratio; nk-rule; \\nspec_prior ok \\nvery_recom ok \\noutput: [parents great_pret pretentious usual\\nrecommend \\nnot_recom 1440.0 1440.0 1440.0\\npriority 858.0 1484.0 1924.0\\nrecommend NaN NaN NaN\\nspec_prior 2022.0 1264.0 758.0\\nvery_recom NaN 132.0 196.0]\\ntimestamp: 2025-03-06T19:39:47.019919\\ncomments: []\\nexception: \\n\\nuid: output_4\\nstatus: fail\\ntype: table\\nproperties: {\\'method\\': \\'crosstab\\'}\\nsdc: {\\'summary\\': {\\'suppressed\\': True, \\'negative\\': 0, \\'missing\\': 0, \\'threshold\\': 2, \\'p-ratio\\': 8, \\'nk-rule\\': 8, \\'all-values-are-same\\': 0}, \\'cells\\': {\\'negative\\': [], \\'missing\\': [], \\'threshold\\': [[2, 2], [2, 5]], \\'p-ratio\\': [[2, 0], [2, 1], [2, 2], [2, 3], [2, 4], [2, 5], [4, 0], [4, 3]], \\'nk-rule\\': [[2, 0], [2, 1], [2, 2], [2, 3], [2, 4], [2, 5], [4, 0], [4, 3]], \\'all-values-are-same\\': []}}\\ncommand: safe_table = acro.crosstab(\\nsummary: fail; threshold: 2 cells suppressed; p-ratio: 8 cells suppressed; nk-rule: 8 cells suppressed; \\noutcome: mode_aggfunc \\\\\\nparents great_pret pretentious \\nrecommend \\nnot_recom ok ok \\npriority ok ok \\nrecommend p-ratio; nk-rule; p-ratio; nk-rule; \\nspec_prior ok ok \\nvery_recom p-ratio; nk-rule; ok \\n\\n mean \\\\\\nparents usual great_pret \\nrecommend \\nnot_recom ok ok \\npriority ok ok \\nrecommend threshold; p-ratio; nk-rule; p-ratio; nk-rule; \\nspec_prior ok ok \\nvery_recom ok p-ratio; nk-rule; \\n\\n \\nparents pretentious usual \\nrecommend \\nnot_recom ok ok \\npriority ok ok \\nrecommend p-ratio; nk-rule; threshold; p-ratio; nk-rule; \\nspec_prior ok ok \\nvery_recom ok ok \\noutput: [ mode_aggfunc mean \\nparents great_pret pretentious usual great_pret pretentious usual\\nrecommend \\nnot_recom 2.0 1.0 1.0 3.125694 3.105556 3.074306\\npriority 1.0 1.0 1.0 2.665501 3.030323 3.116944\\nrecommend NaN NaN NaN NaN NaN NaN\\nspec_prior 3.0 3.0 3.0 3.353610 3.370253 3.393140\\nvery_recom NaN 1.0 1.0 NaN 2.204545 2.244898]\\ntimestamp: 2025-03-06T19:39:47.068066\\ncomments: []\\nexception: \\n\\nuid: output_5\\nstatus: pass\\ntype: table\\nproperties: {\\'method\\': \\'pivot_table\\'}\\nsdc: {\\'summary\\': {\\'suppressed\\': True, \\'negative\\': 0, \\'missing\\': 0, \\'threshold\\': 0, \\'p-ratio\\': 0, \\'nk-rule\\': 0, \\'all-values-are-same\\': 0}, \\'cells\\': {\\'negative\\': [], \\'missing\\': [], \\'threshold\\': [], \\'p-ratio\\': [], \\'nk-rule\\': [], \\'all-values-are-same\\': []}}\\ncommand: table = acro.pivot_table(\\nsummary: pass\\noutcome: mean std\\n children children\\nparents \\ngreat_pret ok ok\\npretentious ok ok\\nusual ok ok\\noutput: [ mean std\\n children children\\nparents \\ngreat_pret 3.140972 2.270396\\npretentious 3.129630 2.250436\\nusual 3.110648 2.213072]\\ntimestamp: 2025-03-06T19:39:47.105651\\ncomments: []\\nexception: \\n\\nuid: output_6\\nstatus: fail\\ntype: table\\nproperties: {\\'method\\': \\'pivot_table\\'}\\nsdc: {\\'summary\\': {\\'suppressed\\': True, \\'negative\\': 0, \\'missing\\': 0, \\'threshold\\': 5, \\'p-ratio\\': 5, \\'nk-rule\\': 5, \\'all-values-are-same\\': 0}, \\'cells\\': {\\'negative\\': [], \\'missing\\': [], \\'threshold\\': [[0, 2], [0, 4], [1, 2], [2, 2], [3, 2]], \\'p-ratio\\': [[0, 2], [0, 4], [1, 2], [2, 2], [3, 2]], \\'nk-rule\\': [[0, 2], [0, 4], [1, 2], [2, 2], [3, 2]], \\'all-values-are-same\\': []}}\\ncommand: safe_table = acro.pivot_table(\\nsummary: fail; threshold: 5 cells suppressed; p-ratio: 5 cells suppressed; nk-rule: 5 cells suppressed; \\noutcome: children \\\\\\nrecommend not_recom priority recommend spec_prior \\nparents \\ngreat_pret ok ok threshold; p-ratio; nk-rule; ok \\npretentious ok ok threshold; p-ratio; nk-rule; ok \\nusual ok ok threshold; p-ratio; nk-rule; ok \\nAll ok ok threshold; p-ratio; nk-rule; ok \\n\\n \\nrecommend very_recom All \\nparents \\ngreat_pret threshold; p-ratio; nk-rule; ok \\npretentious ok ok \\nusual ok ok \\nAll ok ok \\noutput: [ children \\nrecommend not_recom priority spec_prior very_recom All\\nparents \\ngreat_pret 3.125694 2.665501 3.353610 NaN 3.140972\\npretentious 3.105556 3.030323 3.370253 2.204545 3.129630\\nusual 3.074306 3.116944 3.393140 2.244898 3.111626\\nAll 3.101852 2.996015 3.366222 2.228659 3.127412]\\ntimestamp: 2025-03-06T19:39:47.231513\\ncomments: []\\nexception: \\n\\nuid: output_7\\nstatus: pass\\ntype: regression\\nproperties: {\\'method\\': \\'ols\\', \\'dof\\': 12958.0}\\nsdc: {}\\ncommand: results = acro.ols(y, x)\\nsummary: pass; dof=12958.0 >= 10\\noutcome: Empty DataFrame\\nColumns: []\\nIndex: []\\noutput: [ recommend R-squared: 0.001\\nDep. Variable: \\nModel: OLS Adj. R-squared: 0.001000\\nMethod: Least Squares F-statistic: 13.830000\\nDate: Thu, 06 Mar 2025 Prob (F-statistic): 0.000201\\nTime: 19:39:47 Log-Likelihood: -25121.000000\\nNo. Observations: 12960 AIC: 50250.000000\\nDf Residuals: 12958 BIC: 50260.000000\\nDf Model: 1 NaN NaN\\nCovariance Type: nonrobust NaN NaN, coef std err t P>|t| [0.025 0.975]\\nconst 2.2099 0.025 87.263 0.0 2.160 2.260\\nchildren 0.0245 0.007 3.718 0.0 0.012 0.037, 77090.215 Durbin-Watson: 2.883\\nOmnibus: \\nProb(Omnibus): 0.000 Jarque-Bera (JB): 1741.57\\nSkew: -0.486 Prob(JB): 0.00\\nKurtosis: 1.489 Cond. No. 6.90]\\ntimestamp: 2025-03-06T19:39:47.388052\\ncomments: []\\nexception: \\n\\nuid: output_8\\nstatus: pass\\ntype: regression\\nproperties: {\\'method\\': \\'olsr\\', \\'dof\\': 12958.0}\\nsdc: {}\\ncommand: results = acro.olsr(formula=\"recommend ~ children\", data=new_df)\\nsummary: pass; dof=12958.0 >= 10\\noutcome: Empty DataFrame\\nColumns: []\\nIndex: []\\noutput: [ recommend R-squared: 0.001\\nDep. Variable: \\nModel: OLS Adj. R-squared: 0.001000\\nMethod: Least Squares F-statistic: 13.830000\\nDate: Thu, 06 Mar 2025 Prob (F-statistic): 0.000201\\nTime: 19:39:47 Log-Likelihood: -25121.000000\\nNo. Observations: 12960 AIC: 50250.000000\\nDf Residuals: 12958 BIC: 50260.000000\\nDf Model: 1 NaN NaN\\nCovariance Type: nonrobust NaN NaN, coef std err t P>|t| [0.025 0.975]\\nIntercept 2.2099 0.025 87.263 0.0 2.160 2.260\\nchildren 0.0245 0.007 3.718 0.0 0.012 0.037, 77090.215 Durbin-Watson: 2.883\\nOmnibus: \\nProb(Omnibus): 0.000 Jarque-Bera (JB): 1741.57\\nSkew: -0.486 Prob(JB): 0.00\\nKurtosis: 1.489 Cond. No. 6.90]\\ntimestamp: 2025-03-06T19:39:47.414293\\ncomments: []\\nexception: \\n\\nuid: output_9\\nstatus: pass\\ntype: regression\\nproperties: {\\'method\\': \\'probit\\', \\'dof\\': 12958.0}\\nsdc: {}\\ncommand: results = acro.probit(y, x)\\nsummary: pass; dof=12958.0 >= 10\\noutcome: Empty DataFrame\\nColumns: []\\nIndex: []\\noutput: [ finance No. Observations: 12960\\nDep. Variable: \\nModel: Probit Df Residuals: 12958.000000\\nMethod: MLE Df Model: 1.000000\\nDate: Thu, 06 Mar 2025 Pseudo R-squ.: 0.000004\\nTime: 19:39:47 Log-Likelihood: -8983.200000\\nconverged: True LL-Null: -8983.200000\\nCovariance Type: nonrobust LLR p-value: 0.799200, coef std err z P>|z| [0.025 0.975]\\nconst -0.0039 0.019 -0.207 0.836 -0.041 0.033\\nchildren 0.0012 0.005 0.254 0.799 -0.008 0.011]\\ntimestamp: 2025-03-06T19:39:47.439598\\ncomments: []\\nexception: \\n\\nuid: output_10\\nstatus: pass\\ntype: regression\\nproperties: {\\'method\\': \\'logit\\', \\'dof\\': 12958.0}\\nsdc: {}\\ncommand: results = acro.logit(y, x)\\nsummary: pass; dof=12958.0 >= 10\\noutcome: Empty DataFrame\\nColumns: []\\nIndex: []\\noutput: [ finance No. Observations: 12960\\nDep. Variable: \\nModel: Logit Df Residuals: 12958.000000\\nMethod: MLE Df Model: 1.000000\\nDate: Thu, 06 Mar 2025 Pseudo R-squ.: 0.000004\\nTime: 19:39:47 Log-Likelihood: -8983.200000\\nconverged: True LL-Null: -8983.200000\\nCovariance Type: nonrobust LLR p-value: 0.799200, coef std err z P>|z| [0.025 0.975]\\nconst -0.0062 0.030 -0.207 0.836 -0.065 0.053\\nchildren 0.0020 0.008 0.254 0.799 -0.013 0.017]\\ntimestamp: 2025-03-06T19:39:47.457696\\ncomments: []\\nexception: \\n\\nuid: output_11\\nstatus: fail\\ntype: table\\nproperties: {\\'method\\': \\'surv_func\\'}\\nsdc: {\\'summary\\': {\\'suppressed\\': True, \\'negative\\': 0, \\'missing\\': 0, \\'threshold\\': 76, \\'p-ratio\\': 0, \\'nk-rule\\': 0, \\'all-values-are-same\\': 0}, \\'cells\\': {\\'negative\\': [], \\'missing\\': [], \\'threshold\\': [[1, 0], [1, 1], [1, 2], [1, 3], [2, 0], [2, 1], [2, 2], [2, 3], [3, 0], [3, 1], [3, 2], [3, 3], [4, 0], [4, 1], [4, 2], [4, 3], [5, 0], [5, 1], [5, 2], [5, 3], [6, 0], [6, 1], [6, 2], [6, 3], [7, 0], [7, 1], [7, 2], [7, 3], [8, 0], [8, 1], [8, 2], [8, 3], [9, 0], [9, 1], [9, 2], [9, 3], [10, 0], [10, 1], [10, 2], [10, 3], [11, 0], [11, 1], [11, 2], [11, 3], [12, 0], [12, 1], [12, 2], [12, 3], [13, 0], [13, 1], [13, 2], [13, 3], [14, 0], [14, 1], [14, 2], [14, 3], [15, 0], [15, 1], [15, 2], [15, 3], [16, 0], [16, 1], [16, 2], [16, 3], [17, 0], [17, 1], [17, 2], [17, 3], [18, 0], [18, 1], [18, 2], [18, 3], [19, 0], [19, 1], [19, 2], [19, 3]], \\'p-ratio\\': [], \\'nk-rule\\': [], \\'all-values-are-same\\': []}}\\ncommand: safe_table = acro.surv_func(data.futime, data.death, output=\"table\")\\nsummary: fail; threshold: 76 cells suppressed; \\noutcome: Surv_prob Surv_prob_SE num_at_risk num_events\\nTime \\n51 ok ok ok ok\\n69 threshold; threshold; threshold; threshold; \\n85 threshold; threshold; threshold; threshold; \\n91 threshold; threshold; threshold; threshold; \\n115 threshold; threshold; threshold; threshold; \\n372 threshold; threshold; threshold; threshold; \\n667 threshold; threshold; threshold; threshold; \\n874 threshold; threshold; threshold; threshold; \\n1039 threshold; threshold; threshold; threshold; \\n1046 threshold; threshold; threshold; threshold; \\n1281 threshold; threshold; threshold; threshold; \\n1286 threshold; threshold; threshold; threshold; \\n1326 threshold; threshold; threshold; threshold; \\n1355 threshold; threshold; threshold; threshold; \\n1626 threshold; threshold; threshold; threshold; \\n1903 threshold; threshold; threshold; threshold; \\n1914 threshold; threshold; threshold; threshold; \\n2776 threshold; threshold; threshold; threshold; \\n2851 threshold; threshold; threshold; threshold; \\n3309 threshold; threshold; threshold; threshold; \\noutput: [ Surv prob Surv prob SE num at risk num events\\nTime \\n51 0.95 0.048734 20.0 1.0\\n69 NaN NaN NaN NaN\\n85 NaN NaN NaN NaN\\n91 NaN NaN NaN NaN\\n115 NaN NaN NaN NaN\\n372 NaN NaN NaN NaN\\n667 NaN NaN NaN NaN\\n874 NaN NaN NaN NaN\\n1039 NaN NaN NaN NaN\\n1046 NaN NaN NaN NaN\\n1281 NaN NaN NaN NaN\\n1286 NaN NaN NaN NaN\\n1326 NaN NaN NaN NaN\\n1355 NaN NaN NaN NaN\\n1626 NaN NaN NaN NaN\\n1903 NaN NaN NaN NaN\\n1914 NaN NaN NaN NaN\\n2776 NaN NaN NaN NaN\\n2851 NaN NaN NaN NaN\\n3309 NaN NaN NaN NaN]\\ntimestamp: 2025-03-06T19:39:48.298262\\ncomments: []\\nexception: \\n\\nuid: output_12\\nstatus: fail\\ntype: survival plot\\nproperties: {\\'method\\': \\'surv_func\\'}\\nsdc: {\\'summary\\': {\\'suppressed\\': True, \\'negative\\': 0, \\'missing\\': 0, \\'threshold\\': 76, \\'p-ratio\\': 0, \\'nk-rule\\': 0, \\'all-values-are-same\\': 0}, \\'cells\\': {\\'negative\\': [], \\'missing\\': [], \\'threshold\\': [[1, 0], [1, 1], [1, 2], [1, 3], [2, 0], [2, 1], [2, 2], [2, 3], [3, 0], [3, 1], [3, 2], [3, 3], [4, 0], [4, 1], [4, 2], [4, 3], [5, 0], [5, 1], [5, 2], [5, 3], [6, 0], [6, 1], [6, 2], [6, 3], [7, 0], [7, 1], [7, 2], [7, 3], [8, 0], [8, 1], [8, 2], [8, 3], [9, 0], [9, 1], [9, 2], [9, 3], [10, 0], [10, 1], [10, 2], [10, 3], [11, 0], [11, 1], [11, 2], [11, 3], [12, 0], [12, 1], [12, 2], [12, 3], [13, 0], [13, 1], [13, 2], [13, 3], [14, 0], [14, 1], [14, 2], [14, 3], [15, 0], [15, 1], [15, 2], [15, 3], [16, 0], [16, 1], [16, 2], [16, 3], [17, 0], [17, 1], [17, 2], [17, 3], [18, 0], [18, 1], [18, 2], [18, 3], [19, 0], [19, 1], [19, 2], [19, 3]], \\'p-ratio\\': [], \\'nk-rule\\': [], \\'all-values-are-same\\': []}}\\ncommand: safe_plot = acro.surv_func(\\nsummary: fail; threshold: 76 cells suppressed; \\noutcome: Empty DataFrame\\nColumns: []\\nIndex: []\\noutput: [\\'acro_artifacts/kaplan-mier_0.png\\']\\ntimestamp: 2025-03-06T19:39:48.450221\\ncomments: []\\nexception: \\n\\n'" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "acro.print_outputs()" ] }, { "cell_type": "markdown", "id": "3136bc78", "metadata": {}, "source": [ "### 2: Remove some ACRO outputs before finalising \n", "This is an example of deleting some of the ACRO outputs. \n", "The name of the output that needs to be removed should be passed to the function remove_output. \n", "- Currently, all outputs names contain timestamp; that is the time when the output was created. \n", "- The output name can be taken from the outputs listed by the print_outputs function, \n", "- or by listing the results and choosing the specific output that needs to be removed" ] }, { "cell_type": "code", "execution_count": 26, "id": "e4ee985e", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:remove(): output_0 removed\n" ] } ], "source": [ "acro.remove_output(\"output_0\")" ] }, { "cell_type": "markdown", "id": "df2a02e0", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 3: Rename ACRO outputs before finalising\n", "This is an example of renaming the outputs to provide a more descriptive name. \n", "The timestamp associated with the output name will not get overwritten" ] }, { "cell_type": "code", "execution_count": 27, "id": "b9d0b9ac", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:rename_output(): output_2 renamed to pivot_table\n" ] } ], "source": [ "acro.rename_output(\"output_2\", \"pivot_table\")" ] }, { "cell_type": "markdown", "id": "56d2b6a1", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 4: Add a comment to output\n", "This is an example to add a comment to outputs. \n", "It can be used to provide a description or to pass additional information to the output checkers." ] }, { "cell_type": "code", "execution_count": 28, "id": "8e21f7b0", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:a comment was added to output_1\n", "INFO:acro:records:a comment was added to output_1\n" ] } ], "source": [ "acro.add_comments(\"output_1\", \"Please let me have this data.\")\n", "acro.add_comments(\"output_1\", \"6 cells were suppressed in this table\")" ] }, { "cell_type": "markdown", "id": "8496fed4", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 5: Add an unsupported output to the list of outputs\n", "This is an example to add an unsupported outputs (such as images) to the list of outputs" ] }, { "cell_type": "code", "execution_count": 29, "id": "1e8000a1", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:add_custom(): output_13\n" ] } ], "source": [ "acro.custom_output(\n", " \"XandY.jpeg\", \"This output is an image showing the relationship between X and Y\"\n", ")" ] }, { "cell_type": "markdown", "id": "5a586694", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 6: (the big one) Finalise ACRO\n", "This is an example of the function _finalise()_ which the users must call at the end of each session. \n", "- It takes each output and saves it to a CSV file. \n", "- It also saves the SDC analysis for each output to a json file or Excel file \n", " (depending on the extension of the name of the file provided as an input to the function)" ] }, { "cell_type": "code", "execution_count": 30, "id": "f941aca2", "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:\n", "uid: output_1\n", "status: fail\n", "type: table\n", "properties: {'method': 'crosstab'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 5, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [2, 3], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\n", "command: safe_table = acro.crosstab(df.recommend, df.parents, margins=True)\n", "summary: fail; threshold: 5 cells suppressed; \n", "outcome: parents great_pret pretentious usual All\n", "recommend \n", "not_recom ok ok ok ok\n", "priority ok ok ok ok\n", "recommend threshold; threshold; threshold; threshold; \n", "spec_prior ok ok ok ok\n", "very_recom threshold; ok ok ok\n", "All ok ok ok ok\n", "output: [parents great_pret pretentious usual All\n", "recommend \n", "not_recom 1440.0 1440 1440 4320\n", "priority 858.0 1484 1924 4266\n", "spec_prior 2022.0 1264 758 4044\n", "very_recom NaN 132 196 328\n", "All 4320.0 4320 4318 12958]\n", "timestamp: 2025-03-06T19:39:46.961631\n", "comments: ['Please let me have this data.', '6 cells were suppressed in this table']\n", "exception: \n", "\n", "The status of the record above is: fail.\n", "Please explain why an exception should be granted.\n", "\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ " suppressed\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:\n", "uid: output_3\n", "status: fail\n", "type: table\n", "properties: {'method': 'crosstab'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 1, 'p-ratio': 4, 'nk-rule': 4, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 2]], 'p-ratio': [[2, 0], [2, 1], [2, 2], [4, 0]], 'nk-rule': [[2, 0], [2, 1], [2, 2], [4, 0]], 'all-values-are-same': []}}\n", "command: safe_table = acro.crosstab(\n", "summary: fail; threshold: 1 cells suppressed; p-ratio: 4 cells suppressed; nk-rule: 4 cells suppressed; \n", "outcome: parents great_pret pretentious \\\n", "recommend \n", "not_recom ok ok \n", "priority ok ok \n", "recommend p-ratio; nk-rule; p-ratio; nk-rule; \n", "spec_prior ok ok \n", "very_recom p-ratio; nk-rule; ok \n", "\n", "parents usual \n", "recommend \n", "not_recom ok \n", "priority ok \n", "recommend threshold; p-ratio; nk-rule; \n", "spec_prior ok \n", "very_recom ok \n", "output: [parents great_pret pretentious usual\n", "recommend \n", "not_recom 1440.0 1440.0 1440.0\n", "priority 858.0 1484.0 1924.0\n", "recommend NaN NaN NaN\n", "spec_prior 2022.0 1264.0 758.0\n", "very_recom NaN 132.0 196.0]\n", "timestamp: 2025-03-06T19:39:47.019919\n", "comments: []\n", "exception: \n", "\n", "The status of the record above is: fail.\n", "Please explain why an exception should be granted.\n", "\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ " exception requested\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:\n", "uid: output_4\n", "status: fail\n", "type: table\n", "properties: {'method': 'crosstab'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 2, 'p-ratio': 8, 'nk-rule': 8, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 2], [2, 5]], 'p-ratio': [[2, 0], [2, 1], [2, 2], [2, 3], [2, 4], [2, 5], [4, 0], [4, 3]], 'nk-rule': [[2, 0], [2, 1], [2, 2], [2, 3], [2, 4], [2, 5], [4, 0], [4, 3]], 'all-values-are-same': []}}\n", "command: safe_table = acro.crosstab(\n", "summary: fail; threshold: 2 cells suppressed; p-ratio: 8 cells suppressed; nk-rule: 8 cells suppressed; \n", "outcome: mode_aggfunc \\\n", "parents great_pret pretentious \n", "recommend \n", "not_recom ok ok \n", "priority ok ok \n", "recommend p-ratio; nk-rule; p-ratio; nk-rule; \n", "spec_prior ok ok \n", "very_recom p-ratio; nk-rule; ok \n", "\n", " mean \\\n", "parents usual great_pret \n", "recommend \n", "not_recom ok ok \n", "priority ok ok \n", "recommend threshold; p-ratio; nk-rule; p-ratio; nk-rule; \n", "spec_prior ok ok \n", "very_recom ok p-ratio; nk-rule; \n", "\n", " \n", "parents pretentious usual \n", "recommend \n", "not_recom ok ok \n", "priority ok ok \n", "recommend p-ratio; nk-rule; threshold; p-ratio; nk-rule; \n", "spec_prior ok ok \n", "very_recom ok ok \n", "output: [ mode_aggfunc mean \n", "parents great_pret pretentious usual great_pret pretentious usual\n", "recommend \n", "not_recom 2.0 1.0 1.0 3.125694 3.105556 3.074306\n", "priority 1.0 1.0 1.0 2.665501 3.030323 3.116944\n", "recommend NaN NaN NaN NaN NaN NaN\n", "spec_prior 3.0 3.0 3.0 3.353610 3.370253 3.393140\n", "very_recom NaN 1.0 1.0 NaN 2.204545 2.244898]\n", "timestamp: 2025-03-06T19:39:47.068066\n", "comments: []\n", "exception: \n", "\n", "The status of the record above is: fail.\n", "Please explain why an exception should be granted.\n", "\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ " exception requested\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:\n", "uid: output_6\n", "status: fail\n", "type: table\n", "properties: {'method': 'pivot_table'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 5, 'p-ratio': 5, 'nk-rule': 5, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[0, 2], [0, 4], [1, 2], [2, 2], [3, 2]], 'p-ratio': [[0, 2], [0, 4], [1, 2], [2, 2], [3, 2]], 'nk-rule': [[0, 2], [0, 4], [1, 2], [2, 2], [3, 2]], 'all-values-are-same': []}}\n", "command: safe_table = acro.pivot_table(\n", "summary: fail; threshold: 5 cells suppressed; p-ratio: 5 cells suppressed; nk-rule: 5 cells suppressed; \n", "outcome: children \\\n", "recommend not_recom priority recommend spec_prior \n", "parents \n", "great_pret ok ok threshold; p-ratio; nk-rule; ok \n", "pretentious ok ok threshold; p-ratio; nk-rule; ok \n", "usual ok ok threshold; p-ratio; nk-rule; ok \n", "All ok ok threshold; p-ratio; nk-rule; ok \n", "\n", " \n", "recommend very_recom All \n", "parents \n", "great_pret threshold; p-ratio; nk-rule; ok \n", "pretentious ok ok \n", "usual ok ok \n", "All ok ok \n", "output: [ children \n", "recommend not_recom priority spec_prior very_recom All\n", "parents \n", "great_pret 3.125694 2.665501 3.353610 NaN 3.140972\n", "pretentious 3.105556 3.030323 3.370253 2.204545 3.129630\n", "usual 3.074306 3.116944 3.393140 2.244898 3.111626\n", "All 3.101852 2.996015 3.366222 2.228659 3.127412]\n", "timestamp: 2025-03-06T19:39:47.231513\n", "comments: []\n", "exception: \n", "\n", "The status of the record above is: fail.\n", "Please explain why an exception should be granted.\n", "\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ " some reason\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:\n", "uid: output_11\n", "status: fail\n", "type: table\n", "properties: {'method': 'surv_func'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 76, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[1, 0], [1, 1], [1, 2], [1, 3], [2, 0], [2, 1], [2, 2], [2, 3], [3, 0], [3, 1], [3, 2], [3, 3], [4, 0], [4, 1], [4, 2], [4, 3], [5, 0], [5, 1], [5, 2], [5, 3], [6, 0], [6, 1], [6, 2], [6, 3], [7, 0], [7, 1], [7, 2], [7, 3], [8, 0], [8, 1], [8, 2], [8, 3], [9, 0], [9, 1], [9, 2], [9, 3], [10, 0], [10, 1], [10, 2], [10, 3], [11, 0], [11, 1], [11, 2], [11, 3], [12, 0], [12, 1], [12, 2], [12, 3], [13, 0], [13, 1], [13, 2], [13, 3], [14, 0], [14, 1], [14, 2], [14, 3], [15, 0], [15, 1], [15, 2], [15, 3], [16, 0], [16, 1], [16, 2], [16, 3], [17, 0], [17, 1], [17, 2], [17, 3], [18, 0], [18, 1], [18, 2], [18, 3], [19, 0], [19, 1], [19, 2], [19, 3]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\n", "command: safe_table = acro.surv_func(data.futime, data.death, output=\"table\")\n", "summary: fail; threshold: 76 cells suppressed; \n", "outcome: Surv_prob Surv_prob_SE num_at_risk num_events\n", "Time \n", "51 ok ok ok ok\n", "69 threshold; threshold; threshold; threshold; \n", "85 threshold; threshold; threshold; threshold; \n", "91 threshold; threshold; threshold; threshold; \n", "115 threshold; threshold; threshold; threshold; \n", "372 threshold; threshold; threshold; threshold; \n", "667 threshold; threshold; threshold; threshold; \n", "874 threshold; threshold; threshold; threshold; \n", "1039 threshold; threshold; threshold; threshold; \n", "1046 threshold; threshold; threshold; threshold; \n", "1281 threshold; threshold; threshold; threshold; \n", "1286 threshold; threshold; threshold; threshold; \n", "1326 threshold; threshold; threshold; threshold; \n", "1355 threshold; threshold; threshold; threshold; \n", "1626 threshold; threshold; threshold; threshold; \n", "1903 threshold; threshold; threshold; threshold; \n", "1914 threshold; threshold; threshold; threshold; \n", "2776 threshold; threshold; threshold; threshold; \n", "2851 threshold; threshold; threshold; threshold; \n", "3309 threshold; threshold; threshold; threshold; \n", "output: [ Surv prob Surv prob SE num at risk num events\n", "Time \n", "51 0.95 0.048734 20.0 1.0\n", "69 NaN NaN NaN NaN\n", "85 NaN NaN NaN NaN\n", "91 NaN NaN NaN NaN\n", "115 NaN NaN NaN NaN\n", "372 NaN NaN NaN NaN\n", "667 NaN NaN NaN NaN\n", "874 NaN NaN NaN NaN\n", "1039 NaN NaN NaN NaN\n", "1046 NaN NaN NaN NaN\n", "1281 NaN NaN NaN NaN\n", "1286 NaN NaN NaN NaN\n", "1326 NaN NaN NaN NaN\n", "1355 NaN NaN NaN NaN\n", "1626 NaN NaN NaN NaN\n", "1903 NaN NaN NaN NaN\n", "1914 NaN NaN NaN NaN\n", "2776 NaN NaN NaN NaN\n", "2851 NaN NaN NaN NaN\n", "3309 NaN NaN NaN NaN]\n", "timestamp: 2025-03-06T19:39:48.298262\n", "comments: []\n", "exception: \n", "\n", "The status of the record above is: fail.\n", "Please explain why an exception should be granted.\n", "\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ " some other reason\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:\n", "uid: output_12\n", "status: fail\n", "type: survival plot\n", "properties: {'method': 'surv_func'}\n", "sdc: {'summary': {'suppressed': True, 'negative': 0, 'missing': 0, 'threshold': 76, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[1, 0], [1, 1], [1, 2], [1, 3], [2, 0], [2, 1], [2, 2], [2, 3], [3, 0], [3, 1], [3, 2], [3, 3], [4, 0], [4, 1], [4, 2], [4, 3], [5, 0], [5, 1], [5, 2], [5, 3], [6, 0], [6, 1], [6, 2], [6, 3], [7, 0], [7, 1], [7, 2], [7, 3], [8, 0], [8, 1], [8, 2], [8, 3], [9, 0], [9, 1], [9, 2], [9, 3], [10, 0], [10, 1], [10, 2], [10, 3], [11, 0], [11, 1], [11, 2], [11, 3], [12, 0], [12, 1], [12, 2], [12, 3], [13, 0], [13, 1], [13, 2], [13, 3], [14, 0], [14, 1], [14, 2], [14, 3], [15, 0], [15, 1], [15, 2], [15, 3], [16, 0], [16, 1], [16, 2], [16, 3], [17, 0], [17, 1], [17, 2], [17, 3], [18, 0], [18, 1], [18, 2], [18, 3], [19, 0], [19, 1], [19, 2], [19, 3]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\n", "command: safe_plot = acro.surv_func(\n", "summary: fail; threshold: 76 cells suppressed; \n", "outcome: Empty DataFrame\n", "Columns: []\n", "Index: []\n", "output: ['acro_artifacts/kaplan-mier_0.png']\n", "timestamp: 2025-03-06T19:39:48.450221\n", "comments: []\n", "exception: \n", "\n", "The status of the record above is: fail.\n", "Please explain why an exception should be granted.\n", "\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ " suppressed\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:\n", "uid: pivot_table\n", "status: fail\n", "type: table\n", "properties: {'method': 'crosstab'}\n", "sdc: {'summary': {'suppressed': False, 'negative': 0, 'missing': 0, 'threshold': 4, 'p-ratio': 0, 'nk-rule': 0, 'all-values-are-same': 0}, 'cells': {'negative': [], 'missing': [], 'threshold': [[2, 0], [2, 1], [2, 2], [4, 0]], 'p-ratio': [], 'nk-rule': [], 'all-values-are-same': []}}\n", "command: safe_table = acro.crosstab(df.recommend, df.parents)\n", "summary: fail; threshold: 4 cells may need suppressing; \n", "outcome: parents great_pret pretentious usual\n", "recommend \n", "not_recom ok ok ok\n", "priority ok ok ok\n", "recommend threshold; threshold; threshold; \n", "spec_prior ok ok ok\n", "very_recom threshold; ok ok\n", "output: [parents great_pret pretentious usual\n", "recommend \n", "not_recom 1440 1440 1440\n", "priority 858 1484 1924\n", "recommend 0 0 2\n", "spec_prior 2022 1264 758\n", "very_recom 0 132 196]\n", "timestamp: 2025-03-06T19:39:46.980090\n", "comments: []\n", "exception: \n", "\n", "The status of the record above is: fail.\n", "Please explain why an exception should be granted.\n", "\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ " a reason is provided\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:\n", "uid: output_13\n", "status: review\n", "type: custom\n", "properties: {}\n", "sdc: {}\n", "command: custom\n", "summary: review\n", "outcome: Empty DataFrame\n", "Columns: []\n", "Index: []\n", "output: ['XandY.jpeg']\n", "timestamp: 2025-03-06T19:39:48.518030\n", "comments: ['This output is an image showing the relationship between X and Y']\n", "exception: \n", "\n", "The status of the record above is: review.\n", "Please explain why an exception should be granted.\n", "\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ " image is not disclosive\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:outputs written to: NURSERY\n" ] } ], "source": [ "output = acro.finalise(\"NURSERY\", \"json\")" ] }, { "cell_type": "markdown", "id": "113d84ec", "metadata": {}, "source": [ "### 7: Add a directory of outputs to an acro object \n", "This is an example of adding a list of files (produced by the researcher without using ACRO) to an acro object and creates a results file for checking." ] }, { "cell_type": "code", "execution_count": 31, "id": "fdea993a", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:version: 0.4.8\n", "INFO:acro:config: {'safe_threshold': 10, 'safe_dof_threshold': 10, 'safe_nk_n': 2, 'safe_nk_k': 0.9, 'safe_pratio_p': 0.1, 'check_missing_values': False, 'survival_safe_threshold': 10, 'zeros_are_disclosive': True}\n", "INFO:acro:automatic suppression: False\n", "INFO:acro:records:add_custom(): output_0\n", "INFO:acro:records:rename_output(): output_0 renamed to crosstab.pkl\n", "INFO:acro:records:\n", "uid: crosstab.pkl\n", "status: review\n", "type: custom\n", "properties: {}\n", "sdc: {}\n", "command: custom\n", "summary: review\n", "outcome: Empty DataFrame\n", "Columns: []\n", "Index: []\n", "output: ['test_add_to_acro/crosstab.pkl']\n", "timestamp: 2025-03-06T19:41:22.128464\n", "comments: ['']\n", "exception: \n", "\n", "The status of the record above is: review.\n", "Please explain why an exception should be granted.\n", "\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ " pickle file need some explanation \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:acro:records:outputs written to: SDC_results\n" ] } ], "source": [ "import shutil\n", "\n", "table = pd.crosstab(df.recommend, df.parents)\n", "# save the output table to a file and add this file to a directory\n", "src_path = \"test_add_to_acro\"\n", "file_path = \"crosstab.pkl\"\n", "dest_path = \"SDC_results\"\n", "if not os.path.exists(src_path):\n", " table.to_pickle(file_path)\n", " os.mkdir(src_path)\n", " shutil.move(file_path, src_path, copy_function=shutil.copytree)\n", "\n", "# add the output to acro\n", "add_to_acro(src_path, dest_path)" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "testacro", "language": "python", "name": "testacro" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 5 }