{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "525fc252-192d-41e0-9357-c3cdd61e7540",
   "metadata": {},
   "source": [
    "# K-anonymity\n",
    "\n",
    "K-anonymity is a privacy-preserving technique used in data anonymization to protect the identities of individuals in a dataset. The main goal of K-anonymity is to ensure that each record in the dataset is indistinguishable from at least \"k\" other records with respect to a set of quasi-identifier attributes. Quasi-identifiers are attributes that, when combined, could potentially lead to the identification of an individual.\n",
    "\n",
    "To achieve K-anonymity, the dataset is modified in such a way that the values of the quasi-identifiers are generalized or suppressed to ensure that groups of \"k\" records with similar quasi-identifiers are identical. This way, an attacker trying to re-identify an individual would not be able to pinpoint a specific individual's data from the anonymized dataset.\n",
    "\n",
    "AIJack supports [Mondrian](https://ieeexplore.ieee.org/document/1617393) algorithm, which efficiently anonymizes table data and preserves privacy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bc4b336e-75c4-4fac-8b16-9e7af35fa233",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "from aijack.defense.kanonymity import Mondrian"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4939daa5-c27b-49e0-8243-f3f4c89d73bb",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "      <th>col5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>test1</td>\n",
       "      <td>x</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>test1</td>\n",
       "      <td>x</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "      <td>test2</td>\n",
       "      <td>x</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "      <td>test3</td>\n",
       "      <td>w</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>test2</td>\n",
       "      <td>y</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>test3</td>\n",
       "      <td>y</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>test3</td>\n",
       "      <td>y</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>test3</td>\n",
       "      <td>z</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>test3</td>\n",
       "      <td>y</td>\n",
       "      <td>32</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1 col2   col3 col4  col5\n",
       "0     6    1  test1    x    20\n",
       "1     6    1  test1    x    30\n",
       "2     8    2  test2    x    50\n",
       "3     8    2  test3    w    45\n",
       "4     8    1  test2    y    35\n",
       "5     4    2  test3    y    20\n",
       "6     4    1  test3    y    20\n",
       "7     2    1  test3    z    22\n",
       "8     2    2  test3    y    32"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# This test code is based on https://github.com/glassonion1/anonypy\n",
    "\n",
    "data = [\n",
    "    [6, \"1\", \"test1\", \"x\", 20],\n",
    "    [6, \"1\", \"test1\", \"x\", 30],\n",
    "    [8, \"2\", \"test2\", \"x\", 50],\n",
    "    [8, \"2\", \"test3\", \"w\", 45],\n",
    "    [8, \"1\", \"test2\", \"y\", 35],\n",
    "    [4, \"2\", \"test3\", \"y\", 20],\n",
    "    [4, \"1\", \"test3\", \"y\", 20],\n",
    "    [2, \"1\", \"test3\", \"z\", 22],\n",
    "    [2, \"2\", \"test3\", \"y\", 32],\n",
    "]\n",
    "\n",
    "columns = [\"col1\", \"col2\", \"col3\", \"col4\", \"col5\"]\n",
    "feature_columns = [\"col1\", \"col2\", \"col3\"]\n",
    "is_continuous_map = {\n",
    "    \"col1\": True,\n",
    "    \"col2\": False,\n",
    "    \"col3\": False,\n",
    "    \"col4\": False,\n",
    "    \"col5\": True,\n",
    "}\n",
    "sensitive_column = \"col4\"\n",
    "\n",
    "df = pd.DataFrame(data=data, columns=columns)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "99d45579-22fb-4e38-bd81-bc4b7747c951",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "mondrian = Mondrian(k=2)\n",
    "adf_ignore_unused_features = mondrian.anonymize(\n",
    "    df, feature_columns, sensitive_column, is_continuous_map\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "dad46817-01e4-41e4-b306-1ae5d094302f",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3.000000</td>\n",
       "      <td>1</td>\n",
       "      <td>test3</td>\n",
       "      <td>z</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.000000</td>\n",
       "      <td>1</td>\n",
       "      <td>test3</td>\n",
       "      <td>y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.000000</td>\n",
       "      <td>2</td>\n",
       "      <td>test3</td>\n",
       "      <td>y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.000000</td>\n",
       "      <td>2</td>\n",
       "      <td>test3</td>\n",
       "      <td>y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>6.666667</td>\n",
       "      <td>1</td>\n",
       "      <td>test1_test2</td>\n",
       "      <td>x</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6.666667</td>\n",
       "      <td>1</td>\n",
       "      <td>test1_test2</td>\n",
       "      <td>x</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>6.666667</td>\n",
       "      <td>1</td>\n",
       "      <td>test1_test2</td>\n",
       "      <td>y</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8.000000</td>\n",
       "      <td>2</td>\n",
       "      <td>test2_test3</td>\n",
       "      <td>x</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>8.000000</td>\n",
       "      <td>2</td>\n",
       "      <td>test2_test3</td>\n",
       "      <td>w</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       col1 col2         col3 col4\n",
       "0  3.000000    1        test3    z\n",
       "1  3.000000    1        test3    y\n",
       "2  3.000000    2        test3    y\n",
       "3  3.000000    2        test3    y\n",
       "4  6.666667    1  test1_test2    x\n",
       "5  6.666667    1  test1_test2    x\n",
       "6  6.666667    1  test1_test2    y\n",
       "7  8.000000    2  test2_test3    x\n",
       "8  8.000000    2  test2_test3    w"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adf_ignore_unused_features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "552bfa42-be85-442d-9568-992d08d5b919",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}