{ "cells": [ { "cell_type": "markdown", "id": "525fc252-192d-41e0-9357-c3cdd61e7540", "metadata": {}, "source": [ "# K-anonymity\n", "\n", "K-anonymity is a privacy-preserving technique used in data anonymization to protect the identities of individuals in a dataset. The main goal of K-anonymity is to ensure that each record in the dataset is indistinguishable from at least \"k\" other records with respect to a set of quasi-identifier attributes. Quasi-identifiers are attributes that, when combined, could potentially lead to the identification of an individual.\n", "\n", "To achieve K-anonymity, the dataset is modified in such a way that the values of the quasi-identifiers are generalized or suppressed to ensure that groups of \"k\" records with similar quasi-identifiers are identical. This way, an attacker trying to re-identify an individual would not be able to pinpoint a specific individual's data from the anonymized dataset.\n", "\n", "AIJack supports [Mondrian](https://ieeexplore.ieee.org/document/1617393) algorithm, which efficiently anonymizes table data and preserves privacy." ] }, { "cell_type": "code", "execution_count": null, "id": "bc4b336e-75c4-4fac-8b16-9e7af35fa233", "metadata": { "tags": [] }, "outputs": [], "source": [ "import pandas as pd\n", "\n", "from aijack.defense.kanonymity import Mondrian" ] }, { "cell_type": "code", "execution_count": 2, "id": "4939daa5-c27b-49e0-8243-f3f4c89d73bb", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2col3col4col5
061test1x20
161test1x30
282test2x50
382test3w45
481test2y35
542test3y20
641test3y20
721test3z22
822test3y32
\n", "
" ], "text/plain": [ " col1 col2 col3 col4 col5\n", "0 6 1 test1 x 20\n", "1 6 1 test1 x 30\n", "2 8 2 test2 x 50\n", "3 8 2 test3 w 45\n", "4 8 1 test2 y 35\n", "5 4 2 test3 y 20\n", "6 4 1 test3 y 20\n", "7 2 1 test3 z 22\n", "8 2 2 test3 y 32" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This test code is based on https://github.com/glassonion1/anonypy\n", "\n", "data = [\n", " [6, \"1\", \"test1\", \"x\", 20],\n", " [6, \"1\", \"test1\", \"x\", 30],\n", " [8, \"2\", \"test2\", \"x\", 50],\n", " [8, \"2\", \"test3\", \"w\", 45],\n", " [8, \"1\", \"test2\", \"y\", 35],\n", " [4, \"2\", \"test3\", \"y\", 20],\n", " [4, \"1\", \"test3\", \"y\", 20],\n", " [2, \"1\", \"test3\", \"z\", 22],\n", " [2, \"2\", \"test3\", \"y\", 32],\n", "]\n", "\n", "columns = [\"col1\", \"col2\", \"col3\", \"col4\", \"col5\"]\n", "feature_columns = [\"col1\", \"col2\", \"col3\"]\n", "is_continuous_map = {\n", " \"col1\": True,\n", " \"col2\": False,\n", " \"col3\": False,\n", " \"col4\": False,\n", " \"col5\": True,\n", "}\n", "sensitive_column = \"col4\"\n", "\n", "df = pd.DataFrame(data=data, columns=columns)\n", "df" ] }, { "cell_type": "code", "execution_count": 3, "id": "99d45579-22fb-4e38-bd81-bc4b7747c951", "metadata": { "tags": [] }, "outputs": [], "source": [ "mondrian = Mondrian(k=2)\n", "adf_ignore_unused_features = mondrian.anonymize(\n", " df, feature_columns, sensitive_column, is_continuous_map\n", ")" ] }, { "cell_type": "code", "execution_count": 4, "id": "dad46817-01e4-41e4-b306-1ae5d094302f", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2col3col4
03.0000001test3z
13.0000001test3y
23.0000002test3y
33.0000002test3y
46.6666671test1_test2x
56.6666671test1_test2x
66.6666671test1_test2y
78.0000002test2_test3x
88.0000002test2_test3w
\n", "
" ], "text/plain": [ " col1 col2 col3 col4\n", "0 3.000000 1 test3 z\n", "1 3.000000 1 test3 y\n", "2 3.000000 2 test3 y\n", "3 3.000000 2 test3 y\n", "4 6.666667 1 test1_test2 x\n", "5 6.666667 1 test1_test2 x\n", "6 6.666667 1 test1_test2 y\n", "7 8.000000 2 test2_test3 x\n", "8 8.000000 2 test2_test3 w" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adf_ignore_unused_features" ] }, { "cell_type": "code", "execution_count": null, "id": "552bfa42-be85-442d-9568-992d08d5b919", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }