Différences

Ci-dessous, les différences entre deux révisions de la page.

--- en:cs:k-nn_multiple_imputation [2024/04/25 16:13] – [Imputation on $N \in \mathbb{N}$ features data set] fraggle
+++ en:cs:k-nn_multiple_imputation [2024/05/27 15:34] (Version actuelle) – [Unique missing value imputation] fraggle
@@ Ligne 2: / Ligne 2: @@
   * $A$ and $B$ two sets, $g: A \longrightarrow B$ a function:
-    * Subset g image : $A^{\prime} \subset A, g(A^{\prime}) = \{g(x) \in B| \, x \in A^{\prime}\} \subset B$
+    * $g$ is injective $\iff \forall x, y \in A, f(x) =_{B} f(y) \implies x =_{A} y$
-    * Subset inverse g image: $B^{\prime} \subset B, g^{-1}(B^{\prime}) = \{x \in A| \, g(x) \in B^{\prime}\} \subset A$
+    * $g$ is surjective $\iff \forall y \in B, \exists x \in A, y =_{B} f(x)$
-    * g is an injection $ \iff \forall x, y \in A, f(x) =_{B} f(y) \implies x =_{A} y$
+    * $g$ is bijective $\iff g$ is injective and  surjective $\iff \forall y \in B, \exists! x \in A, y =_{B} f(x) \iff \exists g^{-1}: B \longrightarrow A, g \circ g^{-1} = id_A \land g^{-1} \circ g = id_B$
+    * Subset $g$ image : $A^{\prime} \subset A, g(A^{\prime}) = \{g(x) \in B| \, x \in A^{\prime}\} \subset B$
+    * Subset inverse $g$ image: $B^{\prime} \subset B, g^{-1}(B^{\prime}) = \{x \in A| \, g(x) \in B^{\prime}\} \subset A$
   * For a given $d \in \mathbb{N}$, let's define the function $f$:
 $$ \begin{array}{lrcl}
    f: & \mathbb{R}^{d} & \longrightarrow & \mathbb{R}^{d} \\
-      & X = (x_{1}, \ldots, x_{d}) & \stackrel{f}{\longmapsto} & Y = f(X) = (y_{1}, \ldots, y_{d})
+      & X = (x_{1}, \ldots, x_{d}) & \stackrel{f}{\longmapsto} & Y = f(X) = (y_{1} = f_{1}(x_{1}), \ldots, y_{d} = f_{d}(x_{d}))
    \end{array}
 $$
@@ Ligne 16: / Ligne 18: @@
 $f$ will be called the prediction function in subsequent sections.
-  * For a given normed space vector on corpse $K$ $(E, \|~\|_{E})$ and $X \in E$, let' define the binary relation $\le_{X}$:
+  * For a given normed space vector on corpse $K$ $(E, \|~\|_{E})$ and $X \in E$, let' define the binary relation $\le_{X}$ (nearest neighborhood order):
-$$\forall X_{1} \in E \land \forall X_{2} \in E, X_{1} \le_{X} X_{2} \iff \|X - X_{1}\|_{E} \le_{K} \|X - X_{2}\|_{E}$$ (nearest neighborhood order)
+$$\forall X_{1} \in E \land \forall X_{2} \in E, X_{1} \le_{X} X_{2} \iff \|X - X_{1}\|_{E} \le_{K} \|X - X_{2}\|_{E}$$
-  * For a given normed space vector on corpse $K$ $(E, \|~\|_{E})$ and $X \in E$, let' define the binary relation $=_{X}$:
+  * For a given normed space vector on corpse $K$ $(E, \|~\|_{E})$ and $X \in E$, let' define the binary relation $=_{X}$ (nearest neighborhood equality):
-$$\forall X_{1} \in E \land \forall X_{2} \in E, X_{1} =_{X} X_{2} \iff \|X - X_{1}\|_{E} =_{K} \|X - X_{2}\|_{E}$$ (nearest neighborhood equality)
+$$\forall X_{1} \in E \land \forall X_{2} \in E, X_{1} =_{X} X_{2} \iff \|X - X_{1}\|_{E} =_{K} \|X - X_{2}\|_{E}$$
 ====== k-NN multiple imputation ======
@@ Ligne 41: / Ligne 43: @@
 ===== k-NN =====
-For $X \in \mathbb{R}^{d}$, $(\mathcal{D}, \le_{X})$ is a fully ordered finite set.
+For $X \in \mathbb{R}^{d}$, $(\mathcal{D}, \le_{X})$ is a fully ordered finite set: $\mathcal{D} = \{X_{i} | \, \forall i \in \{2,\ldots,n\}, X_{i-1} \le_{X} X_{i} \}$
 For $k \in \{1,\ldots,n\}$, let's define:
-  * $\mathcal{N}^{k}_{X} = \{X_{i} \in \mathcal{D}|\, i \in \{1,\ldots,k\}\}$ the ordered finite set of the $k$ nearest neighbors of $X$ in $\mathcal{D}$
+  * $\mathcal{N}^{k}_{X} = \{X_{i} \in \mathcal{D}|\, i \in \{1,\ldots,k\} \land \forall i \in \{2,\ldots,k\}, X_{i-1} \le_{X} X_{i} \}$ the ordered finite set of the $k$ nearest neighbors of $X$ in $\mathcal{D}$
   * $\underset{\le_{X}}\max \mathcal{N}^{k}_{X}$ the k-th nearest neighbor of $X$ in $\mathcal{D}$
@@ Ligne 62: / Ligne 64: @@
   * Impute with the mean:
 \[
-Y^* = \frac{1}{k} (Y_{1} + \ldots + Y_{k})
+Y^* = \frac{1}{k} \sum_{i=1}^{k} Y_{i}
 \]
+  * Impute with the median:
   * Impute with random sampling:
@@ Ligne 83: / Ligne 87: @@
 ===== Imputation on $N \in \mathbb{N}$ features data set =====
-Given $f: (X_{1},\ldots,X_{N-1}) \mapsto Y_N = f((X_{1},\ldots,X_{N-1}))$, let's define $\mathcal{D^{\prime}} = \{(X_{1,i},\ldots,X_{N-1,i}, Y_{N,i}) \in \mathbb{R}^{d} \times \ldots \times \mathbb{R}^{d} | \, \forall i \in \{1,\ldots,n\} Y_{N,i} = f((X_{1,i},\ldots,X_{N-1,i}))\}$ a $N$ features finite data set.
+Given $f: (X_{1},\ldots,X_{N-1}) \mapsto Y_N = f((X_{1},\ldots,X_{N-1}))$, let's define $\mathcal{D^{\prime}} = \{(X_{1,i},\ldots,X_{N-1,i}, Y_{N,i}) \in \mathbb{R}^{d} \times \ldots \times \mathbb{R}^{d} | \, \forall i \in \{1,\ldots,n\}, Y_{N,i} = f((X_{1,i},\ldots,X_{N-1,i}))\}$ a $N$ features finite data set.