Différences
Ci-dessous, les différences entre deux révisions de la page.
| Les deux révisions précédentes Révision précédente Prochaine révision | Révision précédente | ||
| en:cs:k-nn_multiple_imputation [2024/04/25 16:13] – [Imputation on $N \in \mathbb{N}$ features data set] fraggle | en:cs:k-nn_multiple_imputation [2024/05/27 15:34] (Version actuelle) – [Unique missing value imputation] fraggle | ||
|---|---|---|---|
| Ligne 2: | Ligne 2: | ||
| * $A$ and $B$ two sets, $g: A \longrightarrow B$ a function: | * $A$ and $B$ two sets, $g: A \longrightarrow B$ a function: | ||
| - | * Subset g image : $A^{\prime} \subset A, g(A^{\prime}) = \{g(x) \in B| \, x \in A^{\prime}\} \subset B$ | + | |
| - | * Subset inverse g image: $B^{\prime} \subset B, g^{-1}(B^{\prime}) = \{x \in A| \, g(x) \in B^{\prime}\} \subset A$ | + | * $g$ is surjective $\iff \forall y \in B, \exists x \in A, y =_{B} f(x)$ |
| - | * g is an injection $ \iff \forall x, y \in A, f(x) =_{B} f(y) \implies x =_{A} y$ | + | * $g$ is bijective $\iff g$ is injective and surjective $\iff \forall y \in B, \exists! x \in A, y =_{B} f(x) \iff \exists g^{-1}: B \longrightarrow A, g \circ g^{-1} = id_A \land g^{-1} \circ g = id_B$ |
| + | | ||
| + | * Subset inverse | ||
| + | | ||
| * For a given $d \in \mathbb{N}$, | * For a given $d \in \mathbb{N}$, | ||
| - | |||
| $$ \begin{array}{lrcl} | $$ \begin{array}{lrcl} | ||
| f: & \mathbb{R}^{d} & \longrightarrow & \mathbb{R}^{d} \\ | f: & \mathbb{R}^{d} & \longrightarrow & \mathbb{R}^{d} \\ | ||
| - | & X = (x_{1}, \ldots, x_{d}) & \stackrel{f}{\longmapsto} & Y = f(X) = (y_{1}, \ldots, y_{d}) | + | & X = (x_{1}, \ldots, x_{d}) & \stackrel{f}{\longmapsto} & Y = f(X) = (y_{1} |
| | | ||
| $$ | $$ | ||
| Ligne 16: | Ligne 18: | ||
| $f$ will be called the prediction function in subsequent sections. | $f$ will be called the prediction function in subsequent sections. | ||
| - | * For a given normed space vector on corpse $K$ $(E, \|~\|_{E})$ and $X \in E$, let' define the binary relation $\le_{X}$: | + | * For a given normed space vector on corpse $K$ $(E, \|~\|_{E})$ and $X \in E$, let' define the binary relation $\le_{X}$ |
| - | $$\forall X_{1} \in E \land \forall X_{2} \in E, X_{1} \le_{X} X_{2} \iff \|X - X_{1}\|_{E} \le_{K} \|X - X_{2}\|_{E}$$ | + | $$\forall X_{1} \in E \land \forall X_{2} \in E, X_{1} \le_{X} X_{2} \iff \|X - X_{1}\|_{E} \le_{K} \|X - X_{2}\|_{E}$$ |
| - | * For a given normed space vector on corpse $K$ $(E, \|~\|_{E})$ and $X \in E$, let' define the binary relation $=_{X}$: | + | * For a given normed space vector on corpse $K$ $(E, \|~\|_{E})$ and $X \in E$, let' define the binary relation $=_{X}$ |
| - | $$\forall X_{1} \in E \land \forall X_{2} \in E, X_{1} =_{X} X_{2} \iff \|X - X_{1}\|_{E} =_{K} \|X - X_{2}\|_{E}$$ | + | $$\forall X_{1} \in E \land \forall X_{2} \in E, X_{1} =_{X} X_{2} \iff \|X - X_{1}\|_{E} =_{K} \|X - X_{2}\|_{E}$$ |
| ====== k-NN multiple imputation ====== | ====== k-NN multiple imputation ====== | ||
| Ligne 41: | Ligne 43: | ||
| ===== k-NN ===== | ===== k-NN ===== | ||
| - | For $X \in \mathbb{R}^{d}$, | + | For $X \in \mathbb{R}^{d}$, |
| For $k \in \{1, | For $k \in \{1, | ||
| - | * $\mathcal{N}^{k}_{X} = \{X_{i} \in \mathcal{D}|\, | + | * $\mathcal{N}^{k}_{X} = \{X_{i} \in \mathcal{D}|\, |
| * $\underset{\le_{X}}\max \mathcal{N}^{k}_{X}$ the k-th nearest neighbor of $X$ in $\mathcal{D}$ | * $\underset{\le_{X}}\max \mathcal{N}^{k}_{X}$ the k-th nearest neighbor of $X$ in $\mathcal{D}$ | ||
| Ligne 62: | Ligne 64: | ||
| * Impute with the mean: | * Impute with the mean: | ||
| \[ | \[ | ||
| - | Y^* = \frac{1}{k} | + | Y^* = \frac{1}{k} |
| \] | \] | ||
| + | |||
| + | * Impute with the median: | ||
| * Impute with random sampling: | * Impute with random sampling: | ||
| Ligne 83: | Ligne 87: | ||
| ===== Imputation on $N \in \mathbb{N}$ features data set ===== | ===== Imputation on $N \in \mathbb{N}$ features data set ===== | ||
| - | Given $f: (X_{1}, | + | Given $f: (X_{1}, |