knnFeat
Feature Extraction with KNN
Description
Python implementation of feature extraction with KNN.
And @momijiame updated my implementation. I recommend to use this:
https://github.com/momijiame/gokinjo
pip install gokinjo
Requirements
Python 3.x
numpy
scikit-learn
scipy
Install
git clone git@github.com:upura/knnFeat.git
cd knnFeat
pip install -r requirements.txt
Demo
Notebook version can be seen here.
Packages for visualization
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
Data generation
x0 = np.random.rand(500) - 0.5
x1 = np.random.rand(500) - 0.5
X = np.array(list(zip(x0, x1)))
y = np.array([1 if i0 * i1 > 0 else 0 for (i0, i1) in list(zip(x0, x1))])
Visualization
Feature extraction with KNN
from knnFeat import knnExtract
newX = knnExtract(X, y, k = 1, folds = 5)
Visualization
Algorithm
Quote from here.
It generates k * c new features, where c is the number of class labels. The new features are computed from the distances between the observations and their k nearest neighbors inside each class, as follows:
The first test feature contains the distances between each test instance and its nearest neighbor inside the first class.
The second test feature contains the sums of distances between each test instance and its 2 nearest neighbors inside the first class.
The third test feature contains the sums of distances between each test instance and its 3 nearest neighbors inside the first class.
And so on.
This procedure repeats for each class label, generating k * c new features. Then, the new training features are generated using a n-fold CV approach, in order to avoid overfitting.
Development
flake8 .
pytest
pytest -v -m 'success' --cov=.
License
Author