How to decide how many n_neighbors to consider while implementing LocalOutlierFactor?

I have a data set with rows: 134000 and columns: 200. I am trying to identify the outliers in data set using LocalOutlierFactor from scikit-learn. Although I understand how the algorithm works, I am unable to decide n_neighbors for my data set.

Kindly suggest.

edited yesterday

ukemi

1238

asked 2 days ago

Neha Bhushan

New contributor

1

$begingroup$
Use grid search to find the optimal number of neighbors
$endgroup$
– Ethan
2 days ago

$begingroup$
This paper may be of interest: Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection (Feb 5, 2019)
$endgroup$
– ukemi
yesterday

add a comment |

Kindly suggest.

edited yesterday

ukemi

1238

asked 2 days ago

Neha Bhushan

New contributor

1

$begingroup$
Use grid search to find the optimal number of neighbors
$endgroup$
– Ethan
2 days ago

$begingroup$
This paper may be of interest: Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection (Feb 5, 2019)
$endgroup$
– ukemi
yesterday

add a comment |

Kindly suggest.

edited yesterday

ukemi

1238

asked 2 days ago

Neha Bhushan

New contributor

Kindly suggest.

python scikit-learn outlier hyperparameter-tuning k-nn

edited yesterday

ukemi

1238

asked 2 days ago

Neha Bhushan

New contributor

edited yesterday

ukemi

1238

asked 2 days ago

Neha Bhushan

New contributor

edited yesterday

ukemi

1238

edited yesterday

ukemi

1238

edited yesterday

ukemi

1238

asked 2 days ago

Neha Bhushan

New contributor

asked 2 days ago

Neha Bhushan

asked 2 days ago

Neha Bhushan

New contributor

Neha Bhushan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

$begingroup$
Use grid search to find the optimal number of neighbors
$endgroup$
– Ethan
2 days ago

$begingroup$
This paper may be of interest: Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection (Feb 5, 2019)
$endgroup$
– ukemi
yesterday

add a comment |

1

$begingroup$
Use grid search to find the optimal number of neighbors
$endgroup$
– Ethan
2 days ago

$begingroup$
This paper may be of interest: Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection (Feb 5, 2019)
$endgroup$
– ukemi
yesterday

Use grid search to find the optimal number of neighbors

– Ethan
2 days ago

This paper may be of interest: Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection (Feb 5, 2019)

– ukemi
yesterday

add a comment |

1 Answer
1

active

oldest

votes

One normally uses Grid Search for calculating the optimum parameters in these situations:

from sklearn.model_selection import GridSearchCV

from sklearn.neighbors import KNeighborsClassifier()



n = 30 # Max number of neighbours you want to consider

param_grid = {'n_neighbors': np.arange(n)}

grid = GridSearchCV(KNeighborsClassifier(), param_grid)

Then given this grid, you can fit it to your data to compute its optimum values (from those you provided, they may not be global optima (or even local if the returned value is one of the extrema of your input range)):

grid.fit(X_train, y_train)

You can view the optimum parameters from your input by calling:

grid.best_params_

>>> {'n_neighbors': ?}

You can automatically select an estimator with said optimum parameters by calling:

model = grid.best_estimator_

y_pred = model.fit(X_train, y_train).predict(X_test)

Note: you can find the optimum values of other parameters by adding them to the input dictionary param_grid.

edited yesterday

answered yesterday

ukemi

1238

$begingroup$
I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.
$endgroup$
– Attack68
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Neha Bhushan is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47048%2fhow-to-decide-how-many-n-neighbors-to-consider-while-implementing-localoutlierfa%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

One normally uses Grid Search for calculating the optimum parameters in these situations:

from sklearn.model_selection import GridSearchCV

from sklearn.neighbors import KNeighborsClassifier()



n = 30 # Max number of neighbours you want to consider

param_grid = {'n_neighbors': np.arange(n)}

grid = GridSearchCV(KNeighborsClassifier(), param_grid)

grid.fit(X_train, y_train)

You can view the optimum parameters from your input by calling:

grid.best_params_

>>> {'n_neighbors': ?}

You can automatically select an estimator with said optimum parameters by calling:

model = grid.best_estimator_

y_pred = model.fit(X_train, y_train).predict(X_test)

Note: you can find the optimum values of other parameters by adding them to the input dictionary param_grid.

edited yesterday

answered yesterday

ukemi

1238

$begingroup$
I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.
$endgroup$
– Attack68
yesterday

add a comment |

One normally uses Grid Search for calculating the optimum parameters in these situations:

from sklearn.model_selection import GridSearchCV

from sklearn.neighbors import KNeighborsClassifier()



n = 30 # Max number of neighbours you want to consider

param_grid = {'n_neighbors': np.arange(n)}

grid = GridSearchCV(KNeighborsClassifier(), param_grid)

grid.fit(X_train, y_train)

You can view the optimum parameters from your input by calling:

grid.best_params_

>>> {'n_neighbors': ?}

You can automatically select an estimator with said optimum parameters by calling:

model = grid.best_estimator_

y_pred = model.fit(X_train, y_train).predict(X_test)

Note: you can find the optimum values of other parameters by adding them to the input dictionary param_grid.

edited yesterday

answered yesterday

ukemi

1238

$begingroup$
I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.
$endgroup$
– Attack68
yesterday

add a comment |

One normally uses Grid Search for calculating the optimum parameters in these situations:

from sklearn.model_selection import GridSearchCV

from sklearn.neighbors import KNeighborsClassifier()



n = 30 # Max number of neighbours you want to consider

param_grid = {'n_neighbors': np.arange(n)}

grid = GridSearchCV(KNeighborsClassifier(), param_grid)

grid.fit(X_train, y_train)

You can view the optimum parameters from your input by calling:

grid.best_params_

>>> {'n_neighbors': ?}

You can automatically select an estimator with said optimum parameters by calling:

model = grid.best_estimator_

y_pred = model.fit(X_train, y_train).predict(X_test)

Note: you can find the optimum values of other parameters by adding them to the input dictionary param_grid.

edited yesterday

answered yesterday

ukemi

1238

One normally uses Grid Search for calculating the optimum parameters in these situations:

from sklearn.model_selection import GridSearchCV

from sklearn.neighbors import KNeighborsClassifier()



n = 30 # Max number of neighbours you want to consider

param_grid = {'n_neighbors': np.arange(n)}

grid = GridSearchCV(KNeighborsClassifier(), param_grid)

grid.fit(X_train, y_train)

You can view the optimum parameters from your input by calling:

grid.best_params_

>>> {'n_neighbors': ?}

You can automatically select an estimator with said optimum parameters by calling:

model = grid.best_estimator_

y_pred = model.fit(X_train, y_train).predict(X_test)

Note: you can find the optimum values of other parameters by adding them to the input dictionary param_grid.

edited yesterday

answered yesterday

ukemi

1238

edited yesterday

answered yesterday

ukemi

1238

answered yesterday

ukemi

1238

answered yesterday

ukemi

1238

$begingroup$
I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.
$endgroup$
– Attack68
yesterday

add a comment |

$begingroup$
I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.
$endgroup$
– Attack68
yesterday

I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.

– Attack68
yesterday

add a comment |

Neha Bhushan is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Neha Bhushan is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk