How to decide how many n_neighbors to consider while implementing LocalOutlierFactor?












1












$begingroup$


I have a data set with rows: 134000 and columns: 200. I am trying to identify the outliers in data set using LocalOutlierFactor from scikit-learn. Although I understand how the algorithm works, I am unable to decide n_neighbors for my data set.



Kindly suggest.










share|improve this question









New contributor




Neha Bhushan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$








  • 1




    $begingroup$
    Use grid search to find the optimal number of neighbors
    $endgroup$
    – Ethan
    2 days ago










  • $begingroup$
    This paper may be of interest: Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection (Feb 5, 2019)
    $endgroup$
    – ukemi
    yesterday
















1












$begingroup$


I have a data set with rows: 134000 and columns: 200. I am trying to identify the outliers in data set using LocalOutlierFactor from scikit-learn. Although I understand how the algorithm works, I am unable to decide n_neighbors for my data set.



Kindly suggest.










share|improve this question









New contributor




Neha Bhushan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$








  • 1




    $begingroup$
    Use grid search to find the optimal number of neighbors
    $endgroup$
    – Ethan
    2 days ago










  • $begingroup$
    This paper may be of interest: Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection (Feb 5, 2019)
    $endgroup$
    – ukemi
    yesterday














1












1








1





$begingroup$


I have a data set with rows: 134000 and columns: 200. I am trying to identify the outliers in data set using LocalOutlierFactor from scikit-learn. Although I understand how the algorithm works, I am unable to decide n_neighbors for my data set.



Kindly suggest.










share|improve this question









New contributor




Neha Bhushan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I have a data set with rows: 134000 and columns: 200. I am trying to identify the outliers in data set using LocalOutlierFactor from scikit-learn. Although I understand how the algorithm works, I am unable to decide n_neighbors for my data set.



Kindly suggest.







python scikit-learn outlier hyperparameter-tuning k-nn






share|improve this question









New contributor




Neha Bhushan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Neha Bhushan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited yesterday









ukemi

1238




1238






New contributor




Neha Bhushan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









Neha BhushanNeha Bhushan

61




61




New contributor




Neha Bhushan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Neha Bhushan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Neha Bhushan is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1




    $begingroup$
    Use grid search to find the optimal number of neighbors
    $endgroup$
    – Ethan
    2 days ago










  • $begingroup$
    This paper may be of interest: Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection (Feb 5, 2019)
    $endgroup$
    – ukemi
    yesterday














  • 1




    $begingroup$
    Use grid search to find the optimal number of neighbors
    $endgroup$
    – Ethan
    2 days ago










  • $begingroup$
    This paper may be of interest: Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection (Feb 5, 2019)
    $endgroup$
    – ukemi
    yesterday








1




1




$begingroup$
Use grid search to find the optimal number of neighbors
$endgroup$
– Ethan
2 days ago




$begingroup$
Use grid search to find the optimal number of neighbors
$endgroup$
– Ethan
2 days ago












$begingroup$
This paper may be of interest: Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection (Feb 5, 2019)
$endgroup$
– ukemi
yesterday




$begingroup$
This paper may be of interest: Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection (Feb 5, 2019)
$endgroup$
– ukemi
yesterday










1 Answer
1






active

oldest

votes


















0












$begingroup$

One normally uses Grid Search for calculating the optimum parameters in these situations:



from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier()

n = 30 # Max number of neighbours you want to consider
param_grid = {'n_neighbors': np.arange(n)}
grid = GridSearchCV(KNeighborsClassifier(), param_grid)


Then given this grid, you can fit it to your data to compute its optimum values (from those you provided, they may not be global optima (or even local if the returned value is one of the extrema of your input range)):



grid.fit(X_train, y_train)


You can view the optimum parameters from your input by calling:



grid.best_params_
>>> {'n_neighbors': ?}


You can automatically select an estimator with said optimum parameters by calling:



model = grid.best_estimator_
y_pred = model.fit(X_train, y_train).predict(X_test)


Note: you can find the optimum values of other parameters by adding them to the input dictionary param_grid.






share|improve this answer











$endgroup$













  • $begingroup$
    I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.
    $endgroup$
    – Attack68
    yesterday











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Neha Bhushan is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47048%2fhow-to-decide-how-many-n-neighbors-to-consider-while-implementing-localoutlierfa%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0












$begingroup$

One normally uses Grid Search for calculating the optimum parameters in these situations:



from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier()

n = 30 # Max number of neighbours you want to consider
param_grid = {'n_neighbors': np.arange(n)}
grid = GridSearchCV(KNeighborsClassifier(), param_grid)


Then given this grid, you can fit it to your data to compute its optimum values (from those you provided, they may not be global optima (or even local if the returned value is one of the extrema of your input range)):



grid.fit(X_train, y_train)


You can view the optimum parameters from your input by calling:



grid.best_params_
>>> {'n_neighbors': ?}


You can automatically select an estimator with said optimum parameters by calling:



model = grid.best_estimator_
y_pred = model.fit(X_train, y_train).predict(X_test)


Note: you can find the optimum values of other parameters by adding them to the input dictionary param_grid.






share|improve this answer











$endgroup$













  • $begingroup$
    I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.
    $endgroup$
    – Attack68
    yesterday
















0












$begingroup$

One normally uses Grid Search for calculating the optimum parameters in these situations:



from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier()

n = 30 # Max number of neighbours you want to consider
param_grid = {'n_neighbors': np.arange(n)}
grid = GridSearchCV(KNeighborsClassifier(), param_grid)


Then given this grid, you can fit it to your data to compute its optimum values (from those you provided, they may not be global optima (or even local if the returned value is one of the extrema of your input range)):



grid.fit(X_train, y_train)


You can view the optimum parameters from your input by calling:



grid.best_params_
>>> {'n_neighbors': ?}


You can automatically select an estimator with said optimum parameters by calling:



model = grid.best_estimator_
y_pred = model.fit(X_train, y_train).predict(X_test)


Note: you can find the optimum values of other parameters by adding them to the input dictionary param_grid.






share|improve this answer











$endgroup$













  • $begingroup$
    I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.
    $endgroup$
    – Attack68
    yesterday














0












0








0





$begingroup$

One normally uses Grid Search for calculating the optimum parameters in these situations:



from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier()

n = 30 # Max number of neighbours you want to consider
param_grid = {'n_neighbors': np.arange(n)}
grid = GridSearchCV(KNeighborsClassifier(), param_grid)


Then given this grid, you can fit it to your data to compute its optimum values (from those you provided, they may not be global optima (or even local if the returned value is one of the extrema of your input range)):



grid.fit(X_train, y_train)


You can view the optimum parameters from your input by calling:



grid.best_params_
>>> {'n_neighbors': ?}


You can automatically select an estimator with said optimum parameters by calling:



model = grid.best_estimator_
y_pred = model.fit(X_train, y_train).predict(X_test)


Note: you can find the optimum values of other parameters by adding them to the input dictionary param_grid.






share|improve this answer











$endgroup$



One normally uses Grid Search for calculating the optimum parameters in these situations:



from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier()

n = 30 # Max number of neighbours you want to consider
param_grid = {'n_neighbors': np.arange(n)}
grid = GridSearchCV(KNeighborsClassifier(), param_grid)


Then given this grid, you can fit it to your data to compute its optimum values (from those you provided, they may not be global optima (or even local if the returned value is one of the extrema of your input range)):



grid.fit(X_train, y_train)


You can view the optimum parameters from your input by calling:



grid.best_params_
>>> {'n_neighbors': ?}


You can automatically select an estimator with said optimum parameters by calling:



model = grid.best_estimator_
y_pred = model.fit(X_train, y_train).predict(X_test)


Note: you can find the optimum values of other parameters by adding them to the input dictionary param_grid.







share|improve this answer














share|improve this answer



share|improve this answer








edited yesterday

























answered yesterday









ukemiukemi

1238




1238












  • $begingroup$
    I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.
    $endgroup$
    – Attack68
    yesterday


















  • $begingroup$
    I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.
    $endgroup$
    – Attack68
    yesterday
















$begingroup$
I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.
$endgroup$
– Attack68
yesterday




$begingroup$
I think the question asks about how many neighbours to choose in the LocalOutlierFactor data pre-processor, not in applying the KNearestNeighbors Classifier. It is a more difficult problem in that Outlier Detection is in general an unsupervised task.
$endgroup$
– Attack68
yesterday










Neha Bhushan is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Neha Bhushan is a new contributor. Be nice, and check out our Code of Conduct.













Neha Bhushan is a new contributor. Be nice, and check out our Code of Conduct.












Neha Bhushan is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47048%2fhow-to-decide-how-many-n-neighbors-to-consider-while-implementing-localoutlierfa%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to label and detect the document text images

Vallis Paradisi

Tabula Rosettana