Having averaged trials which are less than the number of features
$begingroup$
Suppose I have an experiment where I have 70 features and 48 samples. The target variable is binary (0,1) and the 48 samples are divided such that 24 of them correspond to outcome 1 and the other 24 correspond to outcome 0.
Note that the data I have is an averaged data. In other words, each of the 48 samples is a result of "n" averaged trials.
I am interested in using SVM and obtaining the weight vector (70x1).
In a way, I have less number of samples vs features.
However, the data which I have is an outcome of averaged trials.
In case I don't have access to the raw data, what is the best way to deal with the data? And does having the mean of trials as my data give a privilege over having 48 raw-trails instead?
feature-selection svm overfitting model-selection
$endgroup$
add a comment |
$begingroup$
Suppose I have an experiment where I have 70 features and 48 samples. The target variable is binary (0,1) and the 48 samples are divided such that 24 of them correspond to outcome 1 and the other 24 correspond to outcome 0.
Note that the data I have is an averaged data. In other words, each of the 48 samples is a result of "n" averaged trials.
I am interested in using SVM and obtaining the weight vector (70x1).
In a way, I have less number of samples vs features.
However, the data which I have is an outcome of averaged trials.
In case I don't have access to the raw data, what is the best way to deal with the data? And does having the mean of trials as my data give a privilege over having 48 raw-trails instead?
feature-selection svm overfitting model-selection
$endgroup$
add a comment |
$begingroup$
Suppose I have an experiment where I have 70 features and 48 samples. The target variable is binary (0,1) and the 48 samples are divided such that 24 of them correspond to outcome 1 and the other 24 correspond to outcome 0.
Note that the data I have is an averaged data. In other words, each of the 48 samples is a result of "n" averaged trials.
I am interested in using SVM and obtaining the weight vector (70x1).
In a way, I have less number of samples vs features.
However, the data which I have is an outcome of averaged trials.
In case I don't have access to the raw data, what is the best way to deal with the data? And does having the mean of trials as my data give a privilege over having 48 raw-trails instead?
feature-selection svm overfitting model-selection
$endgroup$
Suppose I have an experiment where I have 70 features and 48 samples. The target variable is binary (0,1) and the 48 samples are divided such that 24 of them correspond to outcome 1 and the other 24 correspond to outcome 0.
Note that the data I have is an averaged data. In other words, each of the 48 samples is a result of "n" averaged trials.
I am interested in using SVM and obtaining the weight vector (70x1).
In a way, I have less number of samples vs features.
However, the data which I have is an outcome of averaged trials.
In case I don't have access to the raw data, what is the best way to deal with the data? And does having the mean of trials as my data give a privilege over having 48 raw-trails instead?
feature-selection svm overfitting model-selection
feature-selection svm overfitting model-selection
asked yesterday
HaneenSuHaneenSu
62
62
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
About dealing with the mean of trials, I should say you can not recover the original trials and therefore you have to trust the results of classifucation model trained on averaged trials.
About dealing with a dataset with higher number of features than number of samples, I would suggest the following steps:
1- reduce the dimension of your data from 70 to 2 using PCA or t-SNE and
2- visualize it using a scatter plot to see if your labels are linearly separable in 2D space
3- If the result of PCA linearly separates labels, it means that your labels are kind of linearly separable in the original 70D space. So, you would apply the results of PCA (or even your original features) to linear-SVM.
4- If the result of t-SNE linearly separates labels, it means that your labels are not linearly separable in the original 70D space because unlike PCA, t-SNE is not a linear transformer. In this case, you would apply kernel SVM to your data.
5- If because of lack of data none of aforementioned methods worked, you would try low-variance classifiers such as Random Forest.
6- If Random Forest didn’t work, you would try using Bayesian methods such as Bayesian logistic regression because these methods assume a prior for parameters’ distributions and update it to posterior distributions using training data which compensates the lack of enough training data.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49054%2fhaving-averaged-trials-which-are-less-than-the-number-of-features%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
About dealing with the mean of trials, I should say you can not recover the original trials and therefore you have to trust the results of classifucation model trained on averaged trials.
About dealing with a dataset with higher number of features than number of samples, I would suggest the following steps:
1- reduce the dimension of your data from 70 to 2 using PCA or t-SNE and
2- visualize it using a scatter plot to see if your labels are linearly separable in 2D space
3- If the result of PCA linearly separates labels, it means that your labels are kind of linearly separable in the original 70D space. So, you would apply the results of PCA (or even your original features) to linear-SVM.
4- If the result of t-SNE linearly separates labels, it means that your labels are not linearly separable in the original 70D space because unlike PCA, t-SNE is not a linear transformer. In this case, you would apply kernel SVM to your data.
5- If because of lack of data none of aforementioned methods worked, you would try low-variance classifiers such as Random Forest.
6- If Random Forest didn’t work, you would try using Bayesian methods such as Bayesian logistic regression because these methods assume a prior for parameters’ distributions and update it to posterior distributions using training data which compensates the lack of enough training data.
$endgroup$
add a comment |
$begingroup$
About dealing with the mean of trials, I should say you can not recover the original trials and therefore you have to trust the results of classifucation model trained on averaged trials.
About dealing with a dataset with higher number of features than number of samples, I would suggest the following steps:
1- reduce the dimension of your data from 70 to 2 using PCA or t-SNE and
2- visualize it using a scatter plot to see if your labels are linearly separable in 2D space
3- If the result of PCA linearly separates labels, it means that your labels are kind of linearly separable in the original 70D space. So, you would apply the results of PCA (or even your original features) to linear-SVM.
4- If the result of t-SNE linearly separates labels, it means that your labels are not linearly separable in the original 70D space because unlike PCA, t-SNE is not a linear transformer. In this case, you would apply kernel SVM to your data.
5- If because of lack of data none of aforementioned methods worked, you would try low-variance classifiers such as Random Forest.
6- If Random Forest didn’t work, you would try using Bayesian methods such as Bayesian logistic regression because these methods assume a prior for parameters’ distributions and update it to posterior distributions using training data which compensates the lack of enough training data.
$endgroup$
add a comment |
$begingroup$
About dealing with the mean of trials, I should say you can not recover the original trials and therefore you have to trust the results of classifucation model trained on averaged trials.
About dealing with a dataset with higher number of features than number of samples, I would suggest the following steps:
1- reduce the dimension of your data from 70 to 2 using PCA or t-SNE and
2- visualize it using a scatter plot to see if your labels are linearly separable in 2D space
3- If the result of PCA linearly separates labels, it means that your labels are kind of linearly separable in the original 70D space. So, you would apply the results of PCA (or even your original features) to linear-SVM.
4- If the result of t-SNE linearly separates labels, it means that your labels are not linearly separable in the original 70D space because unlike PCA, t-SNE is not a linear transformer. In this case, you would apply kernel SVM to your data.
5- If because of lack of data none of aforementioned methods worked, you would try low-variance classifiers such as Random Forest.
6- If Random Forest didn’t work, you would try using Bayesian methods such as Bayesian logistic regression because these methods assume a prior for parameters’ distributions and update it to posterior distributions using training data which compensates the lack of enough training data.
$endgroup$
About dealing with the mean of trials, I should say you can not recover the original trials and therefore you have to trust the results of classifucation model trained on averaged trials.
About dealing with a dataset with higher number of features than number of samples, I would suggest the following steps:
1- reduce the dimension of your data from 70 to 2 using PCA or t-SNE and
2- visualize it using a scatter plot to see if your labels are linearly separable in 2D space
3- If the result of PCA linearly separates labels, it means that your labels are kind of linearly separable in the original 70D space. So, you would apply the results of PCA (or even your original features) to linear-SVM.
4- If the result of t-SNE linearly separates labels, it means that your labels are not linearly separable in the original 70D space because unlike PCA, t-SNE is not a linear transformer. In this case, you would apply kernel SVM to your data.
5- If because of lack of data none of aforementioned methods worked, you would try low-variance classifiers such as Random Forest.
6- If Random Forest didn’t work, you would try using Bayesian methods such as Bayesian logistic regression because these methods assume a prior for parameters’ distributions and update it to posterior distributions using training data which compensates the lack of enough training data.
edited yesterday
answered yesterday
pythinkerpythinker
7581212
7581212
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49054%2fhaving-averaged-trials-which-are-less-than-the-number-of-features%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown