Having averaged trials which are less than the number of features

Suppose I have an experiment where I have 70 features and 48 samples. The target variable is binary (0,1) and the 48 samples are divided such that 24 of them correspond to outcome 1 and the other 24 correspond to outcome 0.

Note that the data I have is an averaged data. In other words, each of the 48 samples is a result of "n" averaged trials.

I am interested in using SVM and obtaining the weight vector (70x1).
In a way, I have less number of samples vs features.
However, the data which I have is an outcome of averaged trials.

In case I don't have access to the raw data, what is the best way to deal with the data? And does having the mean of trials as my data give a privilege over having 48 raw-trails instead?

asked yesterday

HaneenSu

add a comment |

Note that the data I have is an averaged data. In other words, each of the 48 samples is a result of "n" averaged trials.

I am interested in using SVM and obtaining the weight vector (70x1).
In a way, I have less number of samples vs features.
However, the data which I have is an outcome of averaged trials.

In case I don't have access to the raw data, what is the best way to deal with the data? And does having the mean of trials as my data give a privilege over having 48 raw-trails instead?

asked yesterday

HaneenSu

add a comment |

Note that the data I have is an averaged data. In other words, each of the 48 samples is a result of "n" averaged trials.

I am interested in using SVM and obtaining the weight vector (70x1).
In a way, I have less number of samples vs features.
However, the data which I have is an outcome of averaged trials.

In case I don't have access to the raw data, what is the best way to deal with the data? And does having the mean of trials as my data give a privilege over having 48 raw-trails instead?

asked yesterday

HaneenSu

Note that the data I have is an averaged data. In other words, each of the 48 samples is a result of "n" averaged trials.

I am interested in using SVM and obtaining the weight vector (70x1).
In a way, I have less number of samples vs features.
However, the data which I have is an outcome of averaged trials.

In case I don't have access to the raw data, what is the best way to deal with the data? And does having the mean of trials as my data give a privilege over having 48 raw-trails instead?

feature-selection svm overfitting model-selection

asked yesterday

HaneenSu

asked yesterday

HaneenSu

asked yesterday

HaneenSu

asked yesterday

HaneenSu

asked yesterday

HaneenSu

add a comment |

1 Answer
1

active

oldest

votes

About dealing with the mean of trials, I should say you can not recover the original trials and therefore you have to trust the results of classifucation model trained on averaged trials.

About dealing with a dataset with higher number of features than number of samples, I would suggest the following steps:

1- reduce the dimension of your data from 70 to 2 using PCA or t-SNE and

2- visualize it using a scatter plot to see if your labels are linearly separable in 2D space

3- If the result of PCA linearly separates labels, it means that your labels are kind of linearly separable in the original 70D space. So, you would apply the results of PCA (or even your original features) to linear-SVM.

4- If the result of t-SNE linearly separates labels, it means that your labels are not linearly separable in the original 70D space because unlike PCA, t-SNE is not a linear transformer. In this case, you would apply kernel SVM to your data.

5- If because of lack of data none of aforementioned methods worked, you would try low-variance classifiers such as Random Forest.

6- If Random Forest didn’t work, you would try using Bayesian methods such as Bayesian logistic regression because these methods assume a prior for parameters’ distributions and update it to posterior distributions using training data which compensates the lack of enough training data.

edited yesterday

answered yesterday

pythinker

7581212

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49054%2fhaving-averaged-trials-which-are-less-than-the-number-of-features%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

About dealing with the mean of trials, I should say you can not recover the original trials and therefore you have to trust the results of classifucation model trained on averaged trials.

About dealing with a dataset with higher number of features than number of samples, I would suggest the following steps:

1- reduce the dimension of your data from 70 to 2 using PCA or t-SNE and

2- visualize it using a scatter plot to see if your labels are linearly separable in 2D space

5- If because of lack of data none of aforementioned methods worked, you would try low-variance classifiers such as Random Forest.

edited yesterday

answered yesterday

pythinker

7581212

add a comment |

About dealing with the mean of trials, I should say you can not recover the original trials and therefore you have to trust the results of classifucation model trained on averaged trials.

About dealing with a dataset with higher number of features than number of samples, I would suggest the following steps:

1- reduce the dimension of your data from 70 to 2 using PCA or t-SNE and

2- visualize it using a scatter plot to see if your labels are linearly separable in 2D space

5- If because of lack of data none of aforementioned methods worked, you would try low-variance classifiers such as Random Forest.

edited yesterday

answered yesterday

pythinker

7581212

add a comment |

About dealing with the mean of trials, I should say you can not recover the original trials and therefore you have to trust the results of classifucation model trained on averaged trials.

About dealing with a dataset with higher number of features than number of samples, I would suggest the following steps:

1- reduce the dimension of your data from 70 to 2 using PCA or t-SNE and

2- visualize it using a scatter plot to see if your labels are linearly separable in 2D space

5- If because of lack of data none of aforementioned methods worked, you would try low-variance classifiers such as Random Forest.

edited yesterday

answered yesterday

pythinker

7581212

About dealing with the mean of trials, I should say you can not recover the original trials and therefore you have to trust the results of classifucation model trained on averaged trials.

About dealing with a dataset with higher number of features than number of samples, I would suggest the following steps:

1- reduce the dimension of your data from 70 to 2 using PCA or t-SNE and

2- visualize it using a scatter plot to see if your labels are linearly separable in 2D space

5- If because of lack of data none of aforementioned methods worked, you would try low-variance classifiers such as Random Forest.

edited yesterday

answered yesterday

pythinker

7581212

edited yesterday

answered yesterday

pythinker

7581212

answered yesterday

pythinker

7581212

answered yesterday

pythinker

7581212

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk