Why do we choose principal components based on maximum variance explained?

I've seen many people choose # of principal components for PCA based on maximum variance explained. So my question is do we always have to choose principal components based on maximum variance explained? Is it applicable for all scenarios i.e text count vectors(BoW, tfidf..) where number of dimensions are really high.

Does maximum variance means most information about my data in higher dimension is captured into lower dimension?

Usually I'd plot something like this to see the variance explained.

plt.plot(np.cumsum(pca.explained_variance_ratio_))

plt.xlabel('Principal Components')

plt.ylabel('Variance ratio')

plt.show()

PCA

asked 2 days ago

user214

16815

add a comment |

Does maximum variance means most information about my data in higher dimension is captured into lower dimension?

Usually I'd plot something like this to see the variance explained.

plt.plot(np.cumsum(pca.explained_variance_ratio_))

plt.xlabel('Principal Components')

plt.ylabel('Variance ratio')

plt.show()

PCA

asked 2 days ago

user214

16815

add a comment |

Does maximum variance means most information about my data in higher dimension is captured into lower dimension?

Usually I'd plot something like this to see the variance explained.

plt.plot(np.cumsum(pca.explained_variance_ratio_))

plt.xlabel('Principal Components')

plt.ylabel('Variance ratio')

plt.show()

PCA

asked 2 days ago

user214

16815

Does maximum variance means most information about my data in higher dimension is captured into lower dimension?

Usually I'd plot something like this to see the variance explained.

plt.plot(np.cumsum(pca.explained_variance_ratio_))

plt.xlabel('Principal Components')

plt.ylabel('Variance ratio')

plt.show()

PCA

machine-learning python scikit-learn pca

asked 2 days ago

user214

16815

asked 2 days ago

user214

16815

asked 2 days ago

user214

16815

asked 2 days ago

user214

16815

asked 2 days ago

user214

16815

add a comment |

3 Answers
3

active

oldest

votes

do we always have to choose principal components based on maximum
variance explained?

Yes. "Maximum variance explained" is closely related to the main objective as follows.

Our main objective is: for a limited budget K dimensions, what information $mbox{a}=(a_1,...,a_K)$ to keep from original data $mbox{x}=(x_1,...,x_D)$ ($D gg K$) in order to be able to reconstruct $mbox{x}$ from $mbox{a}$ as close as possible?

If we only allow rotation and scaling of original data, i.e. $a_k := mbox{x}.mbox{v}_k$ for unknown set of vectors $V_K={mbox{v}_k|mbox{v}_k in mathbb{R}^D, 1 leq k leq K}$, and define the reconstruction error as
$$loss(mbox{x},V_K):=left | mbox{x}-underbrace{sum_{k=1}^{K}overbrace{(mbox{x}.mbox{v}_k)}^{a_k}mbox{v}_k}_{hat{mbox{x}}} right |^2,$$
the solution $V^*_K$ that minimizes this error is PCA. For first dimension, PCA keeps the projection of data on vector $mbox{v}^*_1$ in the direction of largest data variance, namely $a^*_1$. For second dimension, it keeps the projection on vector $mbox{v}^*_2$ in the direction of second largest data variance, namely $a^*_2$, and so on.

In other words, when we try to find a K-vector set $V_K$ that minimizes $loss(X,V_K)=frac{1}{N}sum_{n=1}^{N}loss(mbox{x}_n,V_K)$, the solution
$V^*_K$ includes $mbox{v}^*_k$ that is in the direction of $kmbox{-th}$ largest data variance.

Note that "ratio of variance explained" is a measure from statistics. Using the previous notations, it is defined as:

$$mbox{R}(X,V_K):=1 - frac{loss(X,V_K)}{Var(X)}$$

Since variance of original data $Var(X)$ is independent of solution, minimum of $loss(X,V_K)$ is equivalent to maximum of $mbox{R}(X,V_K)$. For example, if $K=2$, then $V^*_2={mbox{v}^*_1, mbox{v}^*_2}$ minimizes $loss(X,V_2)$ and equivalently maximizes $mbox{R}(X,V_2)$. Ideally, if original data $X$ can be perfectly reconstructed from $V_K$, then $R(X, V_K)$ would be $1$.

Does maximum variance means most information about my data in higher
dimension is captured into lower dimension?

Yes. If we agree that "keep as much information as possible" is equivalent to "be able to reconstruct the data as close as possible", then our objective $min_{V_K}loss(X,V_K)$ formalizes "keep as much information as possible", and its solution is "maximum variance".

edited 21 hours ago

answered yesterday

Esmailian

4905

add a comment |

Principal Component Analysis is commonly used as a technique in Machine Learning as a preprocessing step. It is dimensionality reduction. You can imagine that this might be useful for things like visualization or for reducing the size of your training set. Why we want to maximize the variance is so that you preserve as much information about the original data as possible and only loose a small amount of information.

In answer to your question, yes, high variance in this case means preserving most of the information captured in the high dimensional data, in a lower dimension. There is a mathematical intuition for this when you are projecting points on to a perpendicular line, could you revert back to the original points?

On that note - if someone would like to provide the mathematical intuition explicitly I would welcome that answer

answered 2 days ago

Ethan

508222

add a comment |

In addition to what has been said:

Why do we choose principal components based on maximum variance explained?

- Because the variance left by rest of the components is in fact
the residual you want to minimize when looking for the best representation of your data in less dimensions (the best mean-square linear representation, of course).

do we always have to choose principal components based on maximum variance explained?

- Yes, if dimensionality reduction is what you want.

However, there are applications when the residual components are those who tell the story :-)

answered yesterday

m0nzderr

263

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46881%2fwhy-do-we-choose-principal-components-based-on-maximum-variance-explained%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

do we always have to choose principal components based on maximum
variance explained?

Yes. "Maximum variance explained" is closely related to the main objective as follows.

Note that "ratio of variance explained" is a measure from statistics. Using the previous notations, it is defined as:

$$mbox{R}(X,V_K):=1 - frac{loss(X,V_K)}{Var(X)}$$

Does maximum variance means most information about my data in higher
dimension is captured into lower dimension?

edited 21 hours ago

answered yesterday

Esmailian

4905

add a comment |

do we always have to choose principal components based on maximum
variance explained?

Yes. "Maximum variance explained" is closely related to the main objective as follows.

Note that "ratio of variance explained" is a measure from statistics. Using the previous notations, it is defined as:

$$mbox{R}(X,V_K):=1 - frac{loss(X,V_K)}{Var(X)}$$

Does maximum variance means most information about my data in higher
dimension is captured into lower dimension?

edited 21 hours ago

answered yesterday

Esmailian

4905

add a comment |

do we always have to choose principal components based on maximum
variance explained?

Yes. "Maximum variance explained" is closely related to the main objective as follows.

Note that "ratio of variance explained" is a measure from statistics. Using the previous notations, it is defined as:

$$mbox{R}(X,V_K):=1 - frac{loss(X,V_K)}{Var(X)}$$

Does maximum variance means most information about my data in higher
dimension is captured into lower dimension?

edited 21 hours ago

answered yesterday

Esmailian

4905

do we always have to choose principal components based on maximum
variance explained?

Yes. "Maximum variance explained" is closely related to the main objective as follows.

Note that "ratio of variance explained" is a measure from statistics. Using the previous notations, it is defined as:

$$mbox{R}(X,V_K):=1 - frac{loss(X,V_K)}{Var(X)}$$

Does maximum variance means most information about my data in higher
dimension is captured into lower dimension?

edited 21 hours ago

answered yesterday

Esmailian

4905

edited 21 hours ago

answered yesterday

Esmailian

4905

answered yesterday

Esmailian

4905

answered yesterday

Esmailian

4905

add a comment |

On that note - if someone would like to provide the mathematical intuition explicitly I would welcome that answer

answered 2 days ago

Ethan

508222

add a comment |

On that note - if someone would like to provide the mathematical intuition explicitly I would welcome that answer

answered 2 days ago

Ethan

508222

add a comment |

On that note - if someone would like to provide the mathematical intuition explicitly I would welcome that answer

answered 2 days ago

Ethan

508222

On that note - if someone would like to provide the mathematical intuition explicitly I would welcome that answer

answered 2 days ago

Ethan

508222

answered 2 days ago

Ethan

508222

answered 2 days ago

Ethan

508222

answered 2 days ago

Ethan

508222

add a comment |

In addition to what has been said:

Why do we choose principal components based on maximum variance explained?

do we always have to choose principal components based on maximum variance explained?

- Yes, if dimensionality reduction is what you want.

However, there are applications when the residual components are those who tell the story :-)

answered yesterday

m0nzderr

263

New contributor

add a comment |

In addition to what has been said:

Why do we choose principal components based on maximum variance explained?

do we always have to choose principal components based on maximum variance explained?

- Yes, if dimensionality reduction is what you want.

However, there are applications when the residual components are those who tell the story :-)

answered yesterday

m0nzderr

263

New contributor

add a comment |

In addition to what has been said:

Why do we choose principal components based on maximum variance explained?

do we always have to choose principal components based on maximum variance explained?

- Yes, if dimensionality reduction is what you want.

However, there are applications when the residual components are those who tell the story :-)

answered yesterday

m0nzderr

263

New contributor

In addition to what has been said:

Why do we choose principal components based on maximum variance explained?

do we always have to choose principal components based on maximum variance explained?

- Yes, if dimensionality reduction is what you want.

However, there are applications when the residual components are those who tell the story :-)

answered yesterday

m0nzderr

263

New contributor

answered yesterday

m0nzderr

263

New contributor

answered yesterday

m0nzderr

263

answered yesterday

m0nzderr

263

New contributor

m0nzderr is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk