“Deep Noether's Theorem”: Building in Symmetry Constraints
$begingroup$
If I have a learning problem that should have an inherent symmetry, is there a way to subject my learning problem to a symmetry constraint to enhance learning?
For example, if I am doing image recognition, I might want 2D rotational symmetry. Meaning that the rotated version of an image should get the same result as the original.
Or if I am learning to play tic-tac-toe, then rotating by 90deg should yield the same game play.
Has any research been done on this?
machine-learning
$endgroup$
add a comment |
$begingroup$
If I have a learning problem that should have an inherent symmetry, is there a way to subject my learning problem to a symmetry constraint to enhance learning?
For example, if I am doing image recognition, I might want 2D rotational symmetry. Meaning that the rotated version of an image should get the same result as the original.
Or if I am learning to play tic-tac-toe, then rotating by 90deg should yield the same game play.
Has any research been done on this?
machine-learning
$endgroup$
1
$begingroup$
Yes, some; e.g., Group Equivariant Convolutional Networks (code), Harmonic Networks: Deep Translation and Rotation Equivariance, Deep Rotation Equivariant Network, Exploiting Cyclic Symmetry in Convolutional Neural Networks etc. You just don't see it much in the wild yet.
$endgroup$
– Emre
Aug 4 '17 at 18:05
$begingroup$
@Emre Thanks! Do you know of any work down outside of CNN's?
$endgroup$
– aidan.plenert.macdonald
Aug 4 '17 at 23:27
$begingroup$
No, I only have superficial knowledge of this niche. Notwithstanding, CNNs seem like a natural setting ...
$endgroup$
– Emre
Aug 5 '17 at 20:05
2
$begingroup$
I should also mention Risi Kondor's PhD dissertation, Group theoretical methods in machine learning (pdf)
$endgroup$
– Emre
Aug 7 '17 at 4:42
add a comment |
$begingroup$
If I have a learning problem that should have an inherent symmetry, is there a way to subject my learning problem to a symmetry constraint to enhance learning?
For example, if I am doing image recognition, I might want 2D rotational symmetry. Meaning that the rotated version of an image should get the same result as the original.
Or if I am learning to play tic-tac-toe, then rotating by 90deg should yield the same game play.
Has any research been done on this?
machine-learning
$endgroup$
If I have a learning problem that should have an inherent symmetry, is there a way to subject my learning problem to a symmetry constraint to enhance learning?
For example, if I am doing image recognition, I might want 2D rotational symmetry. Meaning that the rotated version of an image should get the same result as the original.
Or if I am learning to play tic-tac-toe, then rotating by 90deg should yield the same game play.
Has any research been done on this?
machine-learning
machine-learning
asked Aug 4 '17 at 16:44
aidan.plenert.macdonaldaidan.plenert.macdonald
1989
1989
1
$begingroup$
Yes, some; e.g., Group Equivariant Convolutional Networks (code), Harmonic Networks: Deep Translation and Rotation Equivariance, Deep Rotation Equivariant Network, Exploiting Cyclic Symmetry in Convolutional Neural Networks etc. You just don't see it much in the wild yet.
$endgroup$
– Emre
Aug 4 '17 at 18:05
$begingroup$
@Emre Thanks! Do you know of any work down outside of CNN's?
$endgroup$
– aidan.plenert.macdonald
Aug 4 '17 at 23:27
$begingroup$
No, I only have superficial knowledge of this niche. Notwithstanding, CNNs seem like a natural setting ...
$endgroup$
– Emre
Aug 5 '17 at 20:05
2
$begingroup$
I should also mention Risi Kondor's PhD dissertation, Group theoretical methods in machine learning (pdf)
$endgroup$
– Emre
Aug 7 '17 at 4:42
add a comment |
1
$begingroup$
Yes, some; e.g., Group Equivariant Convolutional Networks (code), Harmonic Networks: Deep Translation and Rotation Equivariance, Deep Rotation Equivariant Network, Exploiting Cyclic Symmetry in Convolutional Neural Networks etc. You just don't see it much in the wild yet.
$endgroup$
– Emre
Aug 4 '17 at 18:05
$begingroup$
@Emre Thanks! Do you know of any work down outside of CNN's?
$endgroup$
– aidan.plenert.macdonald
Aug 4 '17 at 23:27
$begingroup$
No, I only have superficial knowledge of this niche. Notwithstanding, CNNs seem like a natural setting ...
$endgroup$
– Emre
Aug 5 '17 at 20:05
2
$begingroup$
I should also mention Risi Kondor's PhD dissertation, Group theoretical methods in machine learning (pdf)
$endgroup$
– Emre
Aug 7 '17 at 4:42
1
1
$begingroup$
Yes, some; e.g., Group Equivariant Convolutional Networks (code), Harmonic Networks: Deep Translation and Rotation Equivariance, Deep Rotation Equivariant Network, Exploiting Cyclic Symmetry in Convolutional Neural Networks etc. You just don't see it much in the wild yet.
$endgroup$
– Emre
Aug 4 '17 at 18:05
$begingroup$
Yes, some; e.g., Group Equivariant Convolutional Networks (code), Harmonic Networks: Deep Translation and Rotation Equivariance, Deep Rotation Equivariant Network, Exploiting Cyclic Symmetry in Convolutional Neural Networks etc. You just don't see it much in the wild yet.
$endgroup$
– Emre
Aug 4 '17 at 18:05
$begingroup$
@Emre Thanks! Do you know of any work down outside of CNN's?
$endgroup$
– aidan.plenert.macdonald
Aug 4 '17 at 23:27
$begingroup$
@Emre Thanks! Do you know of any work down outside of CNN's?
$endgroup$
– aidan.plenert.macdonald
Aug 4 '17 at 23:27
$begingroup$
No, I only have superficial knowledge of this niche. Notwithstanding, CNNs seem like a natural setting ...
$endgroup$
– Emre
Aug 5 '17 at 20:05
$begingroup$
No, I only have superficial knowledge of this niche. Notwithstanding, CNNs seem like a natural setting ...
$endgroup$
– Emre
Aug 5 '17 at 20:05
2
2
$begingroup$
I should also mention Risi Kondor's PhD dissertation, Group theoretical methods in machine learning (pdf)
$endgroup$
– Emre
Aug 7 '17 at 4:42
$begingroup$
I should also mention Risi Kondor's PhD dissertation, Group theoretical methods in machine learning (pdf)
$endgroup$
– Emre
Aug 7 '17 at 4:42
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
From Emre's comment above, Section 4.4 of Group theoretical methods in machine learning by Risi Kondor has detailed information and proofs about creating kernel methods that inherently have symmetries. I will summarize it in a hopefully intuitive way (I am a physicist not a mathematician!).
Most ML algorithms have a matrix multiplication like,
begin{align}
s_i &= sum_j W_{ij}~x_j \
&= sum_j W_{ij}~(vec{e}_j cdot vec{x})
end{align}
with $ vec{x} $ being the input and $ W_{ij} $ being the weights we wish to train.
Kernel Method
Enter the realm of kernel methods and let the algorithm handle input via,
begin{align}
s_i &= sum_j W_{ij}~k(e_j,~x)
end{align}
where now we generalize to $ x, e_j in mathcal{X} $.
Consider a group $ G $ that acts on $ mathcal{X} $ via $ x rightarrow T_g(x) $ for $ g in G $. A simple way to make our algorithm invariant under this group is to make a kernel,
begin{align}
k^G(x, y) &= frac{1}{|G|} sum_{g in G} k(x, T_g(y))
end{align}
with $ k(x, y) = k(T_g(x), T_g(y)) $.
So,
begin{align}
k^G(x, T_h(y)) &= frac{1}{|G|} sum_{g in G} k(x, T_{gh}(y)) \
&= frac{1}{|G|} sum_{g in G} k(x, T_{g}(y)) \
&= frac{1}{|G|} sum_{g in G} k(T_{g}(x), y)
end{align}
For $ k(x, y) = x cdot y $ which works for all unitary representations,
begin{align}
k^G(x, T_h(y)) &= left[ frac{1}{|G|} sum_{g in G} T_{g}(x) right] cdot y
end{align}
Which offers a transformation matrix that can symmeterize the input into the algorithm.
SO(2) Example
Actually just the group that maps to $ frac{pi}{2} $ rotations for simplicity.
Let us run linear regression on data $ (vec{x}_i, y_i) in mathbb{R}^2 times mathbb{R} $ where we expect a rotational symmetry.
Our optimization problem becomes,
begin{align}
min_{W_{j}} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
tilde{y}_i &= sum_j W_{j} k_G(e_j, x_i) + b_i
end{align}
The kernel $ k(x, y) = | x - y |^2 $ satisfies $ k(x, y) = k(T_g(x), T_g(y)) $. You could also use $ k(x, y) = x cdot y $ and a variety of kernels.
Thus,
begin{align}
k_G(e_j, x_i) &= frac{1}{4} sum_{n=1}^4 | R(npi/2)~vec{e}_j - vec{x}_i |^2 \
&= frac{1}{4} sum_{n=1}^4 ( cos(npi/2) - vec{x}_{i1} )^2 + ( sin(npi/2) - vec{x}_{i2} )^2 \
&= frac{1}{4} left[ 2 vec{x}_{i1}^2 + 2 vec{x}_{i2}^2 + (1 - vec{x}_{i1} )^2 + (1 - vec{x}_{i2} )^2 + (1 + vec{x}_{i1} )^2 + (1 + vec{x}_{i2} )^2 right] \
&= vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1
end{align}
Note that we needn't sum over $ j $ because it is the same for both. So our problem becomes,
begin{align}
min_{W} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
tilde{y}_i &= W left[ vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1 right] + b_i
end{align}
Which yields the expected spherical symmetry!
Tic-Tac-Toe
Example code can be seen here. It shows how we can create a matrix that encodes the symmetry and use it. Note that this is really bad when I actually run it! Working with other kernels at the moment.
$endgroup$
$begingroup$
Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
$endgroup$
– Emre
Aug 8 '17 at 20:37
1
$begingroup$
Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
$endgroup$
– aidan.plenert.macdonald
Aug 10 '17 at 0:00
$begingroup$
Is there any meaningful approach when the symmetry group is not known?
$endgroup$
– leitasat
Jun 7 '18 at 12:39
$begingroup$
@leitasat How do you know it's symmetric if you don't know the group?
$endgroup$
– aidan.plenert.macdonald
Jun 7 '18 at 17:20
$begingroup$
@aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
$endgroup$
– leitasat
Jun 13 '18 at 13:11
|
show 1 more comment
$begingroup$
Turns out this is just the study of Invariant Theory applied to Machine Learning
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f21963%2fdeep-noethers-theorem-building-in-symmetry-constraints%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
From Emre's comment above, Section 4.4 of Group theoretical methods in machine learning by Risi Kondor has detailed information and proofs about creating kernel methods that inherently have symmetries. I will summarize it in a hopefully intuitive way (I am a physicist not a mathematician!).
Most ML algorithms have a matrix multiplication like,
begin{align}
s_i &= sum_j W_{ij}~x_j \
&= sum_j W_{ij}~(vec{e}_j cdot vec{x})
end{align}
with $ vec{x} $ being the input and $ W_{ij} $ being the weights we wish to train.
Kernel Method
Enter the realm of kernel methods and let the algorithm handle input via,
begin{align}
s_i &= sum_j W_{ij}~k(e_j,~x)
end{align}
where now we generalize to $ x, e_j in mathcal{X} $.
Consider a group $ G $ that acts on $ mathcal{X} $ via $ x rightarrow T_g(x) $ for $ g in G $. A simple way to make our algorithm invariant under this group is to make a kernel,
begin{align}
k^G(x, y) &= frac{1}{|G|} sum_{g in G} k(x, T_g(y))
end{align}
with $ k(x, y) = k(T_g(x), T_g(y)) $.
So,
begin{align}
k^G(x, T_h(y)) &= frac{1}{|G|} sum_{g in G} k(x, T_{gh}(y)) \
&= frac{1}{|G|} sum_{g in G} k(x, T_{g}(y)) \
&= frac{1}{|G|} sum_{g in G} k(T_{g}(x), y)
end{align}
For $ k(x, y) = x cdot y $ which works for all unitary representations,
begin{align}
k^G(x, T_h(y)) &= left[ frac{1}{|G|} sum_{g in G} T_{g}(x) right] cdot y
end{align}
Which offers a transformation matrix that can symmeterize the input into the algorithm.
SO(2) Example
Actually just the group that maps to $ frac{pi}{2} $ rotations for simplicity.
Let us run linear regression on data $ (vec{x}_i, y_i) in mathbb{R}^2 times mathbb{R} $ where we expect a rotational symmetry.
Our optimization problem becomes,
begin{align}
min_{W_{j}} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
tilde{y}_i &= sum_j W_{j} k_G(e_j, x_i) + b_i
end{align}
The kernel $ k(x, y) = | x - y |^2 $ satisfies $ k(x, y) = k(T_g(x), T_g(y)) $. You could also use $ k(x, y) = x cdot y $ and a variety of kernels.
Thus,
begin{align}
k_G(e_j, x_i) &= frac{1}{4} sum_{n=1}^4 | R(npi/2)~vec{e}_j - vec{x}_i |^2 \
&= frac{1}{4} sum_{n=1}^4 ( cos(npi/2) - vec{x}_{i1} )^2 + ( sin(npi/2) - vec{x}_{i2} )^2 \
&= frac{1}{4} left[ 2 vec{x}_{i1}^2 + 2 vec{x}_{i2}^2 + (1 - vec{x}_{i1} )^2 + (1 - vec{x}_{i2} )^2 + (1 + vec{x}_{i1} )^2 + (1 + vec{x}_{i2} )^2 right] \
&= vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1
end{align}
Note that we needn't sum over $ j $ because it is the same for both. So our problem becomes,
begin{align}
min_{W} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
tilde{y}_i &= W left[ vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1 right] + b_i
end{align}
Which yields the expected spherical symmetry!
Tic-Tac-Toe
Example code can be seen here. It shows how we can create a matrix that encodes the symmetry and use it. Note that this is really bad when I actually run it! Working with other kernels at the moment.
$endgroup$
$begingroup$
Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
$endgroup$
– Emre
Aug 8 '17 at 20:37
1
$begingroup$
Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
$endgroup$
– aidan.plenert.macdonald
Aug 10 '17 at 0:00
$begingroup$
Is there any meaningful approach when the symmetry group is not known?
$endgroup$
– leitasat
Jun 7 '18 at 12:39
$begingroup$
@leitasat How do you know it's symmetric if you don't know the group?
$endgroup$
– aidan.plenert.macdonald
Jun 7 '18 at 17:20
$begingroup$
@aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
$endgroup$
– leitasat
Jun 13 '18 at 13:11
|
show 1 more comment
$begingroup$
From Emre's comment above, Section 4.4 of Group theoretical methods in machine learning by Risi Kondor has detailed information and proofs about creating kernel methods that inherently have symmetries. I will summarize it in a hopefully intuitive way (I am a physicist not a mathematician!).
Most ML algorithms have a matrix multiplication like,
begin{align}
s_i &= sum_j W_{ij}~x_j \
&= sum_j W_{ij}~(vec{e}_j cdot vec{x})
end{align}
with $ vec{x} $ being the input and $ W_{ij} $ being the weights we wish to train.
Kernel Method
Enter the realm of kernel methods and let the algorithm handle input via,
begin{align}
s_i &= sum_j W_{ij}~k(e_j,~x)
end{align}
where now we generalize to $ x, e_j in mathcal{X} $.
Consider a group $ G $ that acts on $ mathcal{X} $ via $ x rightarrow T_g(x) $ for $ g in G $. A simple way to make our algorithm invariant under this group is to make a kernel,
begin{align}
k^G(x, y) &= frac{1}{|G|} sum_{g in G} k(x, T_g(y))
end{align}
with $ k(x, y) = k(T_g(x), T_g(y)) $.
So,
begin{align}
k^G(x, T_h(y)) &= frac{1}{|G|} sum_{g in G} k(x, T_{gh}(y)) \
&= frac{1}{|G|} sum_{g in G} k(x, T_{g}(y)) \
&= frac{1}{|G|} sum_{g in G} k(T_{g}(x), y)
end{align}
For $ k(x, y) = x cdot y $ which works for all unitary representations,
begin{align}
k^G(x, T_h(y)) &= left[ frac{1}{|G|} sum_{g in G} T_{g}(x) right] cdot y
end{align}
Which offers a transformation matrix that can symmeterize the input into the algorithm.
SO(2) Example
Actually just the group that maps to $ frac{pi}{2} $ rotations for simplicity.
Let us run linear regression on data $ (vec{x}_i, y_i) in mathbb{R}^2 times mathbb{R} $ where we expect a rotational symmetry.
Our optimization problem becomes,
begin{align}
min_{W_{j}} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
tilde{y}_i &= sum_j W_{j} k_G(e_j, x_i) + b_i
end{align}
The kernel $ k(x, y) = | x - y |^2 $ satisfies $ k(x, y) = k(T_g(x), T_g(y)) $. You could also use $ k(x, y) = x cdot y $ and a variety of kernels.
Thus,
begin{align}
k_G(e_j, x_i) &= frac{1}{4} sum_{n=1}^4 | R(npi/2)~vec{e}_j - vec{x}_i |^2 \
&= frac{1}{4} sum_{n=1}^4 ( cos(npi/2) - vec{x}_{i1} )^2 + ( sin(npi/2) - vec{x}_{i2} )^2 \
&= frac{1}{4} left[ 2 vec{x}_{i1}^2 + 2 vec{x}_{i2}^2 + (1 - vec{x}_{i1} )^2 + (1 - vec{x}_{i2} )^2 + (1 + vec{x}_{i1} )^2 + (1 + vec{x}_{i2} )^2 right] \
&= vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1
end{align}
Note that we needn't sum over $ j $ because it is the same for both. So our problem becomes,
begin{align}
min_{W} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
tilde{y}_i &= W left[ vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1 right] + b_i
end{align}
Which yields the expected spherical symmetry!
Tic-Tac-Toe
Example code can be seen here. It shows how we can create a matrix that encodes the symmetry and use it. Note that this is really bad when I actually run it! Working with other kernels at the moment.
$endgroup$
$begingroup$
Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
$endgroup$
– Emre
Aug 8 '17 at 20:37
1
$begingroup$
Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
$endgroup$
– aidan.plenert.macdonald
Aug 10 '17 at 0:00
$begingroup$
Is there any meaningful approach when the symmetry group is not known?
$endgroup$
– leitasat
Jun 7 '18 at 12:39
$begingroup$
@leitasat How do you know it's symmetric if you don't know the group?
$endgroup$
– aidan.plenert.macdonald
Jun 7 '18 at 17:20
$begingroup$
@aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
$endgroup$
– leitasat
Jun 13 '18 at 13:11
|
show 1 more comment
$begingroup$
From Emre's comment above, Section 4.4 of Group theoretical methods in machine learning by Risi Kondor has detailed information and proofs about creating kernel methods that inherently have symmetries. I will summarize it in a hopefully intuitive way (I am a physicist not a mathematician!).
Most ML algorithms have a matrix multiplication like,
begin{align}
s_i &= sum_j W_{ij}~x_j \
&= sum_j W_{ij}~(vec{e}_j cdot vec{x})
end{align}
with $ vec{x} $ being the input and $ W_{ij} $ being the weights we wish to train.
Kernel Method
Enter the realm of kernel methods and let the algorithm handle input via,
begin{align}
s_i &= sum_j W_{ij}~k(e_j,~x)
end{align}
where now we generalize to $ x, e_j in mathcal{X} $.
Consider a group $ G $ that acts on $ mathcal{X} $ via $ x rightarrow T_g(x) $ for $ g in G $. A simple way to make our algorithm invariant under this group is to make a kernel,
begin{align}
k^G(x, y) &= frac{1}{|G|} sum_{g in G} k(x, T_g(y))
end{align}
with $ k(x, y) = k(T_g(x), T_g(y)) $.
So,
begin{align}
k^G(x, T_h(y)) &= frac{1}{|G|} sum_{g in G} k(x, T_{gh}(y)) \
&= frac{1}{|G|} sum_{g in G} k(x, T_{g}(y)) \
&= frac{1}{|G|} sum_{g in G} k(T_{g}(x), y)
end{align}
For $ k(x, y) = x cdot y $ which works for all unitary representations,
begin{align}
k^G(x, T_h(y)) &= left[ frac{1}{|G|} sum_{g in G} T_{g}(x) right] cdot y
end{align}
Which offers a transformation matrix that can symmeterize the input into the algorithm.
SO(2) Example
Actually just the group that maps to $ frac{pi}{2} $ rotations for simplicity.
Let us run linear regression on data $ (vec{x}_i, y_i) in mathbb{R}^2 times mathbb{R} $ where we expect a rotational symmetry.
Our optimization problem becomes,
begin{align}
min_{W_{j}} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
tilde{y}_i &= sum_j W_{j} k_G(e_j, x_i) + b_i
end{align}
The kernel $ k(x, y) = | x - y |^2 $ satisfies $ k(x, y) = k(T_g(x), T_g(y)) $. You could also use $ k(x, y) = x cdot y $ and a variety of kernels.
Thus,
begin{align}
k_G(e_j, x_i) &= frac{1}{4} sum_{n=1}^4 | R(npi/2)~vec{e}_j - vec{x}_i |^2 \
&= frac{1}{4} sum_{n=1}^4 ( cos(npi/2) - vec{x}_{i1} )^2 + ( sin(npi/2) - vec{x}_{i2} )^2 \
&= frac{1}{4} left[ 2 vec{x}_{i1}^2 + 2 vec{x}_{i2}^2 + (1 - vec{x}_{i1} )^2 + (1 - vec{x}_{i2} )^2 + (1 + vec{x}_{i1} )^2 + (1 + vec{x}_{i2} )^2 right] \
&= vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1
end{align}
Note that we needn't sum over $ j $ because it is the same for both. So our problem becomes,
begin{align}
min_{W} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
tilde{y}_i &= W left[ vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1 right] + b_i
end{align}
Which yields the expected spherical symmetry!
Tic-Tac-Toe
Example code can be seen here. It shows how we can create a matrix that encodes the symmetry and use it. Note that this is really bad when I actually run it! Working with other kernels at the moment.
$endgroup$
From Emre's comment above, Section 4.4 of Group theoretical methods in machine learning by Risi Kondor has detailed information and proofs about creating kernel methods that inherently have symmetries. I will summarize it in a hopefully intuitive way (I am a physicist not a mathematician!).
Most ML algorithms have a matrix multiplication like,
begin{align}
s_i &= sum_j W_{ij}~x_j \
&= sum_j W_{ij}~(vec{e}_j cdot vec{x})
end{align}
with $ vec{x} $ being the input and $ W_{ij} $ being the weights we wish to train.
Kernel Method
Enter the realm of kernel methods and let the algorithm handle input via,
begin{align}
s_i &= sum_j W_{ij}~k(e_j,~x)
end{align}
where now we generalize to $ x, e_j in mathcal{X} $.
Consider a group $ G $ that acts on $ mathcal{X} $ via $ x rightarrow T_g(x) $ for $ g in G $. A simple way to make our algorithm invariant under this group is to make a kernel,
begin{align}
k^G(x, y) &= frac{1}{|G|} sum_{g in G} k(x, T_g(y))
end{align}
with $ k(x, y) = k(T_g(x), T_g(y)) $.
So,
begin{align}
k^G(x, T_h(y)) &= frac{1}{|G|} sum_{g in G} k(x, T_{gh}(y)) \
&= frac{1}{|G|} sum_{g in G} k(x, T_{g}(y)) \
&= frac{1}{|G|} sum_{g in G} k(T_{g}(x), y)
end{align}
For $ k(x, y) = x cdot y $ which works for all unitary representations,
begin{align}
k^G(x, T_h(y)) &= left[ frac{1}{|G|} sum_{g in G} T_{g}(x) right] cdot y
end{align}
Which offers a transformation matrix that can symmeterize the input into the algorithm.
SO(2) Example
Actually just the group that maps to $ frac{pi}{2} $ rotations for simplicity.
Let us run linear regression on data $ (vec{x}_i, y_i) in mathbb{R}^2 times mathbb{R} $ where we expect a rotational symmetry.
Our optimization problem becomes,
begin{align}
min_{W_{j}} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
tilde{y}_i &= sum_j W_{j} k_G(e_j, x_i) + b_i
end{align}
The kernel $ k(x, y) = | x - y |^2 $ satisfies $ k(x, y) = k(T_g(x), T_g(y)) $. You could also use $ k(x, y) = x cdot y $ and a variety of kernels.
Thus,
begin{align}
k_G(e_j, x_i) &= frac{1}{4} sum_{n=1}^4 | R(npi/2)~vec{e}_j - vec{x}_i |^2 \
&= frac{1}{4} sum_{n=1}^4 ( cos(npi/2) - vec{x}_{i1} )^2 + ( sin(npi/2) - vec{x}_{i2} )^2 \
&= frac{1}{4} left[ 2 vec{x}_{i1}^2 + 2 vec{x}_{i2}^2 + (1 - vec{x}_{i1} )^2 + (1 - vec{x}_{i2} )^2 + (1 + vec{x}_{i1} )^2 + (1 + vec{x}_{i2} )^2 right] \
&= vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1
end{align}
Note that we needn't sum over $ j $ because it is the same for both. So our problem becomes,
begin{align}
min_{W} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
tilde{y}_i &= W left[ vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1 right] + b_i
end{align}
Which yields the expected spherical symmetry!
Tic-Tac-Toe
Example code can be seen here. It shows how we can create a matrix that encodes the symmetry and use it. Note that this is really bad when I actually run it! Working with other kernels at the moment.
edited yesterday
answered Aug 8 '17 at 0:22
aidan.plenert.macdonaldaidan.plenert.macdonald
1989
1989
$begingroup$
Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
$endgroup$
– Emre
Aug 8 '17 at 20:37
1
$begingroup$
Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
$endgroup$
– aidan.plenert.macdonald
Aug 10 '17 at 0:00
$begingroup$
Is there any meaningful approach when the symmetry group is not known?
$endgroup$
– leitasat
Jun 7 '18 at 12:39
$begingroup$
@leitasat How do you know it's symmetric if you don't know the group?
$endgroup$
– aidan.plenert.macdonald
Jun 7 '18 at 17:20
$begingroup$
@aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
$endgroup$
– leitasat
Jun 13 '18 at 13:11
|
show 1 more comment
$begingroup$
Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
$endgroup$
– Emre
Aug 8 '17 at 20:37
1
$begingroup$
Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
$endgroup$
– aidan.plenert.macdonald
Aug 10 '17 at 0:00
$begingroup$
Is there any meaningful approach when the symmetry group is not known?
$endgroup$
– leitasat
Jun 7 '18 at 12:39
$begingroup$
@leitasat How do you know it's symmetric if you don't know the group?
$endgroup$
– aidan.plenert.macdonald
Jun 7 '18 at 17:20
$begingroup$
@aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
$endgroup$
– leitasat
Jun 13 '18 at 13:11
$begingroup$
Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
$endgroup$
– Emre
Aug 8 '17 at 20:37
$begingroup$
Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
$endgroup$
– Emre
Aug 8 '17 at 20:37
1
1
$begingroup$
Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
$endgroup$
– aidan.plenert.macdonald
Aug 10 '17 at 0:00
$begingroup$
Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
$endgroup$
– aidan.plenert.macdonald
Aug 10 '17 at 0:00
$begingroup$
Is there any meaningful approach when the symmetry group is not known?
$endgroup$
– leitasat
Jun 7 '18 at 12:39
$begingroup$
Is there any meaningful approach when the symmetry group is not known?
$endgroup$
– leitasat
Jun 7 '18 at 12:39
$begingroup$
@leitasat How do you know it's symmetric if you don't know the group?
$endgroup$
– aidan.plenert.macdonald
Jun 7 '18 at 17:20
$begingroup$
@leitasat How do you know it's symmetric if you don't know the group?
$endgroup$
– aidan.plenert.macdonald
Jun 7 '18 at 17:20
$begingroup$
@aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
$endgroup$
– leitasat
Jun 13 '18 at 13:11
$begingroup$
@aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
$endgroup$
– leitasat
Jun 13 '18 at 13:11
|
show 1 more comment
$begingroup$
Turns out this is just the study of Invariant Theory applied to Machine Learning
$endgroup$
add a comment |
$begingroup$
Turns out this is just the study of Invariant Theory applied to Machine Learning
$endgroup$
add a comment |
$begingroup$
Turns out this is just the study of Invariant Theory applied to Machine Learning
$endgroup$
Turns out this is just the study of Invariant Theory applied to Machine Learning
answered 17 mins ago
aidan.plenert.macdonaldaidan.plenert.macdonald
1989
1989
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f21963%2fdeep-noethers-theorem-building-in-symmetry-constraints%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Yes, some; e.g., Group Equivariant Convolutional Networks (code), Harmonic Networks: Deep Translation and Rotation Equivariance, Deep Rotation Equivariant Network, Exploiting Cyclic Symmetry in Convolutional Neural Networks etc. You just don't see it much in the wild yet.
$endgroup$
– Emre
Aug 4 '17 at 18:05
$begingroup$
@Emre Thanks! Do you know of any work down outside of CNN's?
$endgroup$
– aidan.plenert.macdonald
Aug 4 '17 at 23:27
$begingroup$
No, I only have superficial knowledge of this niche. Notwithstanding, CNNs seem like a natural setting ...
$endgroup$
– Emre
Aug 5 '17 at 20:05
2
$begingroup$
I should also mention Risi Kondor's PhD dissertation, Group theoretical methods in machine learning (pdf)
$endgroup$
– Emre
Aug 7 '17 at 4:42