“Deep Noether's Theorem”: Building in Symmetry Constraints












6












$begingroup$


If I have a learning problem that should have an inherent symmetry, is there a way to subject my learning problem to a symmetry constraint to enhance learning?



For example, if I am doing image recognition, I might want 2D rotational symmetry. Meaning that the rotated version of an image should get the same result as the original.



Or if I am learning to play tic-tac-toe, then rotating by 90deg should yield the same game play.



Has any research been done on this?










share|improve this question









$endgroup$








  • 1




    $begingroup$
    Yes, some; e.g., Group Equivariant Convolutional Networks (code), Harmonic Networks: Deep Translation and Rotation Equivariance, Deep Rotation Equivariant Network, Exploiting Cyclic Symmetry in Convolutional Neural Networks etc. You just don't see it much in the wild yet.
    $endgroup$
    – Emre
    Aug 4 '17 at 18:05












  • $begingroup$
    @Emre Thanks! Do you know of any work down outside of CNN's?
    $endgroup$
    – aidan.plenert.macdonald
    Aug 4 '17 at 23:27










  • $begingroup$
    No, I only have superficial knowledge of this niche. Notwithstanding, CNNs seem like a natural setting ...
    $endgroup$
    – Emre
    Aug 5 '17 at 20:05






  • 2




    $begingroup$
    I should also mention Risi Kondor's PhD dissertation, Group theoretical methods in machine learning (pdf)
    $endgroup$
    – Emre
    Aug 7 '17 at 4:42


















6












$begingroup$


If I have a learning problem that should have an inherent symmetry, is there a way to subject my learning problem to a symmetry constraint to enhance learning?



For example, if I am doing image recognition, I might want 2D rotational symmetry. Meaning that the rotated version of an image should get the same result as the original.



Or if I am learning to play tic-tac-toe, then rotating by 90deg should yield the same game play.



Has any research been done on this?










share|improve this question









$endgroup$








  • 1




    $begingroup$
    Yes, some; e.g., Group Equivariant Convolutional Networks (code), Harmonic Networks: Deep Translation and Rotation Equivariance, Deep Rotation Equivariant Network, Exploiting Cyclic Symmetry in Convolutional Neural Networks etc. You just don't see it much in the wild yet.
    $endgroup$
    – Emre
    Aug 4 '17 at 18:05












  • $begingroup$
    @Emre Thanks! Do you know of any work down outside of CNN's?
    $endgroup$
    – aidan.plenert.macdonald
    Aug 4 '17 at 23:27










  • $begingroup$
    No, I only have superficial knowledge of this niche. Notwithstanding, CNNs seem like a natural setting ...
    $endgroup$
    – Emre
    Aug 5 '17 at 20:05






  • 2




    $begingroup$
    I should also mention Risi Kondor's PhD dissertation, Group theoretical methods in machine learning (pdf)
    $endgroup$
    – Emre
    Aug 7 '17 at 4:42
















6












6








6


5



$begingroup$


If I have a learning problem that should have an inherent symmetry, is there a way to subject my learning problem to a symmetry constraint to enhance learning?



For example, if I am doing image recognition, I might want 2D rotational symmetry. Meaning that the rotated version of an image should get the same result as the original.



Or if I am learning to play tic-tac-toe, then rotating by 90deg should yield the same game play.



Has any research been done on this?










share|improve this question









$endgroup$




If I have a learning problem that should have an inherent symmetry, is there a way to subject my learning problem to a symmetry constraint to enhance learning?



For example, if I am doing image recognition, I might want 2D rotational symmetry. Meaning that the rotated version of an image should get the same result as the original.



Or if I am learning to play tic-tac-toe, then rotating by 90deg should yield the same game play.



Has any research been done on this?







machine-learning






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Aug 4 '17 at 16:44









aidan.plenert.macdonaldaidan.plenert.macdonald

1989




1989








  • 1




    $begingroup$
    Yes, some; e.g., Group Equivariant Convolutional Networks (code), Harmonic Networks: Deep Translation and Rotation Equivariance, Deep Rotation Equivariant Network, Exploiting Cyclic Symmetry in Convolutional Neural Networks etc. You just don't see it much in the wild yet.
    $endgroup$
    – Emre
    Aug 4 '17 at 18:05












  • $begingroup$
    @Emre Thanks! Do you know of any work down outside of CNN's?
    $endgroup$
    – aidan.plenert.macdonald
    Aug 4 '17 at 23:27










  • $begingroup$
    No, I only have superficial knowledge of this niche. Notwithstanding, CNNs seem like a natural setting ...
    $endgroup$
    – Emre
    Aug 5 '17 at 20:05






  • 2




    $begingroup$
    I should also mention Risi Kondor's PhD dissertation, Group theoretical methods in machine learning (pdf)
    $endgroup$
    – Emre
    Aug 7 '17 at 4:42
















  • 1




    $begingroup$
    Yes, some; e.g., Group Equivariant Convolutional Networks (code), Harmonic Networks: Deep Translation and Rotation Equivariance, Deep Rotation Equivariant Network, Exploiting Cyclic Symmetry in Convolutional Neural Networks etc. You just don't see it much in the wild yet.
    $endgroup$
    – Emre
    Aug 4 '17 at 18:05












  • $begingroup$
    @Emre Thanks! Do you know of any work down outside of CNN's?
    $endgroup$
    – aidan.plenert.macdonald
    Aug 4 '17 at 23:27










  • $begingroup$
    No, I only have superficial knowledge of this niche. Notwithstanding, CNNs seem like a natural setting ...
    $endgroup$
    – Emre
    Aug 5 '17 at 20:05






  • 2




    $begingroup$
    I should also mention Risi Kondor's PhD dissertation, Group theoretical methods in machine learning (pdf)
    $endgroup$
    – Emre
    Aug 7 '17 at 4:42










1




1




$begingroup$
Yes, some; e.g., Group Equivariant Convolutional Networks (code), Harmonic Networks: Deep Translation and Rotation Equivariance, Deep Rotation Equivariant Network, Exploiting Cyclic Symmetry in Convolutional Neural Networks etc. You just don't see it much in the wild yet.
$endgroup$
– Emre
Aug 4 '17 at 18:05






$begingroup$
Yes, some; e.g., Group Equivariant Convolutional Networks (code), Harmonic Networks: Deep Translation and Rotation Equivariance, Deep Rotation Equivariant Network, Exploiting Cyclic Symmetry in Convolutional Neural Networks etc. You just don't see it much in the wild yet.
$endgroup$
– Emre
Aug 4 '17 at 18:05














$begingroup$
@Emre Thanks! Do you know of any work down outside of CNN's?
$endgroup$
– aidan.plenert.macdonald
Aug 4 '17 at 23:27




$begingroup$
@Emre Thanks! Do you know of any work down outside of CNN's?
$endgroup$
– aidan.plenert.macdonald
Aug 4 '17 at 23:27












$begingroup$
No, I only have superficial knowledge of this niche. Notwithstanding, CNNs seem like a natural setting ...
$endgroup$
– Emre
Aug 5 '17 at 20:05




$begingroup$
No, I only have superficial knowledge of this niche. Notwithstanding, CNNs seem like a natural setting ...
$endgroup$
– Emre
Aug 5 '17 at 20:05




2




2




$begingroup$
I should also mention Risi Kondor's PhD dissertation, Group theoretical methods in machine learning (pdf)
$endgroup$
– Emre
Aug 7 '17 at 4:42






$begingroup$
I should also mention Risi Kondor's PhD dissertation, Group theoretical methods in machine learning (pdf)
$endgroup$
– Emre
Aug 7 '17 at 4:42












2 Answers
2






active

oldest

votes


















5












$begingroup$

From Emre's comment above, Section 4.4 of Group theoretical methods in machine learning by Risi Kondor has detailed information and proofs about creating kernel methods that inherently have symmetries. I will summarize it in a hopefully intuitive way (I am a physicist not a mathematician!).



Most ML algorithms have a matrix multiplication like,
begin{align}
s_i &= sum_j W_{ij}~x_j \
&= sum_j W_{ij}~(vec{e}_j cdot vec{x})
end{align}

with $ vec{x} $ being the input and $ W_{ij} $ being the weights we wish to train.



Kernel Method



Enter the realm of kernel methods and let the algorithm handle input via,
begin{align}
s_i &= sum_j W_{ij}~k(e_j,~x)
end{align}

where now we generalize to $ x, e_j in mathcal{X} $.



Consider a group $ G $ that acts on $ mathcal{X} $ via $ x rightarrow T_g(x) $ for $ g in G $. A simple way to make our algorithm invariant under this group is to make a kernel,
begin{align}
k^G(x, y) &= frac{1}{|G|} sum_{g in G} k(x, T_g(y))
end{align}

with $ k(x, y) = k(T_g(x), T_g(y)) $.



So,
begin{align}
k^G(x, T_h(y)) &= frac{1}{|G|} sum_{g in G} k(x, T_{gh}(y)) \
&= frac{1}{|G|} sum_{g in G} k(x, T_{g}(y)) \
&= frac{1}{|G|} sum_{g in G} k(T_{g}(x), y)
end{align}



For $ k(x, y) = x cdot y $ which works for all unitary representations,



begin{align}
k^G(x, T_h(y)) &= left[ frac{1}{|G|} sum_{g in G} T_{g}(x) right] cdot y
end{align}



Which offers a transformation matrix that can symmeterize the input into the algorithm.



SO(2) Example



Actually just the group that maps to $ frac{pi}{2} $ rotations for simplicity.



Let us run linear regression on data $ (vec{x}_i, y_i) in mathbb{R}^2 times mathbb{R} $ where we expect a rotational symmetry.



Our optimization problem becomes,
begin{align}
min_{W_{j}} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
tilde{y}_i &= sum_j W_{j} k_G(e_j, x_i) + b_i
end{align}



The kernel $ k(x, y) = | x - y |^2 $ satisfies $ k(x, y) = k(T_g(x), T_g(y)) $. You could also use $ k(x, y) = x cdot y $ and a variety of kernels.



Thus,
begin{align}
k_G(e_j, x_i) &= frac{1}{4} sum_{n=1}^4 | R(npi/2)~vec{e}_j - vec{x}_i |^2 \
&= frac{1}{4} sum_{n=1}^4 ( cos(npi/2) - vec{x}_{i1} )^2 + ( sin(npi/2) - vec{x}_{i2} )^2 \
&= frac{1}{4} left[ 2 vec{x}_{i1}^2 + 2 vec{x}_{i2}^2 + (1 - vec{x}_{i1} )^2 + (1 - vec{x}_{i2} )^2 + (1 + vec{x}_{i1} )^2 + (1 + vec{x}_{i2} )^2 right] \
&= vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1
end{align}



Note that we needn't sum over $ j $ because it is the same for both. So our problem becomes,
begin{align}
min_{W} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
tilde{y}_i &= W left[ vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1 right] + b_i
end{align}



Which yields the expected spherical symmetry!



Tic-Tac-Toe



Example code can be seen here. It shows how we can create a matrix that encodes the symmetry and use it. Note that this is really bad when I actually run it! Working with other kernels at the moment.






share|improve this answer











$endgroup$













  • $begingroup$
    Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
    $endgroup$
    – Emre
    Aug 8 '17 at 20:37








  • 1




    $begingroup$
    Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
    $endgroup$
    – aidan.plenert.macdonald
    Aug 10 '17 at 0:00












  • $begingroup$
    Is there any meaningful approach when the symmetry group is not known?
    $endgroup$
    – leitasat
    Jun 7 '18 at 12:39










  • $begingroup$
    @leitasat How do you know it's symmetric if you don't know the group?
    $endgroup$
    – aidan.plenert.macdonald
    Jun 7 '18 at 17:20










  • $begingroup$
    @aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
    $endgroup$
    – leitasat
    Jun 13 '18 at 13:11



















0












$begingroup$

Turns out this is just the study of Invariant Theory applied to Machine Learning






share|improve this answer









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f21963%2fdeep-noethers-theorem-building-in-symmetry-constraints%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    5












    $begingroup$

    From Emre's comment above, Section 4.4 of Group theoretical methods in machine learning by Risi Kondor has detailed information and proofs about creating kernel methods that inherently have symmetries. I will summarize it in a hopefully intuitive way (I am a physicist not a mathematician!).



    Most ML algorithms have a matrix multiplication like,
    begin{align}
    s_i &= sum_j W_{ij}~x_j \
    &= sum_j W_{ij}~(vec{e}_j cdot vec{x})
    end{align}

    with $ vec{x} $ being the input and $ W_{ij} $ being the weights we wish to train.



    Kernel Method



    Enter the realm of kernel methods and let the algorithm handle input via,
    begin{align}
    s_i &= sum_j W_{ij}~k(e_j,~x)
    end{align}

    where now we generalize to $ x, e_j in mathcal{X} $.



    Consider a group $ G $ that acts on $ mathcal{X} $ via $ x rightarrow T_g(x) $ for $ g in G $. A simple way to make our algorithm invariant under this group is to make a kernel,
    begin{align}
    k^G(x, y) &= frac{1}{|G|} sum_{g in G} k(x, T_g(y))
    end{align}

    with $ k(x, y) = k(T_g(x), T_g(y)) $.



    So,
    begin{align}
    k^G(x, T_h(y)) &= frac{1}{|G|} sum_{g in G} k(x, T_{gh}(y)) \
    &= frac{1}{|G|} sum_{g in G} k(x, T_{g}(y)) \
    &= frac{1}{|G|} sum_{g in G} k(T_{g}(x), y)
    end{align}



    For $ k(x, y) = x cdot y $ which works for all unitary representations,



    begin{align}
    k^G(x, T_h(y)) &= left[ frac{1}{|G|} sum_{g in G} T_{g}(x) right] cdot y
    end{align}



    Which offers a transformation matrix that can symmeterize the input into the algorithm.



    SO(2) Example



    Actually just the group that maps to $ frac{pi}{2} $ rotations for simplicity.



    Let us run linear regression on data $ (vec{x}_i, y_i) in mathbb{R}^2 times mathbb{R} $ where we expect a rotational symmetry.



    Our optimization problem becomes,
    begin{align}
    min_{W_{j}} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
    tilde{y}_i &= sum_j W_{j} k_G(e_j, x_i) + b_i
    end{align}



    The kernel $ k(x, y) = | x - y |^2 $ satisfies $ k(x, y) = k(T_g(x), T_g(y)) $. You could also use $ k(x, y) = x cdot y $ and a variety of kernels.



    Thus,
    begin{align}
    k_G(e_j, x_i) &= frac{1}{4} sum_{n=1}^4 | R(npi/2)~vec{e}_j - vec{x}_i |^2 \
    &= frac{1}{4} sum_{n=1}^4 ( cos(npi/2) - vec{x}_{i1} )^2 + ( sin(npi/2) - vec{x}_{i2} )^2 \
    &= frac{1}{4} left[ 2 vec{x}_{i1}^2 + 2 vec{x}_{i2}^2 + (1 - vec{x}_{i1} )^2 + (1 - vec{x}_{i2} )^2 + (1 + vec{x}_{i1} )^2 + (1 + vec{x}_{i2} )^2 right] \
    &= vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1
    end{align}



    Note that we needn't sum over $ j $ because it is the same for both. So our problem becomes,
    begin{align}
    min_{W} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
    tilde{y}_i &= W left[ vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1 right] + b_i
    end{align}



    Which yields the expected spherical symmetry!



    Tic-Tac-Toe



    Example code can be seen here. It shows how we can create a matrix that encodes the symmetry and use it. Note that this is really bad when I actually run it! Working with other kernels at the moment.






    share|improve this answer











    $endgroup$













    • $begingroup$
      Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
      $endgroup$
      – Emre
      Aug 8 '17 at 20:37








    • 1




      $begingroup$
      Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
      $endgroup$
      – aidan.plenert.macdonald
      Aug 10 '17 at 0:00












    • $begingroup$
      Is there any meaningful approach when the symmetry group is not known?
      $endgroup$
      – leitasat
      Jun 7 '18 at 12:39










    • $begingroup$
      @leitasat How do you know it's symmetric if you don't know the group?
      $endgroup$
      – aidan.plenert.macdonald
      Jun 7 '18 at 17:20










    • $begingroup$
      @aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
      $endgroup$
      – leitasat
      Jun 13 '18 at 13:11
















    5












    $begingroup$

    From Emre's comment above, Section 4.4 of Group theoretical methods in machine learning by Risi Kondor has detailed information and proofs about creating kernel methods that inherently have symmetries. I will summarize it in a hopefully intuitive way (I am a physicist not a mathematician!).



    Most ML algorithms have a matrix multiplication like,
    begin{align}
    s_i &= sum_j W_{ij}~x_j \
    &= sum_j W_{ij}~(vec{e}_j cdot vec{x})
    end{align}

    with $ vec{x} $ being the input and $ W_{ij} $ being the weights we wish to train.



    Kernel Method



    Enter the realm of kernel methods and let the algorithm handle input via,
    begin{align}
    s_i &= sum_j W_{ij}~k(e_j,~x)
    end{align}

    where now we generalize to $ x, e_j in mathcal{X} $.



    Consider a group $ G $ that acts on $ mathcal{X} $ via $ x rightarrow T_g(x) $ for $ g in G $. A simple way to make our algorithm invariant under this group is to make a kernel,
    begin{align}
    k^G(x, y) &= frac{1}{|G|} sum_{g in G} k(x, T_g(y))
    end{align}

    with $ k(x, y) = k(T_g(x), T_g(y)) $.



    So,
    begin{align}
    k^G(x, T_h(y)) &= frac{1}{|G|} sum_{g in G} k(x, T_{gh}(y)) \
    &= frac{1}{|G|} sum_{g in G} k(x, T_{g}(y)) \
    &= frac{1}{|G|} sum_{g in G} k(T_{g}(x), y)
    end{align}



    For $ k(x, y) = x cdot y $ which works for all unitary representations,



    begin{align}
    k^G(x, T_h(y)) &= left[ frac{1}{|G|} sum_{g in G} T_{g}(x) right] cdot y
    end{align}



    Which offers a transformation matrix that can symmeterize the input into the algorithm.



    SO(2) Example



    Actually just the group that maps to $ frac{pi}{2} $ rotations for simplicity.



    Let us run linear regression on data $ (vec{x}_i, y_i) in mathbb{R}^2 times mathbb{R} $ where we expect a rotational symmetry.



    Our optimization problem becomes,
    begin{align}
    min_{W_{j}} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
    tilde{y}_i &= sum_j W_{j} k_G(e_j, x_i) + b_i
    end{align}



    The kernel $ k(x, y) = | x - y |^2 $ satisfies $ k(x, y) = k(T_g(x), T_g(y)) $. You could also use $ k(x, y) = x cdot y $ and a variety of kernels.



    Thus,
    begin{align}
    k_G(e_j, x_i) &= frac{1}{4} sum_{n=1}^4 | R(npi/2)~vec{e}_j - vec{x}_i |^2 \
    &= frac{1}{4} sum_{n=1}^4 ( cos(npi/2) - vec{x}_{i1} )^2 + ( sin(npi/2) - vec{x}_{i2} )^2 \
    &= frac{1}{4} left[ 2 vec{x}_{i1}^2 + 2 vec{x}_{i2}^2 + (1 - vec{x}_{i1} )^2 + (1 - vec{x}_{i2} )^2 + (1 + vec{x}_{i1} )^2 + (1 + vec{x}_{i2} )^2 right] \
    &= vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1
    end{align}



    Note that we needn't sum over $ j $ because it is the same for both. So our problem becomes,
    begin{align}
    min_{W} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
    tilde{y}_i &= W left[ vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1 right] + b_i
    end{align}



    Which yields the expected spherical symmetry!



    Tic-Tac-Toe



    Example code can be seen here. It shows how we can create a matrix that encodes the symmetry and use it. Note that this is really bad when I actually run it! Working with other kernels at the moment.






    share|improve this answer











    $endgroup$













    • $begingroup$
      Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
      $endgroup$
      – Emre
      Aug 8 '17 at 20:37








    • 1




      $begingroup$
      Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
      $endgroup$
      – aidan.plenert.macdonald
      Aug 10 '17 at 0:00












    • $begingroup$
      Is there any meaningful approach when the symmetry group is not known?
      $endgroup$
      – leitasat
      Jun 7 '18 at 12:39










    • $begingroup$
      @leitasat How do you know it's symmetric if you don't know the group?
      $endgroup$
      – aidan.plenert.macdonald
      Jun 7 '18 at 17:20










    • $begingroup$
      @aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
      $endgroup$
      – leitasat
      Jun 13 '18 at 13:11














    5












    5








    5





    $begingroup$

    From Emre's comment above, Section 4.4 of Group theoretical methods in machine learning by Risi Kondor has detailed information and proofs about creating kernel methods that inherently have symmetries. I will summarize it in a hopefully intuitive way (I am a physicist not a mathematician!).



    Most ML algorithms have a matrix multiplication like,
    begin{align}
    s_i &= sum_j W_{ij}~x_j \
    &= sum_j W_{ij}~(vec{e}_j cdot vec{x})
    end{align}

    with $ vec{x} $ being the input and $ W_{ij} $ being the weights we wish to train.



    Kernel Method



    Enter the realm of kernel methods and let the algorithm handle input via,
    begin{align}
    s_i &= sum_j W_{ij}~k(e_j,~x)
    end{align}

    where now we generalize to $ x, e_j in mathcal{X} $.



    Consider a group $ G $ that acts on $ mathcal{X} $ via $ x rightarrow T_g(x) $ for $ g in G $. A simple way to make our algorithm invariant under this group is to make a kernel,
    begin{align}
    k^G(x, y) &= frac{1}{|G|} sum_{g in G} k(x, T_g(y))
    end{align}

    with $ k(x, y) = k(T_g(x), T_g(y)) $.



    So,
    begin{align}
    k^G(x, T_h(y)) &= frac{1}{|G|} sum_{g in G} k(x, T_{gh}(y)) \
    &= frac{1}{|G|} sum_{g in G} k(x, T_{g}(y)) \
    &= frac{1}{|G|} sum_{g in G} k(T_{g}(x), y)
    end{align}



    For $ k(x, y) = x cdot y $ which works for all unitary representations,



    begin{align}
    k^G(x, T_h(y)) &= left[ frac{1}{|G|} sum_{g in G} T_{g}(x) right] cdot y
    end{align}



    Which offers a transformation matrix that can symmeterize the input into the algorithm.



    SO(2) Example



    Actually just the group that maps to $ frac{pi}{2} $ rotations for simplicity.



    Let us run linear regression on data $ (vec{x}_i, y_i) in mathbb{R}^2 times mathbb{R} $ where we expect a rotational symmetry.



    Our optimization problem becomes,
    begin{align}
    min_{W_{j}} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
    tilde{y}_i &= sum_j W_{j} k_G(e_j, x_i) + b_i
    end{align}



    The kernel $ k(x, y) = | x - y |^2 $ satisfies $ k(x, y) = k(T_g(x), T_g(y)) $. You could also use $ k(x, y) = x cdot y $ and a variety of kernels.



    Thus,
    begin{align}
    k_G(e_j, x_i) &= frac{1}{4} sum_{n=1}^4 | R(npi/2)~vec{e}_j - vec{x}_i |^2 \
    &= frac{1}{4} sum_{n=1}^4 ( cos(npi/2) - vec{x}_{i1} )^2 + ( sin(npi/2) - vec{x}_{i2} )^2 \
    &= frac{1}{4} left[ 2 vec{x}_{i1}^2 + 2 vec{x}_{i2}^2 + (1 - vec{x}_{i1} )^2 + (1 - vec{x}_{i2} )^2 + (1 + vec{x}_{i1} )^2 + (1 + vec{x}_{i2} )^2 right] \
    &= vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1
    end{align}



    Note that we needn't sum over $ j $ because it is the same for both. So our problem becomes,
    begin{align}
    min_{W} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
    tilde{y}_i &= W left[ vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1 right] + b_i
    end{align}



    Which yields the expected spherical symmetry!



    Tic-Tac-Toe



    Example code can be seen here. It shows how we can create a matrix that encodes the symmetry and use it. Note that this is really bad when I actually run it! Working with other kernels at the moment.






    share|improve this answer











    $endgroup$



    From Emre's comment above, Section 4.4 of Group theoretical methods in machine learning by Risi Kondor has detailed information and proofs about creating kernel methods that inherently have symmetries. I will summarize it in a hopefully intuitive way (I am a physicist not a mathematician!).



    Most ML algorithms have a matrix multiplication like,
    begin{align}
    s_i &= sum_j W_{ij}~x_j \
    &= sum_j W_{ij}~(vec{e}_j cdot vec{x})
    end{align}

    with $ vec{x} $ being the input and $ W_{ij} $ being the weights we wish to train.



    Kernel Method



    Enter the realm of kernel methods and let the algorithm handle input via,
    begin{align}
    s_i &= sum_j W_{ij}~k(e_j,~x)
    end{align}

    where now we generalize to $ x, e_j in mathcal{X} $.



    Consider a group $ G $ that acts on $ mathcal{X} $ via $ x rightarrow T_g(x) $ for $ g in G $. A simple way to make our algorithm invariant under this group is to make a kernel,
    begin{align}
    k^G(x, y) &= frac{1}{|G|} sum_{g in G} k(x, T_g(y))
    end{align}

    with $ k(x, y) = k(T_g(x), T_g(y)) $.



    So,
    begin{align}
    k^G(x, T_h(y)) &= frac{1}{|G|} sum_{g in G} k(x, T_{gh}(y)) \
    &= frac{1}{|G|} sum_{g in G} k(x, T_{g}(y)) \
    &= frac{1}{|G|} sum_{g in G} k(T_{g}(x), y)
    end{align}



    For $ k(x, y) = x cdot y $ which works for all unitary representations,



    begin{align}
    k^G(x, T_h(y)) &= left[ frac{1}{|G|} sum_{g in G} T_{g}(x) right] cdot y
    end{align}



    Which offers a transformation matrix that can symmeterize the input into the algorithm.



    SO(2) Example



    Actually just the group that maps to $ frac{pi}{2} $ rotations for simplicity.



    Let us run linear regression on data $ (vec{x}_i, y_i) in mathbb{R}^2 times mathbb{R} $ where we expect a rotational symmetry.



    Our optimization problem becomes,
    begin{align}
    min_{W_{j}} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
    tilde{y}_i &= sum_j W_{j} k_G(e_j, x_i) + b_i
    end{align}



    The kernel $ k(x, y) = | x - y |^2 $ satisfies $ k(x, y) = k(T_g(x), T_g(y)) $. You could also use $ k(x, y) = x cdot y $ and a variety of kernels.



    Thus,
    begin{align}
    k_G(e_j, x_i) &= frac{1}{4} sum_{n=1}^4 | R(npi/2)~vec{e}_j - vec{x}_i |^2 \
    &= frac{1}{4} sum_{n=1}^4 ( cos(npi/2) - vec{x}_{i1} )^2 + ( sin(npi/2) - vec{x}_{i2} )^2 \
    &= frac{1}{4} left[ 2 vec{x}_{i1}^2 + 2 vec{x}_{i2}^2 + (1 - vec{x}_{i1} )^2 + (1 - vec{x}_{i2} )^2 + (1 + vec{x}_{i1} )^2 + (1 + vec{x}_{i2} )^2 right] \
    &= vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1
    end{align}



    Note that we needn't sum over $ j $ because it is the same for both. So our problem becomes,
    begin{align}
    min_{W} &sum_i frac{1}{2} (y_i - tilde{y}_i)^2 \
    tilde{y}_i &= W left[ vec{x}_{i1}^2 + vec{x}_{i2}^2 + 1 right] + b_i
    end{align}



    Which yields the expected spherical symmetry!



    Tic-Tac-Toe



    Example code can be seen here. It shows how we can create a matrix that encodes the symmetry and use it. Note that this is really bad when I actually run it! Working with other kernels at the moment.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited yesterday

























    answered Aug 8 '17 at 0:22









    aidan.plenert.macdonaldaidan.plenert.macdonald

    1989




    1989












    • $begingroup$
      Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
      $endgroup$
      – Emre
      Aug 8 '17 at 20:37








    • 1




      $begingroup$
      Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
      $endgroup$
      – aidan.plenert.macdonald
      Aug 10 '17 at 0:00












    • $begingroup$
      Is there any meaningful approach when the symmetry group is not known?
      $endgroup$
      – leitasat
      Jun 7 '18 at 12:39










    • $begingroup$
      @leitasat How do you know it's symmetric if you don't know the group?
      $endgroup$
      – aidan.plenert.macdonald
      Jun 7 '18 at 17:20










    • $begingroup$
      @aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
      $endgroup$
      – leitasat
      Jun 13 '18 at 13:11


















    • $begingroup$
      Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
      $endgroup$
      – Emre
      Aug 8 '17 at 20:37








    • 1




      $begingroup$
      Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
      $endgroup$
      – aidan.plenert.macdonald
      Aug 10 '17 at 0:00












    • $begingroup$
      Is there any meaningful approach when the symmetry group is not known?
      $endgroup$
      – leitasat
      Jun 7 '18 at 12:39










    • $begingroup$
      @leitasat How do you know it's symmetric if you don't know the group?
      $endgroup$
      – aidan.plenert.macdonald
      Jun 7 '18 at 17:20










    • $begingroup$
      @aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
      $endgroup$
      – leitasat
      Jun 13 '18 at 13:11
















    $begingroup$
    Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
    $endgroup$
    – Emre
    Aug 8 '17 at 20:37






    $begingroup$
    Good job, Aidan! If you have time, you can write a more detailed blog post. The community will be most interested.
    $endgroup$
    – Emre
    Aug 8 '17 at 20:37






    1




    1




    $begingroup$
    Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
    $endgroup$
    – aidan.plenert.macdonald
    Aug 10 '17 at 0:00






    $begingroup$
    Not sure what community you are referring to, but I started writing more. I wanted to find a way to estimate the optimal kernel given a set of data. So I optimized entropy on kernel space to intuitively get a new set of features that are symmetrically constrained and also maximally entropic (ie. informative). Now whether that it the right approach. I can't say. Just a warning, the math is a bit of a hack job right now and kind of straight out of stat mech. overleaf.com/read/kdfzdbyhpbbq
    $endgroup$
    – aidan.plenert.macdonald
    Aug 10 '17 at 0:00














    $begingroup$
    Is there any meaningful approach when the symmetry group is not known?
    $endgroup$
    – leitasat
    Jun 7 '18 at 12:39




    $begingroup$
    Is there any meaningful approach when the symmetry group is not known?
    $endgroup$
    – leitasat
    Jun 7 '18 at 12:39












    $begingroup$
    @leitasat How do you know it's symmetric if you don't know the group?
    $endgroup$
    – aidan.plenert.macdonald
    Jun 7 '18 at 17:20




    $begingroup$
    @leitasat How do you know it's symmetric if you don't know the group?
    $endgroup$
    – aidan.plenert.macdonald
    Jun 7 '18 at 17:20












    $begingroup$
    @aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
    $endgroup$
    – leitasat
    Jun 13 '18 at 13:11




    $begingroup$
    @aidan.plenert.macdonald from the data. Let's say we have 1000 sets of 100 pictures each, and within each set there are pictures of one object from different viewpoints. Can any algorithm "learn the idea" of SO(3) symmetry and use it on previously unseen objects?
    $endgroup$
    – leitasat
    Jun 13 '18 at 13:11











    0












    $begingroup$

    Turns out this is just the study of Invariant Theory applied to Machine Learning






    share|improve this answer









    $endgroup$


















      0












      $begingroup$

      Turns out this is just the study of Invariant Theory applied to Machine Learning






      share|improve this answer









      $endgroup$
















        0












        0








        0





        $begingroup$

        Turns out this is just the study of Invariant Theory applied to Machine Learning






        share|improve this answer









        $endgroup$



        Turns out this is just the study of Invariant Theory applied to Machine Learning







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 17 mins ago









        aidan.plenert.macdonaldaidan.plenert.macdonald

        1989




        1989






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f21963%2fdeep-noethers-theorem-building-in-symmetry-constraints%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to label and detect the document text images

            Vallis Paradisi

            Tabula Rosettana