Aggregate NumPy array with condition as mask












4












$begingroup$


I have a matrix $b$ with elements:
$$b =
begin{pmatrix}
0.01 & 0.02 & cdots & 1 \
0.01 & 0.02 & cdots & 1 \
vdots& vdots & ddots & vdots \
0.01 & 0.02 & cdots & 1 \
end{pmatrix}
$$
For which through a series of calculation which is vectorised, $b$ is used to calculate $a$ which is another matrix that has the same dimension/shape as $b$.
$$a =
begin{pmatrix}
3 & 5 & cdots & 17 \
2 & 6 & cdots & 23 \
vdots& vdots & ddots & vdots \
4 & 3 & cdots & 19 \
end{pmatrix}
$$

At this point it is important to note that the elements of $a$ and $b$ have a one to one correspondence. The different row values(let's call it $sigma$) $0.01, 0.02...$ are different parameters for a series of simulations that I'm running. Hence for a fixed value of say $sigma = 0.01$, the length of its column values correspond to the total number of "simulations" I'm running for that particular parameter. If you know python vectorisation then you'll start to understand what I'm doing.



It is known that higher the $sigma$, the more the number of simulations for that particular sigma will have a value higher than 5 i.e. more of the matrix element along a column will have value bigger than 5. Essentially what I'm doing is vectorising $N$(columns) different simulations for $M$(rows) different parameters. Now I wish to find out the value of $sigma$ for which the total number simulation that's bigger than 5, is bigger than 95% of the total simulation.



To put it more concisely, for a $sigma$ of 0.02, each simulation would have results of $$5, 6, ..., 3$$ with say a total simulation of $N$. So let $$kappa = sum{ (text{all the simulations that have values bigger than 5})},$$I wish to find out the FIRST $sigma$ for which
$$frac{kappa}{N} > 0.95*N$$
i.e. the FIRST $sigma$ for which the proportion of total experiment for which its value $>5$ is bigger than 95% of the total number of experiment.



The code that I have written is:



# say 10000 simulations for a particular sigma 
SIMULATION = 10000

# say 100 different values of sigma ranging from 0.01 to 1
# this is equivalent to matrix b in mathjax above
SIGMA = np.ones((EXPERIMENTS,100))*np.linspace(0.01, 1, 100)

def return_sigma(matrix, simulation, sigma):
"""
My idea here is I put in sigma and matrix and total number of simulation.
Each time using np.ndenumerate looping over i and j to compare if the
element values are greater than 5. If yes then I add 1 to counter, if no
then continue. If the number of experiments with result bigger than 5 is
bigger than 95% of total number of experiment then I return that particular
sigma.
"""
counter = 0
for (i, j), value in np.ndenumerate(matrix):
if value[i, j] > 5:
counter+=1
if counter/experiments > 0.95*simulation:
break
return sigma[0, j] # sigma[:, j] should all be the same anyway
"""Now this can be ran by:"""
print(return_sigma(a, SIMULATION, SIGMA))


which doesn't seem to quite work as I'm not well-versed with 2D slicing comprehension so this is quite a challenging problem for me. Thanks in advance.



EDIT
I apologise on not giving away my calculation as it's sort of a coursework of mine. I have generated a for 15 different values of $sigma$ with 15 simulations each, and here they are:



array([[ 6,  2, 12, 12, 14, 14, 11, 11,  9, 23, 15,  3, 10, 12, 10],
[ 7, 7, 6, 9, 13, 8, 11, 17, 13, 8, 10, 16, 11, 16, 8],
[14, 6, 4, 8, 10, 9, 11, 14, 12, 14, 5, 8, 18, 29, 22],
[ 4, 12, 12, 3, 7, 8, 5, 13, 13, 10, 14, 16, 22, 15, 22],
[ 9, 8, 7, 12, 12, 6, 4, 13, 12, 12, 18, 20, 18, 14, 23],
[ 8, 6, 8, 6, 12, 11, 11, 4, 9, 9, 13, 19, 13, 11, 20],
[12, 8, 7, 17, 3, 9, 11, 5, 12, 24, 11, 12, 17, 9, 16],
[ 4, 8, 7, 5, 6, 10, 9, 6, 4, 13, 13, 14, 18, 20, 23],
[ 5, 10, 5, 6, 8, 4, 7, 7, 10, 11, 9, 22, 14, 30, 17],
[ 6, 4, 5, 9, 8, 8, 4, 21, 14, 18, 21, 13, 14, 22, 10],
[ 6, 2, 7, 7, 8, 3, 7, 19, 14, 7, 13, 12, 18, 8, 12],
[ 5, 7, 6, 4, 13, 9, 4, 3, 20, 11, 11, 8, 12, 29, 14],
[ 6, 3, 13, 6, 12, 10, 17, 6, 9, 15, 12, 12, 16, 12, 15],
[ 2, 9, 8, 15, 5, 4, 5, 7, 16, 13, 20, 18, 14, 18, 14],
[14, 10, 7, 11, 8, 13, 14, 13, 12, 19, 9, 10, 11, 17, 13]])


As you can see as $sigma$ gets higher the number of matrix elements in each column for which it is bigger than 5 is higher.



EDIT 2
So now condition is giving me the right thing, which is an array of booleans.



array([[False, False, False, False, False, False, False, False, True, True],
....................................................................,
[False, False, False, False, False, False, False, True, True, True]])


So now the last row is the important thing here as it corresponds to the parameters, in this case,



array([[0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.5],
...........................................................,
[0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5]])


Now the last row of condition is telling me that first True happens at $sigma$=0.4 i.e. for which all the > 95% of the total simulations for that $sigma$ have simulation result of > 5. So now I need to return the index of condition where the first True in the last row appeared i.e. [i, j]. Now doing b[i, j] should give me the parameter I want.(which I'm not sure if your next few line of codes are doing that.)










share|improve this question









New contributor




user3613025 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    If you could provide the matrix: a (a dummy version at least) it would be helpful to check output against your expectations.
    $endgroup$
    – n1k31t4
    Mar 31 at 23:22










  • $begingroup$
    Hi thanks for the reminder, I've added a to my edit.
    $endgroup$
    – user3613025
    Mar 31 at 23:41










  • $begingroup$
    Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
    $endgroup$
    – n1k31t4
    Mar 31 at 23:47
















4












$begingroup$


I have a matrix $b$ with elements:
$$b =
begin{pmatrix}
0.01 & 0.02 & cdots & 1 \
0.01 & 0.02 & cdots & 1 \
vdots& vdots & ddots & vdots \
0.01 & 0.02 & cdots & 1 \
end{pmatrix}
$$
For which through a series of calculation which is vectorised, $b$ is used to calculate $a$ which is another matrix that has the same dimension/shape as $b$.
$$a =
begin{pmatrix}
3 & 5 & cdots & 17 \
2 & 6 & cdots & 23 \
vdots& vdots & ddots & vdots \
4 & 3 & cdots & 19 \
end{pmatrix}
$$

At this point it is important to note that the elements of $a$ and $b$ have a one to one correspondence. The different row values(let's call it $sigma$) $0.01, 0.02...$ are different parameters for a series of simulations that I'm running. Hence for a fixed value of say $sigma = 0.01$, the length of its column values correspond to the total number of "simulations" I'm running for that particular parameter. If you know python vectorisation then you'll start to understand what I'm doing.



It is known that higher the $sigma$, the more the number of simulations for that particular sigma will have a value higher than 5 i.e. more of the matrix element along a column will have value bigger than 5. Essentially what I'm doing is vectorising $N$(columns) different simulations for $M$(rows) different parameters. Now I wish to find out the value of $sigma$ for which the total number simulation that's bigger than 5, is bigger than 95% of the total simulation.



To put it more concisely, for a $sigma$ of 0.02, each simulation would have results of $$5, 6, ..., 3$$ with say a total simulation of $N$. So let $$kappa = sum{ (text{all the simulations that have values bigger than 5})},$$I wish to find out the FIRST $sigma$ for which
$$frac{kappa}{N} > 0.95*N$$
i.e. the FIRST $sigma$ for which the proportion of total experiment for which its value $>5$ is bigger than 95% of the total number of experiment.



The code that I have written is:



# say 10000 simulations for a particular sigma 
SIMULATION = 10000

# say 100 different values of sigma ranging from 0.01 to 1
# this is equivalent to matrix b in mathjax above
SIGMA = np.ones((EXPERIMENTS,100))*np.linspace(0.01, 1, 100)

def return_sigma(matrix, simulation, sigma):
"""
My idea here is I put in sigma and matrix and total number of simulation.
Each time using np.ndenumerate looping over i and j to compare if the
element values are greater than 5. If yes then I add 1 to counter, if no
then continue. If the number of experiments with result bigger than 5 is
bigger than 95% of total number of experiment then I return that particular
sigma.
"""
counter = 0
for (i, j), value in np.ndenumerate(matrix):
if value[i, j] > 5:
counter+=1
if counter/experiments > 0.95*simulation:
break
return sigma[0, j] # sigma[:, j] should all be the same anyway
"""Now this can be ran by:"""
print(return_sigma(a, SIMULATION, SIGMA))


which doesn't seem to quite work as I'm not well-versed with 2D slicing comprehension so this is quite a challenging problem for me. Thanks in advance.



EDIT
I apologise on not giving away my calculation as it's sort of a coursework of mine. I have generated a for 15 different values of $sigma$ with 15 simulations each, and here they are:



array([[ 6,  2, 12, 12, 14, 14, 11, 11,  9, 23, 15,  3, 10, 12, 10],
[ 7, 7, 6, 9, 13, 8, 11, 17, 13, 8, 10, 16, 11, 16, 8],
[14, 6, 4, 8, 10, 9, 11, 14, 12, 14, 5, 8, 18, 29, 22],
[ 4, 12, 12, 3, 7, 8, 5, 13, 13, 10, 14, 16, 22, 15, 22],
[ 9, 8, 7, 12, 12, 6, 4, 13, 12, 12, 18, 20, 18, 14, 23],
[ 8, 6, 8, 6, 12, 11, 11, 4, 9, 9, 13, 19, 13, 11, 20],
[12, 8, 7, 17, 3, 9, 11, 5, 12, 24, 11, 12, 17, 9, 16],
[ 4, 8, 7, 5, 6, 10, 9, 6, 4, 13, 13, 14, 18, 20, 23],
[ 5, 10, 5, 6, 8, 4, 7, 7, 10, 11, 9, 22, 14, 30, 17],
[ 6, 4, 5, 9, 8, 8, 4, 21, 14, 18, 21, 13, 14, 22, 10],
[ 6, 2, 7, 7, 8, 3, 7, 19, 14, 7, 13, 12, 18, 8, 12],
[ 5, 7, 6, 4, 13, 9, 4, 3, 20, 11, 11, 8, 12, 29, 14],
[ 6, 3, 13, 6, 12, 10, 17, 6, 9, 15, 12, 12, 16, 12, 15],
[ 2, 9, 8, 15, 5, 4, 5, 7, 16, 13, 20, 18, 14, 18, 14],
[14, 10, 7, 11, 8, 13, 14, 13, 12, 19, 9, 10, 11, 17, 13]])


As you can see as $sigma$ gets higher the number of matrix elements in each column for which it is bigger than 5 is higher.



EDIT 2
So now condition is giving me the right thing, which is an array of booleans.



array([[False, False, False, False, False, False, False, False, True, True],
....................................................................,
[False, False, False, False, False, False, False, True, True, True]])


So now the last row is the important thing here as it corresponds to the parameters, in this case,



array([[0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.5],
...........................................................,
[0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5]])


Now the last row of condition is telling me that first True happens at $sigma$=0.4 i.e. for which all the > 95% of the total simulations for that $sigma$ have simulation result of > 5. So now I need to return the index of condition where the first True in the last row appeared i.e. [i, j]. Now doing b[i, j] should give me the parameter I want.(which I'm not sure if your next few line of codes are doing that.)










share|improve this question









New contributor




user3613025 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    If you could provide the matrix: a (a dummy version at least) it would be helpful to check output against your expectations.
    $endgroup$
    – n1k31t4
    Mar 31 at 23:22










  • $begingroup$
    Hi thanks for the reminder, I've added a to my edit.
    $endgroup$
    – user3613025
    Mar 31 at 23:41










  • $begingroup$
    Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
    $endgroup$
    – n1k31t4
    Mar 31 at 23:47














4












4








4





$begingroup$


I have a matrix $b$ with elements:
$$b =
begin{pmatrix}
0.01 & 0.02 & cdots & 1 \
0.01 & 0.02 & cdots & 1 \
vdots& vdots & ddots & vdots \
0.01 & 0.02 & cdots & 1 \
end{pmatrix}
$$
For which through a series of calculation which is vectorised, $b$ is used to calculate $a$ which is another matrix that has the same dimension/shape as $b$.
$$a =
begin{pmatrix}
3 & 5 & cdots & 17 \
2 & 6 & cdots & 23 \
vdots& vdots & ddots & vdots \
4 & 3 & cdots & 19 \
end{pmatrix}
$$

At this point it is important to note that the elements of $a$ and $b$ have a one to one correspondence. The different row values(let's call it $sigma$) $0.01, 0.02...$ are different parameters for a series of simulations that I'm running. Hence for a fixed value of say $sigma = 0.01$, the length of its column values correspond to the total number of "simulations" I'm running for that particular parameter. If you know python vectorisation then you'll start to understand what I'm doing.



It is known that higher the $sigma$, the more the number of simulations for that particular sigma will have a value higher than 5 i.e. more of the matrix element along a column will have value bigger than 5. Essentially what I'm doing is vectorising $N$(columns) different simulations for $M$(rows) different parameters. Now I wish to find out the value of $sigma$ for which the total number simulation that's bigger than 5, is bigger than 95% of the total simulation.



To put it more concisely, for a $sigma$ of 0.02, each simulation would have results of $$5, 6, ..., 3$$ with say a total simulation of $N$. So let $$kappa = sum{ (text{all the simulations that have values bigger than 5})},$$I wish to find out the FIRST $sigma$ for which
$$frac{kappa}{N} > 0.95*N$$
i.e. the FIRST $sigma$ for which the proportion of total experiment for which its value $>5$ is bigger than 95% of the total number of experiment.



The code that I have written is:



# say 10000 simulations for a particular sigma 
SIMULATION = 10000

# say 100 different values of sigma ranging from 0.01 to 1
# this is equivalent to matrix b in mathjax above
SIGMA = np.ones((EXPERIMENTS,100))*np.linspace(0.01, 1, 100)

def return_sigma(matrix, simulation, sigma):
"""
My idea here is I put in sigma and matrix and total number of simulation.
Each time using np.ndenumerate looping over i and j to compare if the
element values are greater than 5. If yes then I add 1 to counter, if no
then continue. If the number of experiments with result bigger than 5 is
bigger than 95% of total number of experiment then I return that particular
sigma.
"""
counter = 0
for (i, j), value in np.ndenumerate(matrix):
if value[i, j] > 5:
counter+=1
if counter/experiments > 0.95*simulation:
break
return sigma[0, j] # sigma[:, j] should all be the same anyway
"""Now this can be ran by:"""
print(return_sigma(a, SIMULATION, SIGMA))


which doesn't seem to quite work as I'm not well-versed with 2D slicing comprehension so this is quite a challenging problem for me. Thanks in advance.



EDIT
I apologise on not giving away my calculation as it's sort of a coursework of mine. I have generated a for 15 different values of $sigma$ with 15 simulations each, and here they are:



array([[ 6,  2, 12, 12, 14, 14, 11, 11,  9, 23, 15,  3, 10, 12, 10],
[ 7, 7, 6, 9, 13, 8, 11, 17, 13, 8, 10, 16, 11, 16, 8],
[14, 6, 4, 8, 10, 9, 11, 14, 12, 14, 5, 8, 18, 29, 22],
[ 4, 12, 12, 3, 7, 8, 5, 13, 13, 10, 14, 16, 22, 15, 22],
[ 9, 8, 7, 12, 12, 6, 4, 13, 12, 12, 18, 20, 18, 14, 23],
[ 8, 6, 8, 6, 12, 11, 11, 4, 9, 9, 13, 19, 13, 11, 20],
[12, 8, 7, 17, 3, 9, 11, 5, 12, 24, 11, 12, 17, 9, 16],
[ 4, 8, 7, 5, 6, 10, 9, 6, 4, 13, 13, 14, 18, 20, 23],
[ 5, 10, 5, 6, 8, 4, 7, 7, 10, 11, 9, 22, 14, 30, 17],
[ 6, 4, 5, 9, 8, 8, 4, 21, 14, 18, 21, 13, 14, 22, 10],
[ 6, 2, 7, 7, 8, 3, 7, 19, 14, 7, 13, 12, 18, 8, 12],
[ 5, 7, 6, 4, 13, 9, 4, 3, 20, 11, 11, 8, 12, 29, 14],
[ 6, 3, 13, 6, 12, 10, 17, 6, 9, 15, 12, 12, 16, 12, 15],
[ 2, 9, 8, 15, 5, 4, 5, 7, 16, 13, 20, 18, 14, 18, 14],
[14, 10, 7, 11, 8, 13, 14, 13, 12, 19, 9, 10, 11, 17, 13]])


As you can see as $sigma$ gets higher the number of matrix elements in each column for which it is bigger than 5 is higher.



EDIT 2
So now condition is giving me the right thing, which is an array of booleans.



array([[False, False, False, False, False, False, False, False, True, True],
....................................................................,
[False, False, False, False, False, False, False, True, True, True]])


So now the last row is the important thing here as it corresponds to the parameters, in this case,



array([[0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.5],
...........................................................,
[0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5]])


Now the last row of condition is telling me that first True happens at $sigma$=0.4 i.e. for which all the > 95% of the total simulations for that $sigma$ have simulation result of > 5. So now I need to return the index of condition where the first True in the last row appeared i.e. [i, j]. Now doing b[i, j] should give me the parameter I want.(which I'm not sure if your next few line of codes are doing that.)










share|improve this question









New contributor




user3613025 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I have a matrix $b$ with elements:
$$b =
begin{pmatrix}
0.01 & 0.02 & cdots & 1 \
0.01 & 0.02 & cdots & 1 \
vdots& vdots & ddots & vdots \
0.01 & 0.02 & cdots & 1 \
end{pmatrix}
$$
For which through a series of calculation which is vectorised, $b$ is used to calculate $a$ which is another matrix that has the same dimension/shape as $b$.
$$a =
begin{pmatrix}
3 & 5 & cdots & 17 \
2 & 6 & cdots & 23 \
vdots& vdots & ddots & vdots \
4 & 3 & cdots & 19 \
end{pmatrix}
$$

At this point it is important to note that the elements of $a$ and $b$ have a one to one correspondence. The different row values(let's call it $sigma$) $0.01, 0.02...$ are different parameters for a series of simulations that I'm running. Hence for a fixed value of say $sigma = 0.01$, the length of its column values correspond to the total number of "simulations" I'm running for that particular parameter. If you know python vectorisation then you'll start to understand what I'm doing.



It is known that higher the $sigma$, the more the number of simulations for that particular sigma will have a value higher than 5 i.e. more of the matrix element along a column will have value bigger than 5. Essentially what I'm doing is vectorising $N$(columns) different simulations for $M$(rows) different parameters. Now I wish to find out the value of $sigma$ for which the total number simulation that's bigger than 5, is bigger than 95% of the total simulation.



To put it more concisely, for a $sigma$ of 0.02, each simulation would have results of $$5, 6, ..., 3$$ with say a total simulation of $N$. So let $$kappa = sum{ (text{all the simulations that have values bigger than 5})},$$I wish to find out the FIRST $sigma$ for which
$$frac{kappa}{N} > 0.95*N$$
i.e. the FIRST $sigma$ for which the proportion of total experiment for which its value $>5$ is bigger than 95% of the total number of experiment.



The code that I have written is:



# say 10000 simulations for a particular sigma 
SIMULATION = 10000

# say 100 different values of sigma ranging from 0.01 to 1
# this is equivalent to matrix b in mathjax above
SIGMA = np.ones((EXPERIMENTS,100))*np.linspace(0.01, 1, 100)

def return_sigma(matrix, simulation, sigma):
"""
My idea here is I put in sigma and matrix and total number of simulation.
Each time using np.ndenumerate looping over i and j to compare if the
element values are greater than 5. If yes then I add 1 to counter, if no
then continue. If the number of experiments with result bigger than 5 is
bigger than 95% of total number of experiment then I return that particular
sigma.
"""
counter = 0
for (i, j), value in np.ndenumerate(matrix):
if value[i, j] > 5:
counter+=1
if counter/experiments > 0.95*simulation:
break
return sigma[0, j] # sigma[:, j] should all be the same anyway
"""Now this can be ran by:"""
print(return_sigma(a, SIMULATION, SIGMA))


which doesn't seem to quite work as I'm not well-versed with 2D slicing comprehension so this is quite a challenging problem for me. Thanks in advance.



EDIT
I apologise on not giving away my calculation as it's sort of a coursework of mine. I have generated a for 15 different values of $sigma$ with 15 simulations each, and here they are:



array([[ 6,  2, 12, 12, 14, 14, 11, 11,  9, 23, 15,  3, 10, 12, 10],
[ 7, 7, 6, 9, 13, 8, 11, 17, 13, 8, 10, 16, 11, 16, 8],
[14, 6, 4, 8, 10, 9, 11, 14, 12, 14, 5, 8, 18, 29, 22],
[ 4, 12, 12, 3, 7, 8, 5, 13, 13, 10, 14, 16, 22, 15, 22],
[ 9, 8, 7, 12, 12, 6, 4, 13, 12, 12, 18, 20, 18, 14, 23],
[ 8, 6, 8, 6, 12, 11, 11, 4, 9, 9, 13, 19, 13, 11, 20],
[12, 8, 7, 17, 3, 9, 11, 5, 12, 24, 11, 12, 17, 9, 16],
[ 4, 8, 7, 5, 6, 10, 9, 6, 4, 13, 13, 14, 18, 20, 23],
[ 5, 10, 5, 6, 8, 4, 7, 7, 10, 11, 9, 22, 14, 30, 17],
[ 6, 4, 5, 9, 8, 8, 4, 21, 14, 18, 21, 13, 14, 22, 10],
[ 6, 2, 7, 7, 8, 3, 7, 19, 14, 7, 13, 12, 18, 8, 12],
[ 5, 7, 6, 4, 13, 9, 4, 3, 20, 11, 11, 8, 12, 29, 14],
[ 6, 3, 13, 6, 12, 10, 17, 6, 9, 15, 12, 12, 16, 12, 15],
[ 2, 9, 8, 15, 5, 4, 5, 7, 16, 13, 20, 18, 14, 18, 14],
[14, 10, 7, 11, 8, 13, 14, 13, 12, 19, 9, 10, 11, 17, 13]])


As you can see as $sigma$ gets higher the number of matrix elements in each column for which it is bigger than 5 is higher.



EDIT 2
So now condition is giving me the right thing, which is an array of booleans.



array([[False, False, False, False, False, False, False, False, True, True],
....................................................................,
[False, False, False, False, False, False, False, True, True, True]])


So now the last row is the important thing here as it corresponds to the parameters, in this case,



array([[0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.5],
...........................................................,
[0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5]])


Now the last row of condition is telling me that first True happens at $sigma$=0.4 i.e. for which all the > 95% of the total simulations for that $sigma$ have simulation result of > 5. So now I need to return the index of condition where the first True in the last row appeared i.e. [i, j]. Now doing b[i, j] should give me the parameter I want.(which I'm not sure if your next few line of codes are doing that.)







python dataset data numpy






share|improve this question









New contributor




user3613025 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




user3613025 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 12 hours ago









n1k31t4

6,4912421




6,4912421






New contributor




user3613025 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Mar 31 at 21:40









user3613025user3613025

234




234




New contributor




user3613025 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





user3613025 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






user3613025 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    If you could provide the matrix: a (a dummy version at least) it would be helpful to check output against your expectations.
    $endgroup$
    – n1k31t4
    Mar 31 at 23:22










  • $begingroup$
    Hi thanks for the reminder, I've added a to my edit.
    $endgroup$
    – user3613025
    Mar 31 at 23:41










  • $begingroup$
    Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
    $endgroup$
    – n1k31t4
    Mar 31 at 23:47


















  • $begingroup$
    If you could provide the matrix: a (a dummy version at least) it would be helpful to check output against your expectations.
    $endgroup$
    – n1k31t4
    Mar 31 at 23:22










  • $begingroup$
    Hi thanks for the reminder, I've added a to my edit.
    $endgroup$
    – user3613025
    Mar 31 at 23:41










  • $begingroup$
    Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
    $endgroup$
    – n1k31t4
    Mar 31 at 23:47
















$begingroup$
If you could provide the matrix: a (a dummy version at least) it would be helpful to check output against your expectations.
$endgroup$
– n1k31t4
Mar 31 at 23:22




$begingroup$
If you could provide the matrix: a (a dummy version at least) it would be helpful to check output against your expectations.
$endgroup$
– n1k31t4
Mar 31 at 23:22












$begingroup$
Hi thanks for the reminder, I've added a to my edit.
$endgroup$
– user3613025
Mar 31 at 23:41




$begingroup$
Hi thanks for the reminder, I've added a to my edit.
$endgroup$
– user3613025
Mar 31 at 23:41












$begingroup$
Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
$endgroup$
– n1k31t4
Mar 31 at 23:47




$begingroup$
Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
$endgroup$
– n1k31t4
Mar 31 at 23:47










2 Answers
2






active

oldest

votes


















1












$begingroup$

I think I have understood your problem (mostly from the comments added in your function).



I'll show step by step what the logic is, building upon each previous step to get the final solution.



First we want to find all position where the matrix is larger than 5:



a > 5    # returns a boolean array with true/false in each position


Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:



(a > 5) / SIMULATION    # returns the value of one match


These values are required to sum to your threshold for an experiment to be valid.



Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).



np.cumsum((a > 5) / SIMULATION, axis=1)     # still same shape as b


Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:



## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
## because we already "normalised" the values within the cumsum.
condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
mask = np.where(condition)


I broke it down now as the expressions are getting long.



That gave us the i and j coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:



valid_rows = np.unique(mask[0], return_index=True)[1]    # [1] gets the indices themselves


Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:



valid_cols = mask[1][valid_rows]


So now you can get the corresponding values from the parameter matrix using these valid rows/columns:



params = b[valid_rows, valid_cols]




If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.






share|improve this answer











$endgroup$













  • $begingroup$
    Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
    $endgroup$
    – user3613025
    Mar 31 at 23:45










  • $begingroup$
    your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
    $endgroup$
    – user3613025
    Apr 2 at 13:51










  • $begingroup$
    From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
    $endgroup$
    – n1k31t4
    Apr 2 at 16:08










  • $begingroup$
    Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
    $endgroup$
    – user3613025
    Apr 2 at 16:42












  • $begingroup$
    So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
    $endgroup$
    – n1k31t4
    Apr 2 at 23:56



















0












$begingroup$

Is this helpful?



import numpy as np, numpy.random as npr
N_sims = 15 # sims per sigma
N_vals = 15 # num sigmas
# Parameters
SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
# Generate "results" :3 (i.e., the matrix a)
RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
for i in range(N_vals):
RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
print("SIGMAn", SIGMA)
print("RESULTSn", RESULTS)
# Mark the positions > 5
more_than_five = RESULTS > 5
print("more_than_fiven", more_than_five)
# Count how many are greater than five, per column (i.e., per sigma)
counts = more_than_five.sum(axis=0)
print('COUNTSn', counts)
# Compute the proportions (so, 1 if all exps were > 5)
proportions = counts.astype(float) / N_sims
print('Proportionsn', proportions)
# Find the first time it is larger than 0.5
first_index = np.argmax( proportions > 0.95 )
print('---nFIRST INDEXn', first_index)





share|improve this answer









$endgroup$














    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    user3613025 is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48316%2faggregate-numpy-array-with-condition-as-mask%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1












    $begingroup$

    I think I have understood your problem (mostly from the comments added in your function).



    I'll show step by step what the logic is, building upon each previous step to get the final solution.



    First we want to find all position where the matrix is larger than 5:



    a > 5    # returns a boolean array with true/false in each position


    Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:



    (a > 5) / SIMULATION    # returns the value of one match


    These values are required to sum to your threshold for an experiment to be valid.



    Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).



    np.cumsum((a > 5) / SIMULATION, axis=1)     # still same shape as b


    Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:



    ## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
    ## because we already "normalised" the values within the cumsum.
    condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
    mask = np.where(condition)


    I broke it down now as the expressions are getting long.



    That gave us the i and j coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:



    valid_rows = np.unique(mask[0], return_index=True)[1]    # [1] gets the indices themselves


    Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:



    valid_cols = mask[1][valid_rows]


    So now you can get the corresponding values from the parameter matrix using these valid rows/columns:



    params = b[valid_rows, valid_cols]




    If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.






    share|improve this answer











    $endgroup$













    • $begingroup$
      Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
      $endgroup$
      – user3613025
      Mar 31 at 23:45










    • $begingroup$
      your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
      $endgroup$
      – user3613025
      Apr 2 at 13:51










    • $begingroup$
      From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
      $endgroup$
      – n1k31t4
      Apr 2 at 16:08










    • $begingroup$
      Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
      $endgroup$
      – user3613025
      Apr 2 at 16:42












    • $begingroup$
      So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
      $endgroup$
      – n1k31t4
      Apr 2 at 23:56
















    1












    $begingroup$

    I think I have understood your problem (mostly from the comments added in your function).



    I'll show step by step what the logic is, building upon each previous step to get the final solution.



    First we want to find all position where the matrix is larger than 5:



    a > 5    # returns a boolean array with true/false in each position


    Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:



    (a > 5) / SIMULATION    # returns the value of one match


    These values are required to sum to your threshold for an experiment to be valid.



    Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).



    np.cumsum((a > 5) / SIMULATION, axis=1)     # still same shape as b


    Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:



    ## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
    ## because we already "normalised" the values within the cumsum.
    condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
    mask = np.where(condition)


    I broke it down now as the expressions are getting long.



    That gave us the i and j coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:



    valid_rows = np.unique(mask[0], return_index=True)[1]    # [1] gets the indices themselves


    Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:



    valid_cols = mask[1][valid_rows]


    So now you can get the corresponding values from the parameter matrix using these valid rows/columns:



    params = b[valid_rows, valid_cols]




    If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.






    share|improve this answer











    $endgroup$













    • $begingroup$
      Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
      $endgroup$
      – user3613025
      Mar 31 at 23:45










    • $begingroup$
      your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
      $endgroup$
      – user3613025
      Apr 2 at 13:51










    • $begingroup$
      From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
      $endgroup$
      – n1k31t4
      Apr 2 at 16:08










    • $begingroup$
      Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
      $endgroup$
      – user3613025
      Apr 2 at 16:42












    • $begingroup$
      So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
      $endgroup$
      – n1k31t4
      Apr 2 at 23:56














    1












    1








    1





    $begingroup$

    I think I have understood your problem (mostly from the comments added in your function).



    I'll show step by step what the logic is, building upon each previous step to get the final solution.



    First we want to find all position where the matrix is larger than 5:



    a > 5    # returns a boolean array with true/false in each position


    Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:



    (a > 5) / SIMULATION    # returns the value of one match


    These values are required to sum to your threshold for an experiment to be valid.



    Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).



    np.cumsum((a > 5) / SIMULATION, axis=1)     # still same shape as b


    Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:



    ## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
    ## because we already "normalised" the values within the cumsum.
    condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
    mask = np.where(condition)


    I broke it down now as the expressions are getting long.



    That gave us the i and j coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:



    valid_rows = np.unique(mask[0], return_index=True)[1]    # [1] gets the indices themselves


    Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:



    valid_cols = mask[1][valid_rows]


    So now you can get the corresponding values from the parameter matrix using these valid rows/columns:



    params = b[valid_rows, valid_cols]




    If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.






    share|improve this answer











    $endgroup$



    I think I have understood your problem (mostly from the comments added in your function).



    I'll show step by step what the logic is, building upon each previous step to get the final solution.



    First we want to find all position where the matrix is larger than 5:



    a > 5    # returns a boolean array with true/false in each position


    Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:



    (a > 5) / SIMULATION    # returns the value of one match


    These values are required to sum to your threshold for an experiment to be valid.



    Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).



    np.cumsum((a > 5) / SIMULATION, axis=1)     # still same shape as b


    Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:



    ## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
    ## because we already "normalised" the values within the cumsum.
    condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
    mask = np.where(condition)


    I broke it down now as the expressions are getting long.



    That gave us the i and j coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:



    valid_rows = np.unique(mask[0], return_index=True)[1]    # [1] gets the indices themselves


    Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:



    valid_cols = mask[1][valid_rows]


    So now you can get the corresponding values from the parameter matrix using these valid rows/columns:



    params = b[valid_rows, valid_cols]




    If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 2 days ago

























    answered Mar 31 at 23:43









    n1k31t4n1k31t4

    6,4912421




    6,4912421












    • $begingroup$
      Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
      $endgroup$
      – user3613025
      Mar 31 at 23:45










    • $begingroup$
      your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
      $endgroup$
      – user3613025
      Apr 2 at 13:51










    • $begingroup$
      From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
      $endgroup$
      – n1k31t4
      Apr 2 at 16:08










    • $begingroup$
      Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
      $endgroup$
      – user3613025
      Apr 2 at 16:42












    • $begingroup$
      So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
      $endgroup$
      – n1k31t4
      Apr 2 at 23:56


















    • $begingroup$
      Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
      $endgroup$
      – user3613025
      Mar 31 at 23:45










    • $begingroup$
      your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
      $endgroup$
      – user3613025
      Apr 2 at 13:51










    • $begingroup$
      From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
      $endgroup$
      – n1k31t4
      Apr 2 at 16:08










    • $begingroup$
      Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
      $endgroup$
      – user3613025
      Apr 2 at 16:42












    • $begingroup$
      So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
      $endgroup$
      – n1k31t4
      Apr 2 at 23:56
















    $begingroup$
    Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
    $endgroup$
    – user3613025
    Mar 31 at 23:45




    $begingroup$
    Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
    $endgroup$
    – user3613025
    Mar 31 at 23:45












    $begingroup$
    your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
    $endgroup$
    – user3613025
    Apr 2 at 13:51




    $begingroup$
    your method seems to be doing fine until I tried to print mask where it'd just keep giving me an empty array, and subsequently all valid_rows, valid_cols and params become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
    $endgroup$
    – user3613025
    Apr 2 at 13:51












    $begingroup$
    From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
    $endgroup$
    – n1k31t4
    Apr 2 at 16:08




    $begingroup$
    From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the condition line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
    $endgroup$
    – n1k31t4
    Apr 2 at 16:08












    $begingroup$
    Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
    $endgroup$
    – user3613025
    Apr 2 at 16:42






    $begingroup$
    Sorry let me explain it more clearly. Looking at the example a output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
    $endgroup$
    – user3613025
    Apr 2 at 16:42














    $begingroup$
    So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
    $endgroup$
    – n1k31t4
    Apr 2 at 23:56




    $begingroup$
    So each row in b is identical? And each column in a contains the results of num_rows experiments for the sigma value of that column? Could the solution then be as simple as to alter condition to perform np.cumsum(..., axis=0)... ?
    $endgroup$
    – n1k31t4
    Apr 2 at 23:56











    0












    $begingroup$

    Is this helpful?



    import numpy as np, numpy.random as npr
    N_sims = 15 # sims per sigma
    N_vals = 15 # num sigmas
    # Parameters
    SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
    # Generate "results" :3 (i.e., the matrix a)
    RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
    for i in range(N_vals):
    RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
    print("SIGMAn", SIGMA)
    print("RESULTSn", RESULTS)
    # Mark the positions > 5
    more_than_five = RESULTS > 5
    print("more_than_fiven", more_than_five)
    # Count how many are greater than five, per column (i.e., per sigma)
    counts = more_than_five.sum(axis=0)
    print('COUNTSn', counts)
    # Compute the proportions (so, 1 if all exps were > 5)
    proportions = counts.astype(float) / N_sims
    print('Proportionsn', proportions)
    # Find the first time it is larger than 0.5
    first_index = np.argmax( proportions > 0.95 )
    print('---nFIRST INDEXn', first_index)





    share|improve this answer









    $endgroup$


















      0












      $begingroup$

      Is this helpful?



      import numpy as np, numpy.random as npr
      N_sims = 15 # sims per sigma
      N_vals = 15 # num sigmas
      # Parameters
      SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
      # Generate "results" :3 (i.e., the matrix a)
      RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
      for i in range(N_vals):
      RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
      print("SIGMAn", SIGMA)
      print("RESULTSn", RESULTS)
      # Mark the positions > 5
      more_than_five = RESULTS > 5
      print("more_than_fiven", more_than_five)
      # Count how many are greater than five, per column (i.e., per sigma)
      counts = more_than_five.sum(axis=0)
      print('COUNTSn', counts)
      # Compute the proportions (so, 1 if all exps were > 5)
      proportions = counts.astype(float) / N_sims
      print('Proportionsn', proportions)
      # Find the first time it is larger than 0.5
      first_index = np.argmax( proportions > 0.95 )
      print('---nFIRST INDEXn', first_index)





      share|improve this answer









      $endgroup$
















        0












        0








        0





        $begingroup$

        Is this helpful?



        import numpy as np, numpy.random as npr
        N_sims = 15 # sims per sigma
        N_vals = 15 # num sigmas
        # Parameters
        SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
        # Generate "results" :3 (i.e., the matrix a)
        RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
        for i in range(N_vals):
        RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
        print("SIGMAn", SIGMA)
        print("RESULTSn", RESULTS)
        # Mark the positions > 5
        more_than_five = RESULTS > 5
        print("more_than_fiven", more_than_five)
        # Count how many are greater than five, per column (i.e., per sigma)
        counts = more_than_five.sum(axis=0)
        print('COUNTSn', counts)
        # Compute the proportions (so, 1 if all exps were > 5)
        proportions = counts.astype(float) / N_sims
        print('Proportionsn', proportions)
        # Find the first time it is larger than 0.5
        first_index = np.argmax( proportions > 0.95 )
        print('---nFIRST INDEXn', first_index)





        share|improve this answer









        $endgroup$



        Is this helpful?



        import numpy as np, numpy.random as npr
        N_sims = 15 # sims per sigma
        N_vals = 15 # num sigmas
        # Parameters
        SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
        # Generate "results" :3 (i.e., the matrix a)
        RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
        for i in range(N_vals):
        RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
        print("SIGMAn", SIGMA)
        print("RESULTSn", RESULTS)
        # Mark the positions > 5
        more_than_five = RESULTS > 5
        print("more_than_fiven", more_than_five)
        # Count how many are greater than five, per column (i.e., per sigma)
        counts = more_than_five.sum(axis=0)
        print('COUNTSn', counts)
        # Compute the proportions (so, 1 if all exps were > 5)
        proportions = counts.astype(float) / N_sims
        print('Proportionsn', proportions)
        # Find the first time it is larger than 0.5
        first_index = np.argmax( proportions > 0.95 )
        print('---nFIRST INDEXn', first_index)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 31 at 23:34









        user3658307user3658307

        1956




        1956






















            user3613025 is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            user3613025 is a new contributor. Be nice, and check out our Code of Conduct.













            user3613025 is a new contributor. Be nice, and check out our Code of Conduct.












            user3613025 is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48316%2faggregate-numpy-array-with-condition-as-mask%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to label and detect the document text images

            Vallis Paradisi

            Tabula Rosettana