Aggregate NumPy array with condition as mask
$begingroup$
I have a matrix $b$ with elements:
$$b =
begin{pmatrix}
0.01 & 0.02 & cdots & 1 \
0.01 & 0.02 & cdots & 1 \
vdots& vdots & ddots & vdots \
0.01 & 0.02 & cdots & 1 \
end{pmatrix}
$$For which through a series of calculation which is vectorised, $b$ is used to calculate $a$ which is another matrix that has the same dimension/shape as $b$.
$$a =
begin{pmatrix}
3 & 5 & cdots & 17 \
2 & 6 & cdots & 23 \
vdots& vdots & ddots & vdots \
4 & 3 & cdots & 19 \
end{pmatrix}
$$
At this point it is important to note that the elements of $a$ and $b$ have a one to one correspondence. The different row values(let's call it $sigma$) $0.01, 0.02...$ are different parameters for a series of simulations that I'm running. Hence for a fixed value of say $sigma = 0.01$, the length of its column values correspond to the total number of "simulations" I'm running for that particular parameter. If you know python vectorisation then you'll start to understand what I'm doing.
It is known that higher the $sigma$, the more the number of simulations for that particular sigma will have a value higher than 5 i.e. more of the matrix element along a column will have value bigger than 5. Essentially what I'm doing is vectorising $N$(columns) different simulations for $M$(rows) different parameters. Now I wish to find out the value of $sigma$ for which the total number simulation that's bigger than 5, is bigger than 95% of the total simulation.
To put it more concisely, for a $sigma$ of 0.02, each simulation would have results of $$5, 6, ..., 3$$ with say a total simulation of $N$. So let $$kappa = sum{ (text{all the simulations that have values bigger than 5})},$$I wish to find out the FIRST $sigma$ for which
$$frac{kappa}{N} > 0.95*N$$
i.e. the FIRST $sigma$ for which the proportion of total experiment for which its value $>5$ is bigger than 95% of the total number of experiment.
The code that I have written is:
# say 10000 simulations for a particular sigma
SIMULATION = 10000
# say 100 different values of sigma ranging from 0.01 to 1
# this is equivalent to matrix b in mathjax above
SIGMA = np.ones((EXPERIMENTS,100))*np.linspace(0.01, 1, 100)
def return_sigma(matrix, simulation, sigma):
"""
My idea here is I put in sigma and matrix and total number of simulation.
Each time using np.ndenumerate looping over i and j to compare if the
element values are greater than 5. If yes then I add 1 to counter, if no
then continue. If the number of experiments with result bigger than 5 is
bigger than 95% of total number of experiment then I return that particular
sigma.
"""
counter = 0
for (i, j), value in np.ndenumerate(matrix):
if value[i, j] > 5:
counter+=1
if counter/experiments > 0.95*simulation:
break
return sigma[0, j] # sigma[:, j] should all be the same anyway
"""Now this can be ran by:"""
print(return_sigma(a, SIMULATION, SIGMA))
which doesn't seem to quite work as I'm not well-versed with 2D slicing comprehension so this is quite a challenging problem for me. Thanks in advance.
EDIT
I apologise on not giving away my calculation as it's sort of a coursework of mine. I have generated a
for 15 different values of $sigma$ with 15 simulations each, and here they are:
array([[ 6, 2, 12, 12, 14, 14, 11, 11, 9, 23, 15, 3, 10, 12, 10],
[ 7, 7, 6, 9, 13, 8, 11, 17, 13, 8, 10, 16, 11, 16, 8],
[14, 6, 4, 8, 10, 9, 11, 14, 12, 14, 5, 8, 18, 29, 22],
[ 4, 12, 12, 3, 7, 8, 5, 13, 13, 10, 14, 16, 22, 15, 22],
[ 9, 8, 7, 12, 12, 6, 4, 13, 12, 12, 18, 20, 18, 14, 23],
[ 8, 6, 8, 6, 12, 11, 11, 4, 9, 9, 13, 19, 13, 11, 20],
[12, 8, 7, 17, 3, 9, 11, 5, 12, 24, 11, 12, 17, 9, 16],
[ 4, 8, 7, 5, 6, 10, 9, 6, 4, 13, 13, 14, 18, 20, 23],
[ 5, 10, 5, 6, 8, 4, 7, 7, 10, 11, 9, 22, 14, 30, 17],
[ 6, 4, 5, 9, 8, 8, 4, 21, 14, 18, 21, 13, 14, 22, 10],
[ 6, 2, 7, 7, 8, 3, 7, 19, 14, 7, 13, 12, 18, 8, 12],
[ 5, 7, 6, 4, 13, 9, 4, 3, 20, 11, 11, 8, 12, 29, 14],
[ 6, 3, 13, 6, 12, 10, 17, 6, 9, 15, 12, 12, 16, 12, 15],
[ 2, 9, 8, 15, 5, 4, 5, 7, 16, 13, 20, 18, 14, 18, 14],
[14, 10, 7, 11, 8, 13, 14, 13, 12, 19, 9, 10, 11, 17, 13]])
As you can see as $sigma$ gets higher the number of matrix elements in each column for which it is bigger than 5 is higher.
EDIT 2
So now condition
is giving me the right thing, which is an array of booleans.
array([[False, False, False, False, False, False, False, False, True, True],
....................................................................,
[False, False, False, False, False, False, False, True, True, True]])
So now the last row is the important thing here as it corresponds to the parameters, in this case,
array([[0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.5],
...........................................................,
[0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5]])
Now the last row of condition
is telling me that first True
happens at $sigma$=0.4 i.e. for which all the > 95% of the total simulations for that $sigma$ have simulation result of > 5. So now I need to return the index of condition
where the first True
in the last row appeared i.e. [i, j]
. Now doing b[i, j]
should give me the parameter I want.(which I'm not sure if your next few line of codes are doing that.)
python dataset data numpy
New contributor
$endgroup$
add a comment |
$begingroup$
I have a matrix $b$ with elements:
$$b =
begin{pmatrix}
0.01 & 0.02 & cdots & 1 \
0.01 & 0.02 & cdots & 1 \
vdots& vdots & ddots & vdots \
0.01 & 0.02 & cdots & 1 \
end{pmatrix}
$$For which through a series of calculation which is vectorised, $b$ is used to calculate $a$ which is another matrix that has the same dimension/shape as $b$.
$$a =
begin{pmatrix}
3 & 5 & cdots & 17 \
2 & 6 & cdots & 23 \
vdots& vdots & ddots & vdots \
4 & 3 & cdots & 19 \
end{pmatrix}
$$
At this point it is important to note that the elements of $a$ and $b$ have a one to one correspondence. The different row values(let's call it $sigma$) $0.01, 0.02...$ are different parameters for a series of simulations that I'm running. Hence for a fixed value of say $sigma = 0.01$, the length of its column values correspond to the total number of "simulations" I'm running for that particular parameter. If you know python vectorisation then you'll start to understand what I'm doing.
It is known that higher the $sigma$, the more the number of simulations for that particular sigma will have a value higher than 5 i.e. more of the matrix element along a column will have value bigger than 5. Essentially what I'm doing is vectorising $N$(columns) different simulations for $M$(rows) different parameters. Now I wish to find out the value of $sigma$ for which the total number simulation that's bigger than 5, is bigger than 95% of the total simulation.
To put it more concisely, for a $sigma$ of 0.02, each simulation would have results of $$5, 6, ..., 3$$ with say a total simulation of $N$. So let $$kappa = sum{ (text{all the simulations that have values bigger than 5})},$$I wish to find out the FIRST $sigma$ for which
$$frac{kappa}{N} > 0.95*N$$
i.e. the FIRST $sigma$ for which the proportion of total experiment for which its value $>5$ is bigger than 95% of the total number of experiment.
The code that I have written is:
# say 10000 simulations for a particular sigma
SIMULATION = 10000
# say 100 different values of sigma ranging from 0.01 to 1
# this is equivalent to matrix b in mathjax above
SIGMA = np.ones((EXPERIMENTS,100))*np.linspace(0.01, 1, 100)
def return_sigma(matrix, simulation, sigma):
"""
My idea here is I put in sigma and matrix and total number of simulation.
Each time using np.ndenumerate looping over i and j to compare if the
element values are greater than 5. If yes then I add 1 to counter, if no
then continue. If the number of experiments with result bigger than 5 is
bigger than 95% of total number of experiment then I return that particular
sigma.
"""
counter = 0
for (i, j), value in np.ndenumerate(matrix):
if value[i, j] > 5:
counter+=1
if counter/experiments > 0.95*simulation:
break
return sigma[0, j] # sigma[:, j] should all be the same anyway
"""Now this can be ran by:"""
print(return_sigma(a, SIMULATION, SIGMA))
which doesn't seem to quite work as I'm not well-versed with 2D slicing comprehension so this is quite a challenging problem for me. Thanks in advance.
EDIT
I apologise on not giving away my calculation as it's sort of a coursework of mine. I have generated a
for 15 different values of $sigma$ with 15 simulations each, and here they are:
array([[ 6, 2, 12, 12, 14, 14, 11, 11, 9, 23, 15, 3, 10, 12, 10],
[ 7, 7, 6, 9, 13, 8, 11, 17, 13, 8, 10, 16, 11, 16, 8],
[14, 6, 4, 8, 10, 9, 11, 14, 12, 14, 5, 8, 18, 29, 22],
[ 4, 12, 12, 3, 7, 8, 5, 13, 13, 10, 14, 16, 22, 15, 22],
[ 9, 8, 7, 12, 12, 6, 4, 13, 12, 12, 18, 20, 18, 14, 23],
[ 8, 6, 8, 6, 12, 11, 11, 4, 9, 9, 13, 19, 13, 11, 20],
[12, 8, 7, 17, 3, 9, 11, 5, 12, 24, 11, 12, 17, 9, 16],
[ 4, 8, 7, 5, 6, 10, 9, 6, 4, 13, 13, 14, 18, 20, 23],
[ 5, 10, 5, 6, 8, 4, 7, 7, 10, 11, 9, 22, 14, 30, 17],
[ 6, 4, 5, 9, 8, 8, 4, 21, 14, 18, 21, 13, 14, 22, 10],
[ 6, 2, 7, 7, 8, 3, 7, 19, 14, 7, 13, 12, 18, 8, 12],
[ 5, 7, 6, 4, 13, 9, 4, 3, 20, 11, 11, 8, 12, 29, 14],
[ 6, 3, 13, 6, 12, 10, 17, 6, 9, 15, 12, 12, 16, 12, 15],
[ 2, 9, 8, 15, 5, 4, 5, 7, 16, 13, 20, 18, 14, 18, 14],
[14, 10, 7, 11, 8, 13, 14, 13, 12, 19, 9, 10, 11, 17, 13]])
As you can see as $sigma$ gets higher the number of matrix elements in each column for which it is bigger than 5 is higher.
EDIT 2
So now condition
is giving me the right thing, which is an array of booleans.
array([[False, False, False, False, False, False, False, False, True, True],
....................................................................,
[False, False, False, False, False, False, False, True, True, True]])
So now the last row is the important thing here as it corresponds to the parameters, in this case,
array([[0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.5],
...........................................................,
[0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5]])
Now the last row of condition
is telling me that first True
happens at $sigma$=0.4 i.e. for which all the > 95% of the total simulations for that $sigma$ have simulation result of > 5. So now I need to return the index of condition
where the first True
in the last row appeared i.e. [i, j]
. Now doing b[i, j]
should give me the parameter I want.(which I'm not sure if your next few line of codes are doing that.)
python dataset data numpy
New contributor
$endgroup$
$begingroup$
If you could provide the matrix:a
(a dummy version at least) it would be helpful to check output against your expectations.
$endgroup$
– n1k31t4
Mar 31 at 23:22
$begingroup$
Hi thanks for the reminder, I've addeda
to my edit.
$endgroup$
– user3613025
Mar 31 at 23:41
$begingroup$
Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
$endgroup$
– n1k31t4
Mar 31 at 23:47
add a comment |
$begingroup$
I have a matrix $b$ with elements:
$$b =
begin{pmatrix}
0.01 & 0.02 & cdots & 1 \
0.01 & 0.02 & cdots & 1 \
vdots& vdots & ddots & vdots \
0.01 & 0.02 & cdots & 1 \
end{pmatrix}
$$For which through a series of calculation which is vectorised, $b$ is used to calculate $a$ which is another matrix that has the same dimension/shape as $b$.
$$a =
begin{pmatrix}
3 & 5 & cdots & 17 \
2 & 6 & cdots & 23 \
vdots& vdots & ddots & vdots \
4 & 3 & cdots & 19 \
end{pmatrix}
$$
At this point it is important to note that the elements of $a$ and $b$ have a one to one correspondence. The different row values(let's call it $sigma$) $0.01, 0.02...$ are different parameters for a series of simulations that I'm running. Hence for a fixed value of say $sigma = 0.01$, the length of its column values correspond to the total number of "simulations" I'm running for that particular parameter. If you know python vectorisation then you'll start to understand what I'm doing.
It is known that higher the $sigma$, the more the number of simulations for that particular sigma will have a value higher than 5 i.e. more of the matrix element along a column will have value bigger than 5. Essentially what I'm doing is vectorising $N$(columns) different simulations for $M$(rows) different parameters. Now I wish to find out the value of $sigma$ for which the total number simulation that's bigger than 5, is bigger than 95% of the total simulation.
To put it more concisely, for a $sigma$ of 0.02, each simulation would have results of $$5, 6, ..., 3$$ with say a total simulation of $N$. So let $$kappa = sum{ (text{all the simulations that have values bigger than 5})},$$I wish to find out the FIRST $sigma$ for which
$$frac{kappa}{N} > 0.95*N$$
i.e. the FIRST $sigma$ for which the proportion of total experiment for which its value $>5$ is bigger than 95% of the total number of experiment.
The code that I have written is:
# say 10000 simulations for a particular sigma
SIMULATION = 10000
# say 100 different values of sigma ranging from 0.01 to 1
# this is equivalent to matrix b in mathjax above
SIGMA = np.ones((EXPERIMENTS,100))*np.linspace(0.01, 1, 100)
def return_sigma(matrix, simulation, sigma):
"""
My idea here is I put in sigma and matrix and total number of simulation.
Each time using np.ndenumerate looping over i and j to compare if the
element values are greater than 5. If yes then I add 1 to counter, if no
then continue. If the number of experiments with result bigger than 5 is
bigger than 95% of total number of experiment then I return that particular
sigma.
"""
counter = 0
for (i, j), value in np.ndenumerate(matrix):
if value[i, j] > 5:
counter+=1
if counter/experiments > 0.95*simulation:
break
return sigma[0, j] # sigma[:, j] should all be the same anyway
"""Now this can be ran by:"""
print(return_sigma(a, SIMULATION, SIGMA))
which doesn't seem to quite work as I'm not well-versed with 2D slicing comprehension so this is quite a challenging problem for me. Thanks in advance.
EDIT
I apologise on not giving away my calculation as it's sort of a coursework of mine. I have generated a
for 15 different values of $sigma$ with 15 simulations each, and here they are:
array([[ 6, 2, 12, 12, 14, 14, 11, 11, 9, 23, 15, 3, 10, 12, 10],
[ 7, 7, 6, 9, 13, 8, 11, 17, 13, 8, 10, 16, 11, 16, 8],
[14, 6, 4, 8, 10, 9, 11, 14, 12, 14, 5, 8, 18, 29, 22],
[ 4, 12, 12, 3, 7, 8, 5, 13, 13, 10, 14, 16, 22, 15, 22],
[ 9, 8, 7, 12, 12, 6, 4, 13, 12, 12, 18, 20, 18, 14, 23],
[ 8, 6, 8, 6, 12, 11, 11, 4, 9, 9, 13, 19, 13, 11, 20],
[12, 8, 7, 17, 3, 9, 11, 5, 12, 24, 11, 12, 17, 9, 16],
[ 4, 8, 7, 5, 6, 10, 9, 6, 4, 13, 13, 14, 18, 20, 23],
[ 5, 10, 5, 6, 8, 4, 7, 7, 10, 11, 9, 22, 14, 30, 17],
[ 6, 4, 5, 9, 8, 8, 4, 21, 14, 18, 21, 13, 14, 22, 10],
[ 6, 2, 7, 7, 8, 3, 7, 19, 14, 7, 13, 12, 18, 8, 12],
[ 5, 7, 6, 4, 13, 9, 4, 3, 20, 11, 11, 8, 12, 29, 14],
[ 6, 3, 13, 6, 12, 10, 17, 6, 9, 15, 12, 12, 16, 12, 15],
[ 2, 9, 8, 15, 5, 4, 5, 7, 16, 13, 20, 18, 14, 18, 14],
[14, 10, 7, 11, 8, 13, 14, 13, 12, 19, 9, 10, 11, 17, 13]])
As you can see as $sigma$ gets higher the number of matrix elements in each column for which it is bigger than 5 is higher.
EDIT 2
So now condition
is giving me the right thing, which is an array of booleans.
array([[False, False, False, False, False, False, False, False, True, True],
....................................................................,
[False, False, False, False, False, False, False, True, True, True]])
So now the last row is the important thing here as it corresponds to the parameters, in this case,
array([[0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.5],
...........................................................,
[0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5]])
Now the last row of condition
is telling me that first True
happens at $sigma$=0.4 i.e. for which all the > 95% of the total simulations for that $sigma$ have simulation result of > 5. So now I need to return the index of condition
where the first True
in the last row appeared i.e. [i, j]
. Now doing b[i, j]
should give me the parameter I want.(which I'm not sure if your next few line of codes are doing that.)
python dataset data numpy
New contributor
$endgroup$
I have a matrix $b$ with elements:
$$b =
begin{pmatrix}
0.01 & 0.02 & cdots & 1 \
0.01 & 0.02 & cdots & 1 \
vdots& vdots & ddots & vdots \
0.01 & 0.02 & cdots & 1 \
end{pmatrix}
$$For which through a series of calculation which is vectorised, $b$ is used to calculate $a$ which is another matrix that has the same dimension/shape as $b$.
$$a =
begin{pmatrix}
3 & 5 & cdots & 17 \
2 & 6 & cdots & 23 \
vdots& vdots & ddots & vdots \
4 & 3 & cdots & 19 \
end{pmatrix}
$$
At this point it is important to note that the elements of $a$ and $b$ have a one to one correspondence. The different row values(let's call it $sigma$) $0.01, 0.02...$ are different parameters for a series of simulations that I'm running. Hence for a fixed value of say $sigma = 0.01$, the length of its column values correspond to the total number of "simulations" I'm running for that particular parameter. If you know python vectorisation then you'll start to understand what I'm doing.
It is known that higher the $sigma$, the more the number of simulations for that particular sigma will have a value higher than 5 i.e. more of the matrix element along a column will have value bigger than 5. Essentially what I'm doing is vectorising $N$(columns) different simulations for $M$(rows) different parameters. Now I wish to find out the value of $sigma$ for which the total number simulation that's bigger than 5, is bigger than 95% of the total simulation.
To put it more concisely, for a $sigma$ of 0.02, each simulation would have results of $$5, 6, ..., 3$$ with say a total simulation of $N$. So let $$kappa = sum{ (text{all the simulations that have values bigger than 5})},$$I wish to find out the FIRST $sigma$ for which
$$frac{kappa}{N} > 0.95*N$$
i.e. the FIRST $sigma$ for which the proportion of total experiment for which its value $>5$ is bigger than 95% of the total number of experiment.
The code that I have written is:
# say 10000 simulations for a particular sigma
SIMULATION = 10000
# say 100 different values of sigma ranging from 0.01 to 1
# this is equivalent to matrix b in mathjax above
SIGMA = np.ones((EXPERIMENTS,100))*np.linspace(0.01, 1, 100)
def return_sigma(matrix, simulation, sigma):
"""
My idea here is I put in sigma and matrix and total number of simulation.
Each time using np.ndenumerate looping over i and j to compare if the
element values are greater than 5. If yes then I add 1 to counter, if no
then continue. If the number of experiments with result bigger than 5 is
bigger than 95% of total number of experiment then I return that particular
sigma.
"""
counter = 0
for (i, j), value in np.ndenumerate(matrix):
if value[i, j] > 5:
counter+=1
if counter/experiments > 0.95*simulation:
break
return sigma[0, j] # sigma[:, j] should all be the same anyway
"""Now this can be ran by:"""
print(return_sigma(a, SIMULATION, SIGMA))
which doesn't seem to quite work as I'm not well-versed with 2D slicing comprehension so this is quite a challenging problem for me. Thanks in advance.
EDIT
I apologise on not giving away my calculation as it's sort of a coursework of mine. I have generated a
for 15 different values of $sigma$ with 15 simulations each, and here they are:
array([[ 6, 2, 12, 12, 14, 14, 11, 11, 9, 23, 15, 3, 10, 12, 10],
[ 7, 7, 6, 9, 13, 8, 11, 17, 13, 8, 10, 16, 11, 16, 8],
[14, 6, 4, 8, 10, 9, 11, 14, 12, 14, 5, 8, 18, 29, 22],
[ 4, 12, 12, 3, 7, 8, 5, 13, 13, 10, 14, 16, 22, 15, 22],
[ 9, 8, 7, 12, 12, 6, 4, 13, 12, 12, 18, 20, 18, 14, 23],
[ 8, 6, 8, 6, 12, 11, 11, 4, 9, 9, 13, 19, 13, 11, 20],
[12, 8, 7, 17, 3, 9, 11, 5, 12, 24, 11, 12, 17, 9, 16],
[ 4, 8, 7, 5, 6, 10, 9, 6, 4, 13, 13, 14, 18, 20, 23],
[ 5, 10, 5, 6, 8, 4, 7, 7, 10, 11, 9, 22, 14, 30, 17],
[ 6, 4, 5, 9, 8, 8, 4, 21, 14, 18, 21, 13, 14, 22, 10],
[ 6, 2, 7, 7, 8, 3, 7, 19, 14, 7, 13, 12, 18, 8, 12],
[ 5, 7, 6, 4, 13, 9, 4, 3, 20, 11, 11, 8, 12, 29, 14],
[ 6, 3, 13, 6, 12, 10, 17, 6, 9, 15, 12, 12, 16, 12, 15],
[ 2, 9, 8, 15, 5, 4, 5, 7, 16, 13, 20, 18, 14, 18, 14],
[14, 10, 7, 11, 8, 13, 14, 13, 12, 19, 9, 10, 11, 17, 13]])
As you can see as $sigma$ gets higher the number of matrix elements in each column for which it is bigger than 5 is higher.
EDIT 2
So now condition
is giving me the right thing, which is an array of booleans.
array([[False, False, False, False, False, False, False, False, True, True],
....................................................................,
[False, False, False, False, False, False, False, True, True, True]])
So now the last row is the important thing here as it corresponds to the parameters, in this case,
array([[0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.5],
...........................................................,
[0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5]])
Now the last row of condition
is telling me that first True
happens at $sigma$=0.4 i.e. for which all the > 95% of the total simulations for that $sigma$ have simulation result of > 5. So now I need to return the index of condition
where the first True
in the last row appeared i.e. [i, j]
. Now doing b[i, j]
should give me the parameter I want.(which I'm not sure if your next few line of codes are doing that.)
python dataset data numpy
python dataset data numpy
New contributor
New contributor
edited 12 hours ago
n1k31t4
6,4912421
6,4912421
New contributor
asked Mar 31 at 21:40
user3613025user3613025
234
234
New contributor
New contributor
$begingroup$
If you could provide the matrix:a
(a dummy version at least) it would be helpful to check output against your expectations.
$endgroup$
– n1k31t4
Mar 31 at 23:22
$begingroup$
Hi thanks for the reminder, I've addeda
to my edit.
$endgroup$
– user3613025
Mar 31 at 23:41
$begingroup$
Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
$endgroup$
– n1k31t4
Mar 31 at 23:47
add a comment |
$begingroup$
If you could provide the matrix:a
(a dummy version at least) it would be helpful to check output against your expectations.
$endgroup$
– n1k31t4
Mar 31 at 23:22
$begingroup$
Hi thanks for the reminder, I've addeda
to my edit.
$endgroup$
– user3613025
Mar 31 at 23:41
$begingroup$
Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
$endgroup$
– n1k31t4
Mar 31 at 23:47
$begingroup$
If you could provide the matrix:
a
(a dummy version at least) it would be helpful to check output against your expectations.$endgroup$
– n1k31t4
Mar 31 at 23:22
$begingroup$
If you could provide the matrix:
a
(a dummy version at least) it would be helpful to check output against your expectations.$endgroup$
– n1k31t4
Mar 31 at 23:22
$begingroup$
Hi thanks for the reminder, I've added
a
to my edit.$endgroup$
– user3613025
Mar 31 at 23:41
$begingroup$
Hi thanks for the reminder, I've added
a
to my edit.$endgroup$
– user3613025
Mar 31 at 23:41
$begingroup$
Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
$endgroup$
– n1k31t4
Mar 31 at 23:47
$begingroup$
Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
$endgroup$
– n1k31t4
Mar 31 at 23:47
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
I think I have understood your problem (mostly from the comments added in your function).
I'll show step by step what the logic is, building upon each previous step to get the final solution.
First we want to find all position where the matrix is larger than 5:
a > 5 # returns a boolean array with true/false in each position
Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:
(a > 5) / SIMULATION # returns the value of one match
These values are required to sum to your threshold for an experiment to be valid.
Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).
np.cumsum((a > 5) / SIMULATION, axis=1) # still same shape as b
Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:
## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
## because we already "normalised" the values within the cumsum.
condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
mask = np.where(condition)
I broke it down now as the expressions are getting long.
That gave us the i
and j
coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:
valid_rows = np.unique(mask[0], return_index=True)[1] # [1] gets the indices themselves
Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:
valid_cols = mask[1][valid_rows]
So now you can get the corresponding values from the parameter matrix using these valid rows/columns:
params = b[valid_rows, valid_cols]
If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.
$endgroup$
$begingroup$
Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
$endgroup$
– user3613025
Mar 31 at 23:45
$begingroup$
your method seems to be doing fine until I tried to printmask
where it'd just keep giving me an empty array, and subsequently allvalid_rows
,valid_cols
andparams
become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
$endgroup$
– user3613025
Apr 2 at 13:51
$begingroup$
From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change thecondition
line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
$endgroup$
– n1k31t4
Apr 2 at 16:08
$begingroup$
Sorry let me explain it more clearly. Looking at the examplea
output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
$endgroup$
– user3613025
Apr 2 at 16:42
$begingroup$
So each row inb
is identical? And each column ina
contains the results ofnum_rows
experiments for the sigma value of that column? Could the solution then be as simple as to altercondition
to performnp.cumsum(..., axis=0)...
?
$endgroup$
– n1k31t4
Apr 2 at 23:56
|
show 5 more comments
$begingroup$
Is this helpful?
import numpy as np, numpy.random as npr
N_sims = 15 # sims per sigma
N_vals = 15 # num sigmas
# Parameters
SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
# Generate "results" :3 (i.e., the matrix a)
RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
for i in range(N_vals):
RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
print("SIGMAn", SIGMA)
print("RESULTSn", RESULTS)
# Mark the positions > 5
more_than_five = RESULTS > 5
print("more_than_fiven", more_than_five)
# Count how many are greater than five, per column (i.e., per sigma)
counts = more_than_five.sum(axis=0)
print('COUNTSn', counts)
# Compute the proportions (so, 1 if all exps were > 5)
proportions = counts.astype(float) / N_sims
print('Proportionsn', proportions)
# Find the first time it is larger than 0.5
first_index = np.argmax( proportions > 0.95 )
print('---nFIRST INDEXn', first_index)
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
user3613025 is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48316%2faggregate-numpy-array-with-condition-as-mask%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I think I have understood your problem (mostly from the comments added in your function).
I'll show step by step what the logic is, building upon each previous step to get the final solution.
First we want to find all position where the matrix is larger than 5:
a > 5 # returns a boolean array with true/false in each position
Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:
(a > 5) / SIMULATION # returns the value of one match
These values are required to sum to your threshold for an experiment to be valid.
Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).
np.cumsum((a > 5) / SIMULATION, axis=1) # still same shape as b
Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:
## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
## because we already "normalised" the values within the cumsum.
condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
mask = np.where(condition)
I broke it down now as the expressions are getting long.
That gave us the i
and j
coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:
valid_rows = np.unique(mask[0], return_index=True)[1] # [1] gets the indices themselves
Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:
valid_cols = mask[1][valid_rows]
So now you can get the corresponding values from the parameter matrix using these valid rows/columns:
params = b[valid_rows, valid_cols]
If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.
$endgroup$
$begingroup$
Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
$endgroup$
– user3613025
Mar 31 at 23:45
$begingroup$
your method seems to be doing fine until I tried to printmask
where it'd just keep giving me an empty array, and subsequently allvalid_rows
,valid_cols
andparams
become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
$endgroup$
– user3613025
Apr 2 at 13:51
$begingroup$
From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change thecondition
line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
$endgroup$
– n1k31t4
Apr 2 at 16:08
$begingroup$
Sorry let me explain it more clearly. Looking at the examplea
output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
$endgroup$
– user3613025
Apr 2 at 16:42
$begingroup$
So each row inb
is identical? And each column ina
contains the results ofnum_rows
experiments for the sigma value of that column? Could the solution then be as simple as to altercondition
to performnp.cumsum(..., axis=0)...
?
$endgroup$
– n1k31t4
Apr 2 at 23:56
|
show 5 more comments
$begingroup$
I think I have understood your problem (mostly from the comments added in your function).
I'll show step by step what the logic is, building upon each previous step to get the final solution.
First we want to find all position where the matrix is larger than 5:
a > 5 # returns a boolean array with true/false in each position
Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:
(a > 5) / SIMULATION # returns the value of one match
These values are required to sum to your threshold for an experiment to be valid.
Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).
np.cumsum((a > 5) / SIMULATION, axis=1) # still same shape as b
Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:
## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
## because we already "normalised" the values within the cumsum.
condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
mask = np.where(condition)
I broke it down now as the expressions are getting long.
That gave us the i
and j
coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:
valid_rows = np.unique(mask[0], return_index=True)[1] # [1] gets the indices themselves
Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:
valid_cols = mask[1][valid_rows]
So now you can get the corresponding values from the parameter matrix using these valid rows/columns:
params = b[valid_rows, valid_cols]
If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.
$endgroup$
$begingroup$
Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
$endgroup$
– user3613025
Mar 31 at 23:45
$begingroup$
your method seems to be doing fine until I tried to printmask
where it'd just keep giving me an empty array, and subsequently allvalid_rows
,valid_cols
andparams
become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
$endgroup$
– user3613025
Apr 2 at 13:51
$begingroup$
From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change thecondition
line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
$endgroup$
– n1k31t4
Apr 2 at 16:08
$begingroup$
Sorry let me explain it more clearly. Looking at the examplea
output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
$endgroup$
– user3613025
Apr 2 at 16:42
$begingroup$
So each row inb
is identical? And each column ina
contains the results ofnum_rows
experiments for the sigma value of that column? Could the solution then be as simple as to altercondition
to performnp.cumsum(..., axis=0)...
?
$endgroup$
– n1k31t4
Apr 2 at 23:56
|
show 5 more comments
$begingroup$
I think I have understood your problem (mostly from the comments added in your function).
I'll show step by step what the logic is, building upon each previous step to get the final solution.
First we want to find all position where the matrix is larger than 5:
a > 5 # returns a boolean array with true/false in each position
Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:
(a > 5) / SIMULATION # returns the value of one match
These values are required to sum to your threshold for an experiment to be valid.
Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).
np.cumsum((a > 5) / SIMULATION, axis=1) # still same shape as b
Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:
## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
## because we already "normalised" the values within the cumsum.
condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
mask = np.where(condition)
I broke it down now as the expressions are getting long.
That gave us the i
and j
coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:
valid_rows = np.unique(mask[0], return_index=True)[1] # [1] gets the indices themselves
Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:
valid_cols = mask[1][valid_rows]
So now you can get the corresponding values from the parameter matrix using these valid rows/columns:
params = b[valid_rows, valid_cols]
If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.
$endgroup$
I think I have understood your problem (mostly from the comments added in your function).
I'll show step by step what the logic is, building upon each previous step to get the final solution.
First we want to find all position where the matrix is larger than 5:
a > 5 # returns a boolean array with true/false in each position
Now we want to check each row to count if the proportion of of matches (> 5) has reached a certain threshold; $N * 0.95$. We can divide by the number of simulations (number of columns) to essentially normalise by the number of simulations:
(a > 5) / SIMULATION # returns the value of one match
These values are required to sum to your threshold for an experiment to be valid.
Now we cumulatively sum across each row. As the True/False array is ones and zeros, we now have a running total of the numbers of matches for each experiment (each row).
np.cumsum((a > 5) / SIMULATION, axis=1) # still same shape as b
Now we just need to find out where (in each row) the sum of matches reaches your threshold. We can use np.where:
## EDIT: we only need to check the cumsum is greater than 0.95 and not (0.95 * SUMLATION)
## because we already "normalised" the values within the cumsum.
condition = np.cumsum((a > 5) / SIMULATION, axis=0) > 0.95
mask = np.where(condition)
I broke it down now as the expressions are getting long.
That gave us the i
and j
coordinates of places where the condition was True. We just want to find the place where we first breached the threshold, so we want to find the indices for the first time in each row:
valid_rows = np.unique(mask[0], return_index=True)[1] # [1] gets the indices themselves
Now we can simply use these indices to get the first index in each valid row, where the threshold was breached:
valid_cols = mask[1][valid_rows]
So now you can get the corresponding values from the parameter matrix using these valid rows/columns:
params = b[valid_rows, valid_cols]
If this is correct, it should be significantly faster than your solution because it avoids looping over the 2d array and instead utilises NumPy's vectorised method and ufuncs.
edited 2 days ago
answered Mar 31 at 23:43
n1k31t4n1k31t4
6,4912421
6,4912421
$begingroup$
Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
$endgroup$
– user3613025
Mar 31 at 23:45
$begingroup$
your method seems to be doing fine until I tried to printmask
where it'd just keep giving me an empty array, and subsequently allvalid_rows
,valid_cols
andparams
become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
$endgroup$
– user3613025
Apr 2 at 13:51
$begingroup$
From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change thecondition
line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
$endgroup$
– n1k31t4
Apr 2 at 16:08
$begingroup$
Sorry let me explain it more clearly. Looking at the examplea
output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
$endgroup$
– user3613025
Apr 2 at 16:42
$begingroup$
So each row inb
is identical? And each column ina
contains the results ofnum_rows
experiments for the sigma value of that column? Could the solution then be as simple as to altercondition
to performnp.cumsum(..., axis=0)...
?
$endgroup$
– n1k31t4
Apr 2 at 23:56
|
show 5 more comments
$begingroup$
Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
$endgroup$
– user3613025
Mar 31 at 23:45
$begingroup$
your method seems to be doing fine until I tried to printmask
where it'd just keep giving me an empty array, and subsequently allvalid_rows
,valid_cols
andparams
become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like
$endgroup$
– user3613025
Apr 2 at 13:51
$begingroup$
From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change thecondition
line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.
$endgroup$
– n1k31t4
Apr 2 at 16:08
$begingroup$
Sorry let me explain it more clearly. Looking at the examplea
output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.
$endgroup$
– user3613025
Apr 2 at 16:42
$begingroup$
So each row inb
is identical? And each column ina
contains the results ofnum_rows
experiments for the sigma value of that column? Could the solution then be as simple as to altercondition
to performnp.cumsum(..., axis=0)...
?
$endgroup$
– n1k31t4
Apr 2 at 23:56
$begingroup$
Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
$endgroup$
– user3613025
Mar 31 at 23:45
$begingroup$
Hi I'd really love to try your method but it's really late here and I can barely open my eyes so I'm going to try it tomorrow and let you know. Cheers.
$endgroup$
– user3613025
Mar 31 at 23:45
$begingroup$
your method seems to be doing fine until I tried to print
mask
where it'd just keep giving me an empty array, and subsequently all valid_rows
, valid_cols
and params
become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like$endgroup$
– user3613025
Apr 2 at 13:51
$begingroup$
your method seems to be doing fine until I tried to print
mask
where it'd just keep giving me an empty array, and subsequently all valid_rows
, valid_cols
and params
become empty arrays too. Even if the first $sigma$ value had already given me over 95% of > 5, it your param should still be returning the first $sigma$ value right? Happy to send you my data code in private if you'd like$endgroup$
– user3613025
Apr 2 at 13:51
$begingroup$
From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the
condition
line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.$endgroup$
– n1k31t4
Apr 2 at 16:08
$begingroup$
From your comment it sounds like I interpreted the logic incorrectly. Each value must be over (0.95 * 5) then you should just change the
condition
line to match your needs. My line checks that more than 95% of the experiment's simulation are over 5. That sound different to your description.$endgroup$
– n1k31t4
Apr 2 at 16:08
$begingroup$
Sorry let me explain it more clearly. Looking at the example
a
output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.$endgroup$
– user3613025
Apr 2 at 16:42
$begingroup$
Sorry let me explain it more clearly. Looking at the example
a
output that I gave above, each column represent the set of simulations that are being ran with a specific value of $sigma$ in increasing order(across the rows). I would like to find the first sigma value for which the total number of elements in that set of simulations that have values > 5 is bigger than 95% of the number of the simulations. So for 100 simulations(each matrix element is each simulation), if 96 of them turns out to be >5 (bigger than 95% of total simulation), I want that particular $sigma$ value.$endgroup$
– user3613025
Apr 2 at 16:42
$begingroup$
So each row in
b
is identical? And each column in a
contains the results of num_rows
experiments for the sigma value of that column? Could the solution then be as simple as to alter condition
to perform np.cumsum(..., axis=0)...
?$endgroup$
– n1k31t4
Apr 2 at 23:56
$begingroup$
So each row in
b
is identical? And each column in a
contains the results of num_rows
experiments for the sigma value of that column? Could the solution then be as simple as to alter condition
to perform np.cumsum(..., axis=0)...
?$endgroup$
– n1k31t4
Apr 2 at 23:56
|
show 5 more comments
$begingroup$
Is this helpful?
import numpy as np, numpy.random as npr
N_sims = 15 # sims per sigma
N_vals = 15 # num sigmas
# Parameters
SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
# Generate "results" :3 (i.e., the matrix a)
RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
for i in range(N_vals):
RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
print("SIGMAn", SIGMA)
print("RESULTSn", RESULTS)
# Mark the positions > 5
more_than_five = RESULTS > 5
print("more_than_fiven", more_than_five)
# Count how many are greater than five, per column (i.e., per sigma)
counts = more_than_five.sum(axis=0)
print('COUNTSn', counts)
# Compute the proportions (so, 1 if all exps were > 5)
proportions = counts.astype(float) / N_sims
print('Proportionsn', proportions)
# Find the first time it is larger than 0.5
first_index = np.argmax( proportions > 0.95 )
print('---nFIRST INDEXn', first_index)
$endgroup$
add a comment |
$begingroup$
Is this helpful?
import numpy as np, numpy.random as npr
N_sims = 15 # sims per sigma
N_vals = 15 # num sigmas
# Parameters
SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
# Generate "results" :3 (i.e., the matrix a)
RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
for i in range(N_vals):
RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
print("SIGMAn", SIGMA)
print("RESULTSn", RESULTS)
# Mark the positions > 5
more_than_five = RESULTS > 5
print("more_than_fiven", more_than_five)
# Count how many are greater than five, per column (i.e., per sigma)
counts = more_than_five.sum(axis=0)
print('COUNTSn', counts)
# Compute the proportions (so, 1 if all exps were > 5)
proportions = counts.astype(float) / N_sims
print('Proportionsn', proportions)
# Find the first time it is larger than 0.5
first_index = np.argmax( proportions > 0.95 )
print('---nFIRST INDEXn', first_index)
$endgroup$
add a comment |
$begingroup$
Is this helpful?
import numpy as np, numpy.random as npr
N_sims = 15 # sims per sigma
N_vals = 15 # num sigmas
# Parameters
SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
# Generate "results" :3 (i.e., the matrix a)
RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
for i in range(N_vals):
RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
print("SIGMAn", SIGMA)
print("RESULTSn", RESULTS)
# Mark the positions > 5
more_than_five = RESULTS > 5
print("more_than_fiven", more_than_five)
# Count how many are greater than five, per column (i.e., per sigma)
counts = more_than_five.sum(axis=0)
print('COUNTSn', counts)
# Compute the proportions (so, 1 if all exps were > 5)
proportions = counts.astype(float) / N_sims
print('Proportionsn', proportions)
# Find the first time it is larger than 0.5
first_index = np.argmax( proportions > 0.95 )
print('---nFIRST INDEXn', first_index)
$endgroup$
Is this helpful?
import numpy as np, numpy.random as npr
N_sims = 15 # sims per sigma
N_vals = 15 # num sigmas
# Parameters
SIGMA = np.ones( (N_sims, N_vals) ) * np.linspace(0.01, 1, N_vals)
# Generate "results" :3 (i.e., the matrix a)
RESULTS = npr.random_integers(low=1, high=10, size=SIGMA.shape)
for i in range(N_vals):
RESULTS[:, i] += npr.random_integers(low=0, high=1, size=(N_sims)) + i // 3
print("SIGMAn", SIGMA)
print("RESULTSn", RESULTS)
# Mark the positions > 5
more_than_five = RESULTS > 5
print("more_than_fiven", more_than_five)
# Count how many are greater than five, per column (i.e., per sigma)
counts = more_than_five.sum(axis=0)
print('COUNTSn', counts)
# Compute the proportions (so, 1 if all exps were > 5)
proportions = counts.astype(float) / N_sims
print('Proportionsn', proportions)
# Find the first time it is larger than 0.5
first_index = np.argmax( proportions > 0.95 )
print('---nFIRST INDEXn', first_index)
answered Mar 31 at 23:34
user3658307user3658307
1956
1956
add a comment |
add a comment |
user3613025 is a new contributor. Be nice, and check out our Code of Conduct.
user3613025 is a new contributor. Be nice, and check out our Code of Conduct.
user3613025 is a new contributor. Be nice, and check out our Code of Conduct.
user3613025 is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48316%2faggregate-numpy-array-with-condition-as-mask%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
If you could provide the matrix:
a
(a dummy version at least) it would be helpful to check output against your expectations.$endgroup$
– n1k31t4
Mar 31 at 23:22
$begingroup$
Hi thanks for the reminder, I've added
a
to my edit.$endgroup$
– user3613025
Mar 31 at 23:41
$begingroup$
Thanks for adding the example. I'd just like to point out that this smaller test matrix will probably never hit your target threshold of 95% of simulations going over 5 ;)
$endgroup$
– n1k31t4
Mar 31 at 23:47