Web page about "On the convergance of the Benjamini-Hochberg procedure"
by Dean Palejev and Mladen Savov

Below is the workflow for calculating the theoretical bound for the power of the Benjamini-Hochberg procedure, described in our article, when the alternative has a Beta distribution.

Notation used: there are m tests; the parameters of the Beta are denoted by firstParameter and secondParameter in the code. The proportion of significant tests is propSignif, or 1 - γ; the significance level α is sometimes denoted also by q.

First we start with some functions, G being the cdf of the Beta, H is described in the article in Definition 2.3:

G <- function(x, firstParameter, secondParameter) {return(pbeta(x, firstParameter, secondParameter))}
H <- function(x, firstParameter, secondParameter) {return((G(x, firstParameter, secondParameter) - x)/x)}

In order to evaluate the theoretical bound described in the article, we need to find the zeroes of the following functions in getPower (see the comment before that function):

G1 <- function(x, firstParameter, secondParameter, eta, bound) {return(G(x, firstParameter, secondParameter) - bound/(1-eta))}
G2 <- function(x, firstParameter, secondParameter, eta, bound) {return(G(x, firstParameter, secondParameter) - bound/(1+eta))}

We can find a numerical solution for x_{α, γ} as per article [8] in the references (J. A. Ferreira and A. H. Zwinderman. On the Benjamini-Hochberg method. Ann. Statist., 34(4):1827-1849, 2006):

f <- function (x, q, gamma, firstParameter, secondParameter)
			{G(q*x, firstParameter, secondParameter)/(q*x) - 1 - (1-q)/(q*(1-gamma))}

This is the function c from the article, appearing first in Section 2.2. It is related to the famous Petrov's inequality (see Proposition S1.10 in the Supplementary Material):

cF <- function(epsilon) {return((1+epsilon)*log(1+epsilon) - epsilon)}

This is the function K_F^M from the article, appearing first in Lemma 2.5, it evalues the contribution in the theoretical bound due to the non-significant tests:

KFM <- function(x, m, gamma, epsilon, alpha)
{
	tmpSum <- 0
	for (t in (ceiling(m*x):m))
	{
		t1 <- t*alpha/m	
		tmpSum <- tmpSum + exp(-m*gamma*t1*cF(epsilon/gamma))
	}
			
	return(2*min( tmpSum ,exp(-2*m*(epsilon*alpha*x)^2/gamma)))
}

This is the function K_G^M from the article, appearing first in Lemma 2.5, it evalues the contribution in the theoretical bound due to the significant tests:

KGM <- function(x, m, gamma, epsilon, alpha, firstParameter, secondParameter)				
{
	gamma1 <- 1-gamma
	tmpSum <- 0
	
	for (t in (ceiling(m*x):m))
	{
		t1 <- t*alpha/m	
		tmpSum <- tmpSum + exp(-m*gamma1*t1*G(t1, firstParameter, secondParameter)*cF(t1/(gamma1*G(t1, firstParameter, secondParameter))))
	}
		
	return(2*min(tmpSum, exp(-2*m*(epsilon*alpha*x)^2/gamma1) ))
}

This calculates numerically x_{α, γ}:

get_x_alpha_gamma <- function(firstParameter, secondParameter, propSignif, q)			
{
	perc <- propSignif
	gamma <- 1 - perc

	solution <- uniroot(f, lower=10^(-50), upper=1, tol = 10^(-10), q = q, gamma = gamma, 
			firstParameter = firstParameter, secondParameter = secondParameter)$root

	return(solution)
}

This is the main function, given m, the parameters of the Beta, the proportion of significant tests and η, it calculates the probability that the power is between lowerBound and upperBound (denoted by powerBound) and a special case when upperBound = 1 (the probability is denoted by powerBound2), and also returns the terms in the respective formulas. Here lowerBound and upperBound are the left and right ends of the interval in the left-hand side of equation (2.30), namely lowerBound = (1-η)G(αx₁) and upperBound = (1+η)G(αx₂). Also, x1 and x2 are the respective x₁ and x₂ (if they exist) for which the previous equalities are satisfied.

getPower <- function(firstParameter, secondParameter, propSignif, m, lowerBound, upperBound, eta)
{
		
# if we can't solve G1 (equal signs at the end), there is not much to return
if(G1(10^(-50), firstParameter = firstParameter, secondParameter = secondParameter, eta = eta, bound = lowerBound) * 
	G1(1, firstParameter = firstParameter, secondParameter = secondParameter, eta = eta, bound = lowerBound) > 0)
{
tmpOutput <- rep(NA, 6)
names(tmpOutput) <- c("powerBound", "powerBound2", "term1", "term2", "term3", "term3a")
return(tmpOutput)
}


### step (4) of the workflow
y1 <- uniroot(G1, lower=10^(-50), upper=1, tol = 10^(-10), firstParameter = firstParameter, secondParameter = secondParameter, eta = eta, bound = lowerBound)$root
x1 <- y1/alpha			

### similar to step (4) of the workflow, but when the upper bound is not 1
y2 <- uniroot(G2, lower=10^(-50), upper=2, tol = 10^(-10), firstParameter = firstParameter, secondParameter = secondParameter, eta = eta, bound = upperBound)$root
x2 <- y2/alpha			


if (ceiling(m*x1)/m >= x_alpha_gamma)   ### the condition in steps (3) and (5) of the article workflow is not met
{tmpOutput <- rep(NA, 6)}



if (ceiling(m*x1)/m < x_alpha_gamma)
{
epsilon1 <- (1-gamma)/2 *(H(alpha*ceiling(m*x1)/m,firstParameter, secondParameter) - H(alpha*x_alpha_gamma, firstParameter, secondParameter))
term1 <- (KFM(x1, m, gamma, epsilon1, alpha) + KGM(x1, m, gamma, epsilon1, alpha, firstParameter, secondParameter))
term3a <- (exp(-m*cF(eta)*(1-gamma)*G(alpha*x1, firstParameter, secondParameter)))
tmp2 <- 1- term1 - term3a			### the terms from equation (2.42) from the article; these also appear among others in (2.41)


if ((upperBound >= (1+eta)*G(alpha, firstParameter, secondParameter)) || ((1+eta)*G(alpha*x_alpha_gamma, firstParameter, secondParameter) > upperBound))
### here we can calculate only one of the probabilities
{
	tmpOutput <- c(NA, tmp2, term1, NA, NA, term3a)
}

if ((upperBound < (1+eta)*G(alpha, firstParameter, secondParameter)) && ((1+eta)*G(alpha*x_alpha_gamma, firstParameter, secondParameter) <= upperBound))
{
epsilon2 <- (1-gamma)/2 *(H(alpha*x_alpha_gamma, firstParameter, secondParameter) - H(alpha*ceiling(m*x2)/m, firstParameter, secondParameter))

term2 <- (KFM(x2, m, gamma, epsilon2, alpha) + KGM(x2, m, gamma, epsilon2, alpha, firstParameter, secondParameter)) 
term3 <- (exp(-m*cF(eta)*(1-gamma)*G(alpha*x1, firstParameter, secondParameter)) + exp(-m*cF(eta)*(1-gamma)*G(alpha*x2, firstParameter, secondParameter)))

tmp <- 1 - term1 - term2 - term3     ### the terms from equation (2.41) from the article;

tmpOutput <- c(tmp, tmp2, term1, term2, term3, term3a)
}
}

names(tmpOutput) <- c("powerBound", "powerBound2", "term1", "term2", "term3", "term3a")
return(tmpOutput)
}

Now, an example.

We will be running the above function over the following values of η and will take the maximum power achieved:

etaValues <- seq(0.001, 0.5, 0.001)

The following parameters can to be changed depending on the number of tests, the parameters of the Beta and the proportion of significant tests:

m <- 2000				# number of tests
firstParameter <- 0.1			# first Beta parameter
secondParameter <- 5			# second Beta parameter
alpha <- 0.05				# significance level	
propSignif <- 0.1			# proportion significant tests

q <- alpha
gamma <- 1-propSignif

If firstParameter < 1 and secondParameter ≥ 1 from Lemma 2.4 we have that condition (M) is satisfied.
Now we calculate and print x_{α, γ} and the real power limit as in steps (1) and (2) from the article workflow.

print(x_alpha_gamma <- get_x_alpha_gamma(firstParameter, secondParameter, propSignif, q))
print(realPowerLimit <- G(alpha*x_alpha_gamma, firstParameter, secondParameter))

that shows that x_{α, γ} = 0.07297654; and the real power limit is 0.696926.

Now we want to find a bound for the probability that the power is betwen two numbers, lowerBound and upperBound, that are so that the real power limit is between them. Here we have chosen values that are about 80% and 120% of that real power limit, but these can be anything as long as the real power limit is between them. As mentioned in the article, this works best when the lower bound is not very close to the real power limit.

lowerBound <- as.numeric(format(realPowerLimit * 0.8, digits = 3))
upperBound <- min(as.numeric(format(realPowerLimit * 1.2, digits = 3)), 1)

Then for each η, we calculate the probability bounds

tmpResults <- NULL
for (eta in etaValues)
{
	try(a <-  getPower(firstParameter, secondParameter, propSignif, m, lowerBound, upperBound, eta))

	tmpResults <- rbind(tmpResults, c(eta, a))
	colnames(tmpResults)[1] <- "eta"
}

tmpResults <- data.frame(tmpResults)

Now, we find for which η the maximum bound was achieved, which is step (6) of the article workflow, these are the columns eta and powerBound in the output of the line below:

format(tmpResults[which(tmpResults$powerBound == max(tmpResults$powerBound, na.rm = TRUE)), ], digits = 3)

which yields the following output

      eta powerBound powerBound2  term1   term2 term3 term3a
164 0.164      0.627       0.797 0.0216 0.00978 0.342  0.182

Note that if the required upper bound is 1, there are 2 options, to look for the maximum powerBound or powerBound2, we think that using powerBound2 yields better results, e.g.:

lowerBound <- as.numeric(format(realPowerLimit * 0.8, digits = 3))
upperBound <- 1

tmpResults <- NULL
for (eta in etaValues)
{
	try(a <-  getPower(firstParameter, secondParameter, propSignif, m, lowerBound, upperBound, eta))

	tmpResults <- rbind(tmpResults, c(eta, a))
	colnames(tmpResults)[1] <- "eta"
}

tmpResults <- data.frame(tmpResults)

and then compare the results of

format(tmpResults[which(tmpResults$powerBound == max(tmpResults$powerBound, na.rm = TRUE)), ], digits = 3)
format(tmpResults[which(tmpResults$powerBound2 == max(tmpResults$powerBound2, na.rm = TRUE)), ], digits = 3)

that are

      eta powerBound powerBound2  term1   term2 term3 term3a
168 0.168      0.693       0.794 0.0398     0 	0.267  0.166

      eta powerBound powerBound2  term1   term2 term3 term3a
165 0.165      0.688       0.797 0.0253     0 	0.287  0.178

Web page about "On the convergance of the Benjamini-Hochberg procedure" by Dean Palejev and Mladen Savov

Below is the workflow for calculating the theoretical bound for the power of the Benjamini-Hochberg procedure, described in our article, when the alternative has a Beta distribution.

Notation used: there are m tests; the parameters of the Beta are denoted by firstParameter and secondParameter in the code. The proportion of significant tests is propSignif, or 1 - γ; the significance level α is sometimes denoted also by q.

Web page about "On the convergance of the Benjamini-Hochberg procedure"
by Dean Palejev and Mladen Savov