The R package betareg provides:

Beta regression for modeling beta-distributed dependent variables on the open unit interval (0, 1), e.g., rates and proportions, see Cribari-Neto and Zeileis (2010, doi:10.18637/jss.v034.i02).

Extended-support beta regression models for variables on the closed unit interval [0, 1] with boundary observations at 0 and/or 1, see Kosmidis and Zeileis (2024, doi:10.48550/arXiv.2409.07233).

Alternative specifications of the classical beta regression model: Bias-corrected and bias-reduced estimation, finite mixture models, and recursive partitioning for (0, 1) beta regression, see Grün, Kosmidis, and Zeileis (2012, doi:10.18637/jss.v048.i11).

The stable version of `betareg`

is available on CRAN:

`install.packages("betareg")`

The latest development version can be installed from R-universe:

`install.packages("betareg", repos = "https://zeileis.R-universe.dev")`

The package is available under the General Public License version 3 or version 2

A nice first illustration of beta regression is the analysis of reading accuracy scores from primary school children from Smithson & Verkuilen (2006). Package and data can be loaded via:

```
library("betareg")
data("ReadingSkills", package = "betareg")
```

The reading `accuracy`

was scaled to be within (0, 1). Its
mean is explained by verbal `iq`

score with separate lines by
`dyslexia`

(control vs. dyslexic). The precision parameter is
explained by main effects of the two explanatory variables. More details
are provided in `?ReadingSkills`

.

```
<- betareg(accuracy ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills)
br summary(br)
##
## Call:
## betareg(formula = accuracy ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills)
##
## Quantile residuals:
## Min 1Q Median 3Q Max
## -2.3625 -0.5872 0.3026 0.9425 1.5874
##
## Coefficients (mean model with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1232 0.1428 7.864 3.73e-15 ***
## dyslexia -0.7416 0.1428 -5.195 2.04e-07 ***
## iq 0.4864 0.1331 3.653 0.000259 ***
## dyslexia:iq -0.5813 0.1327 -4.381 1.18e-05 ***
##
## Phi coefficients (precision model with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.3044 0.2227 14.835 < 2e-16 ***
## dyslexia 1.7466 0.2623 6.658 2.77e-11 ***
## iq 1.2291 0.2672 4.600 4.23e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Type of estimator: ML (maximum likelihood)
## Log-likelihood: 65.9 on 7 Df
## Pseudo R-squared: 0.5756
## Number of iterations: 25 (BFGS) + 1 (Fisher scoring)
```

The regression summary shows that `accuracy`

increases
with `iq`

for the control group but not for the dyslexic
group (even slightly decreases). This can be brought out more clearly
graphically. This also highlights that the model employs a logit link so
that the fitted curves always remain within (0, 1).

```
<- palette.colors()[c(4, 8)]
pal <- c(19, 17)
pch plot(accuracy ~ iq, data = ReadingSkills, col = pal[dyslexia], pch = pch[dyslexia])
<- -30:30/10
iq lines(iq, predict(br, newdata = data.frame(dyslexia = "no", iq = iq)), col = pal[1], lwd = 2)
lines(iq, predict(br, newdata = data.frame(dyslexia = "yes", iq = iq)), col = pal[2], lwd = 2)
legend("topleft", c("Control", "Dyslexic"), pch = pch, col = pal, bty = "n")
```

For going beyond this basic analysis the following extensions can be considered.

To analyze the original accuracy scores in [0, 1] (without scaling
the perfect scores of `1`

to `0.99`

), use the
variable `accuracy1`

(instead of `accuracy`

) in
the code above. The `betareg()`

model then automatically
estimates an additional exceedence parameter that accounts for the
boundary probability of a perfect score. For this data set, most
coefficients shrink a bit, rendering some coefficients only weakly
significant but the qualitative interpretations still remain
similar.

`betareg(accuracy1 ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills)`

See Kosmidis and Zeileis (2024) and the documentation of
`betareg()`

for more details.

Bias-reduced estimation (instead of the default maximum likelihood
estimation) can be used by adding the argument `type = "BR"`

in `betareg()`

. This slightly shrinks all coefficient
estimates but, on this data, leads to qualitatively identical
results.

`betareg(accuracy ~ dyslexia * iq | dyslexia + iq, data = ReadingSkills, type = "BR")`

See Grün et al. (2012) and the documentation of
`betareg()`

for more details.

To find subgroups in a beta regression by recursively splitting
subsamples, beta regression trees can be used. Here, this strategy can
be used to figure out the different `iq`

effects by
`dyslexia`

rather than fixing the variables’ interaction in
advance.

`betatree(accuracy ~ iq | iq, ~ dyslexia + ..., data = ReadingSkills, minsize = 10)`

See Grün et al. (2012) and the documentation of
`betatree()`

for more details.

To find clusters in a beta regression finite mixtures of beta
regressions can be used. Here, this technique can be employed to find
the different `iq`

effects in the data without even requiring
the `dyslexia`

information.

`betamix(accuracy ~ iq, data = ReadingSkills, k = 3, ...)`

See Grün et al. (2012) and the documentation of
`betamix()`

for more details.