MAPpoly is an R package to construct genetic maps in autopolyploids with even ploidy levels. This quick start guide will present some essential functions to construct a tetraploid potato map. Please refer to MAPpolyâ€™s Reference Manual and the Extended Tutorial for a comprehensive description of all functions.

There are several functions to read discrete and probabilistic dosage-based genotype data sets in MAPpoly. You can read a data set from TXT, CSV, VCF, fitPoly-generated or import it from the R packages polyRAD, polymapR, and updog. The data set distributed along with MAPpoly is a subset of markers from (Pereira et al., 2021) in CSV format. Let us read it into MAPpoly

```
library(mappoly)
file.name <- system.file("extdata/potato_example.csv", package = "mappoly")
dat <- read_geno_csv(file.in = file.name, ploidy = 4)
print(dat, detailed = T)
plot(dat)
```

The output figure shows a bar plot on the left-hand side with the number of markers in each allele dosage combination in \(P_1\) and \(P_2\), respectively. The upper-right plot contains the \(\log_{10}(p-value)\) from \(\chi^2\) tests for all markers, considering the expected segregation patterns under Mendelian inheritance.

Quality control (QC) procedures are fundamental to identify:

- Individuals from crosses other than \(P_1 \times P_2\)
- Individuals and markers that exceeds a defined threshold of missing data points
- Markers with distorted segregation
- Markers with the same genotypic information (redundant markers). Removed markers are positioned into the final map.

Depending on the data set, these procedures can be conducted in any order. Let us first remove individuals from crosses other than \(P_1 \times P_2\). When using the interactive function, the user needs to select a polygon around the individuals to be removed by clicking its vertices and pressing Esc.

Now, let us filter out markers and individuals with more than 5% of missing data. You can update the threshold interactively

```
dat <- filter_missing(dat, type = "marker", filter.thres = .05)
dat <- filter_missing(dat, type = "individual", filter.thres = .05)
```

Finally, we can filter out markers with distorted segregation and redundant information. At this point, we do not consider preferential pairing and double reduction.

```
seq.filt <- filter_segregation(dat, chisq.pval.thres = 0.05/dat$n.mrk)
seq.filt <- make_seq_mappoly(seq.filt)
seq.red <- elim_redundant(seq.filt)
```

After filtering, 1990 markers were left to be mapped. Now, let us create a sequence of ordered markers to proceed with the analysis. You can also plot the data set for a specific sequence of markers

and check the distribution of the markers in the reference genome, if the information is available

The two-point analysis calculates the pairwise recombination fraction in a sequence of markers. At this point of the analysis, where we have many markers, we use the function `est_pairwise_rf2`

, which has a less detailed output than the original `est_pairwise_rf`

(which will be used later) but can handle tens of thousands of markers, even when using a personal computer. Nevertheless, the analysis can take a while depending on the number of markers if few cores are available.

```
ncores <- parallel::detectCores() - 1
tpt <- est_pairwise_rf2(seq.init, ncpus = ncores)
m <- rf_list_to_matrix(tpt) ## converts rec. frac. list into a matrix
sgo <- make_seq_mappoly(go) ## creates a sequence of markers in the genome order
plot(m, ord = sgo, fact = 5) ## plots a rec. frac. matrix using the genome order, averaging neighbor cells in a 5 x 5 grid
```

We can cluster the markers in linkage groups by using function `group_mappoly`

. The function uses the recombination fraction matrix and UPGMA method to group markers. Use the option `comp.mat = TRUE`

to compare the linkage-based clustering results with the chromosome information. If your data set does not contain chromosome information, use the option `comp.mat = FALSE`

. You also can use the interactive version to change the number of expected groups

In the table above, the rows indicate linkage groups obtained using linkage information and, the columns are the chromosomes in the reference genome. Notice the diagonal indicating the concordance between the two sources of information.

Markers are ordered within linkage groups. In this tutorial, we will show the step-by-step procedure using Linkage Group 1 (LG1). You can do the same for the remaining linkage groups.

Since we had a good concordance between genome and linkage information, we will use only markers assigned to a particular linkage group using both sources of information. We will do that using `genomic.info = 1`

, so the function uses the intersection of the markers assigned using linkage and the chromosome with the highest number of allocated markers. To use only the linkage information, do not use the argument `genomic.info`

. You also need the recombination fraction matrix for that group.

Let us order the markers in the sequence using the MDS algorithm.

Usually, at this point, the user can use diagnostic plots to remove markers that disturb the ordering procedure. We didnâ€™t use that procedure in this tutorial, but we encourage the user to check the example in `?mds_mappoly`

. Now, let us use the reference genome to order the markers.

You also can order the markers the reference genome

For the sake of this short tutorial, let us use the MDS order (`s1.mds`

) to phase the markers. Still, you can also try the genome order (`s1.gen`

) and compare the resulting maps using function `compare_maps`

, and you will notice that the genomic order yields a better map since its likelihood is higher.

Estimating the genetic map for a given order involves the computation of recombination fraction between adjacent markers and inferring the linkage phase configuration of those markers in both parents. The core function to perform these tasks in `MAPpoly`

is `est_rf_hmm_sequential`

. This function uses the pairwise recombination fraction as the first source of information to sequentially position allelic variants in specific parental homologs. The algorithm relies on the likelihood obtained through a hidden Markov model (HMM) for situations where pairwise analysis has limited power. Once all markers are positioned, the final map is reconstructed using the HMM multipoint algorithm. For a detailed description of the `est_rf_hmm_sequential`

arguments, please refer to MAPpolyâ€™s Reference Manual and the Extended Tutorial.

First we need to calculate the pairwise recombination fraction for markers in sequence `s1.mds`

using `est_pairwise_rf`

, which contains the information necessary to the proper working of the phasing algorithm.

```
tpt1 <- est_pairwise_rf(s1.mds, ncpus = ncores)
lg1.map <- est_rf_hmm_sequential(input.seq = s1.mds,
start.set = 3,
thres.twopt = 10,
thres.hmm = 20,
extend.tail = 50,
info.tail = TRUE,
twopt = tpt1,
sub.map.size.diff.limit = 5,
phase.number.limit = 20,
reestimate.single.ph.configuration = TRUE,
tol = 10e-3,
tol.final = 10e-4)
```

Now, use the functions `print`

and `plot`

to view the map results:

Now let us update the recombination fractions by allowing a global error in the HMM recombination fraction re-estimation. Using this approach, the genetic mapâ€™s length will be updated by removing spurious recombination events. This procedure can be applied using either the probability distribution provided by the genotype calling software using function `est_full_hmm_with_prior_prob`

or assuming a global genotype error like the following example

```
lg1.map.up <- est_full_hmm_with_global_error(input.map = lg1.map, error = 0.05,
verbose = TRUE)
plot(lg1.map.up, mrk.names = TRUE, cex = 0.7)
```

We can also use the ordinary least squares (OLS) method and the weighted MDS followed by fitting a one dimensional principal curve (wMDS_to_1D_pc)

```
lg1.map.ols <- reest_rf(lg1.map, m1, method = "ols")
lg1.map.mds <- reest_rf(lg1.map, m1, method = "wMDS_to_1D_pc", input.mds = mds.o1)
```

Now let us create a list with the maps and plot the results

To use the genetic map in conjunction with QTL analysis software, we need to obtain the homolog probability for all linkage groups for all individuals in the full-sib population. In this short guide, we will proceed only with one linkage group, but this procedure should be applied to all chromosomes in real situations. Let us use the updated map

```
g1 <- calc_genoprob_error(lg1.map.up, step = 1, error = 0.05)
to.qtlpoly <- export_qtlpoly(g1) #export to QTLpoly
h1 <- calc_homologprob(g1)
plot(h1, lg = 1, ind = 10)
```

Now let us compute the preferential pairing profile for linkage group 1

It is possible to export a phased map to an external CSV file using

For a script with a complete analysis of the data set presented here, please refer to the Complete script

To use the next functions, let us load the complete genetic map

```
in.file <- "https://github.com/mmollina/SCRI/raw/main/docs/tetra/maps_updated.rda"
map_file <- tempfile()
download.file(in.file, map_file)
load(map_file)
```