2024-02-14 @Atsushi Kawaguchi

The msma package provides functions for a matrix decomposition method incorporating sparse and supervised modeling for a multiblock multivariable data analysis.

Preparation

Install package (as necessary)

if(!require("msma")) install.packages("msma")

Load package

library(msma)

Getting started

Simulated multiblock data (list data) by using the function simdata.

Sample size is 50. The correlation coeficient is 0.8. The numbers of columns for response and predictor can be specified by the argument Yps and Xps, respectively. The length of vecor represents the number of blocks. That is, response has three blocks with the numbers of columns being 3, 4, and 5 and predictor has one block with the number of columns being 3.

dataset0 = simdata(n = 50, rho = 0.8, Yps = c(3, 4, 5), Xps = 3, seed=1)
X0 = dataset0$X; Y0 = dataset0$Y 

The data generated here is applied to the msma function.

One Component

The argument comp can specify the number of components. The arguments lambdaX and lambdaY can specify the regularization parameters for X and Y, respectively.

First, we set comp=1, which will perform an analysis with 1 component.

fit01 = msma(X0, Y0, comp=1, lambdaX=0.05, lambdaY=1:3)
fit01
## Call:
## msma.default(X = X0, Y = Y0, comp = 1, lambdaX = 0.05, lambdaY = 1:3)
## 
## Numbers of non-zeros for X block: 
##        comp1
## block1     3
## 
## Numbers of non-zeros for X super: 
##         comp1
## comp1-1     1
## 
## Numbers of non-zeros for Y block: 
##        comp1
## block1     1
## block2     1
## block3     1
## 
## Numbers of non-zeros for Y super: 
##         comp1
## comp1-1     3

The plot function is available. In default setting, the block weights are displayed as a barplot.

plot(fit01)

Two Component

Next, we set comp=2, which will perform an analysis with 2 components.

fit02 = msma(X0, Y0, comp=2, lambdaX=0.03, lambdaY=0.01*(1:3))
fit02
## Call:
## msma.default(X = X0, Y = Y0, comp = 2, lambdaX = 0.03, lambdaY = 0.01 * 
##     (1:3))
## 
## Numbers of non-zeros for X block: 
##        comp1 comp2
## block1     3     3
## 
## Numbers of non-zeros for X super: 
##         comp1 comp2
## comp1-1     1     1
## 
## Numbers of non-zeros for Y block: 
##        comp1 comp2
## block1     3     3
## block2     4     4
## block3     5     5
## 
## Numbers of non-zeros for Y super: 
##         comp1 comp2
## comp1-1     3     3

Single Block

Two matrics are prepared by specifying arguments Yps and Xps.

dataset1 = simdata(n = 50, rho = 0.8, Yps = 5, Xps = 5, seed=1)
X1 = dataset1$X[[1]]; Y1 = dataset1$Y 

Principal Component Analysis (PCA)

If input is a matrix, a principal component analysis is implemented.

(fit111 = msma(X1, comp=5))
## Call:
## msma.default(X = X1, comp = 5)
## 
## Numbers of non-zeros for X block: 
##        comp1 comp2 comp3 comp4 comp5
## block1     5     5     5     5     5
## 
## Numbers of non-zeros for X super: 
##         comp1 comp2 comp3 comp4 comp5
## comp1-1     1     1     1     1     1

The weight (loading) vectors can be obtained as follows.

fit111$wbX
## $block1
##           comp1       comp2       comp3         comp4       comp5
## X.1.1 0.4309622 -0.74170223 -0.03672379  0.1325580413 -0.49520613
## X.1.2 0.4483196  0.31188303  0.63228246  0.5490205405  0.02310504
## X.1.3 0.4601597 -0.19547078 -0.38567734  0.1474129336  0.76129277
## X.1.4 0.4392794  0.55811865 -0.57117598 -0.0006449093 -0.41145448
## X.1.5 0.4566923  0.05386584  0.35196769 -0.8119567864  0.07331836

The bar plots of weight vectors are provided by the function plot. The component number is specified by the argument axes. The plot type is selected by the argument plottype. Furthermore, since this function uses the barplot function originally built into R, its arguments are also available. In the following example, on the horizontal axis, the magnification of the variable names is set to 0.7 by setting cex.names=0.7, and the variable names are oriented as las=2.

par(mfrow=c(1,2))
plot(fit111, axes = 1, plottype="bar", cex.names=0.7, las=2)
plot(fit111, axes = 2, plottype="bar", cex.names=0.7, las=2)

The score vectors for first six subjects.

lapply(fit111$sbX, head)
## $block1
##            [,1]          [,2]        [,3]        [,4]        [,5]
## [1,]  0.7097369  0.0487564120  0.10746733 -0.02462727 -0.00598565
## [2,] -0.6976955 -0.5423072581 -0.98211121 -0.23652145 -0.16120137
## [3,]  2.4367362 -0.0238218850 -0.32403419 -0.44206969 -0.47004393
## [4,] -2.4460385 -0.0007036966  0.08112764  0.14263545 -0.45584684
## [5,]  1.7708133  0.9741849574 -0.64716134  0.09377875 -0.78085072
## [6,] -0.8043943 -0.9469205017 -0.34705994 -0.62641753  0.26617649

The scatter plots for the score vectors specified by the argument v. The argument axes is specified by the two length vector represents which components are displayed.

par(mfrow=c(1,2))
plot(fit111, v="score", axes = 1:2, plottype="scatter")
plot(fit111, v="score", axes = 2:3, plottype="scatter")

When the argument v was specified as “cpev”, the cummulative eigenvalues are plotted.

par(mfrow=c(1,2))
plot(fit111, v="cpev", ylim=c(0.7, 1))

Other functions in R for PCA

There is the R function prcomp to implement PCA.

(fit1112 = prcomp(X1, scale=TRUE))
## Standard deviations (1, .., p=5):
## [1] 2.0446732 0.5899513 0.4458638 0.3926788 0.3439156
## 
## Rotation (n x k) = (5 x 5):
##             PC1         PC2         PC3           PC4         PC5
## [1,] -0.4309746  0.74172462 -0.03722419 -1.351296e-01  0.49442882
## [2,] -0.4483141 -0.31171881  0.63044575 -5.510830e-01 -0.02629751
## [3,] -0.4601629  0.19547669 -0.38616901 -1.416349e-01 -0.76213651
## [4,] -0.4392701 -0.55816074 -0.57114566  4.296727e-05  0.41144993
## [5,] -0.4566918 -0.05405032  0.35470924  8.111640e-01 -0.06859643
summary(fit1112)
## Importance of components:
##                           PC1     PC2     PC3     PC4     PC5
## Standard deviation     2.0447 0.58995 0.44586 0.39268 0.34392
## Proportion of Variance 0.8361 0.06961 0.03976 0.03084 0.02366
## Cumulative Proportion  0.8361 0.90575 0.94551 0.97634 1.00000

This Rotation is almost the same as the output of msma, but it can be made closer by setting the argument ceps as follows.

fit1113 = msma(X1, comp=5, ceps=0.0000001)
fit1113$wbX
## $block1
##           comp1       comp2       comp3       comp4       comp5
## X.1.1 0.4309745 -0.74172365 -0.03694568  0.13514153 -0.49444798
## X.1.2 0.4483141  0.31172789  0.63155896  0.54980542  0.02621947
## X.1.3 0.4601628 -0.19547781 -0.38588003  0.14252697  0.76211630
## X.1.4 0.4392701  0.55815708 -0.57114810  0.00105051 -0.41145010
## X.1.5 0.4566918  0.05404487  0.35306479 -0.81187175  0.06871165

Plotting the scores with the signs turned over, we see that similar scores are calculated.

par(mfrow=c(1,2))
biplot(fit1112)
plot(-fit1113$sbX[[1]][,1:2],xlab="Component 1",ylab="Component 2")