In this paper they prior the representation a logistic regression model using known protein-protein interactions. They do so by regularizing the weights of the model using the Laplacian encoding of a graph.
Here is a regularization term of this form:
$$\lambda ||w||_1 + \eta w^T L w,$$
A small example:
Given a small graph of three nodes A, B, and C with one edge: {A-B} we have the following Laplacian:
$$
L = D - A =
\left[\array{
1 & 0 & 0 \\
0 & 1 & 0\\
0 & 0 & 0}\right]
-
\left[\array{
0 & 1 & 0 \\
1 & 0 & 0\\
0 & 0 & 0}\right]$$
$$L =
\left[\array{
1 & -1 & 0 \\
-1 & 1 & 0\\
0 & 0 & 0}\right]
$$
If we have a small linear regression of the form:
$$y = x_Aw_A + x_Bw_B + x_Cw_C$$
Then we can look at how $w^TLw$ will impact the weights to gain insight:
$$w^TLw $$
$$=
\left[\array{
w_A &
w_B &
w_C}\right]
\left[\array{
1 & -1 & 0 \\
-1 & 1 & 0\\
0 & 0 & 0}\right]
\left[\array{
w_A \\
w_B \\
w_C}\right]
$$
$$=
\left[\array{
w_A &
w_B &
w_C}\right]
\left[\array{
w_A -w_B \\
-w_A + w_B \\
0}\right]
$$
$$
=
(w_A^2 -w_Aw_B ) +
(-w_Aw_B + w_B^2)
$$
So because all terms are squared we can remove them from consideration to look at what is the real impact of regularization.
$$
=
(-w_Aw_B ) +
(-w_Aw_B)
$$
$$ = -2w_Aw_B$$
The Laplacian regularization seems to increase the weight values of edges which are connected. Along with the squared terms and the $L1$ penalty that is also used the weights cannot grow without bound.
A few more experiments:
If we perform the same computation for a graph with two edges: {A-B, B-C} we have the following term which increases the weights of both pairwise interactions:
$$ = -2w_Aw_B -2w_Bw_C$$
If we perform the same computation for a graph with two edges: {A-B, A-C} we have no surprises:
$$ = -2w_Aw_B -2w_Aw_C$$
Another thing to think about is if there are no edges. If by default there are self-loops then the degree matrix will have 1 on the diagonal and it will be the identity which will be an $L2$ term. If no self loops are defined then the result is a 0 matrix yielding no regularization at all.
Contribution:
A contribution of this paper is to use the absolute value of the weights to make training easier.
$$|w|^T L |w|$$
TODO: Add more about how this impacts learning.
Overview
Here a high level figure shows the data and targets together with a graph prior. It looks nice so I wanted to include it.
