Network-regularized Sparse Logistic Regression Models for Clinical Risk Prediction and Biomarker Discovery on ShortScience.org

arxiv.org
scholar.google.com

Network-regularized Sparse Logistic Regression Models for Clinical Risk Prediction and Biomarker Discovery
Wenwen Min and Juan Liu and Shihua Zhang
arXiv e-Print archive - 2016 via Local arXiv
Keywords: q-bio.GN, cs.LG, stat.ML, J.3; H.2.8; G.1.6; I.5
more

Summaries/Notes 1

[link] Summary by Joseph Paul Cohen 7 years ago

In this paper they prior the representation a logistic regression model using known protein-protein interactions. They do so by regularizing the weights of the model using the Laplacian encoding of a graph. 

Here is a regularization term of this form:

$$\lambda ||w||_1 + \eta w^T L w,$$

#### A small example:

Given a small graph of three nodes A, B, and C with one edge: {A-B} we have the following Laplacian:

$$
L = D - A = 
\left[\array{
1 & 0 & 0 \\
0 & 1 & 0\\
0 & 0 & 0}\right]
-
\left[\array{
0 & 1 & 0 \\
1 & 0 & 0\\
0 & 0 & 0}\right]$$

$$L = 
\left[\array{
1 & -1 & 0 \\
-1 & 1 & 0\\
0 & 0 & 0}\right]
$$

If we have a small linear regression of the form:

$$y = x_Aw_A + x_Bw_B + x_Cw_C$$

Then we can look at how $w^TLw$ will impact the weights to gain insight:

$$w^TLw $$

$$=
\left[\array{
w_A &
w_B &
w_C}\right]
\left[\array{
1 & -1 & 0 \\
-1 & 1 & 0\\
0 & 0 & 0}\right]
\left[\array{
w_A \\
w_B \\
w_C}\right] 
$$

$$= 
\left[\array{
w_A &
w_B &
w_C}\right]
\left[\array{
w_A -w_B \\
-w_A + w_B \\
0}\right] 
$$



$$
= 
(w_A^2 -w_Aw_B ) + 
(-w_Aw_B + w_B^2)
$$

So because all terms are squared we can remove them from consideration to look at what is the real impact of regularization.

$$
= 
(-w_Aw_B ) + 
(-w_Aw_B)
$$

$$ = -2w_Aw_B$$

The Laplacian regularization seems to increase the weight values of edges which are connected. Along with the squared terms and the $L1$ penalty that is also used the weights cannot grow without bound.

#### A few more experiments:

If we perform the same computation for a graph with two edges: {A-B, B-C} we have the following term which increases the weights of both pairwise interactions:

$$ = -2w_Aw_B -2w_Bw_C$$

If we perform the same computation for a graph with two edges: {A-B, A-C} we have no surprises: 

$$ = -2w_Aw_B -2w_Aw_C$$

Another thing to think about is if there are no edges. If by default there are self-loops then the degree matrix will have 1 on the diagonal and it will be the identity which will be an $L2$ term. If no self loops are defined then the result is a 0 matrix yielding no regularization at all.

#### Contribution:

A contribution of this paper is to use the absolute value of the weights to make training easier. 

$$|w|^T L |w|$$

TODO: Add more about how this impacts learning.



#### Overview

Here a high level figure shows the data and targets together with a graph prior. It looks nice so I wanted to include it.

https://i.imgur.com/rnGtHqe.png

In this paper they prior the representation a logistic regression model using known protein-protein interactions. They do so by regularizing the weights of the model using the Laplacian encoding of a graph.

Here is a regularization term of this form:

$$\lambda ||w||_1 + \eta w^T L w,$$

A small example:

Given a small graph of three nodes A, B, and C with one edge: {A-B} we have the following Laplacian:

$$ L = D - A = \left[\array{ 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & 0 & 0}\right] - \left[\array{ 0 & 1 & 0 \\ 1 & 0 & 0\\ 0 & 0 & 0}\right]$$

$$L = \left[\array{ 1 & -1 & 0 \\ -1 & 1 & 0\\ 0 & 0 & 0}\right] $$

If we have a small linear regression of the form:

$$y = x_Aw_A + x_Bw_B + x_Cw_C$$

Then we can look at how $w^TLw$ will impact the weights to gain insight:

$$w^TLw $$

$$= \left[\array{ w_A & w_B & w_C}\right] \left[\array{ 1 & -1 & 0 \\ -1 & 1 & 0\\ 0 & 0 & 0}\right] \left[\array{ w_A \\ w_B \\ w_C}\right] $$

$$= \left[\array{ w_A & w_B & w_C}\right] \left[\array{ w_A -w_B \\ -w_A + w_B \\ 0}\right] $$

$$ = (w_A^2 -w_Aw_B ) + (-w_Aw_B + w_B^2) $$

So because all terms are squared we can remove them from consideration to look at what is the real impact of regularization.

$$ = (-w_Aw_B ) + (-w_Aw_B) $$

$$ = -2w_Aw_B$$

The Laplacian regularization seems to increase the weight values of edges which are connected. Along with the squared terms and the $L1$ penalty that is also used the weights cannot grow without bound.

A few more experiments:

If we perform the same computation for a graph with two edges: {A-B, B-C} we have the following term which increases the weights of both pairwise interactions:

$$ = -2w_Aw_B -2w_Bw_C$$

If we perform the same computation for a graph with two edges: {A-B, A-C} we have no surprises:

$$ = -2w_Aw_B -2w_Aw_C$$

Another thing to think about is if there are no edges. If by default there are self-loops then the degree matrix will have 1 on the diagonal and it will be the identity which will be an $L2$ term. If no self loops are defined then the result is a 0 matrix yielding no regularization at all.

Contribution:

A contribution of this paper is to use the absolute value of the weights to make training easier.

$$|w|^T L |w|$$

TODO: Add more about how this impacts learning.

Overview

Here a high level figure shows the data and targets together with a graph prior. It looks nice so I wanted to include it.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private