Context
Let A∈RN×p be a “compressing matrix,” where N<<p. Let Az=y. It is generally not possible to recover x without any further constraint. The field of compressed sensing adds the constraint that z is s−sparse and asks the following two questions.
(1) What matrix A should we use to ensure the perfect recovery of the sparse vector x?
(2) How do we go about finding such algorithm?
For what follows, an excellent reference is available here.
It can be shown that the problem of
z∈{s-sparse}min∥z∥1subject toAz=Ax for an s-sparse vector x
has unique solution if and only if A satisfies the restricted nullspace property. There are two sufficient conditions of A that enable this.
(1) Pairwise incoherence
∥ASTAS−I∥∞≤2s1,∀∣S∣≤s
(2) Restricted Isometry property
δ=S:∣S∣=smax∥ASTAS−I∥2≤31
One way to obtain such matrix A in (1) is to utilize the randomized matrix by Johnson-Lindenstrauss type inequality. However, the resulting dimension N is not generally small enough. Specifically, we want O(s) but we have an order of O(s2).
For condition (2), while the complexity of N is more favorable with O(s∗constant), the difficulty lies in the fact that the only matrices known to satisfy this condition are those that are randomized by normal distribution.
Given these theoretical challenges, I want to introduce an elegant greedy-based approximation algorithm to recover x that applies to the matrix A with δ3s≤121, and provides proof of its convergence.
Iterative Hard Thresholding Algorithm
Algorithm: Iterative Sparse Minimization
Objective: 
Minimize the L1-norm of z subject to Az=y.
Initialization: 
Choose an initial guess z0 and set t=0.
Repeat until convergence:
  - 
    Gradient Step: Update a by: at+1=zt−AT(Azt−y) 
- 
    Sparsity Constraint: Update z by finding the closest s-sparse vector to at+1: zt+1=argz∈{s-sparse}min∥at+1−z∥2 
- 
    Update Iteration: Set  t = t + 1 . 
Output: 
Return zt as the solution.
  - Note that −AT(Azt−y) is like gradient descent step.
Theorem Let RIP constant δ3s≤121. Then we have linear convergence of the above algorithm. That is,
∥zt+1−z∗∥2≤β∥z0−z∗∥2
for some β<1.
proof: Let S=supp(z∗)∪supp(zt+1). We have
∥zt+1−z∗∥2=∥zt+1,S−zS∗∥2≤∥zt+1,S−at+1,S∥2+∥at+1,S−zS∗∥2≤∥zS∗−at+1,S∥2+∥at+1,S−zS∗∥2=2∥zS∗−at+1,S∥2
Now, we have
∥zS∗−at+1.S∥2=∥zS∗−zt,S+ASTA(zt−z∗)∥2=∥−rt,S+ASTArt∥2=∥−rt,S+ASTArt,S+ASTArt,SC∥2=∥−rt,S+ASTArt,S∥2+∥ASTArt,SC∥2=∥(I−ASTAS)rt∥+∥ASTASCrt∥≤δ3s∥rt∥2+∥ASTArt,S−/S∥2=δ3s∥rt∥2+∥ASTAS−/Srt,∥2
where rt is supported on S−, rt:=zt−z∗ and AS is a matrix with zero on columns S.
Now, it suffices to show that ∥ASTAS−/S∥2≤2δ3s, since we would then have the desired inequality with β=2t1 by combining two inequalities above.
To this end, let ∥x∥2,∥y∥2=1. We have
xT(ASTAS−/S)y=xST(ATA)yS−/S=41(∥AxS+AyS−/S∥22−∥AxS−AyS−/S∥22)=41(∥xS+yS−/S∥2+(xS+yS−/S)T(ATA−I)(xS+yS−/S)−∥2xS−yS−/S∥22−(xS−yS−/S)T(ATA−I)(xS−yS−/S))=41((xS+yS−/S)T(ATA−I)(xS+yS−/S)−(xS−yS−/S)T(ATA−I)(xS−yS−/S))≤2δ3s
where the last inequality due to the fact that xS±yS−/S is 3s-sparse (recall that S−=supp(rt)⊂supp(z∗)∪supp(zt)). Therefore, we have established the claim ∥zt+1−z∗∥2≤β∥z0−z∗∥2 with β=2t1.
Conclusion
We have established the convergence of the Hard Thresholding algorithm for reconstructing the vector x. In today’s landscape, where data storage is relatively inexpensive, the practicality of compressed sensing, or reconstruction problems, may not be immediately relevant for everyday data scientists. Nevertheless, the underlying mathematics of this field is quite fascinating, particularly in the context of exploring different trade-offs.
References
Simon Foucart, H. R. (2012). A Mathematical Introduction to Compressive Sensing. Springer.