Optimizing Sparsity over Lattices and Semigroups

Motivated by problems in optimization we study the sparsity of the solutions to systems of linear Diophantine equations and linear integer programs, i.e., the number of non-zero entries of a solution, which is often referred to as the $\ell_0$-norm. Our main results are improved bounds on the $\ell_0$-norm of sparse solutions to systems $A x = b$, where $A \in \mathbb{Z}^{m \times n}$, $b \in \mathbb{Z}^m$ and $x$ is either a general integer vector (lattice case) or a non-negative integer vector (semigroup case). In the lattice case and certain scenarios of the semigroup case, we give polynomial time algorithms for computing solutions with $\ell_0$-norm satisfying the obtained bounds.


Introduction
This paper discusses the problem of finding sparse solutions to systems of linear Diophantine equations and integer linear programs. We investigate the ℓ 0 -norm x 0 := | {i : x i = 0} |, a function widely used in the theory of compressed sensing [6,9], which measures the sparsity of a given vector x = (x 1 , . . . , x n ) ⊤ ∈ R n (it is clear that the ℓ 0 -norm is actually not a norm).
Sparsity is a topic of interest in several areas of optimization. The ℓ 0 -norm minimization problem over reals is central in the theory of the classical compressed sensing, where a linear programming relaxation provides a guaranteed approximation [8,9]. Support minimization for solutions to Diophantine equations is relevant for the theory of compressed sensing for discrete-valued signals [11,12,17]. There is still little understanding of discrete signals in the compressed sensing paradigm, despite the fact that there are many applications in which the signal is known to have discrete-valued entries, for instance, in wireless communication [22] and the theory of error-correcting codes [7]. Sparsity was also investigated in integer optimization [1,10,20], where many combinatorial optimization problems have useful interpretations as sparse semigroup problems. For example, the edge-coloring problem can be seen as a problem in the semigroup generated by matchings of the graph [18]. Our results provide natural out-of-the-box sparsity bounds for problems with linear constraints and integer variables in a general form.

Lattices: sparse solutions of linear Diophantine systems
Each integer matrix A ∈ Z m×n determines the lattice L(A) := {Ax : x ∈ Z n } generated by the columns of A. By an easy reduction via row transformations, we may assume without loss of generality that the rank of A is m. [n] m .
Since L(A γ ) is the lattice spanned by the columns of A indexed by γ, it is a sublattice of L(A). We first deal with a natural question: Can the description of a given lattice L(A) in terms of A be made sparser by passing from A to A γ with γ having a smaller cardinality than n and satisfying L(A) = L(A γ )? That is, we want to discard some of the columns of A and generate L(A) by |γ| columns with |γ| being possibly small.
For stating our results, we need several number-theoretic functions. Given z ∈ Z >0 , consider the prime factorization z = p s1 1 · · · p s k k with pairwise distinct prime factors p 1 , . . . , p k and their multiplicities s 1 , . . . , s k ∈ Z >0 . Then the number of prime factors k i=1 s i counting the multiplicities is denoted by Ω(z). Furthermore, we introduce Ω m (z) := k i=1 min{s i , m}. That is, by introducing m we set a threshold to account for multiplicities. In the case m = 1 we thus have ω(z) := Ω 1 (z) = k, which is the number of prime factors in z, not taking the multiplicities into account. The functions Ω and ω are called prime Ω-function and prime ω-function, respectively, in number theory [15]. We call Ω m the truncated prime Ω-function.
Theorem 1 Let A ∈ Z m×n , with m ≤ n, and let τ ∈ [n] m be such that the matrix A τ is non-singular. Then the equality L(A) = L(A γ ) holds for some γ satisfying τ ⊆ γ ⊆ [n] and Given A and τ , the set γ can be computed in polynomial time.
As an immediate consequence of Theorem 1 we obtain Corollary 2 Consider the linear Diophantine system with A ∈ Z m×n , b ∈ Z m and m ≤ n. Let τ ∈ [n] m be such that the m × m matrix A τ is non-singular. If (2) is feasible, then (2) has a solution x satisfying the sparsity bound Under the above assumptions, for given A, b and τ , such a sparse solution can be computed in polynomial time.
From the optimization perspective, Corollary 2 deals with the problem of minimization of the ℓ 0 -norm over the affine lattice {x ∈ Z n : Ax = b}.

Semigroups: sparse solutions in integer programming
Consider next the standard form of the feasibility constraints of integer linear programming For a given matrix A, the set of all b such that (3) is feasible, is the semigroup Sg(A) = {Ax : x ∈ Z n ≥0 } generated by the columns of A. If (3) has a solution, i.e., b ∈ Sg(A), how sparse can such a solution be? In other words, we are interested in the ℓ 0 -norm minimization problem It is clear that Problem (4) is NP-hard, because deciding the feasibility of (3) [23, § 18.2] or even solving the relaxation of (4) with the condition x ∈ Z n

≥0
replaced by x ∈ R n [19] is NP-hard. Taking the NP-hardness of Problem (4) into account, our aim is to estimate the optimal value of (4) under the assumption that this problem is feasible. In [2, Theorem 1.1 (i)] (see also [1, Theorem 1]), it was shown that for any b ∈ Sg(A), there exists a x ∈ Z n , such that Ax = b and In [1, Theorem 2], it was shown that Equation (5) cannot be improved significantly, but nevertheless we show here how to improve it in some special cases. As a consequence of Theorem 1 we obtain the following.
Corollary 3 Let A ∈ Z m×n be a matrix whose columns positively span R m and m is a set, for which the matrix A τ is non-singular, then there is a solution x of the integer-programming feasibility problem Ax = b, x ∈ Z m ≥0 that satisfies the sparsity bound Under the above assumptions, for given A, b and τ , such a sparse solution x can be computed in polynomial time.
Note that for a fixed m, (6) is usually much tighter than (5), because the function Ω m (z) is bounded from above by the logarithmic function log 2 (z) and is much smaller than log 2 (z) on the average. Furthermore, We take a closer look at the case m = 1 of a single equation and tighten the given bounds in this case. That is, we consider the knapsack feasibility problem where a ∈ Z n and b ∈ Z. Without loss of generality we can assume that all components of the vector a are not equal to zero. It follows from (5) that a feasible problem (7) has a solution x with If all components of a have the same sign, without loss of generality we can assume a ∈ Z n >0 . In this setting, Theorem 1.2 in [2] strengthens the bound (8) by replacing the ℓ 2 -norm of the vector a with the ℓ ∞ -norm. It was conjectured in [2, page 247] that a bound x 0 ≤ c+⌊log 2 ( a ∞ / gcd(a))⌋ with an absolute constant c holds for an arbitrary a ∈ Z n . We obtain the following result, which covers the case that has not been settled so far and yields a confirmation of this conjecture.
Corollary 4 Let a = (a 1 , . . . , a n ) ⊤ ∈ (Z \ {0}) n be a vector that contains both positive and negative components. If the knapsack feasibility problem a ⊤ x = b, x ∈ Z n ≥0 has a solution, then there is a solution x satisfying the sparsity bound Under the above assumptions, for given a and b, such a sparse solution x can be computed in polynomial time.
Our next contribution is that, given additional structure on A, we can improve on [2, Theorem 1.1 (i)], which in turn also gives an improvement on [2, Theorem 1.2]. For a 1 , . . . , a n ∈ R m , we denote by cone(a 1 , . . . , a n ) the convex conic hull of the set {a 1 , . . . , a n }. Now assume the matrix A = (a 1 , . . . , a n ) ∈ Z m×n with columns a i satisfies the following conditions: cone(a 1 , . . . , a n ) is an m-dimensional pointed cone, (10) cone(a 1 ) is an extreme ray of cone(a 1 , . . . , a n ). (11) Note that the previously best sparsity bound for the general case of the integerprogramming feasibility problem is (5). Using the Cauchy-Binet formula, (5) can be written as The following theorem improves this bound in the "pointed cone case" by removing a fraction of m/n of terms in the sum under the square root.
Theorem 5 Let A = (a 1 , . . . , a n ) ∈ Z m×n satisfy (9)-(11) and, for b ∈ Z m , consider the integer-programming feasibility problem If (12) is feasible, then there is a feasible solution x satisfying the sparsity bound We omit the proof of this result due to the page limit for the IPCO proceedings. Instead we focus on the particularly interesting case m = 1. In this case, assumption (10) is equivalent to a ∈ Z n >0 ∪ Z n <0 . Without loss of generality, one can assume a ∈ Z n >0 .
When dealing with bounds for sparsity it would be interesting to understand the worst case scenario among all members of the semigroup, which is described by the function We call ICR(A) the integer Carathéodory rank in resemblance to the classical problem of finding the integer Carathéodory number for Hilbert bases [24]. Above results for the problem Ax = b, x ∈ Z n ≥0 can be phrased as upper bounds on ICR(A). We are interested in the complexity of computing ICR(A). The first question is: can the integer Carathéodory rank of a matrix A be computed at all? After all, remember that the semigroup has infinitely many elements and, despite the fact that ICR(A) is a finite number, a direct usage of (13) would result into the determination of the sparsest representation Ax = b for all of the infinitely many elements b of Sg(A). It turns out that ICR(A) is computable, as the inequality ICR(A) ≤ k can be expressed as the formula ∀x ∈ Z n ≥0 ∃y ∈ Z n ≥0 : (Ax = Ay) ∧ ( y 0 ≤ k) in Presburger arithmetic [14]. Beyond this fact, the complexity status of computing ICR(A) is largely open, even when A is just one row: Problem 7 Given the input a = (a 1 , . . . , a n ) ⊤ ∈ Z n , is it NP-hard to compute ICR(a ⊤ )?
The Frobenius number max Z ≥0 \ Sg(a ⊤ ), defined under the assumptions a ∈ Z n >0 and gcd(a) = 1, is yet another value associated to Sg(a ⊤ ). The Frobenius number can be computed in polynomial time when n is fixed [5,16] but is NP-hard to compute when n is not fixed [21]. It seems that there might be a connection between computing the Frobenius number and ICR(a ⊤ ).

Proofs of Theorem 1 and its consequences
The proof of Theorem 1 relies on the theory of finite Abelian groups. We write Abelian groups additively. An Abelian group G is said to be a direct sum of its finitely many subgroups G 1 , . . . , G m , which is written as . A primary cyclic group is a non-zero finite cyclic group whose order is a power of a prime number. We use G/H to denote the quotient of G modulo its subgroup H.
The fundamental theorem of finite Abelian groups states that every finite Abelian group G has a primary decomposition, which is essentially unique. This means, G is decomposable into a direct sum of its primary cyclic groups and that this decomposition is unique up to automorphisms of G. We denote by κ(G) the number of direct summands in the primary decomposition of G.
For a subset S of a finite Abelian group G, we denote by S the subgroup of G generated by S. We call a subset S of G non-redundant if the subgroups T generated by proper subsets T of S are properly contained in S . In other words, S is non-redundant if S \ {x} is a proper subgroup of S for every x ∈ S. The following result can be found in [13, Lemma A.6].
Theorem 8 Let G be a finite Abelian group. Then the maximum cardinality of a non-redundant subset S of G is equal to κ(G).
We will also need the following lemmas, proved in the Appendix. Λ, where i ∈ I, is in the group generated by the remaining elements. Suppose j ∈ I and we want to check if φ(a j ) is in the group generated by all φ(a i ) with i ∈ I \ {j}. Since Λ = L(A τ ), this is equivalent to checking a j ∈ L(A I\{j}∪τ ) and is thus reduced to solving a system of linear Diophantine equations with the left-hand side matrix A I\{j}∪τ and the right-hand side vector a j . Thus, carrying the above procedure for every j ∈ I and removing j from I whenever a j ∈ L(A I\{j}∪τ ), we eventually arrive at a set I that determines a non-redundant subset S of Z m /Λ. This is done by solving at most n − m linear Diophantine systems in total, where the matrix of each system is a sub-matrix of A and the right-hand vector of the system is a column of A.
⊓ ⊔ Remark 1 (Optimality of the bounds). For a given ∆ ∈ Z ≥2 let us consider matrices A ∈ Z m×n with ∆ = | det(A τ )|/ gcd(A). We construct a matrix A that shows the optimality of the bound (1). As in the proof of Theorem 1, we assume τ = [m] and use the notation B = A τ . Consider the prime factorization ∆ = p n1 1 · · · p ns s . We will fix the matrix B to be a diagonal matrix with diagonal entries d 1 , . . . , d m ∈ Z >0 so that det(B) = d 1 · · · d m = ∆.
The diagonal entries are defined by distributing the prime factors of ∆ among the diagonal entries of B. If the multiplicity n i of the prime p i is less than m, we introduce p i as a factor of multiplicity 1 in n i of the m diagonal entries of B. If the multiplicity n i is at least m, we are able distribute the factors p i among all of the diagonal entries of B so that each diagonal entry contains the factor p i with multiplicity at least 1.
The group Z m /Λ = Z m /L(B) is a direct sum of m cyclic groups G 1 , . . . , G m of orders d 1 , . . . , d m , respectively. By the Chinese Remainder Theorem, these cyclic groups can be further decomposed into the direct sum of primary cyclic groups. By our construction, the prime factor p i of the multiplicity n i < m generates a cyclic direct summand of order p i in n i of the subgroups G 1 , . . . , G m . If n i ≥ m, then each of the groups G 1 , . . . , G m has a direct summand, which is a non-trivial cyclic group whose order is a power of p i . Summarizing, we see that the decomposition of Z m /Λ into primary cyclic groups contains n i summands of order p i , when n i < m, and m summands, whose order is a power of p i , when n i ≥ m. The total number of summands is thus s i=1 min{m, n i } = Ω m (∆). Now, fix n = m+Ω m (∆) and choose columns a m+1 , . . . , a n so that φ(a m+1 ), . . . , φ(a n ) generate all direct summands in the decomposition of Z m /Λ into primary cyclic groups. With this choice, φ(a m+1 ), . . . , φ(a n ) generate Z m /Λ, which means that L(A) = Z m and implies gcd(A) = 1. On the other hand, any proper subset {φ(a m+1 ), . . . , φ(a n )} generates a proper subgroup of Z m /Λ, as some of the direct summands in the decomposition of Z m /Λ into primary cyclic groups will be missing. This means L(A [m]∪I ) Z m for every I {m+1, . . . , n}.
Proof (Corollary 2). Feasiblity of (2) can be expressed as b ∈ L(A). Choose γ from the assertion of Theorem 1. One has b ∈ L(A) = L(A γ ) and so there exists a solution x of (2) whose support is a subset of γ. This sparse solution x can be computed by solving the Diophantine system with the left-hand side matrix A γ and the right-hand side vector b.

Proof (Corollary 3).
Assume that the Diophantine system Ax = b, x ∈ Z n has a solution. It suffices to show that, in this case, the integer-programming feasibility problem Ax = b, x ∈ Z n ≥0 has a solution, too, and that one can find a solution of the desired sparsity to the integer-programming feasibility problem in polynomial time.
One can determine γ as in Theorem 1 in polynomial time. Using γ, we can determine a solution x * = (x * 1 , . . . , x * n ) ⊤ ∈ Z n of the Diophantine system Ax = b, x ∈ Z n satisfying x * i = 0 for i ∈ [n] \ γ in polynomial time, as described in the proof of Corollary 2.
Let a 1 , . . . , a n be the columns of A. Since the matrix A τ is non-singular, the m vectors a i , where i ∈ τ , together with the vector v = − i∈τ a i positively span R n . Since all columns of A positive span R n , the conic version of the Carathéodory theorem implies the existence of a set β ⊆ [m] with |β| ≤ m, such that v is in the conic hull of {a i : i ∈ β}. Consequently, the set {a i : i ∈ β ∪ τ } and by this also the larger set {a i : i ∈ β ∪ γ} positively span R m . Let I = β ∪γ. By construction, |I| ≤ |β| + |γ| ≤ m + |γ|.
Since the vectors a i with i ∈ I positively span R m , there exist a choice of rational coefficients λ i > 0 (i ∈ I) with i∈I λ i a i = 0. After rescaling we can assume λ i ∈ Z >0 . Define x ′ = (x ′ 1 , . . . , x ′ n ) ⊤ ∈ Z n ≥0 by setting x ′ i = λ i for i ∈ I and x ′ i = 0 otherwise. The vector x ′ is a solution of Ax = 0. Choosing N ∈ Z >0 large enough, we can ensure that the vector x * + N x ′ has non-negative components. Hence, x = x * + N x ′ is a solution of the system Ax = b, x ∈ Z n ≥0 satisfying the desired sparsity estimate. The coefficients λ i and the number N can be computed in polynomial time.
in the unknowns y 1 , . . . , y t has a solution that is not identically equal to zero.
Proof. The proof is inspired by the approach in [3, § 3.1] (used in a different context) that suggests to reformulate the underlying equation over integers as two strict inequalities and then use Minkowski's first theorem [4, Ch. VII, Sect. 3] from the geometry of numbers. Consider the convex set Y ⊆ R t defined by 2t strict linear inequalities −1 <y 1 a 1 + · · · + y t a t < 1, −2 <y i < 2 for all i ∈ {2, . . . , t}.
Clearly, the set Y is the interior of a hyper-parallelepiped and can also be described as Y = {y ∈ R t : M y ∞ < 1}, where M is the upper triangular matrix M =      a 1 a 2 · · · a t 1/2 . . .
It is easy to see that the t-dimensional volume vol(Y ) of Y is The assumption t > 1 + log 2 (a 1 ) implies that the volume of Y is strictly larger than 2 t . Thus, by Minkowski's first theorem, the set Y contains a non-zero integer vector y = (y 1 , . . . , y t ) ⊤ ∈ Z t . Without loss of generality we can assume that y 1 ≥ 0 (if the latter is not true, one can replace y by −y). The vector y is a desired solution from the assertion of the lemma. ⊓ ⊔