Python integer optimization with PuLP involving polynomials that I think can be faster

Question 1

I have a problem where we are given multivariate polynomials in variables like y_i and z_j and the goal is to express them as sums of products of terms of the form y_i-z_j with nonnegative integer coefficients (each of the factors of one of these products must specifically have a coefficient of 1 for one y variable and a coefficient of -1 for one z variable). For example, you might be given

(y_1 - z_1)*(y_3 - z_2) + (y_1 + y_2 - z_1 - z_2)*(y_1 - z_3) - (y_1 - z_2)*(y_1 - z_3)

and you would want to express this positively like

(y_1 - z_1)*(y_3 - z_2) + (y_2 - z_1)*(y_1 - z_3)

The same polynomial could be given to you in a different way, such as

y_1*y_2 + y_1*y_3 - y_1*z_1 - y_1*z_2 - y_2*z_3 - y_3*z_1 + z_1*z_2 + z_1*z_3

and you would have to pick this apart to get the expression above.

The goal, specifically, is if we were to consider new variables a_ij = y_i-z_j, we want to express the polynomial with nonnegative integer coefficients in terms of these variables. Since the variables are not linearly independent, a change of variables won't necessarily work, and my code uses linear optimization.

After some discussion with a member of the community, a requirement that is perhaps easier to understand is that we want to express the polynomial as a sum of terms of a specific form. The form is

a*(y_i + y_j + ... - z_k - z_l + ...)*(y_m + y_n + ... - z_o - z_p + ...)*(...) + ...

where:

The coefficient a is always a positive integer
The sign of y terms is always positive
The sign of z terms is always negative
Within one factor, the number of y is equal to the number of z
Within one factor, there must be at least one y and one z

For clarification, each of the y and z terms have unit coefficient (depending on the sign), but when we express it in this level of generality some of them can be the same, for example

2*y_1 - z_2 - z_3 = y_1 + y_1 - z_2 - z_3

My program expresses it with factors of exactly one y with exactly one z, so this doesn't come up.

It took me a year to even figure out how to do this at all. I have some working code now (it's actually available on PyPI, but only for solving a specific problem) that I think can be faster/more efficient in general.

The polynomial is passed as a symengine object. I want to use sympy, so I cast it to a string and have sympy then parse it. I create a vector expressed as a dict of monomials with integer coefficients, then I have to somehow create a set of basis vectors to then express the result positively with an integer optimization (using PuLP). I think this is part of where the speed and efficiency problem is.

I create the basis vectors by taking all y variables that occur in terms of the polynomial that have only y variables, and all z variables that occur in terms of the polynomial that have only z variables, and take products of differences of these. I have a conjecture that they are square-free in these terms, and so far that's actually been working out. But I end up with huge numbers of basis vectors for any sizeable polynomial, on the order of 20K+. I use xpress for smaller cases (5000 rows and columns), and use CBC for larger ones.

So the question is: is there a way to speed this up, perhaps by trimming down the number of basis vectors? Any other comments or suggestions for improvements on the code are welcome. I've had a good experience with math-y people lurking here before which is why I'm trying this site.

from symengine import *
import numpy as np
import pulp as pu
import itertools as it
import sympy
import re
 
n = 100
var2 = symarray('y',n)
var3 = symarray('z',n)
dimen = 0
monom_to_vec = {}
def poly_to_vec(poly,vec0=None): 
 global dimen, monom_to_vec
 dc = poly.as_coefficients_dict()
 mn = list(dc.keys())
 
 if vec0 is None:
 init_basevec(mn)
 vec = np.zeros(dimen)
 try:
 for i in range(len(mn)-1,-1,-1):
 cf = dc[mn[i]]
 index = monom_to_vec[mn[i]]
 if vec0 is not None and abs(vec0[index])<abs(cf):
 return None
 vec[index] = cf
 except Exception:
 return None
 return vec
def init_basevec(monoms):
 global dimen, monom_to_vec
 monom_to_vec = {}
 index = 0
 for mn in monoms:
 monom_to_vec[mn] = index
 index+=1
 dimen = index
def compute_positive_rep(val,var2=var2,var3=var3,neg=False):
 notint = False
 try:
 int(val)
 except Exception:
 notint = True
 if notint: 
 sval = str(val)
 val = expand(val)
 
 regex = re.compile("y_([0-9]+)")
 regex2 = re.compile("z_([0-9]+)") 
 n1 = max([int(index) for index in regex.findall(sval)])+1
 n2 = max([int(index) for index in regex2.findall(sval)])+1
 var22 = sympy.symarray('y',n1)
 var33 = sympy.symarray('z',n2) 
 
 val_poly = sympy.poly(sval,*var22[1:],*var33[1:])
 
 if val_poly == sympy.sympify(0):
 return 0
 
 did = False 
 
 list0 = []
 list1 = []
 mn = val_poly.monoms()
 
 L0 = [0 for i in range(1,n1)]
 L1 = [0 for i in range(1,n2)]
 mn1L = []
 mnn1L = []
 for mm0 in mn:
 if list(mm0[:n1-1]) == L0:
 mnn1L += [mm0]
 if list(mm0[n1-1:]) == L1:
 mn1L += [mm0]
 
 base = []
 base_vectors = []
 base_monoms = []
 
 vec = poly_to_vec(val)
 
 set1 = set()
 set2 = set()
 full_varlist = []
 for mnn1 in mnn1L:
 for i in range(len(mnn1)):
 if mnn1[i]!=0:
 set2.add(i+1-n1+1)
 for index1 in range(1,n1):
 full_varlist += [[]]
 for index2 in set2:
 full_varlist[index1-1] += [var2[index1]-var3[index2]]
 comblists = []
 for mn1 in mn1L: 
 comblistmn1 = [1]
 for i in range(n1-1): 
 if mn1[i]!=0:
 arr = np.array(comblistmn1)
 comblistmn12 = []
 newset = it.combinations(full_varlist[i],mn1[i])
 combs = list(newset)
 for comb in combs:
 term = np.prod(comb)
 comblistmn12 += (arr*term).tolist()
 comblistmn1 = comblistmn12
 for b1 in comblistmn1:
 vec0 = poly_to_vec(expand(b1),vec)
 if vec0 is not None:
 base_vectors += [vec0]
 base_monoms += [b1]
 vrs = [pu.LpVariable(name=f"a{i}",lowBound=0,cat='Integer') for i in range(len(base_vectors))]
 lp_prob = pu.LpProblem('Problem',pu.LpMinimize)
 lp_prob += 0
 for i in range(len(vec)):
 eq = 0
 for j in range(len(base_vectors)):
 if base_vectors[j][i]!=0:
 eq += int(base_vectors[j][i])*vrs[j]
 lp_prob += eq == int(vec[i])
 try:
 status = lp_prob.solve(pu.XPRESS_PY(msg=0))
 except pu.PulpSolverError:
 status = lp_prob.solve(pu.PULP_CBC_CMD(msg=0))
 if status != 1:
 print(f"Counterexample {val}")
 exit(1)
 val = 0
 for i in range(len(base_vectors)):
 x = vrs[i].value()
 b1 = base_monoms[i]
 if x!=0:
 val += int(x)*b1
 return val
print(compute_positive_rep((var2[1]-var3[5])*(var2[5]-var3[1]) + (var2[3] + var2[7] - var3[1] - var3[2])*(var2[2] + var2[3] - var3[7] - var3[8]) - (var2[3] - var3[1])*(var2[2] - var3[7])))

This script runs for me, and below is the output:

(y_5 - z_1)*(y_1 - z_5) + (y_7 - z_1)*(y_3 - z_8) + (y_7 - z_2)*(y_2 - z_7) + (y_3 - z_8)*(y_3 - z_2)

It has not directly manipulated the expression to cancel the negative term, but rather started from scratch to try to express this positively.

You may need to pip install xpress if you don't have it already in your pulp installation.

Included below is a more substantial example that takes a very long time on my machine. Replace the last line with this.

print(compute_positive_rep((var2[3] - var3[1])*((var2[1] + var2[3] + var2[4] + var2[6] - var3[1] - var3[3] - var3[5] - var3[6])*(((var2[6] - var3[1])*(var2[4] - var3[1])*(var2[1] + var2[3] - var3[1] - var3[2]) + (var2[3] - var3[2])*(var2[1] - var3[2])*(var2[4] + var2[6] - var3[1] - var3[2]))*((var2[4] - var3[3])*(var2[6] - var3[3]) + (var2[1] - var3[4])*(var2[3] - var3[4]) + (var2[4] + var2[6] - var3[1] - var3[3])*(var2[1] + var2[3] - var3[1] - var3[4])) - (var2[6] - var3[1])*(var2[3] - var3[1])*(var2[1] - var3[1])*(var2[4] - var3[1])*(var2[1] + var2[3] + var2[4] + var2[6] - var3[1] - var3[2] - var3[3] - var3[4])) - (var2[6] - var3[1])*(var2[3] - var3[1])*(var2[1] - var3[1])*(var2[4] - var3[1])*((var2[4] - var3[3])*(var2[6] - var3[3]) + (var2[1] - var3[4])*(var2[3] - var3[4]) + (var2[1] + var2[3] - var3[2] - var3[4])*(var2[4] + var2[6] - var3[2] - var3[3])) - (((var2[6] - var3[1])*(var2[4] - var3[1])*(var2[1] + var2[3] - var3[1] - var3[3]) + (var2[1] - var3[3])*(var2[3] - var3[3])*(var2[4] + var2[6] - var3[1] - var3[3]))*((var2[6] - var3[1])*(var2[4] - var3[1])*(var2[1] + var2[3] - var3[1] - var3[2]) + (var2[3] - var3[2])*(var2[1] - var3[2])*(var2[4] + var2[6] - var3[1] - var3[2])) - (var2[6] - var3[1])*(var2[3] - var3[1])*(var2[1] - var3[1])*(var2[4] - var3[1])*((var2[4] - var3[2])*(var2[6] - var3[2]) + (var2[1] - var3[3])*(var2[3] - var3[3]) + (var2[4] + var2[6] - var3[1] - var3[2])*(var2[1] + var2[3] - var3[1] - var3[3])))) - (-(var2[6] - var3[1])*(var2[3] - var3[1])*(var2[1] - var3[1])*(var2[4] - var3[1])*((var2[4] - var3[2])*(var2[6] - var3[2])*(var2[1] + var2[3] - var3[2] - var3[3]) + (var2[1] - var3[3])*(var2[3] - var3[3])*(var2[4] + var2[6] - var3[2] - var3[3])) + (var2[6] - var3[1])*(var2[3] - var3[1])*(var2[1] - var3[1])*(var2[4] - var3[1])*(var2[1] + var2[3] + var2[4] + var2[6] - var3[2] - var3[3] - var3[5] - var3[6])*((var2[4] - var3[3])*(var2[6] - var3[3]) + (var2[1] - var3[4])*(var2[3] - var3[4]) + (var2[1] + var2[3] - var3[2] - var3[4])*(var2[4] + var2[6] - var3[2] - var3[3])))))

Question 2

Can you explain more about how you're linearizing all of those products? Are any of the variables binary? Or are the coefficients to those products constants and not variables?

Question 3

@Reinderien So I use symegine's function expand which will fully expand the result in terms of monomials with integer coefficients. The variables don't have anything substituted, these are abstract polynomials. It arbitrarily assigned indexes to the monomials and populates "monom_to_vec" which is a lookup translating the monomial to the index. Then, given an expanded polynomial, poly_to_vec will fill in those coordinates which are the integer coefficients of the monomials in a numpy array. So the value at the index in the array is the coefficient of the monomial. Is this clearer?

Question 4

What defines success here? Must the expression be as factored as possible, even if that means nesting depth greater than one?

Question 5

@Reinderien So we're not trying to factorize here per se. We want any expression in terms of products of terms like (y_5 - z_1) with nonnegative coefficients. So (y_5 - z_1)*(y_1-z_2)+(y_5-z_1)*(y_3-z_5) would be good, and the reason this is so hard is because you could also express the same polynomial as (y_5 - z_1)*(y_1-z_5)+(y_5-z_1)*(y_3-z_2). We're not looking for a unique expression or factorized, just positive in terms of terms like that. The assumption is that this is always possible (if it's not in a certain case, we don't care about that case).

Question 6

Must each sub expression have two terms? Why not write (y_5 - z_1)*(y_1-z_2)+(y_5-z_1)*(y_3-z_5) as (y_5 - z_1)*(y_1-z_2+y_3-z_5)?

Question 7

A lot of the more technical terms (basis vectors, etc.) are lost on me, so I can only partially speak to the theory in this question.

my code uses linear optimization

Ish. From lp_prob += 0 (which should have instead used setObjective) we can see that there actually isn't an optimization objective at all. This is a linear program, and qualifies as MILP, but is only constraint programming and not minimization.

Since this is for theoretical mathematics, a (shaky) argument could be made that you're better off constructing your LP matrices directly instead of through PuLP. Advantages are typically faster setup time, a more bare-metal representation of the problem construction, and easier graphing of the sparsity pattern; disadvantages are that it takes more code and relies more heavily on knowledge of linear algebra (especially: Kronecker products proliferate).

I want to use sympy, so I cast it to a string and have sympy then parse it

I think you can trim off several of your dependencies. I was able to hack through a solution with only scipy, numpy and sympy, which I consider more standard than symengine and xpress.

each of the y and z terms have unit coefficient (depending on the sign), but when we express it in this level of generality some of them can be the same

This is expressible by just limiting every factor to have unique variables, and allowing multiple factors.

My program expresses it with factors of exactly one y with exactly one z

This isn't strictly necessary. I demonstrate how to allow for anywhere from two variables to all variables on either side of each factor.

is there a way to speed this up, perhaps by trimming down the number of basis vectors?

Ill-defined. You haven't shown any profiling results (and I don't, either, other than measuring setup and solve times).

Included below is a more substantial example that takes a very long time on my machine.

That example is quite different from the first two, since it has cubic terms and so can't be solved with the pair-wise LP construction I showed. Oh well - perhaps that can still serve as inspiration.

Fiddly bits about the code:

Avoid global side-effects.

except; return None is not great - the exception is too broad, and anyway, better to just let the exception fall through and let the caller catch it (or perhaps re-throw a domain-specific exception), which is how Python exception handling was designed to work.

Run your code through a PEP8 linter. For one example, you don't have enough newlines between functions.

Add type hints to your function signatures.

Include a __main__ guard.

Example code

This is more general in one sense, as it accommodates for longer factor expressions, but less general in another sense as it can only handle order-2 polynomials.

import time
from typing import Iterator
import matplotlib.pyplot as plt
import numpy as np
import sympy
from scipy.optimize import milp, Bounds, LinearConstraint
import scipy.sparse as sp
'''
Factorize an expression to the form
a*(yi + yj + ... - zk - zl + ...)(ym + yn + ... - zo - zp + ...) + ...
where:
- the outer product's coefficient a is always a positive integer
- the sign of y terms is always positive
- the sign of z terms is always negative
- within one factor, the number of y is equal to the number of z
- within one factor, there must be at least one y and one z
LP Variables:
n_max_factors: non-negative outer factor integers
n_max_factors * 2 * n_vars: binary selection variables, +1 for y, -1 for z
n_max_factors * n_addends: addends per factor, left variable, and right variable
'''
n_max_factors = 4 # Estimate; will need increasing for larger problems
def parse(expr: str) -> tuple[
 sympy.Expr, # The parsed expression object
 list[sympy.Symbol], # All free symbols, sorted
 int, # number of variables
 int, # number of y variables
 int, # number of z variables
 int, # number of unique addends
]:
 # Parse and immediately expand to SOP form
 input_expr = sympy.expand(sympy.parse_expr(expr))
 # Unique tuple of all symbols sorted by name and index
 symbols = sorted(input_expr.free_symbols, key=lambda s: s.name)
 n_vars = len(symbols) # Number of unique polynomial variables
 n_y = sum(s.name.startswith('y') for s in symbols) # Length of y vector
 n_z = sum(s.name.startswith('z') for s in symbols) # Length of z vector
 # For each combination of the variables. The number of unique multiplicative terms.
 n_addends = n_vars * n_vars
 return input_expr, symbols, n_vars, n_y, n_z, n_addends
def flatten_factors(term: sympy.Expr) -> Iterator[sympy.Expr]:
 """
 Flatten a multiplicative Sympy expression into an iterator of terms
 """
 if isinstance(term, (sympy.Symbol, sympy.Number)):
 yield term
 elif isinstance(term, sympy.Pow):
 yield from tuple(flatten_factors(term.args[0])) * term.args[1]
 elif isinstance(term, sympy.Mul):
 for arg in term.args:
 yield from flatten_factors(arg)
 else:
 raise NotImplementedError()
def iterate_equation(
 input_expr: sympy.Expr,
 symbols: list[sympy.Symbol],
 n_vars: int,
 n_y: int,
 n_z: int,
) -> tuple[
 np.ndarray, # equation coefficients, square, dense
 np.ndarray, # flattened upper triangular equation coefficients
 np.ndarray, # signs for each variable, one dimension
 np.ndarray, # matrix of signs for each variable pair
 np.ndarray, # upper triangular indices, first dimension
 np.ndarray, # upper triangular indices, second dimension
]:
 # Equation coefficient matrix, dense, square
 equation_coefficients = np.zeros((n_vars, n_vars))
 for term in input_expr.args:
 args = tuple(flatten_factors(term))
 coef = next(
 (s for s in args if isinstance(s, sympy.Number)),
 1,
 )
 syms = sorted(
 (s for s in args if isinstance(s, sympy.Symbol)),
 key=lambda s: s.name,
 )
 if len(syms) != 2:
 raise NotImplementedError()
 i = symbols.index(syms[0])
 j = symbols.index(syms[1])
 equation_coefficients[i, j] = coef
 # Flattened version of the upper triangular half of the equation matrix
 equation_triu = equation_coefficients[np.triu_indices(n_vars)]
 # Vector of signs over one dimension of the equation matrix
 equation_sign_vector = np.concatenate((
 np.ones(n_y), -np.ones(n_z),
 ))
 # Square outer product matrix of signs, dense, square
 equation_signs = np.outer(equation_sign_vector, equation_sign_vector)
 trii, trij = np.triu_indices(n_vars)
 return equation_coefficients, equation_triu, equation_sign_vector, equation_signs, trii, trij
def make_bounds(
 equation_signs: np.ndarray,
 equation_sign_vector: np.ndarray,
 n_addends: int,
) -> Bounds:
 # Bounds for selectors
 sel_lower = np.tile(equation_sign_vector*0.5-0.5, n_max_factors*2)
 sel_upper = np.tile(equation_sign_vector*0.5+0.5, n_max_factors*2)
 # Bounds for addends
 add_lower = np.zeros((n_max_factors, n_addends))
 add_upper = np.zeros((n_max_factors, n_addends))
 add_lower[:, equation_signs.ravel() == -1] = -np.inf
 add_upper[:, equation_signs.ravel() == +1] = +np.inf
 return Bounds(
 lb=np.concatenate((
 np.zeros(n_max_factors), # outer factor coefficients
 sel_lower , # selectors
 add_lower.ravel(),
 )),
 ub=np.concatenate((
 np.full(shape=n_max_factors, fill_value=np.inf),
 sel_upper,
 add_upper.ravel(),
 )),
 )
def make_constraints(
 equation_triu: np.ndarray,
 equation_sign_vector: np.ndarray,
 equation_signs: np.ndarray,
 trii: np.ndarray,
 trij: np.ndarray,
 n_vars: int,
 n_addends: int,
) -> tuple[
 tuple[LinearConstraint, ...],
 int,
]:
 # Per factor, there must be at least two variables selected
 constraint_factorcount = LinearConstraint(
 A=sp.hstack((
 sp.csc_array((n_max_factors*2, n_max_factors)),
 sp.kron(
 A=sp.eye(n_max_factors*2),
 B=equation_sign_vector,
 ),
 sp.csc_array((n_max_factors*2, n_max_factors * n_addends)),
 ), format='csc'),
 lb=2,
 )
 # Per factor, the number of y and z variables must be equal
 constraint_factorparity = LinearConstraint(
 A=sp.hstack((
 sp.csc_array((n_max_factors*2, n_max_factors)),
 sp.kron(
 A=sp.eye(n_max_factors*2),
 B=np.ones(n_vars),
 ),
 sp.csc_array((n_max_factors*2, n_max_factors * n_addends)),
 ), format='csc'),
 lb=0,
 ub=0,
 )
 # The sum of each addend over all factors is pinned at the equation value
 A_equation = sp.lil_array((len(equation_triu), n_addends))
 vidx = np.arange(len(equation_triu))
 A_equation[vidx, trii*n_vars + trij] = 1
 A_equation[vidx, trij*n_vars + trii] = 1
 constraint_equation = LinearConstraint(
 A=sp.hstack((
 sp.csc_array((len(equation_triu), n_max_factors)),
 sp.csc_array((len(equation_triu), n_max_factors*2*n_vars)),
 sp.kron(
 A=np.ones(n_max_factors),
 B=A_equation,
 ),
 ), format='csc'),
 lb=equation_triu,
 ub=equation_triu,
 )
 # Each addend will never exceed its corresponding outer factor
 # positive addend < factor -> factor - addend > 0
 # negative addend > -factor -> factor + addend > 0
 constraint_addends_factor = LinearConstraint(
 A=sp.hstack((
 sp.kron(
 A=sp.eye(n_max_factors),
 B=np.ones((n_addends, 1)),
 ),
 sp.csc_array((n_max_factors * n_addends, n_max_factors*2*n_vars)),
 sp.kron(
 A=-sp.eye(n_max_factors),
 B=sp.diags(equation_signs.ravel()),
 )
 ), format='csc'),
 lb=0,
 )
 '''
 If either of a given selector in a pair is zero, the corresponding addend must be zero
 addend == factor * sela * selb * eqsign
 addend*asign <= M*asel*vsign -> M*asel*vsign - addend*asign >= 0
 addend*asign <= M*bsel*vsign -> M*bsel*vsign - addend*asign >= 0
 '''
 M = 10*n_max_factors*np.abs(equation_triu).max() # rough estimate; must exceed all factors
 A_add = sp.vstack((
 sp.diags(-equation_signs.ravel()),
 sp.lil_array((n_addends, n_addends)),
 ), format='lil')
 i, j = np.indices((n_vars, n_vars))
 A_add[
 n_vars*n_vars + n_vars*i + j,
 n_vars*j + i,
 ] = -equation_signs
 constraint_addends_zero = LinearConstraint(
 A=sp.hstack((
 sp.csc_array((n_max_factors*2*n_addends, n_max_factors)),
 sp.kron(
 A=sp.kron(
 A=sp.eye(n_max_factors * 2),
 B=sp.diags(M * equation_sign_vector),
 ),
 B=np.ones((n_vars, 1)),
 ),
 sp.kron(
 A=sp.eye(n_max_factors),
 B=A_add,
 ),
 ), format='csc'),
 lb=0,
 )
 '''
 If both selectors in a pair are non-zero, the corresponding addend must equal the factor
 addend*asign >= factor - M(1 - sl*sgnl) - M(1 - sr*sgnr)
 2M >= factor + Msl*sgnl + Msr*sgnr - asign*addend
 '''
 constraint_addends_nonzero = LinearConstraint(
 A=sp.hstack((
 sp.kron(
 A=sp.eye(n_max_factors),
 B=np.ones((n_addends, 1)),
 ),
 sp.kron(
 A=sp.eye(n_max_factors),
 B=sp.hstack((
 sp.kron(
 A=sp.diags(M * equation_sign_vector),
 B=np.ones((n_vars, 1)),
 ),
 sp.kron(
 A=np.ones((n_vars, 1)),
 B=sp.diags(M * equation_sign_vector),
 )
 )),
 ),
 sp.kron(
 A=sp.eye(n_max_factors),
 B=sp.diags(-equation_signs.ravel()),
 ),
 ), format='csc'),
 ub=2*M,
 )
 constraints = (
 constraint_factorcount,
 constraint_factorparity,
 constraint_equation,
 constraint_addends_factor,
 constraint_addends_zero,
 constraint_addends_nonzero,
 )
 n_rows = sum(
 con.A.shape[0] for con in constraints
 )
 return constraints, n_rows
def solve(
 bounds: Bounds,
 constraints: tuple[LinearConstraint, ...],
 n_vars: int,
 n_addends: int,
) -> np.ndarray:
 result = milp(
 # This is constraint programming; there is no minimization objective
 c=np.zeros(n_max_factors + n_max_factors*2*n_vars + n_max_factors*n_addends),
 integrality=np.concatenate((
 np.ones(n_max_factors), # Coefficients are integral
 np.ones(n_max_factors * 2 * n_vars), # selectors are integral
 np.zeros(n_max_factors * n_addends), # addends are continuous
 )),
 bounds=bounds,
 constraints=constraints,
 )
 if not result.success:
 raise ValueError(result.message)
 print(result.message)
 return result.x
def triu_fold(square: np.ndarray) -> np.ndarray:
 """
 Convert a matrix so that its upper triangular is the sum of the transpose and the lower triangular is zero
 """
 output = square + square.T
 trii, trij = np.tril_indices(n=square.shape[0], k=-1)
 output[trii, trij] = 0
 output[np.diag_indices_from(output)] = np.diag(square)
 return output
def process_results(
 result_x: np.ndarray,
 symbols: list[sympy.Symbol],
 n_vars: int,
) -> tuple[
 np.ndarray, # outer factor coefficients
 np.ndarray, # binary selectors
 np.ndarray, # term addends
 np.ndarray, # term addends, sum
 np.ndarray, # term addends, upper triangular
 sympy.Expr, # final output expression
]:
 coeffs, selectors, addends = np.split(
 result_x.round().astype(int),
 (n_max_factors, n_max_factors + n_max_factors*2*n_vars),
 )
 selectors = selectors.reshape((n_max_factors, 2, n_vars))
 addends = addends.reshape((n_max_factors, n_vars, n_vars))
 addends_total = addends.sum(axis=0)
 addends_triu = triu_fold(addends_total)
 expr = sum(
 coeff
 * factor_selectors[0, :].dot(symbols)
 * factor_selectors[1, :].dot(symbols)
 for coeff, factor_selectors in zip(coeffs, selectors)
 )
 return coeffs, selectors, addends, addends_total, addends_triu, expr
def print_results(
 eqn: str,
 coeffs: np.ndarray,
 selectors: np.ndarray,
 addends: np.ndarray,
 addends_triu: np.ndarray,
 equation_coefficients: np.ndarray,
 expr: sympy.Expr,
 verbose: bool = False,
) -> None:
 if verbose:
 print(f'Selectors {selectors.shape}:')
 print(selectors.reshape((-1, addends.shape[1])))
 print(f'Term addends, output {addends.shape}:')
 print(addends)
 print(f'Term addends, upper triangular {addends_triu.shape}:')
 print(addends_triu)
 print(f'Term addends, input {equation_coefficients.shape}:')
 print(equation_coefficients.round().astype(int))
 print('Outer factor coefficients:', coeffs)
 print('Input expression:', eqn)
 print('Final expression:', expr)
def test(
 equation_coefficients: np.ndarray,
 coeffs: np.ndarray,
 selectors: np.ndarray,
 addends_full: np.ndarray,
 addends_triu: np.ndarray,
) -> None:
 assert np.allclose(equation_coefficients, addends_triu)
 addends_reco = np.zeros_like(addends_triu)
 for i_factor, (coef, (lhs, rhs), addend_slice) in enumerate(zip(coeffs, selectors, addends_full)):
 product = coef*np.outer(lhs, rhs)
 assert np.allclose(addend_slice, product)
 addends_reco += product
 addends_reco = triu_fold(addends_reco)
 assert np.allclose(addends_reco, addends_triu)
def eval_constraints(
 constraints: tuple[LinearConstraint, ...],
 result_x: np.ndarray,
) -> None:
 for constraint in constraints:
 evald = constraint.A @ result_x
 print('Evaluated constraint vector:')
 if evald.size % 10:
 print(evald)
 else:
 print(evald.reshape((-1, 10)))
 print('Lower bound:', constraint.lb)
 print(evald >= constraint.lb)
 print('Upper bound:', constraint.ub)
 print(evald <= constraint.ub)
 print()
def graph_constraints(
 eqn: str,
 constraints: tuple[LinearConstraint, ...],
) -> plt.Figure:
 all_constraints = sp.vstack([
 con.A for con in constraints
 ]).T.toarray() != 0
 fig, ax = plt.subplots()
 fig: plt.Figure
 ax: plt.Axes
 fig.tight_layout()
 ax.imshow(all_constraints, cmap='binary')
 ax.set_title('Sparsity pattern (transpose) to solve\n' + eqn)
 return fig
def run(eqn: str) -> None:
 start = time.perf_counter()
 input_expr, symbols, n_vars, n_y, n_z, n_addends = parse(eqn)
 (
 equation_coefficients, equation_triu, equation_sign_vector, equation_signs, trii, trij,
 ) = iterate_equation(input_expr, symbols, n_vars, n_y, n_z)
 bounds = make_bounds(equation_signs, equation_sign_vector, n_addends)
 constraints, n_rows = make_constraints(
 equation_triu, equation_sign_vector, equation_signs, trii, trij, n_vars, n_addends,
 )
 setup = time.perf_counter()
 result = solve(bounds, constraints, n_vars, n_addends)
 solved = time.perf_counter()
 coeffs, selectors, addends_full, addends_total, addends_triu, expr = process_results(result, symbols, n_vars)
 print(f'Problem size: {n_rows}x{result.size}')
 print(f'Setup time: {(setup - start)*1e3:.1f} ms')
 print(f'Solve time: {(solved - setup)*1e3:.1f} ms')
 print_results(
 eqn, coeffs, selectors, addends_total, addends_triu, equation_coefficients, expr,
 )
 print()
 graph_constraints(eqn, constraints)
 # eval_constraints(constraints, result)
 test(equation_coefficients, coeffs, selectors, addends_full, addends_triu)
def main() -> None:
 # In-question first example
 run(
 '(y_1 - z_1)*(y_3 - z_2) '
 '+ (y_1 + y_2 - z_1 - z_2)*(y_1 - z_3) '
 '- (y_1 - z_2)*(y_1 - z_3)'
 )
 # Example hard-coded in OP script
 run(
 '(y_1-z_5)*(y_5-z_1) '
 '+ (y_3 + y_7 - z_1 - z_2)*(y_2 + y_3 - z_7 - z_8) '
 '- (y_3 - z_1)*(y_2 - z_7)'
 )
 # From chat
 run(
 '2*y_1**2 + 3*y_1*y_3 + 3*y_1*y_4 + 2*y_1*y_5 + 3*y_1*y_6 + 3*y_1*y_7 - 4*y_1*z_1 - 4*y_1*z_2 - 3*y_1*z_3 - 3*y_1*z_4 - 2*y_1*z_5 - 2*y_1*z_6 + y_3**2 + 2*y_3*y_4 + y_3*y_5 + 2*y_3*y_6 + 2*y_3*y_7 - 3*y_3*z_1 - 3*y_3*z_2 - 2*y_3*z_3 - 2*y_3*z_4 - y_3*z_5 - y_3*z_6 + y_4**2 + y_4*y_5 +'
 '2*y_4*y_6 + 2*y_4*y_7 - 3*y_4*z_1 - 3*y_4*z_2 - 2*y_4*z_3 - 2*y_4*z_4 - y_4*z_5 - y_4*z_6 + y_5*y_6 + y_5*y_7 - 2*y_5*z_1 - 2*y_5*z_2 - y_5*z_3 - y_5*z_4 + y_6**2 + 2*y_6*y_7 - 3*y_6*z_1 - 3*y_6*z_2 - 2*y_6*z_3 - 2*y_6*z_4 - y_6*z_5 - y_6*z_6 + y_7**2 - 3*y_7*z_1 - 3*y_7*z_2 - 2*y_7*z_3 - 2*y_7*z_4 - y_7*z_5'
 '- y_7*z_6 + 2*z_1**2 + 4*z_1*z_2 + 3*z_1*z_3 + 3*z_1*z_4 + 2*z_1*z_5 + 2*z_1*z_6 + 2*z_2**2 + 3*z_2*z_3 + 3*z_2*z_4 + 2*z_2*z_5 + 2*z_2*z_6 + z_3**2 + 2*z_3*z_4 + z_3*z_5 + z_3*z_6 + z_4**2 + z_4*z_5 + z_4*z_6'
 '+' # Missing from pasted expression, inferred from context
 '(y_1 - z_1)*(y_7 - z_2) + (y_1 - z_1)*(y_1 - z_3) + (y_1 - z_1)*(y_6 - z_4) + 2*(y_1 - z_1)*(y_4 - z_6) + (y_3 - z_1)*(y_1 - z_2) + (y_3 - z_1)*(y_3 - z_4) + (y_4 - z_1)*(y_7 - z_1) + (y_4 - z_1)*(y_1 - z_2) + (y_4 - z_1)*(y_6 - z_2) + (y_4 - z_1)*(y_3 - z_4) + (y_6 - z_1)*(y_3 - z_3)'
 '+ (y_6 - z_1)*(y_6 - z_3) + (y_6 - z_1)*(y_5 - z_5) + (y_7 - z_1)*(y_5 - z_1) + (y_7 - z_1)*(y_1 - z_5) + (y_1 - z_2)*(y_7 - z_4) + 2*(y_1 - z_2)*(y_5 - z_5) + (y_3 - z_2)*(y_6 - z_6) + (y_4 - z_2)*(y_4 - z_4) + (y_7 - z_2)*(y_3 - z_2) + (y_7 - z_2)*(y_6 - z_2) + (y_7 - z_2)*(y_6 - z_3) + (y_7 - z_2)*(y_3 - z_4) + 2*(y_1 - z_3)*(y_3 - z_2)'
 '+ (y_1 - z_3)*(y_1 - z_4) + (y_4 - z_3)*(y_5 - z_3) + (y_7 - z_3)*(y_7 - z_6) + (y_1 - z_4)*(y_6 - z_3) + (y_4 - z_4)*(y_6 - z_5) + (y_4 - z_4)*(y_7 - z_6) + (y_5 - z_4)*(y_3 - z_4) + (y_4 - z_5)*(y_3 - z_3) + (y_1 - z_6)*(y_6 - z_2)'
 )
 
 # Last example in question; has cubics so cannot be solved in pair-factor form
 '''
 run(
 '(y_3 - z_1)*((y_1 + y_3 + y_4 + y_6 - z_1 - z_3 - z_5 - z_6)*(((y_6 - z_1)*(y_4 - z_1)*(y_1 + y_3 - z_1 - z_2) '
 '+ (y_3 - z_2)*(y_1 - z_2)*(y_4 + y_6 - z_1 - z_2))*((y_4 - z_3)*(y_6 - z_3) + (y_1 - z_4)*(y_3 - z_4) '
 '+ (y_4 + y_6 - z_1 - z_3)*(y_1 + y_3 - z_1 - z_4)) '
 '- (y_6 - z_1)*(y_3 - z_1)*(y_1 - z_1)*(y_4 - z_1)*(y_1 + y_3 + y_4 + y_6 - z_1 - z_2 - z_3 - z_4)) '
 '- (y_6 - z_1)*(y_3 - z_1)*(y_1 - z_1)*(y_4 - z_1)*((y_4 - z_3)*(y_6 - z_3) + (y_1 - z_4)*(y_3 - z_4) '
 '+ (y_1 + y_3 - z_2 - z_4)*(y_4 + y_6 - z_2 - z_3)) - (((y_6 - z_1)*(y_4 - z_1)*(y_1 + y_3 - z_1 - z_3) '
 '+ (y_1 - z_3)*(y_3 - z_3)*(y_4 + y_6 - z_1 - z_3))*((y_6 - z_1)*(y_4 - z_1)*(y_1 + y_3 - z_1 - z_2) '
 '+ (y_3 - z_2)*(y_1 - z_2)*(y_4 + y_6 - z_1 - z_2)) '
 '- (y_6 - z_1)*(y_3 - z_1)*(y_1 - z_1)*(y_4 - z_1)*((y_4 - z_2)*(y_6 - z_2) + (y_1 - z_3)*(y_3 - z_3) '
 '+ (y_4 + y_6 - z_1 - z_2)*(y_1 + y_3 - z_1 - z_3)))) '
 '- (-(y_6 - z_1)*(y_3 - z_1)*(y_1 - z_1)*(y_4 - z_1)*((y_4 - z_2)*(y_6 - z_2)*(y_1 + y_3 - z_2 - z_3) '
 '+ (y_1 - z_3)*(y_3 - z_3)*(y_4 + y_6 - z_2 - z_3)) '
 '+ (y_6 - z_1)*(y_3 - z_1)*(y_1 - z_1)*(y_4 - z_1)*(y_1 + y_3 + y_4 + y_6 - z_2 - z_3 - z_5 - z_6)*((y_4 - z_3)*(y_6 - z_3) '
 '+ (y_1 - z_4)*(y_3 - z_4) + (y_1 + y_3 - z_2 - z_4)*(y_4 + y_6 - z_2 - z_3)))'
 )
 '''
 plt.show()
if __name__ == '__main__':
 main()

Optimization terminated successfully. (HiGHS Status 7: Optimal)
Problem size: 613x196
Setup time: 62.9 ms
Solve time: 33.4 ms
Input expression: (y_1 - z_1)*(y_3 - z_2) + (y_1 + y_2 - z_1 - z_2)*(y_1 - z_3) - (y_1 - z_2)*(y_1 - z_3)
Final expression: (y_1 - z_1)*(y_3 - z_2) + (y_1 - z_3)*(y_2 - z_1)
Optimization terminated successfully. (HiGHS Status 7: Optimal)
Problem size: 1671x484
Setup time: 29.5 ms
Solve time: 21.3 ms
Input expression: (y_1-z_5)*(y_5-z_1) + (y_3 + y_7 - z_1 - z_2)*(y_2 + y_3 - z_7 - z_8) - (y_3 - z_1)*(y_2 - z_7)
Final expression: (y_1 - z_5)*(y_5 - z_1) + (y_2 - z_7)*(y_7 - z_2) + (y_3 - z_1)*(y_3 - z_8) + (y_3 - z_8)*(y_7 - z_2)
Optimization terminated successfully. (HiGHS Status 7: Optimal)
Problem size: 2398x676
Setup time: 142.7 ms
Solve time: 939.3 ms
Input expression: 2*y_1**2 + 3*y_1*y_3 + 3*y_1*y_4 + 2*y_1*y_5 + 3*y_1*y_6 + 3*y_1*y_7 - 4*y_1*z_1 - 4*y_1*z_2 - 3*y_1*z_3 - 3*y_1*z_4 - 2*y_1*z_5 - 2*y_1*z_6 + y_3**2 + 2*y_3*y_4 + y_3*y_5 + 2*y_3*y_6 + 2*y_3*y_7 - 3*y_3*z_1 - 3*y_3*z_2 - 2*y_3*z_3 - 2*y_3*z_4 - y_3*z_5 - y_3*z_6 + y_4**2 + y_4*y_5 +2*y_4*y_6 + 2*y_4*y_7 - 3*y_4*z_1 - 3*y_4*z_2 - 2*y_4*z_3 - 2*y_4*z_4 - y_4*z_5 - y_4*z_6 + y_5*y_6 + y_5*y_7 - 2*y_5*z_1 - 2*y_5*z_2 - y_5*z_3 - y_5*z_4 + y_6**2 + 2*y_6*y_7 - 3*y_6*z_1 - 3*y_6*z_2 - 2*y_6*z_3 - 2*y_6*z_4 - y_6*z_5 - y_6*z_6 + y_7**2 - 3*y_7*z_1 - 3*y_7*z_2 - 2*y_7*z_3 - 2*y_7*z_4 - y_7*z_5- y_7*z_6 + 2*z_1**2 + 4*z_1*z_2 + 3*z_1*z_3 + 3*z_1*z_4 + 2*z_1*z_5 + 2*z_1*z_6 + 2*z_2**2 + 3*z_2*z_3 + 3*z_2*z_4 + 2*z_2*z_5 + 2*z_2*z_6 + z_3**2 + 2*z_3*z_4 + z_3*z_5 + z_3*z_6 + z_4**2 + z_4*z_5 + z_4*z_6+(y_1 - z_1)*(y_7 - z_2) + (y_1 - z_1)*(y_1 - z_3) + (y_1 - z_1)*(y_6 - z_4) + 2*(y_1 - z_1)*(y_4 - z_6) + (y_3 - z_1)*(y_1 - z_2) + (y_3 - z_1)*(y_3 - z_4) + (y_4 - z_1)*(y_7 - z_1) + (y_4 - z_1)*(y_1 - z_2) + (y_4 - z_1)*(y_6 - z_2) + (y_4 - z_1)*(y_3 - z_4) + (y_6 - z_1)*(y_3 - z_3)+ (y_6 - z_1)*(y_6 - z_3) + (y_6 - z_1)*(y_5 - z_5) + (y_7 - z_1)*(y_5 - z_1) + (y_7 - z_1)*(y_1 - z_5) + (y_1 - z_2)*(y_7 - z_4) + 2*(y_1 - z_2)*(y_5 - z_5) + (y_3 - z_2)*(y_6 - z_6) + (y_4 - z_2)*(y_4 - z_4) + (y_7 - z_2)*(y_3 - z_2) + (y_7 - z_2)*(y_6 - z_2) + (y_7 - z_2)*(y_6 - z_3) + (y_7 - z_2)*(y_3 - z_4) + 2*(y_1 - z_3)*(y_3 - z_2)+ (y_1 - z_3)*(y_1 - z_4) + (y_4 - z_3)*(y_5 - z_3) + (y_7 - z_3)*(y_7 - z_6) + (y_1 - z_4)*(y_6 - z_3) + (y_4 - z_4)*(y_6 - z_5) + (y_4 - z_4)*(y_7 - z_6) + (y_5 - z_4)*(y_3 - z_4) + (y_4 - z_5)*(y_3 - z_3) + (y_1 - z_6)*(y_6 - z_2)
Final expression: (2*y_1 + 2*y_7 - 2*z_1 - 2*z_2)*(y_1 + y_3 + y_4 + y_5 + y_6 + y_7 - z_1 - z_2 - z_3 - z_4 - z_5 - z_6) + (y_1 + y_3 + y_4 + y_6 - z_1 - z_2 - z_3 - z_4)*(2*y_1 + 2*y_3 + 2*y_4 + 2*y_5 + 2*y_6 + 2*y_7 - 2*z_1 - 2*z_2 - 2*z_3 - 2*z_4 - 2*z_5 - 2*z_6)

sparsity

Question 8

Thanks for your answer. I haven't looked at it fully yet, but I'm hoping it generalizes to polynomials of higher degree. I'll take a look.

Question 9

It will generalize to higher degrees (with code modifications), but it will scale very poorly. With an n-degree polynomial, every top-level term will need up to n factors, requiring n_vars**n selectors.

Question 10

So the scaling is what is needed. The 5th degree polynomial example is more typical, and we're currently running degree 7 in our test runs.

Question 11

Maybe a combination will solve that issue.

Question 12

I have an idea for an arbitrary-degree algorithm but it will take another while to implement. I can describe it in the chat.

score 3 · Answer 1 · 2023-12-07 23:29:11Z

A lot of the more technical terms (basis vectors, etc.) are lost on me, so I can only partially speak to the theory in this question.

my code uses linear optimization

Ish. From lp_prob += 0 (which should have instead used setObjective) we can see that there actually isn't an optimization objective at all. This is a linear program, and qualifies as MILP, but is only constraint programming and not minimization.

Since this is for theoretical mathematics, a (shaky) argument could be made that you're better off constructing your LP matrices directly instead of through PuLP. Advantages are typically faster setup time, a more bare-metal representation of the problem construction, and easier graphing of the sparsity pattern; disadvantages are that it takes more code and relies more heavily on knowledge of linear algebra (especially: Kronecker products proliferate).

I want to use sympy, so I cast it to a string and have sympy then parse it

I think you can trim off several of your dependencies. I was able to hack through a solution with only scipy, numpy and sympy, which I consider more standard than symengine and xpress.

each of the y and z terms have unit coefficient (depending on the sign), but when we express it in this level of generality some of them can be the same

This is expressible by just limiting every factor to have unique variables, and allowing multiple factors.

My program expresses it with factors of exactly one y with exactly one z

This isn't strictly necessary. I demonstrate how to allow for anywhere from two variables to all variables on either side of each factor.

is there a way to speed this up, perhaps by trimming down the number of basis vectors?

Ill-defined. You haven't shown any profiling results (and I don't, either, other than measuring setup and solve times).

Included below is a more substantial example that takes a very long time on my machine.

That example is quite different from the first two, since it has cubic terms and so can't be solved with the pair-wise LP construction I showed. Oh well - perhaps that can still serve as inspiration.

Fiddly bits about the code:

Avoid global side-effects.

except; return None is not great - the exception is too broad, and anyway, better to just let the exception fall through and let the caller catch it (or perhaps re-throw a domain-specific exception), which is how Python exception handling was designed to work.

Run your code through a PEP8 linter. For one example, you don't have enough newlines between functions.

Add type hints to your function signatures.

Include a __main__ guard.

Example code

This is more general in one sense, as it accommodates for longer factor expressions, but less general in another sense as it can only handle order-2 polynomials.

import time
from typing import Iterator
import matplotlib.pyplot as plt
import numpy as np
import sympy
from scipy.optimize import milp, Bounds, LinearConstraint
import scipy.sparse as sp
'''
Factorize an expression to the form
a*(yi + yj + ... - zk - zl + ...)(ym + yn + ... - zo - zp + ...) + ...
where:
- the outer product's coefficient a is always a positive integer
- the sign of y terms is always positive
- the sign of z terms is always negative
- within one factor, the number of y is equal to the number of z
- within one factor, there must be at least one y and one z
LP Variables:
n_max_factors: non-negative outer factor integers
n_max_factors * 2 * n_vars: binary selection variables, +1 for y, -1 for z
n_max_factors * n_addends: addends per factor, left variable, and right variable
'''
n_max_factors = 4 # Estimate; will need increasing for larger problems
def parse(expr: str) -> tuple[
 sympy.Expr, # The parsed expression object
 list[sympy.Symbol], # All free symbols, sorted
 int, # number of variables
 int, # number of y variables
 int, # number of z variables
 int, # number of unique addends
]:
 # Parse and immediately expand to SOP form
 input_expr = sympy.expand(sympy.parse_expr(expr))
 # Unique tuple of all symbols sorted by name and index
 symbols = sorted(input_expr.free_symbols, key=lambda s: s.name)
 n_vars = len(symbols) # Number of unique polynomial variables
 n_y = sum(s.name.startswith('y') for s in symbols) # Length of y vector
 n_z = sum(s.name.startswith('z') for s in symbols) # Length of z vector
 # For each combination of the variables. The number of unique multiplicative terms.
 n_addends = n_vars * n_vars
 return input_expr, symbols, n_vars, n_y, n_z, n_addends
def flatten_factors(term: sympy.Expr) -> Iterator[sympy.Expr]:
 """
 Flatten a multiplicative Sympy expression into an iterator of terms
 """
 if isinstance(term, (sympy.Symbol, sympy.Number)):
 yield term
 elif isinstance(term, sympy.Pow):
 yield from tuple(flatten_factors(term.args[0])) * term.args[1]
 elif isinstance(term, sympy.Mul):
 for arg in term.args:
 yield from flatten_factors(arg)
 else:
 raise NotImplementedError()
def iterate_equation(
 input_expr: sympy.Expr,
 symbols: list[sympy.Symbol],
 n_vars: int,
 n_y: int,
 n_z: int,
) -> tuple[
 np.ndarray, # equation coefficients, square, dense
 np.ndarray, # flattened upper triangular equation coefficients
 np.ndarray, # signs for each variable, one dimension
 np.ndarray, # matrix of signs for each variable pair
 np.ndarray, # upper triangular indices, first dimension
 np.ndarray, # upper triangular indices, second dimension
]:
 # Equation coefficient matrix, dense, square
 equation_coefficients = np.zeros((n_vars, n_vars))
 for term in input_expr.args:
 args = tuple(flatten_factors(term))
 coef = next(
 (s for s in args if isinstance(s, sympy.Number)),
 1,
 )
 syms = sorted(
 (s for s in args if isinstance(s, sympy.Symbol)),
 key=lambda s: s.name,
 )
 if len(syms) != 2:
 raise NotImplementedError()
 i = symbols.index(syms[0])
 j = symbols.index(syms[1])
 equation_coefficients[i, j] = coef
 # Flattened version of the upper triangular half of the equation matrix
 equation_triu = equation_coefficients[np.triu_indices(n_vars)]
 # Vector of signs over one dimension of the equation matrix
 equation_sign_vector = np.concatenate((
 np.ones(n_y), -np.ones(n_z),
 ))
 # Square outer product matrix of signs, dense, square
 equation_signs = np.outer(equation_sign_vector, equation_sign_vector)
 trii, trij = np.triu_indices(n_vars)
 return equation_coefficients, equation_triu, equation_sign_vector, equation_signs, trii, trij
def make_bounds(
 equation_signs: np.ndarray,
 equation_sign_vector: np.ndarray,
 n_addends: int,
) -> Bounds:
 # Bounds for selectors
 sel_lower = np.tile(equation_sign_vector*0.5-0.5, n_max_factors*2)
 sel_upper = np.tile(equation_sign_vector*0.5+0.5, n_max_factors*2)
 # Bounds for addends
 add_lower = np.zeros((n_max_factors, n_addends))
 add_upper = np.zeros((n_max_factors, n_addends))
 add_lower[:, equation_signs.ravel() == -1] = -np.inf
 add_upper[:, equation_signs.ravel() == +1] = +np.inf
 return Bounds(
 lb=np.concatenate((
 np.zeros(n_max_factors), # outer factor coefficients
 sel_lower , # selectors
 add_lower.ravel(),
 )),
 ub=np.concatenate((
 np.full(shape=n_max_factors, fill_value=np.inf),
 sel_upper,
 add_upper.ravel(),
 )),
 )
def make_constraints(
 equation_triu: np.ndarray,
 equation_sign_vector: np.ndarray,
 equation_signs: np.ndarray,
 trii: np.ndarray,
 trij: np.ndarray,
 n_vars: int,
 n_addends: int,
) -> tuple[
 tuple[LinearConstraint, ...],
 int,
]:
 # Per factor, there must be at least two variables selected
 constraint_factorcount = LinearConstraint(
 A=sp.hstack((
 sp.csc_array((n_max_factors*2, n_max_factors)),
 sp.kron(
 A=sp.eye(n_max_factors*2),
 B=equation_sign_vector,
 ),
 sp.csc_array((n_max_factors*2, n_max_factors * n_addends)),
 ), format='csc'),
 lb=2,
 )
 # Per factor, the number of y and z variables must be equal
 constraint_factorparity = LinearConstraint(
 A=sp.hstack((
 sp.csc_array((n_max_factors*2, n_max_factors)),
 sp.kron(
 A=sp.eye(n_max_factors*2),
 B=np.ones(n_vars),
 ),
 sp.csc_array((n_max_factors*2, n_max_factors * n_addends)),
 ), format='csc'),
 lb=0,
 ub=0,
 )
 # The sum of each addend over all factors is pinned at the equation value
 A_equation = sp.lil_array((len(equation_triu), n_addends))
 vidx = np.arange(len(equation_triu))
 A_equation[vidx, trii*n_vars + trij] = 1
 A_equation[vidx, trij*n_vars + trii] = 1
 constraint_equation = LinearConstraint(
 A=sp.hstack((
 sp.csc_array((len(equation_triu), n_max_factors)),
 sp.csc_array((len(equation_triu), n_max_factors*2*n_vars)),
 sp.kron(
 A=np.ones(n_max_factors),
 B=A_equation,
 ),
 ), format='csc'),
 lb=equation_triu,
 ub=equation_triu,
 )
 # Each addend will never exceed its corresponding outer factor
 # positive addend < factor -> factor - addend > 0
 # negative addend > -factor -> factor + addend > 0
 constraint_addends_factor = LinearConstraint(
 A=sp.hstack((
 sp.kron(
 A=sp.eye(n_max_factors),
 B=np.ones((n_addends, 1)),
 ),
 sp.csc_array((n_max_factors * n_addends, n_max_factors*2*n_vars)),
 sp.kron(
 A=-sp.eye(n_max_factors),
 B=sp.diags(equation_signs.ravel()),
 )
 ), format='csc'),
 lb=0,
 )
 '''
 If either of a given selector in a pair is zero, the corresponding addend must be zero
 addend == factor * sela * selb * eqsign
 addend*asign <= M*asel*vsign -> M*asel*vsign - addend*asign >= 0
 addend*asign <= M*bsel*vsign -> M*bsel*vsign - addend*asign >= 0
 '''
 M = 10*n_max_factors*np.abs(equation_triu).max() # rough estimate; must exceed all factors
 A_add = sp.vstack((
 sp.diags(-equation_signs.ravel()),
 sp.lil_array((n_addends, n_addends)),
 ), format='lil')
 i, j = np.indices((n_vars, n_vars))
 A_add[
 n_vars*n_vars + n_vars*i + j,
 n_vars*j + i,
 ] = -equation_signs
 constraint_addends_zero = LinearConstraint(
 A=sp.hstack((
 sp.csc_array((n_max_factors*2*n_addends, n_max_factors)),
 sp.kron(
 A=sp.kron(
 A=sp.eye(n_max_factors * 2),
 B=sp.diags(M * equation_sign_vector),
 ),
 B=np.ones((n_vars, 1)),
 ),
 sp.kron(
 A=sp.eye(n_max_factors),
 B=A_add,
 ),
 ), format='csc'),
 lb=0,
 )
 '''
 If both selectors in a pair are non-zero, the corresponding addend must equal the factor
 addend*asign >= factor - M(1 - sl*sgnl) - M(1 - sr*sgnr)
 2M >= factor + Msl*sgnl + Msr*sgnr - asign*addend
 '''
 constraint_addends_nonzero = LinearConstraint(
 A=sp.hstack((
 sp.kron(
 A=sp.eye(n_max_factors),
 B=np.ones((n_addends, 1)),
 ),
 sp.kron(
 A=sp.eye(n_max_factors),
 B=sp.hstack((
 sp.kron(
 A=sp.diags(M * equation_sign_vector),
 B=np.ones((n_vars, 1)),
 ),
 sp.kron(
 A=np.ones((n_vars, 1)),
 B=sp.diags(M * equation_sign_vector),
 )
 )),
 ),
 sp.kron(
 A=sp.eye(n_max_factors),
 B=sp.diags(-equation_signs.ravel()),
 ),
 ), format='csc'),
 ub=2*M,
 )
 constraints = (
 constraint_factorcount,
 constraint_factorparity,
 constraint_equation,
 constraint_addends_factor,
 constraint_addends_zero,
 constraint_addends_nonzero,
 )
 n_rows = sum(
 con.A.shape[0] for con in constraints
 )
 return constraints, n_rows
def solve(
 bounds: Bounds,
 constraints: tuple[LinearConstraint, ...],
 n_vars: int,
 n_addends: int,
) -> np.ndarray:
 result = milp(
 # This is constraint programming; there is no minimization objective
 c=np.zeros(n_max_factors + n_max_factors*2*n_vars + n_max_factors*n_addends),
 integrality=np.concatenate((
 np.ones(n_max_factors), # Coefficients are integral
 np.ones(n_max_factors * 2 * n_vars), # selectors are integral
 np.zeros(n_max_factors * n_addends), # addends are continuous
 )),
 bounds=bounds,
 constraints=constraints,
 )
 if not result.success:
 raise ValueError(result.message)
 print(result.message)
 return result.x
def triu_fold(square: np.ndarray) -> np.ndarray:
 """
 Convert a matrix so that its upper triangular is the sum of the transpose and the lower triangular is zero
 """
 output = square + square.T
 trii, trij = np.tril_indices(n=square.shape[0], k=-1)
 output[trii, trij] = 0
 output[np.diag_indices_from(output)] = np.diag(square)
 return output
def process_results(
 result_x: np.ndarray,
 symbols: list[sympy.Symbol],
 n_vars: int,
) -> tuple[
 np.ndarray, # outer factor coefficients
 np.ndarray, # binary selectors
 np.ndarray, # term addends
 np.ndarray, # term addends, sum
 np.ndarray, # term addends, upper triangular
 sympy.Expr, # final output expression
]:
 coeffs, selectors, addends = np.split(
 result_x.round().astype(int),
 (n_max_factors, n_max_factors + n_max_factors*2*n_vars),
 )
 selectors = selectors.reshape((n_max_factors, 2, n_vars))
 addends = addends.reshape((n_max_factors, n_vars, n_vars))
 addends_total = addends.sum(axis=0)
 addends_triu = triu_fold(addends_total)
 expr = sum(
 coeff
 * factor_selectors[0, :].dot(symbols)
 * factor_selectors[1, :].dot(symbols)
 for coeff, factor_selectors in zip(coeffs, selectors)
 )
 return coeffs, selectors, addends, addends_total, addends_triu, expr
def print_results(
 eqn: str,
 coeffs: np.ndarray,
 selectors: np.ndarray,
 addends: np.ndarray,
 addends_triu: np.ndarray,
 equation_coefficients: np.ndarray,
 expr: sympy.Expr,
 verbose: bool = False,
) -> None:
 if verbose:
 print(f'Selectors {selectors.shape}:')
 print(selectors.reshape((-1, addends.shape[1])))
 print(f'Term addends, output {addends.shape}:')
 print(addends)
 print(f'Term addends, upper triangular {addends_triu.shape}:')
 print(addends_triu)
 print(f'Term addends, input {equation_coefficients.shape}:')
 print(equation_coefficients.round().astype(int))
 print('Outer factor coefficients:', coeffs)
 print('Input expression:', eqn)
 print('Final expression:', expr)
def test(
 equation_coefficients: np.ndarray,
 coeffs: np.ndarray,
 selectors: np.ndarray,
 addends_full: np.ndarray,
 addends_triu: np.ndarray,
) -> None:
 assert np.allclose(equation_coefficients, addends_triu)
 addends_reco = np.zeros_like(addends_triu)
 for i_factor, (coef, (lhs, rhs), addend_slice) in enumerate(zip(coeffs, selectors, addends_full)):
 product = coef*np.outer(lhs, rhs)
 assert np.allclose(addend_slice, product)
 addends_reco += product
 addends_reco = triu_fold(addends_reco)
 assert np.allclose(addends_reco, addends_triu)
def eval_constraints(
 constraints: tuple[LinearConstraint, ...],
 result_x: np.ndarray,
) -> None:
 for constraint in constraints:
 evald = constraint.A @ result_x
 print('Evaluated constraint vector:')
 if evald.size % 10:
 print(evald)
 else:
 print(evald.reshape((-1, 10)))
 print('Lower bound:', constraint.lb)
 print(evald >= constraint.lb)
 print('Upper bound:', constraint.ub)
 print(evald <= constraint.ub)
 print()
def graph_constraints(
 eqn: str,
 constraints: tuple[LinearConstraint, ...],
) -> plt.Figure:
 all_constraints = sp.vstack([
 con.A for con in constraints
 ]).T.toarray() != 0
 fig, ax = plt.subplots()
 fig: plt.Figure
 ax: plt.Axes
 fig.tight_layout()
 ax.imshow(all_constraints, cmap='binary')
 ax.set_title('Sparsity pattern (transpose) to solve\n' + eqn)
 return fig
def run(eqn: str) -> None:
 start = time.perf_counter()
 input_expr, symbols, n_vars, n_y, n_z, n_addends = parse(eqn)
 (
 equation_coefficients, equation_triu, equation_sign_vector, equation_signs, trii, trij,
 ) = iterate_equation(input_expr, symbols, n_vars, n_y, n_z)
 bounds = make_bounds(equation_signs, equation_sign_vector, n_addends)
 constraints, n_rows = make_constraints(
 equation_triu, equation_sign_vector, equation_signs, trii, trij, n_vars, n_addends,
 )
 setup = time.perf_counter()
 result = solve(bounds, constraints, n_vars, n_addends)
 solved = time.perf_counter()
 coeffs, selectors, addends_full, addends_total, addends_triu, expr = process_results(result, symbols, n_vars)
 print(f'Problem size: {n_rows}x{result.size}')
 print(f'Setup time: {(setup - start)*1e3:.1f} ms')
 print(f'Solve time: {(solved - setup)*1e3:.1f} ms')
 print_results(
 eqn, coeffs, selectors, addends_total, addends_triu, equation_coefficients, expr,
 )
 print()
 graph_constraints(eqn, constraints)
 # eval_constraints(constraints, result)
 test(equation_coefficients, coeffs, selectors, addends_full, addends_triu)
def main() -> None:
 # In-question first example
 run(
 '(y_1 - z_1)*(y_3 - z_2) '
 '+ (y_1 + y_2 - z_1 - z_2)*(y_1 - z_3) '
 '- (y_1 - z_2)*(y_1 - z_3)'
 )
 # Example hard-coded in OP script
 run(
 '(y_1-z_5)*(y_5-z_1) '
 '+ (y_3 + y_7 - z_1 - z_2)*(y_2 + y_3 - z_7 - z_8) '
 '- (y_3 - z_1)*(y_2 - z_7)'
 )
 # From chat
 run(
 '2*y_1**2 + 3*y_1*y_3 + 3*y_1*y_4 + 2*y_1*y_5 + 3*y_1*y_6 + 3*y_1*y_7 - 4*y_1*z_1 - 4*y_1*z_2 - 3*y_1*z_3 - 3*y_1*z_4 - 2*y_1*z_5 - 2*y_1*z_6 + y_3**2 + 2*y_3*y_4 + y_3*y_5 + 2*y_3*y_6 + 2*y_3*y_7 - 3*y_3*z_1 - 3*y_3*z_2 - 2*y_3*z_3 - 2*y_3*z_4 - y_3*z_5 - y_3*z_6 + y_4**2 + y_4*y_5 +'
 '2*y_4*y_6 + 2*y_4*y_7 - 3*y_4*z_1 - 3*y_4*z_2 - 2*y_4*z_3 - 2*y_4*z_4 - y_4*z_5 - y_4*z_6 + y_5*y_6 + y_5*y_7 - 2*y_5*z_1 - 2*y_5*z_2 - y_5*z_3 - y_5*z_4 + y_6**2 + 2*y_6*y_7 - 3*y_6*z_1 - 3*y_6*z_2 - 2*y_6*z_3 - 2*y_6*z_4 - y_6*z_5 - y_6*z_6 + y_7**2 - 3*y_7*z_1 - 3*y_7*z_2 - 2*y_7*z_3 - 2*y_7*z_4 - y_7*z_5'
 '- y_7*z_6 + 2*z_1**2 + 4*z_1*z_2 + 3*z_1*z_3 + 3*z_1*z_4 + 2*z_1*z_5 + 2*z_1*z_6 + 2*z_2**2 + 3*z_2*z_3 + 3*z_2*z_4 + 2*z_2*z_5 + 2*z_2*z_6 + z_3**2 + 2*z_3*z_4 + z_3*z_5 + z_3*z_6 + z_4**2 + z_4*z_5 + z_4*z_6'
 '+' # Missing from pasted expression, inferred from context
 '(y_1 - z_1)*(y_7 - z_2) + (y_1 - z_1)*(y_1 - z_3) + (y_1 - z_1)*(y_6 - z_4) + 2*(y_1 - z_1)*(y_4 - z_6) + (y_3 - z_1)*(y_1 - z_2) + (y_3 - z_1)*(y_3 - z_4) + (y_4 - z_1)*(y_7 - z_1) + (y_4 - z_1)*(y_1 - z_2) + (y_4 - z_1)*(y_6 - z_2) + (y_4 - z_1)*(y_3 - z_4) + (y_6 - z_1)*(y_3 - z_3)'
 '+ (y_6 - z_1)*(y_6 - z_3) + (y_6 - z_1)*(y_5 - z_5) + (y_7 - z_1)*(y_5 - z_1) + (y_7 - z_1)*(y_1 - z_5) + (y_1 - z_2)*(y_7 - z_4) + 2*(y_1 - z_2)*(y_5 - z_5) + (y_3 - z_2)*(y_6 - z_6) + (y_4 - z_2)*(y_4 - z_4) + (y_7 - z_2)*(y_3 - z_2) + (y_7 - z_2)*(y_6 - z_2) + (y_7 - z_2)*(y_6 - z_3) + (y_7 - z_2)*(y_3 - z_4) + 2*(y_1 - z_3)*(y_3 - z_2)'
 '+ (y_1 - z_3)*(y_1 - z_4) + (y_4 - z_3)*(y_5 - z_3) + (y_7 - z_3)*(y_7 - z_6) + (y_1 - z_4)*(y_6 - z_3) + (y_4 - z_4)*(y_6 - z_5) + (y_4 - z_4)*(y_7 - z_6) + (y_5 - z_4)*(y_3 - z_4) + (y_4 - z_5)*(y_3 - z_3) + (y_1 - z_6)*(y_6 - z_2)'
 )
 
 # Last example in question; has cubics so cannot be solved in pair-factor form
 '''
 run(
 '(y_3 - z_1)*((y_1 + y_3 + y_4 + y_6 - z_1 - z_3 - z_5 - z_6)*(((y_6 - z_1)*(y_4 - z_1)*(y_1 + y_3 - z_1 - z_2) '
 '+ (y_3 - z_2)*(y_1 - z_2)*(y_4 + y_6 - z_1 - z_2))*((y_4 - z_3)*(y_6 - z_3) + (y_1 - z_4)*(y_3 - z_4) '
 '+ (y_4 + y_6 - z_1 - z_3)*(y_1 + y_3 - z_1 - z_4)) '
 '- (y_6 - z_1)*(y_3 - z_1)*(y_1 - z_1)*(y_4 - z_1)*(y_1 + y_3 + y_4 + y_6 - z_1 - z_2 - z_3 - z_4)) '
 '- (y_6 - z_1)*(y_3 - z_1)*(y_1 - z_1)*(y_4 - z_1)*((y_4 - z_3)*(y_6 - z_3) + (y_1 - z_4)*(y_3 - z_4) '
 '+ (y_1 + y_3 - z_2 - z_4)*(y_4 + y_6 - z_2 - z_3)) - (((y_6 - z_1)*(y_4 - z_1)*(y_1 + y_3 - z_1 - z_3) '
 '+ (y_1 - z_3)*(y_3 - z_3)*(y_4 + y_6 - z_1 - z_3))*((y_6 - z_1)*(y_4 - z_1)*(y_1 + y_3 - z_1 - z_2) '
 '+ (y_3 - z_2)*(y_1 - z_2)*(y_4 + y_6 - z_1 - z_2)) '
 '- (y_6 - z_1)*(y_3 - z_1)*(y_1 - z_1)*(y_4 - z_1)*((y_4 - z_2)*(y_6 - z_2) + (y_1 - z_3)*(y_3 - z_3) '
 '+ (y_4 + y_6 - z_1 - z_2)*(y_1 + y_3 - z_1 - z_3)))) '
 '- (-(y_6 - z_1)*(y_3 - z_1)*(y_1 - z_1)*(y_4 - z_1)*((y_4 - z_2)*(y_6 - z_2)*(y_1 + y_3 - z_2 - z_3) '
 '+ (y_1 - z_3)*(y_3 - z_3)*(y_4 + y_6 - z_2 - z_3)) '
 '+ (y_6 - z_1)*(y_3 - z_1)*(y_1 - z_1)*(y_4 - z_1)*(y_1 + y_3 + y_4 + y_6 - z_2 - z_3 - z_5 - z_6)*((y_4 - z_3)*(y_6 - z_3) '
 '+ (y_1 - z_4)*(y_3 - z_4) + (y_1 + y_3 - z_2 - z_4)*(y_4 + y_6 - z_2 - z_3)))'
 )
 '''
 plt.show()
if __name__ == '__main__':
 main()

Optimization terminated successfully. (HiGHS Status 7: Optimal)
Problem size: 613x196
Setup time: 62.9 ms
Solve time: 33.4 ms
Input expression: (y_1 - z_1)*(y_3 - z_2) + (y_1 + y_2 - z_1 - z_2)*(y_1 - z_3) - (y_1 - z_2)*(y_1 - z_3)
Final expression: (y_1 - z_1)*(y_3 - z_2) + (y_1 - z_3)*(y_2 - z_1)
Optimization terminated successfully. (HiGHS Status 7: Optimal)
Problem size: 1671x484
Setup time: 29.5 ms
Solve time: 21.3 ms
Input expression: (y_1-z_5)*(y_5-z_1) + (y_3 + y_7 - z_1 - z_2)*(y_2 + y_3 - z_7 - z_8) - (y_3 - z_1)*(y_2 - z_7)
Final expression: (y_1 - z_5)*(y_5 - z_1) + (y_2 - z_7)*(y_7 - z_2) + (y_3 - z_1)*(y_3 - z_8) + (y_3 - z_8)*(y_7 - z_2)
Optimization terminated successfully. (HiGHS Status 7: Optimal)
Problem size: 2398x676
Setup time: 142.7 ms
Solve time: 939.3 ms
Input expression: 2*y_1**2 + 3*y_1*y_3 + 3*y_1*y_4 + 2*y_1*y_5 + 3*y_1*y_6 + 3*y_1*y_7 - 4*y_1*z_1 - 4*y_1*z_2 - 3*y_1*z_3 - 3*y_1*z_4 - 2*y_1*z_5 - 2*y_1*z_6 + y_3**2 + 2*y_3*y_4 + y_3*y_5 + 2*y_3*y_6 + 2*y_3*y_7 - 3*y_3*z_1 - 3*y_3*z_2 - 2*y_3*z_3 - 2*y_3*z_4 - y_3*z_5 - y_3*z_6 + y_4**2 + y_4*y_5 +2*y_4*y_6 + 2*y_4*y_7 - 3*y_4*z_1 - 3*y_4*z_2 - 2*y_4*z_3 - 2*y_4*z_4 - y_4*z_5 - y_4*z_6 + y_5*y_6 + y_5*y_7 - 2*y_5*z_1 - 2*y_5*z_2 - y_5*z_3 - y_5*z_4 + y_6**2 + 2*y_6*y_7 - 3*y_6*z_1 - 3*y_6*z_2 - 2*y_6*z_3 - 2*y_6*z_4 - y_6*z_5 - y_6*z_6 + y_7**2 - 3*y_7*z_1 - 3*y_7*z_2 - 2*y_7*z_3 - 2*y_7*z_4 - y_7*z_5- y_7*z_6 + 2*z_1**2 + 4*z_1*z_2 + 3*z_1*z_3 + 3*z_1*z_4 + 2*z_1*z_5 + 2*z_1*z_6 + 2*z_2**2 + 3*z_2*z_3 + 3*z_2*z_4 + 2*z_2*z_5 + 2*z_2*z_6 + z_3**2 + 2*z_3*z_4 + z_3*z_5 + z_3*z_6 + z_4**2 + z_4*z_5 + z_4*z_6+(y_1 - z_1)*(y_7 - z_2) + (y_1 - z_1)*(y_1 - z_3) + (y_1 - z_1)*(y_6 - z_4) + 2*(y_1 - z_1)*(y_4 - z_6) + (y_3 - z_1)*(y_1 - z_2) + (y_3 - z_1)*(y_3 - z_4) + (y_4 - z_1)*(y_7 - z_1) + (y_4 - z_1)*(y_1 - z_2) + (y_4 - z_1)*(y_6 - z_2) + (y_4 - z_1)*(y_3 - z_4) + (y_6 - z_1)*(y_3 - z_3)+ (y_6 - z_1)*(y_6 - z_3) + (y_6 - z_1)*(y_5 - z_5) + (y_7 - z_1)*(y_5 - z_1) + (y_7 - z_1)*(y_1 - z_5) + (y_1 - z_2)*(y_7 - z_4) + 2*(y_1 - z_2)*(y_5 - z_5) + (y_3 - z_2)*(y_6 - z_6) + (y_4 - z_2)*(y_4 - z_4) + (y_7 - z_2)*(y_3 - z_2) + (y_7 - z_2)*(y_6 - z_2) + (y_7 - z_2)*(y_6 - z_3) + (y_7 - z_2)*(y_3 - z_4) + 2*(y_1 - z_3)*(y_3 - z_2)+ (y_1 - z_3)*(y_1 - z_4) + (y_4 - z_3)*(y_5 - z_3) + (y_7 - z_3)*(y_7 - z_6) + (y_1 - z_4)*(y_6 - z_3) + (y_4 - z_4)*(y_6 - z_5) + (y_4 - z_4)*(y_7 - z_6) + (y_5 - z_4)*(y_3 - z_4) + (y_4 - z_5)*(y_3 - z_3) + (y_1 - z_6)*(y_6 - z_2)
Final expression: (2*y_1 + 2*y_7 - 2*z_1 - 2*z_2)*(y_1 + y_3 + y_4 + y_5 + y_6 + y_7 - z_1 - z_2 - z_3 - z_4 - z_5 - z_6) + (y_1 + y_3 + y_4 + y_6 - z_1 - z_2 - z_3 - z_4)*(2*y_1 + 2*y_3 + 2*y_4 + 2*y_5 + 2*y_6 + 2*y_7 - 2*z_1 - 2*z_2 - 2*z_3 - 2*z_4 - 2*z_5 - 2*z_6)

sparsity

Thanks for your answer. I haven't looked at it fully yet, but I'm hoping it generalizes to polynomials of higher degree. I'll take a look.
It will generalize to higher degrees (with code modifications), but it will scale very poorly. With an n-degree polynomial, every top-level term will need up to n factors, requiring n_vars**n selectors.
So the scaling is what is needed. The 5th degree polynomial example is more typical, and we're currently running degree 7 in our test runs.
I have an idea for an arbitrary-degree algorithm but it will take another while to implement. I can describe it in the chat.

Stack Exchange Network

Python integer optimization with PuLP involving polynomials that I think can be faster

1 Answer 1

Example code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python integer optimization with PuLP involving polynomials that I think can be faster

1 Answer 1

Example code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions