-
Notifications
You must be signed in to change notification settings - Fork 90
Hybrid quantum-classical network written in PyTorch does not get the correct gradients using automatic-differentiation #230
-
Hi all,
I'm trying to implement a hybrid quantum-classical network and train it using GPU.
It is written in pytorch and TensorCircuit, and I use automatic-differentiation (AD) to calculate the cost function.
I test it using a simple toy model, where I try to learn a sine function between [0, pi] according to the loss function: du_dt - cosine(t)
and initial-condition loss: u[0] = 0
. (Solving the Ordinary Differential Equation (ODE))
The QLayer:
tc.set_backend("pytorch")
class QLayer(nn.Module):
def __init__(self, n_qubits=5, n_layers=2):
super(QLayer, self).__init__()
self.n_qubits = n_qubits
self.n_layers = n_layers
self.weights = nn.Parameter(torch.rand(n_layers, n_qubits, 3) * 2 * torch.pi)
self.K = tc.set_backend("tensorflow")
qpreds_vmap = self.K.vmap(self.qcirc, vectorized_argnums=0)
self.qpreds_batch = tc.interfaces.torch_interface(qpreds_vmap, jit=True)
def qcirc(self, inputs, weights): # Angle Embedding followed Entangling Layers and measurement of PauliZ for each qubit
c = tc.Circuit(self.n_qubits)
for i in range(self.n_qubits):
c.rx(i, theta=inputs[i])
for j in range(self.n_layers):
for i in range(self.n_qubits):
c.r(i, theta=weights[j, i, 0], alpha=weights[j, i, 1], phi=weights[j, i, 2])
for i in range(self.n_qubits):
c.cnot(i, (i + 1) % self.n_qubits)
return self.K.stack([self.K.real(c.expectation_ps(z=[i])) for i in range(self.n_qubits)])
def forward(self, inputs):
outputs = self.qpreds_batch(inputs, self.weights)
return outputs
The network:
class Network(nn.Module):
def __init__(self, HL_dim=64, in_dim=1, out_dim=1):
super(Network, self).__init__()
self.model = nn.Sequential(
nn.Linear(in_dim, n_qubits), nn.Tanh(),
QLayer(n_qubits=n_qubits, n_layers=n_layers), # Quantum layer
nn.Linear(n_qubits, out_dim))
def forward(self, t):
return self.model(t)
def compute_loss(self, t):
u = self.model(t)
u_t = grad(u, t, grad_outputs=torch.ones_like(u), create_graph=True)[0]
loss_fun = nn.MSELoss() # Default reduction is 'mean'
residual_1 = u_t - torch.cos(t)
eqn_loss_1 = loss_fun(residual_1, torch.zeros_like(residual_1))
device = u.device
## IC loss ##
IC_loss = loss_fun(u[0], torch.tensor(0.0, device=device))
return eqn_loss_1, IC_loss
The setting and optimization loop:
model = PINNs_net().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
scheduler = ExponentialLR(optimizer, gamma=0.9999)
Nt = 32
# domain dimensions
t_i, t_f = 0.0, torch.pi
t = torch.linspace(t_i, t_f, Nt).requires_grad_(True).to(device)
for epoch in range(epochs + 1):
# Compute various losses
eqn_loss_1, IC_loss = model.compute_loss(t.view(-1, 1))
# Compute the total loss
total_loss = (eqn_loss_1 + IC_loss) # Strongly respect boundary, initial conditions and symmetry
# Backward pass
total_loss.backward()
optimizer.step()
optimizer.zero_grad() # Reset grads
scheduler.step()
The model does not converge to the correct answer, even though the loss gets smaller and smaller...
I think that this is due to incorrect calculation of the gradients. how can I solve it? I need the GPU capabilities for scaling to larger networks and circuits.
The expected output (the one received if I change the QLayer to nn.Linear(n_qubits, n_qubits)
):
image
The output I receive:
image
(in both cases everything remains the same but the layer change)
Thanks in advance!
Beta Was this translation helpful? Give feedback.