模块手册#

Module contents#

Kaiwu PyTorch plugin public API.

class kaiwu.torch_plugin.BoltzmannMachine(num_nodes: int, quadratic_coef: FloatTensor | None = None, linear_bias: FloatTensor | None = None, device=None)[源代码]#

基类:AbstractBoltzmannMachine

Boltzmann Machine.

Args:

num_nodes (int): Total number of nodes in the model.

quadratic_coef (torch.FloatTensor, optional): quadratic coefficent,

shape is [num_nodes, num_nodes]

linear_bias (torch.FloatTensor, optional): linear bias, shape is [num_nodes]

device (torch.device, optional): Device for tensor construction. If None, uses CPU.

clip_parameters(h_range, j_range) None[源代码]#

Clip linear and quadratic bias weights in-place.

Args:

h_range (tuple[float, float]): Range for quadratic weights. for example, [-1, 1] j_range (tuple[float, float]): Range for linear weights. for example, [-1, 1]

condition_sample(sampler, s_visible, dtype=torch.float32) Tensor[源代码]#

Sample from the Boltzmann Machine given some nodes.

Args:

sampler (kaiwu.core.Optimizer): Optimizer used for sampling from the model. s_visible: State of the visible layer.

Returns:
torch.Tensor: Spins sampled from the model

(shape determined by sampler and sample_params).

forward(s_all: Tensor) Tensor[源代码]#

Compute the Hamiltonian.

Args:
s_all (torch.tensor): Tensor of shape (B, N), where B is batch size,

N is the number of variables in the model.

Returns:

torch.tensor: Hamiltonian of shape (B,).

gibbs_sample(num_steps: int = 100, s_visible: Tensor | None = None, num_sample=None) Tensor[源代码]#

Sample from the Boltzmann Machine.

Args:

num_steps (int): Number of Gibbs sampling steps.

s_visible (torch.Tensor, optional): State of the visible layer,

shape (B, num_visible). If None, randomly initialize visible layer.

num_sample (int, optional): Number of samples.

If None, uses batch size of s_visible.

hidden_bias(num_hidden: int) Tensor[源代码]#

Get the hidden bias.

Args:

num_hidden (int): Number of hidden nodes.

symmetrized_quadratic_coef()[源代码]#

Quadratic coefficient

visible_bias(num_visible) Tensor[源代码]#

Get the visible bias.

Args:

num_visible (int): Number of visible nodes.

class kaiwu.torch_plugin.QDiffusion(proposal_model: Module, energy_model: Module, token_spec: SequenceTokenSpec, energy_adapter: EnergyBackboneAdapter, config: QDiffusionConfig | None = None, dtype: dtype = torch.float32, device: device | str | None = None, freeze_proposal: bool = True, energy_head: Module | None = None)[源代码]#

基类:Module

Energy-guided discrete diffusion wrapper over generic sequence backbones.

The class combines two backbone roles:

  • a proposal model that predicts token logits for the current noisy state

  • an energy model that reranks candidate reconstructions

It exposes both training-oriented APIs such as objective() and decoding-oriented APIs such as initialize_state(), step(), and generate().

energy(noisy_tokens: Tensor, candidate_tokens: Tensor, attention_mask: Tensor | None = None) Tensor[源代码]#

Scores candidate reconstructions conditioned on the noisy state.

Args:

noisy_tokens: Noisy token tensor used as conditioning input. candidate_tokens: Candidate clean token tensor to score. attention_mask: Optional attention mask for the energy model.

Returns:

torch.Tensor: A tensor of scalar energies with shape [batch, 1].

forward(noisy_tokens: Tensor, **kwargs: Any) Tensor[源代码]#

Runs the proposal model on the current noisy state.

Args:

noisy_tokens: Current noisy token tensor. **kwargs: Additional keyword arguments forwarded to the proposal

model.

Returns:

torch.Tensor: Proposal logits over the token vocabulary.

Raises:

TypeError: If the proposal model does not implement forward.

generate(input_tokens: Tensor, *, max_steps: int = 500, partial_masks: Tensor | None = None, temperature: float = 1.0, return_state: bool = False) Tensor | dict[str, Any][源代码]#

Runs a complete iterative decoding loop inside the core class.

Args:

input_tokens: Initial token tensor. max_steps: Number of decode iterations to run. partial_masks: Optional boolean mask of fixed positions. temperature: Sampling temperature stored in the decode state. return_state: Whether to return the full final state dictionary.

Returns:

torch.Tensor | dict[str, Any]: Either the final token tensor or the full decode state.

get_non_special_symbol_mask(output_tokens: Tensor, partial_masks: Tensor | None = None) Tensor[源代码]#

Returns a boolean mask of editable non-special-token positions.

Args:

output_tokens: Token tensor to inspect. partial_masks: Optional boolean mask of positions that should remain

fixed.

Returns:

torch.Tensor: A boolean mask where True marks editable non-special positions.

initialize_state(input_tokens: Tensor, partial_masks: Tensor | None = None, max_steps: int = 500, temperature: float = 1.0) dict[str, Any][源代码]#

Creates the initial decoding state for an external generation loop.

Args:

input_tokens: Initial token tensor. partial_masks: Optional boolean mask of fixed positions. max_steps: Planned number of decode iterations. temperature: Sampling temperature stored in the state payload.

Returns:

dict[str, Any]: A mutable state dictionary suitable for repeated step() calls.

objective(batch: dict[str, Tensor], weighting: str = 'constant') dict[str, Tensor][源代码]#

Builds the one-step training objective used by an external loop.

Args:

batch: Batch dictionary containing at least batch["targets"]. weighting: Per-sample timestep weighting mode.

Returns:

dict[str, torch.Tensor]: A dictionary containing proposal logits, supervision masks, loss weights, and the EBM objective term.

proposal(noisy_tokens: Tensor, **kwargs: Any) Tensor[源代码]#

Semantic alias around forward() for proposal-side calls.

Args:

noisy_tokens: Current noisy token tensor. **kwargs: Additional keyword arguments forwarded to the proposal

model.

Returns:

torch.Tensor: Proposal logits over the token vocabulary.

step(state: dict[str, Any], partial_masks: Tensor | None = None) dict[str, Any][源代码]#

Runs one denoising/reranking step and returns updated state.

Args:

state: Current decode state created by initialize_state(). partial_masks: Optional boolean mask of fixed positions.

Returns:

dict[str, Any]: The updated decode state after one iteration.

to(*args: Any, **kwargs: Any) QDiffusion[源代码]#

Moves the module and refreshes cached device/dtype metadata.

Args:

*args: Positional arguments forwarded to nn.Module.to. **kwargs: Keyword arguments forwarded to nn.Module.to.

Returns:

QDiffusion: The moved module instance.

class kaiwu.torch_plugin.QDiffusionConfig(num_diffusion_timesteps: int = 500, use_coupled_sampling: bool = False, num_candidates: int = 1, proposal_temperature: float = 0.0, proposal_noise_scale: float = 1.0, energy_temperature: float = 1.0, disable_resample: bool = False, resample_ratio: float = 0.25, resample_top_p: float = 0.95, decoding_strategy: str = 'reparam-uncond-deterministic-linear')[源代码]#

基类:object

Configuration for energy-guided discrete generation.

Attributes:
num_diffusion_timesteps: Number of discrete noising steps used by the

training objective.

use_coupled_sampling: Whether to use the coupled corruption variant. num_candidates: Number of proposal candidates sampled at each decode step. proposal_temperature: Temperature used for proposal-side sampling. proposal_noise_scale: Gumbel noise scale used during proposal sampling. energy_temperature: Temperature used when converting energies into

reranking weights.

disable_resample: Whether to disable repetition-collapse resampling. resample_ratio: Frequency threshold that triggers resampling. resample_top_p: Top-p cutoff used during resampling. decoding_strategy: Skeptical-remasking strategy string.

decoding_strategy: str = 'reparam-uncond-deterministic-linear'#
disable_resample: bool = False#
energy_temperature: float = 1.0#
num_candidates: int = 1#
num_diffusion_timesteps: int = 500#
proposal_noise_scale: float = 1.0#
proposal_temperature: float = 0.0#
resample_ratio: float = 0.25#
resample_top_p: float = 0.95#
use_coupled_sampling: bool = False#
class kaiwu.torch_plugin.QVAE(encoder, decoder, bm: AbstractBoltzmannMachine, sampler, dist_beta, mean_x: float, num_vis: int)[源代码]#

基类:Module

Quantum Variational Autoencoder (QVAE) Model

Args:

encoder: Encoder module

decoder: Decoder module

bm (AbstractBoltzmannMachine): Boltzmann machine

sampler: Sampler

dist_beta: Beta parameter for the distribution

mean_x (torch.Tensor): Bias of training data

num_vis (int): Number of visible variables in the Boltzmann machine

forward(x)[源代码]#

Forward propagation

Args:

x (torch.Tensor): Input data

Returns:
tuple: (recon_x, posterior, q, zeta)

recon_x: Reconstructed data posterior: Posterior distribution object q: Encoder output zeta: Posterior sample

neg_elbo(x, kl_beta)[源代码]#

Compute negative ELBO loss

Args:

x (torch.Tensor): Input data

kl_beta (float): Weight coefficient for KL term

Returns:
tuple: (output, recon_x, neg_elbo, wd_loss, total_kl, cost, q, zeta)

output: Reconstructed output (sigmoid activated) recon_x: Reconstructed data neg_elbo: Negative ELBO loss wd_loss: Weight decay loss total_kl: KL divergence cost: Reconstruction loss q: Encoder output zeta: Posterior sample

posterior(q_logits, beta)[源代码]#

Compute posterior distribution and its reparameterized sample

Args:

q_logits (torch.Tensor): Encoder output, log-odds

beta: Beta parameter for the distribution

Returns:

tuple: (Posterior distribution object, Sampled result zeta)

class kaiwu.torch_plugin.RestrictedBoltzmannMachine(num_visible: int, num_hidden: int, quadratic_coef: FloatTensor | None = None, linear_bias: FloatTensor | None = None, device=None)[源代码]#

基类:AbstractBoltzmannMachine

Create a Restricted Boltzmann Machine.

Args:

num_visible (int): Number of visible nodes in the model.

num_hidden (int): Number of hidden nodes in the model.

quadratic_coef (torch.FloatTensor, optional): quadratic coefficent,

shape is [num_visible, num_hidden]

linear_bias (torch.FloatTensor, optional): linear bias, shape is [num_hidden]

device (torch.device, optional): Device to construct tensors.

clip_parameters(h_range, j_range) None[源代码]#

Clip linear and quadratic bias weights in-place.

Args:

h_range (tuple[float, float]): Range for quadratic weights. for example, [-1, 1] j_range (tuple[float, float]): Range for linear weights. for example, [-1, 1]

forward(s_all: Tensor) Tensor[源代码]#

Compute the Hamiltonian.

Args:
s_all (torch.tensor): Tensor of shape (B, N), where B is the batch size,

and N is the number of variables in the model.

Returns:

torch.tensor: Hamiltonian of shape (B,).

get_hidden(s_visible: Tensor, requires_grad: bool = False, bernoulli: bool = False) Tensor[源代码]#

Propagate visible spins to the hidden layer.

Args:

s_visible: Visible layer tensor. requires_grad: Whether to allow gradient backpropagation.

get_visible(s_hidden: Tensor, bernoulli: bool = False) Tensor[源代码]#

Propagate hidden spins to the visible layer.

property hidden_bias: Tensor#

Return the hidden bias.

property visible_bias: Tensor#

Return the visible bias.

class kaiwu.torch_plugin.UnsupervisedDBN(hidden_layers_structure=None)[源代码]#

基类:Module

A general unsupervised Deep Belief Network (DBN) architecture.

This model is a stack of Restricted Boltzmann Machines (RBMs).

Args:
hidden_layers_structure (list, optional): A list of integers

representing the number of hidden units in each layer. Defaults to [100, 100].

create_rbm_layer(input_dim)[源代码]#

Creates the layers of RBMs for the DBN.

Args:

input_dim (int): The dimension of the input data (number of visible units).

Returns:

UnsupervisedDBN: The instance itself with the RBM layers created.

forward(data_in)[源代码]#

Performs a forward pass to transform the input data.

Args:

data_in (numpy.ndarray): The input data.

Returns:

numpy.ndarray: The transformed data after passing through all RBM layers.

Raises:

ValueError: If the model has not been built or trained yet.

get_rbm_layer(index)[源代码]#

Gets the RBM layer at the specified index.

Args:

index (int): The index of the RBM layer.

Returns:

RestrictedBoltzmannMachine or None: The RBM layer if found, otherwise None.

mark_as_trained()[源代码]#

Marks the model as trained.

Returns:

UnsupervisedDBN: The instance itself.

property num_layers#

Returns the number of RBM layers.

Returns:

int: The number of layers.

property output_dim#

Returns the output dimension of the DBN.

Returns:

int: The dimension of the final hidden layer.

reconstruct(data_in, layer_index=0)[源代码]#

Reconstructs the input from a specified RBM layer.

Args:

data_in (numpy.ndarray): The input data to be reconstructed.

layer_index (int, optional): The index of the RBM layer to use for reconstruction.

Defaults to 0.

Returns:

numpy.ndarray: The reconstructed data.

Raises:

ValueError: If the model has no RBM layers or the layer index is out of range.

static reconstruct_with_rbm(rbm, data_in, device=None)[源代码]#

Reconstructs data using a single RBM.

Args:

rbm (RestrictedBoltzmannMachine): The trained RBM model.

data_in (numpy.ndarray): The input data.

device (torch.device, optional): The device to perform computation on.

If None, uses the RBM's device. Defaults to None.

Returns:
tuple[numpy.ndarray, numpy.ndarray]: A tuple containing:
  • The reconstructed visible layer data.

  • The reconstruction error for each sample.

transform(data_in)[源代码]#

An sklearn-compatible transform method.

Args:

data_in (numpy.ndarray): The input data.

Returns:

numpy.ndarray: The transformed data.