模块手册#
Module contents#
Kaiwu PyTorch plugin public API.
- class kaiwu.torch_plugin.BoltzmannMachine(num_nodes: int, quadratic_coef: FloatTensor | None = None, linear_bias: FloatTensor | None = None, device=None)[源代码]#
基类:
AbstractBoltzmannMachineBoltzmann Machine.
- Args:
num_nodes (int): Total number of nodes in the model.
quadratic_coef (torch.FloatTensor, optional): quadratic coefficent,
shape is [num_nodes, num_nodes]
linear_bias (torch.FloatTensor, optional): linear bias, shape is [num_nodes]
device (torch.device, optional): Device for tensor construction. If
None, uses CPU.- clip_parameters(h_range, j_range) None[源代码]#
Clip linear and quadratic bias weights in-place.
- Args:
h_range (tuple[float, float]): Range for quadratic weights. for example, [-1, 1] j_range (tuple[float, float]): Range for linear weights. for example, [-1, 1]
- condition_sample(sampler, s_visible, dtype=torch.float32) Tensor[源代码]#
Sample from the Boltzmann Machine given some nodes.
- Args:
sampler (kaiwu.core.Optimizer): Optimizer used for sampling from the model. s_visible: State of the visible layer.
- Returns:
- torch.Tensor: Spins sampled from the model
(shape determined by
samplerandsample_params).
- forward(s_all: Tensor) Tensor[源代码]#
Compute the Hamiltonian.
- Args:
- s_all (torch.tensor): Tensor of shape (B, N), where B is batch size,
N is the number of variables in the model.
- Returns:
torch.tensor: Hamiltonian of shape (B,).
- gibbs_sample(num_steps: int = 100, s_visible: Tensor | None = None, num_sample=None) Tensor[源代码]#
Sample from the Boltzmann Machine.
- Args:
num_steps (int): Number of Gibbs sampling steps.
- s_visible (torch.Tensor, optional): State of the visible layer,
shape (B, num_visible). If
None, randomly initialize visible layer.- num_sample (int, optional): Number of samples.
If
None, uses batch size of s_visible.
Get the hidden bias.
- Args:
num_hidden (int): Number of hidden nodes.
- class kaiwu.torch_plugin.QDiffusion(proposal_model: Module, energy_model: Module, token_spec: SequenceTokenSpec, energy_adapter: EnergyBackboneAdapter, config: QDiffusionConfig | None = None, dtype: dtype = torch.float32, device: device | str | None = None, freeze_proposal: bool = True, energy_head: Module | None = None)[源代码]#
基类:
ModuleEnergy-guided discrete diffusion wrapper over generic sequence backbones.
The class combines two backbone roles:
a proposal model that predicts token logits for the current noisy state
an energy model that reranks candidate reconstructions
It exposes both training-oriented APIs such as
objective()and decoding-oriented APIs such asinitialize_state(),step(), andgenerate().- energy(noisy_tokens: Tensor, candidate_tokens: Tensor, attention_mask: Tensor | None = None) Tensor[源代码]#
Scores candidate reconstructions conditioned on the noisy state.
- Args:
noisy_tokens: Noisy token tensor used as conditioning input. candidate_tokens: Candidate clean token tensor to score. attention_mask: Optional attention mask for the energy model.
- Returns:
torch.Tensor: A tensor of scalar energies with shape
[batch, 1].
- forward(noisy_tokens: Tensor, **kwargs: Any) Tensor[源代码]#
Runs the proposal model on the current noisy state.
- Args:
noisy_tokens: Current noisy token tensor. **kwargs: Additional keyword arguments forwarded to the proposal
model.
- Returns:
torch.Tensor: Proposal logits over the token vocabulary.
- Raises:
TypeError: If the proposal model does not implement
forward.
- generate(input_tokens: Tensor, *, max_steps: int = 500, partial_masks: Tensor | None = None, temperature: float = 1.0, return_state: bool = False) Tensor | dict[str, Any][源代码]#
Runs a complete iterative decoding loop inside the core class.
- Args:
input_tokens: Initial token tensor. max_steps: Number of decode iterations to run. partial_masks: Optional boolean mask of fixed positions. temperature: Sampling temperature stored in the decode state. return_state: Whether to return the full final state dictionary.
- Returns:
torch.Tensor | dict[str, Any]: Either the final token tensor or the full decode state.
- get_non_special_symbol_mask(output_tokens: Tensor, partial_masks: Tensor | None = None) Tensor[源代码]#
Returns a boolean mask of editable non-special-token positions.
- Args:
output_tokens: Token tensor to inspect. partial_masks: Optional boolean mask of positions that should remain
fixed.
- Returns:
torch.Tensor: A boolean mask where
Truemarks editable non-special positions.
- initialize_state(input_tokens: Tensor, partial_masks: Tensor | None = None, max_steps: int = 500, temperature: float = 1.0) dict[str, Any][源代码]#
Creates the initial decoding state for an external generation loop.
- Args:
input_tokens: Initial token tensor. partial_masks: Optional boolean mask of fixed positions. max_steps: Planned number of decode iterations. temperature: Sampling temperature stored in the state payload.
- Returns:
dict[str, Any]: A mutable state dictionary suitable for repeated
step()calls.
- objective(batch: dict[str, Tensor], weighting: str = 'constant') dict[str, Tensor][源代码]#
Builds the one-step training objective used by an external loop.
- Args:
batch: Batch dictionary containing at least
batch["targets"]. weighting: Per-sample timestep weighting mode.- Returns:
dict[str, torch.Tensor]: A dictionary containing proposal logits, supervision masks, loss weights, and the EBM objective term.
- proposal(noisy_tokens: Tensor, **kwargs: Any) Tensor[源代码]#
Semantic alias around
forward()for proposal-side calls.- Args:
noisy_tokens: Current noisy token tensor. **kwargs: Additional keyword arguments forwarded to the proposal
model.
- Returns:
torch.Tensor: Proposal logits over the token vocabulary.
- step(state: dict[str, Any], partial_masks: Tensor | None = None) dict[str, Any][源代码]#
Runs one denoising/reranking step and returns updated state.
- Args:
state: Current decode state created by
initialize_state(). partial_masks: Optional boolean mask of fixed positions.- Returns:
dict[str, Any]: The updated decode state after one iteration.
- to(*args: Any, **kwargs: Any) QDiffusion[源代码]#
Moves the module and refreshes cached device/dtype metadata.
- class kaiwu.torch_plugin.QDiffusionConfig(num_diffusion_timesteps: int = 500, use_coupled_sampling: bool = False, num_candidates: int = 1, proposal_temperature: float = 0.0, proposal_noise_scale: float = 1.0, energy_temperature: float = 1.0, disable_resample: bool = False, resample_ratio: float = 0.25, resample_top_p: float = 0.95, decoding_strategy: str = 'reparam-uncond-deterministic-linear')[源代码]#
基类:
objectConfiguration for energy-guided discrete generation.
- Attributes:
- num_diffusion_timesteps: Number of discrete noising steps used by the
training objective.
use_coupled_sampling: Whether to use the coupled corruption variant. num_candidates: Number of proposal candidates sampled at each decode step. proposal_temperature: Temperature used for proposal-side sampling. proposal_noise_scale: Gumbel noise scale used during proposal sampling. energy_temperature: Temperature used when converting energies into
reranking weights.
disable_resample: Whether to disable repetition-collapse resampling. resample_ratio: Frequency threshold that triggers resampling. resample_top_p: Top-p cutoff used during resampling. decoding_strategy: Skeptical-remasking strategy string.
- decoding_strategy: str = 'reparam-uncond-deterministic-linear'#
- disable_resample: bool = False#
- energy_temperature: float = 1.0#
- num_candidates: int = 1#
- num_diffusion_timesteps: int = 500#
- proposal_noise_scale: float = 1.0#
- proposal_temperature: float = 0.0#
- resample_ratio: float = 0.25#
- resample_top_p: float = 0.95#
- use_coupled_sampling: bool = False#
- class kaiwu.torch_plugin.QVAE(encoder, decoder, bm: AbstractBoltzmannMachine, sampler, dist_beta, mean_x: float, num_vis: int)[源代码]#
基类:
ModuleQuantum Variational Autoencoder (QVAE) Model
- Args:
encoder: Encoder module
decoder: Decoder module
bm (AbstractBoltzmannMachine): Boltzmann machine
sampler: Sampler
dist_beta: Beta parameter for the distribution
mean_x (torch.Tensor): Bias of training data
num_vis (int): Number of visible variables in the Boltzmann machine
- forward(x)[源代码]#
Forward propagation
- Args:
x (torch.Tensor): Input data
- Returns:
- tuple: (recon_x, posterior, q, zeta)
recon_x: Reconstructed data posterior: Posterior distribution object q: Encoder output zeta: Posterior sample
- neg_elbo(x, kl_beta)[源代码]#
Compute negative ELBO loss
- Args:
x (torch.Tensor): Input data
kl_beta (float): Weight coefficient for KL term
- Returns:
- tuple: (output, recon_x, neg_elbo, wd_loss, total_kl, cost, q, zeta)
output: Reconstructed output (sigmoid activated) recon_x: Reconstructed data neg_elbo: Negative ELBO loss wd_loss: Weight decay loss total_kl: KL divergence cost: Reconstruction loss q: Encoder output zeta: Posterior sample
- class kaiwu.torch_plugin.RestrictedBoltzmannMachine(num_visible: int, num_hidden: int, quadratic_coef: FloatTensor | None = None, linear_bias: FloatTensor | None = None, device=None)[源代码]#
基类:
AbstractBoltzmannMachineCreate a Restricted Boltzmann Machine.
- Args:
num_visible (int): Number of visible nodes in the model.
num_hidden (int): Number of hidden nodes in the model.
- quadratic_coef (torch.FloatTensor, optional): quadratic coefficent,
shape is [num_visible, num_hidden]
linear_bias (torch.FloatTensor, optional): linear bias, shape is [num_hidden]
device (torch.device, optional): Device to construct tensors.
- clip_parameters(h_range, j_range) None[源代码]#
Clip linear and quadratic bias weights in-place.
- Args:
h_range (tuple[float, float]): Range for quadratic weights. for example, [-1, 1] j_range (tuple[float, float]): Range for linear weights. for example, [-1, 1]
- forward(s_all: Tensor) Tensor[源代码]#
Compute the Hamiltonian.
- Args:
- s_all (torch.tensor): Tensor of shape (B, N), where B is the batch size,
and N is the number of variables in the model.
- Returns:
torch.tensor: Hamiltonian of shape (B,).
Propagate visible spins to the hidden layer.
- Args:
s_visible: Visible layer tensor. requires_grad: Whether to allow gradient backpropagation.
- get_visible(s_hidden: Tensor, bernoulli: bool = False) Tensor[源代码]#
Propagate hidden spins to the visible layer.
Return the hidden bias.
- property visible_bias: Tensor#
Return the visible bias.
- class kaiwu.torch_plugin.UnsupervisedDBN(hidden_layers_structure=None)[源代码]#
基类:
ModuleA general unsupervised Deep Belief Network (DBN) architecture.
This model is a stack of Restricted Boltzmann Machines (RBMs).
- Args:
- hidden_layers_structure (list, optional): A list of integers
representing the number of hidden units in each layer. Defaults to [100, 100].
- create_rbm_layer(input_dim)[源代码]#
Creates the layers of RBMs for the DBN.
- Args:
input_dim (int): The dimension of the input data (number of visible units).
- Returns:
UnsupervisedDBN: The instance itself with the RBM layers created.
- forward(data_in)[源代码]#
Performs a forward pass to transform the input data.
- Args:
data_in (numpy.ndarray): The input data.
- Returns:
numpy.ndarray: The transformed data after passing through all RBM layers.
- Raises:
ValueError: If the model has not been built or trained yet.
- get_rbm_layer(index)[源代码]#
Gets the RBM layer at the specified index.
- Args:
index (int): The index of the RBM layer.
- Returns:
RestrictedBoltzmannMachine or None: The RBM layer if found, otherwise None.
- property num_layers#
Returns the number of RBM layers.
- Returns:
int: The number of layers.
- property output_dim#
Returns the output dimension of the DBN.
- Returns:
int: The dimension of the final hidden layer.
- reconstruct(data_in, layer_index=0)[源代码]#
Reconstructs the input from a specified RBM layer.
- Args:
data_in (numpy.ndarray): The input data to be reconstructed.
- layer_index (int, optional): The index of the RBM layer to use for reconstruction.
Defaults to 0.
- Returns:
numpy.ndarray: The reconstructed data.
- Raises:
ValueError: If the model has no RBM layers or the layer index is out of range.
- static reconstruct_with_rbm(rbm, data_in, device=None)[源代码]#
Reconstructs data using a single RBM.
- Args:
rbm (RestrictedBoltzmannMachine): The trained RBM model.
data_in (numpy.ndarray): The input data.
- device (torch.device, optional): The device to perform computation on.
If None, uses the RBM's device. Defaults to None.
- Returns:
- tuple[numpy.ndarray, numpy.ndarray]: A tuple containing:
The reconstructed visible layer data.
The reconstruction error for each sample.