SASNet: Spatially-Adaptive Sinusoidal Neural Networks

1University of Maryland, College Park    2Institute for Pure and Applied Mathematics (IMPA)
CVPR 2026
Image fitting Volumetric data SDF reconstruction Stable training Frequency localization
SASNet teaser: comparison against SIREN under different ω₀ on super-resolution, volumetric fitting, and SDF reconstruction.

SIREN's reconstruction quality is sensitive to its input frequency parameter $\omega_0$: small values yield blurry results, while large values introduce ringing artifacts and high-frequency noise in regions that should remain smooth. We refer to this undesired activation of high-frequency components in low-frequency regions as frequency leakage, the central failure mode SASNet addresses. By learning where each frequency band should be active, SASNet produces sharper edges while keeping smooth regions clean on 2D super-resolution (top), volumetric fitting (middle), and SDF reconstruction (bottom).

TL;DR

A spatially-adaptive masking strategy for SIRENs

SASNet pairs a frozen frequency embedding layer with a lightweight hash-grid MLP that produces spatially-adaptive masks. The masks select, per location and per frequency band, which sinusoidal neurons express output. This suppresses high-frequency leakage in smooth regions while preserving fine detail around edges and surfaces.

01 / Problem

SIREN is sensitive to $\omega_0$

A single global frequency parameter controls the entire signal. Low values produce blurry edges, while high values introduce ringing in flat regions. No per-region control is available.

02 / Idea

Localize frequency in space

Freeze the frequency embedding to fix the spectral support, and jointly learn where each frequency band is active through spatially-adaptive masks, rather than adjusting which frequencies are available.

03 / Result

Improved accuracy and convergence

Highest PSNR and SSIM on DIV2K and Kodak image fitting and ScalarFlow volumetric fitting, and lowest Chamfer distance with highest IoU on SDF reconstruction, with faster convergence and only modest overhead from the mask generation network.

Method

SASNet architecture

SASNet overview diagram: frequency embedding layer combined with hash-grid-driven spatial masks modulating sinusoidal neurons.

A frozen frequency embedding layer fixes the spectral support of the network, following Novello et al. (2024). In parallel, a multi-scale hash-grid MLP predicts per-layer spatial masks $\mathcal{M}^i(\mathbf{x}) \in [0,1]^{n_i}$, which modulate the sinusoidal neuron activations through an element-wise (Hadamard) product. The mask generation network and the SIREN backbone are trained jointly.

① Frequency embedding layer

Each sinusoidal neuron in the $i$-th layer of a SIREN can be written as

$h_j^{i}(\mathbf{x}) = \sin\!\Bigg(\sum_{k=1}^{n} W_{jk}^{i}\,\underbrace{\sin(\mathbf{y}_k^{i-1})}_{h_k^{i-1}(\mathbf{x})} + b_j^{i}\Bigg)$,

where $\mathbf{y}^{i-1}$ denotes $\mathbf{h}^{i-1}(\mathbf{x})$ before activation and $W^{i}_{jk}$ controls how strongly neuron $h_k^{i-1}$ contributes to the next layer. Novello et al. (2024) show that this neuron admits the expansion

$h_j^{i}(\mathbf{x}) \;=\; \sum_{\mathbf{k}\in\mathbb{Z}^{n_i}} \alpha_{\mathbf{k}} \sin\!\Big(\langle \mathbf{k},\mathbf{y}\rangle + b_j^{i}\Big), \qquad \alpha_{\mathbf{k}} = \prod_{l} J_{k_l}(W_{jl}^{i})$,

where $J_{k_l}$ denotes the Bessel function of the first kind. The amplitudes $\alpha_{\mathbf{k}}$ depend only on the network weights, and the generated frequencies are integer linear combinations $\langle \mathbf{k}, \omega \rangle$ of the input frequencies $\omega$. Consequently, the input frequencies fully determine the spectrum of the entire network.

SASNet exploits this property by initializing $\omega$ to cover a broad spectral range, sampling most components from a low-frequency band $[-\mathcal{L}, \mathcal{L}]^{n_0}$ and the remainder from a higher band, and then freezing $\omega$ during training. Only the amplitudes $\alpha_{\mathbf{k}}$ are learned. This keeps the spectrum stable throughout optimization, in contrast to standard SIRENs whose effective frequencies drift, and accelerates convergence.

② Spatially-adaptive masks

The amplitudes $\alpha_{\mathbf{k}}$ above are independent of the spatial coordinate $\mathbf{x}$, so each high-frequency component contributes globally across the domain, including in smooth regions. To localize these contributions, SASNet multiplies each neuron by a learned mask $\mathcal{M}_j^{i}(\mathbf{x}) \in [0,1]$, yielding the spatially modulated neuron

$\widetilde{h_j^{i}}(\mathbf{x}) \;=\; \mathcal{M}_j^{i}(\mathbf{x})\, h_j^{i}(\mathbf{x}) \;=\; \sum_{\mathbf{k}\in\mathbb{Z}^{n_i}} \alpha_{\mathbf{k}}\, \mathcal{M}_j^{i}(\mathbf{x})\, \sin\!\Big(\langle \mathbf{k},\mathbf{y}\rangle + b_j^{i}\Big)$.

The product $\alpha_{\mathbf{k}}\,\mathcal{M}_j^{i}(\mathbf{x})$ now plays the role of a spatially dependent amplitude: a frequency component contributes only where the corresponding mask is nonzero. Applied to the frequency embedding layer, $\mathcal{M}^1(\mathbf{x}) \odot \sin(\omega\mathbf{x} + \varphi)$ directly selects which input frequencies are visible in each region.

The masks $\mathcal{M}_\phi$ are parameterized by a small multi-scale hash-grid network (Müller et al., 2022) and trained jointly with the SIREN parameters. To control parameter count, neurons in each layer are partitioned into groups and a single mask is broadcast across all neurons within a group. In the frequency embedding layer, neurons are grouped by the magnitude of their initialized frequency; in hidden layers, neurons are partitioned evenly.

Results · 2D images

Image fitting on DIV2K

Method PSNR ↑ SSIM ↑ PSNRedge Noisiness ↓
SIREN34.640.95931.372.525
FFN31.920.91030.209.873
SAPE32.570.93230.313.128
WIRE28.760.83125.7310.324
FINER37.520.96734.024.617
GaussianImage37.380.97333.25
NeuRBF36.330.95933.926.975
SASNet (Ours)39.820.97936.812.105

Best / runner-up. SASNet wins on all four metrics, with the largest gap on noisiness: direct evidence that the masks suppress frequency leakage.

Reconstruction error maps and learned masks on a toy image.

Error maps and learned masks on a toy image. Most baselines exhibit ringing or fail to capture the central high-frequency structure. SASNet learns well-localized, multi-frequency masks that activate only where they are needed, in stark contrast to SAPE's spatially inconsistent masks.

Pixel-shift and ×16 super-resolution comparison.

Pixel shift and ×16 super-resolution. SASNet is the only method that simultaneously preserves sharp texture (yellow boxes) and stays clean in smooth regions (green boxes).

PSNR vs. parameter count on Kodak.

Scalability on Kodak. SASNet shows good scaling behavior, leading across the practical parameter range.

Results · 3D volumes

Volumetric data on ScalarFlow

MethodPSNR ↑SSIM ↑
PEMLP40.420.9434
Gauss47.110.9920
SIREN47.670.9948
FINER49.770.9959
WIRE53.060.9963
SASNet (Ours)55.920.9973
Volumetric reconstruction comparison.

SASNet recovers thin filaments and turbulent detail while keeping a clean empty background, whereas global methods (SIREN, FINER) leak high-frequency noise into empty regions and local methods (WIRE) introduce artifacts in smooth areas.

Results · Signed distance fields

SDF reconstruction

Interactive example: Dragon model

Drag to rotate, scroll to zoom. Each mesh is the zero-level set of the SDF learned by the corresponding method, extracted with Marching Cubes and decimated for web display. The two viewers are independent, so you can pick any pair of methods to compare directly on the same shape.

SIREN  ·  Chamfer: 2.871 × 10−6  ·  IoU: 0.9694

SASNet (Ours)  ·  Chamfer: 1.964 × 10−6  ·  IoU: 0.9809

Meshes decimated to ~120K faces from the original ~600K-face reconstructions for web delivery.

MetricMethod ArmadilloDragonLucyThai StatueAvg.
Chamfer ↓ (×10⁻⁶)PEMLP3.4652.0392.3874.4603.088
Gauss7.28313.0411.7811.6210.93
SIREN3.5992.8712.4054.3813.314
WIRE3.2742.1722.4193.2962.790
FINER4.0402.4772.5364.2013.313
SASNet3.3681.9641.9273.0902.587
IoU ↑PEMLP0.98780.98030.96770.95610.9730
Gauss0.98060.95610.95810.94180.9592
SIREN0.98720.96940.97160.95610.9711
WIRE0.99100.97470.96930.96350.9746
FINER0.98370.97280.96710.95180.9689
SASNet0.98910.98090.98130.96540.9792

SASNet leads on the Chamfer / IoU average across four canonical shapes.

Training loss curves for the Armadillo SDF.

Convergence: SASNet reaches the final loss of the strongest baseline within the first ~35K iterations, then keeps improving.

Sliced 3D mask visualization.

What the masks learn: a sliced view of the highest-frequency mask shows activation concentrating tightly around the zero-level set, exactly where high-frequency detail is needed.

BibTeX

@inproceedings{feng2026sasnet,
  title     = {SASNet: Spatially-Adaptive Sinusoidal Neural Networks},
  author    = {Feng, Haoan and Aldana, Diana and Novello, Tiago and De Floriani, Leila},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}