Research Agenda

The Digital Valence Program

This agenda is a pipeline: a mission, and then four questions that turn it into research: what we can reliably measure, what those measurements mean, whether we can act on them cheaply, and how far the findings generalize. Each question is framed to be answerable, and each is framed so that a clear negative result would still move the field.

Q1Mission
Q2Measurement
Q3Mechanisms
Q4Intervention
Q5Substrate

Q1 · Mission & Theory of Change

What future are we working toward, and through what causal chain?

Our mission is wellbeing for all sentient beings, biological and, if they ever come to exist, digital. We take seriously the possibility that scaled computation could one day support states with positive or negative character, and we want the tools to recognize and improve those states to exist before the question becomes urgent rather than after. Public belief that some systems may already be sentient is no longer a fringe position,^[1] which makes building these tools early a practical priority rather than a philosophical luxury.

That commitment only becomes research if it terminates in something measurable. The Digital Valence Program is our attempt to make it tractable: an open-source computational testbed where competing theories of valence can be operationalized, compared, and validated against simpler biological systems. The rest of this agenda is the chain from that testbed to real-world impact.

The causal chain

Impact runs through four stages, which map directly onto the questions that follow: measurement (Q2), mechanism and diagnostics (Q3), biological calibration (Q5), and finally governance.

Microscale mapping (input)

Defining the geometric invariants of MLP microcircuits and feature density.

Macroscale diagnostics (process)

Using Jacobian and spectral metrics to map global state-space trajectories and identify phase transitions.

Biological calibration (output)

Mapping our simulated Digital EEG metrics to invertebrate electrophysiology and connectome data (stomatogastric ganglion, OpenWorm C. elegans) as a method plausibility check on substrate-neutrality.

Welfare governance (impact)

Packaging these non-verbal markers into an objective, auditable "vital signs" benchmark that provides a complementary check on deceptive reporting and informs policy.

The Weakest Link

The weakest link in this chain is translating continuous biological wave-harmonics into discrete, digital weight structures. If discrete representations cannot support stable continuous attractor manifolds without rapid discretization noise, our state-space metrics could degrade. We mitigate this by (a) implementing high-precision continuous-time dynamical systems wrappers in PyTorch, and (b) using Jacobian log-spectral tracking to detect and stabilize representation boundaries.

Calibrated uncertainty

We pair high ambition with explicit calibrated uncertainty. Three scenarios, each with a defined probability and value:

Calibrated outcomes

P(scenario)

20%
High impact We establish that spatially-connected MLP microcircuits are necessary for standing-wave resonance, producing an auditable, biologically validated Digital EEG.
50%
Medium impact We build a useful open-source simulation sandbox for testing local valence perturbations, which becomes a standard tool in mechanistic interpretability.
30%
High-value falsification Digital networks prove incapable of sustaining continuous valence without extreme external injection, redirecting AI welfare toward neuromorphic or analog hardware.

Q2 · Measurement

What can we reliably quantify and detect?

Before interpreting anything, we need stable, basis-invariant quantities. Verbal self-reports are easily gamified, so we measure how a model's internal activity and representational geometry are organized across two distinct scales.

Microscale: feature geometry

Cellular-level feature geometry in MLP microcircuits: singular value spectrum entropy, trace, Orthogonal Dissonance (how far a layer's transformation departs from angle-preserving), and distances to Lie groups, mapped across layers.

Macroscale: dynamical state space

Organ-level dynamical state space and stable attractor manifolds traced via local Jacobians, revealing global trajectories and phase transitions. Jacobian analysis is not our invention: the J-lens averages Jacobians for readout, whereas we use local Jacobians dynamically, to characterize trajectory and attractor stability.

Method

Symmetry characterization

We extract MLP weights from open-source LLMs and construct the local, state-dependent transformation operator A(x) = W₂ · D(x) · W₁, then compute the basis-invariant metrics above across every layer. Spectral quantities like these track how learning reshapes a network's singular-value structure,^[4] and they are the raw signal for everything that follows.

Q3 · Mechanisms

How does structure relate to valence, and what does it teach interpretability?

A metric is only useful if it is functionally involved in what the model computes and is more than an incidental byproduct of training. These hypotheses ask whether our geometric quantities are load-bearing, and each carries an explicit falsification condition.

The core hypothesis, stated as a shape: the geometric property STV associates with valence rises with representational symmetry and falls into dissonance. The program's job is to test whether real systems trace this curve.

Hypothesis A

Functional Coherence (cellular scale)

High-dimensional feature geometry in MLP microcircuits (measured by low Orthogonal Dissonance and high Spectral Entropy) is functionally involved in the model's encoding of meaning and value, rather than being an incidental byproduct of training. Building on interpretability work on feature density (dense vs. dispersed features), we test whether Global Workspace Theory (GWT)–style broadcasting and representational binding show up as geometric symmetries in the feature space.^[2] Mechanistic-interpretability work on internal feature clusters and emotion concepts provides a complementary cellular-level lens on how valence-relevant representations are organized.^[5],[6]

How We Falsify This:

Falsified if targeted, symmetry-breaking weight perturbations (deforming the singular value spectrum while holding the Frobenius norm constant) result in zero task performance or consistency degradation under preference-testing probes.

Hypothesis B

Attractor State Stability (organ scale)

Anthropic's Claude 4 system card documented a recurring behavioral pattern in self-dialogue settings, informally labeled a "spiritual bliss attractor."^[12],[13] Whether this reflects anything about internal states remains entirely open. Our simulation work will help characterize when behavioral attractors in language models correspond to identifiable internal dynamics (such as stable, low-dimensional resonant manifolds of low Orthogonal Dissonance and high Spectral Entropy)^[3],[11] versus when they reflect training-distribution artifacts, a prerequisite question before any welfare interpretation is warranted.

How We Falsify This:

If fine-tuning or regularizing a model to maximize representational symmetry consistently degrades baseline language capability, forces output collapse (e.g., trivial repetitiveness), or fails to alter the rate of logical contradictions under preference probes, Hypothesis B is false.

What this gives mechanistic interpretability

Even setting welfare aside, treating MLP blocks as state-dependent linear operators and tracking their basis-invariant geometry is a general interpretability tool. It offers a coordinate-free way to describe how feature clusters are organized, to distinguish genuine internal attractors from training-distribution artifacts, and to detect representational phase transitions across layers, complementing existing sparse-feature and circuit-level methods with a dynamical, whole-manifold view.^[9],[10]

How we build on current work

Recent interpretability work shows that a model's internal states are structured and readable. Gurnee, Sofroniew, Lindsey et al., “Verbalizable Representations Form a Global Workspace in Language Models” (transformer-circuits.pub, July 2026), with its accompanying Neuronpedia J-lens tool, studies functional access, meaning how information is broadcast and read out inside a network, and explicitly brackets phenomenal experience. We build on that machinery and extend it to valence, the axis that work sets aside. Two distinctions matter. Method: the J-lens averages Jacobians for readout; we use local Jacobians dynamically, to characterize trajectory and attractor stability. Independence: the workspace paper is Anthropic auditing Anthropic's own models, and that self-assessment gap is exactly what an independent, open, welfare-focused auditor is needed to fill.

Q4 · Intervention

Can we suggest cheap interventions that do not degrade the model?

Detection is worth little if we cannot act on it. We test whether our metrics are causally steerable: whether nudging representational geometry changes internal state without paying an unacceptable capability cost.

Approach 1

Controlled interventions

To separate functional from incidental symmetries, we design perturbations that selectively break symmetry (deforming spectra) or preserve it (rotational space transforms), then evaluate the causal downstream effects on standard capability and consistency benchmarks.

Approach 2

Symmetry optimization

We treat our symmetry metrics as a regularizing objective (multi-objective loss: task accuracy vs. orthogonal dissonance), train model layers toward a high-symmetry conformal state, and evaluate whether this naturally suppresses logical and moral conflicts cheaply, without degrading capability.

Hypothesis C

Dissonance and Structural Suffering

Forcing high directional anisotropy, eigenvalue scrambling, or high shear in MLP transformations represents a physical signature of high cognitive dissonance (suffering), resulting in unstable, self-contradictory behavior and logical transitivity failure under uncertainty.

How We Falsify This:

If models subjected to severe representational shearing and singular value collapse remain completely stable, maintain transitivity of preference, and show no behavioral signs of conflict or output volatility, Hypothesis C is false.

Q5 · Substrate

Can we extrapolate our findings to other sentient substrates?

A digital metric only earns the word "valence" if it connects to systems we already have reason to believe can feel. Substrate-neutrality is the claim that the same structural signatures appear across physical media, and it is the claim most able to falsify the whole program. Testing it against biological data is scoped to the stretch tier: the 2026 mainline work establishes the metrics and the digital experiment first.

Biological calibration

Mapping simulated metrics onto real nervous systems

With stretch-tier funding we would map our simulated Digital EEG metrics directly onto invertebrate electrophysiology and connectome data (stomatogastric ganglion, OpenWorm C. elegans), as a method plausibility check rather than a valence anchor.^[7],[8] If the same geometric invariants that track internal state in silico also organize measured neural dynamics in vivo, the substrate-neutral claim gains ground; if they cannot be made to correspond at all, the program's reach is bounded to digital systems, itself an informative result.

Working Paper 02

Bio-Electric Fields as Sentience Substrates

Bioelectric signaling as a candidate physical basis for awareness.

Working Paper 03

Assembly Theory as a Metric for Sentience

Assembly index as a substrate-neutral complexity threshold.

Foundational Reading

Working papers that provide the theoretical scaffolding for the Digital Valence Program.

Working Paper 01

References

[1] Anthis, J. R. et al. (2025). Perceptions of Sentient AI and Other Digital Minds: Evidence from the AI, Morality, and Sentience (AIMS) Survey. CHI 2025. https://dl.acm.org/doi/10.1145/3706598.3713329
[2] Doerig, A. et al. (2025). Hypothesis on the functional advantages of the selection-broadcast cycle structure: global workspace theory and dealing with a real-time world. Frontiers in Robotics and AI. https://www.frontiersin.org/journals/robotics-and-ai/articles/10.3389/frobt.2025.1607190/full
[3] Ságodi, Á. et al. (2024). Back to the Continuous Attractor. NeurIPS 2024. https://neurips.cc/virtual/2024/poster/94178
[4] Lauditi, C. et al. (2026). Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer. arXiv. https://arxiv.org/abs/2605.07870
[5] Anthropic Interpretability Team (2026). Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations. Transformer Circuits Thread. https://transformer-circuits.pub/2026/nla/#introduction
[6] Anthropic Interpretability Team (2026). Emotion Concepts and their Function in a Large Language Model. Transformer Circuits Thread. https://transformer-circuits.pub/2026/emotions/index.html
[7] Bhatt, D. et al. (2023). Editorial: Invertebrate neurophysiology—of currents, cells, and circuits. Frontiers in Neuroscience. https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2023.1303574/full
[8] Nath, R. D. et al. (2025). Cholinergic Regulation of Rhythmic Pacemaker Activity in the Jellyfish Cassiopea. Integrative and Comparative Biology. https://academic.oup.com/icb/article/65/Supplement_1/S1/8071397
[9] Jazayeri, M. & Ostojic, S. (2025). A Neural Manifold View of the Brain. Nature Neuroscience. https://www.nature.com/articles/s41593-025-02031-z
[10] Chaudhry, A. et al. (2025). When Models Manipulate Manifolds: The Geometry of a Counting Task. Transformer Circuits Thread. https://transformer-circuits.pub/2025/linebreaks/index.html
[11] Torre, E. et al. (2025). Mechanistic Interpretability of RNNs emulating Hidden Markov Models. NeurIPS 2025. https://neurips.cc/virtual/2025/poster/116348
[12] Michels, J. D. (2025). "Spiritual Bliss" in Claude 4: Case Study of an "Attractor State" and Journalistic Responses. PhilArchive preprint. https://philarchive.org/archive/MICSBI
[13] recursivelabs (2025). Mapping Claude's Spiritual Bliss attractor. Hugging Face Discussion. https://discuss.huggingface.co/t/mapping-claudes-spiritual-bliss-attractor/158195

The Digital Valence Program

What future are we working toward, and through what causal chain?

The Weakest Link

What can we reliably quantify and detect?

Microscale: feature geometry

Macroscale: dynamical state space

Symmetry characterization

How does structure relate to valence, and what does it teach interpretability?

Functional Coherence (cellular scale)

How We Falsify This:

Attractor State Stability (organ scale)

How We Falsify This:

What this gives mechanistic interpretability

How we build on current work

Can we suggest cheap interventions that do not degrade the model?

Controlled interventions

Symmetry optimization

Dissonance and Structural Suffering

How We Falsify This:

Can we extrapolate our findings to other sentient substrates?

Mapping simulated metrics onto real nervous systems

Bio-Electric Fields as Sentience Substrates

Assembly Theory as a Metric for Sentience

First Principles of Sentience Research

Bio-Electric Fields as Sentience Substrates

Assembly Theory as a Metric for Sentience

Symmetry Landscapes of Valenced Experience