Samuel Douglas Caldwell Jr.

Sam Caldwell

512.712.3095

Sonora, Texas 76950

  • Software engineering
  • Security research
  • DevOps/SRE
  • Cloud Infrastructure

CRSCE Decompression Specification (Version 2025-12-20)


Abstract


Cross-Sums Compression and Expansion (CRSCE) decompression reconstructs a binary block from (i) four families of cyclic cross-sum constraints and (ii) a chained lateral hash (LH) oracle, yielding a $511 \times 511$ Cross-Sum Matrix (CSM) that is then deterministically mapped back to the original byte stream. Unlike classical codecs that encode an invertible transform, CRSCE stores global structural constraints and a cryptographic commitment. Accordingly, decompression is defined by acceptance constraints rather than a mandated solver: any reconstruction strategy is conforming if and only if the recovered CSM satisfies the canonical cross-sum equalities and reproduces the LH chain exactly. This document specifies the normative parsing rules, cross-sum mathematics, LH construction, acceptance criteria, fail-hard semantics, and a prototype-ready reconstruction framework based on deterministic elimination (DE) and iterative message passing (GOBP/LBP), with explicit validation of mathematical consistency with the CRSCE compression specification and a defensible feasibility and performance argument under stated assumptions (Caldwell, 2024; Herman & Kuba, 1999; Kschischang et al., 2001; National Institute of Standards and Technology [NIST], 2015; Pearl, 1988; Wainwright & Jordan, 2008).


Scope and Conformance


Scope


This specification defines CRSCE decompression for CRSCE format version 1. It covers:

  • Block payload parsing and field interpretation.
  • The normative cross-sum and LH verification functions.
  • Conformance requirements for success and failure.
  • A solver-agnostic reconstruction contract, plus an informative prototype-ready approach.

Normative keywords


The keywords MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY indicate requirement levels as defined for technical standards (Bradner, 1997; Leiba, 2017).


Definitions and Notations


Constants and indexing

CRSCE uses a fixed block dimension: $$ s=511. $$ The CSM indices are: $$ r,\ c\ ∈\ {0, \dots, s−1}\ =\ {0, \dots , 510}. $$ The stored index domain for all cross-sum vectors and the $LH$ array is: $$ i\ ∈\ {0, \dots ,510}, $$ so each stored vector has exactly $s=511$ elements.

The cross-sum bit-width is: $$ b=\ \lceil log_2(s) \rceil = \lceil log_2(511) \rceil\ =\ 9, $$ and each stored cross-sum element is a raw integer in ${0, \dots, 511}$ encoded in exactly $b$ bits (Caldwell, 2024).


Primary Object


  • CSM: the Cross-Sum Matrix, $$ CSM\ ∈\ \{0,\ 1\}^{s \times s}=\{0,\ 1\}^{511 \times 511}. $$
  • Cross-sum vectors: $LSM$, $VSM$, $DSM$, $XSM$, each length $s$
  • Lateral Hash Array ($LH$): $LH[0,\dots,510]$, a length-$s$ chain of SHA-256 digests used for deterministic verification (Caldwell, 2024; NIST, 2015)

Decompression Inputs and Outputs


Inputs

A conforming decoder consumes:

  1. A validated file header containing at least:
    • version
    • original_file_size_bytes
    • block_count
    • header integrity fields as defined by the CRSCE container format (Caldwell, 2024).
  2. A sequence of block_count fixed-size block payloads.

If original_file_size_bytes = 0, the file MUST contain block_count=0 and the decoder MUST output the empty byte stream (Caldwell, 2024).


Outputs


The decoder outputs a byte stream $Y$ such that: $ ∣Y∣ =\ $ original_file_size_bytes, and $Y$ matches the original source bytes exactly.


Failure Semantics (Fail-hard)


If the decoder cannot produce a valid CSM for any block under the acceptance criteria (discussed below), it MUST stop decoding and return an ERROR. Best-effort or partial output MUST NOT be the default (Caldwell, 2024).


Block Payload Parsing


Let the block payload be a bitstream $P$ of fixed length $149,216$ bits ($18,652$ bytes), byte-aligned. A decoder MUST parse, in order:

  1. $LH[0], LH[1], \dots, LH[510]$, each 256 bits (SHA-256 digest size).
  2. $LSM[0..510]$: $s$ values, each $b=9$ bits, MSB-first within the 9-bit field.
  3. $VSM[0..510]$: same encoding.
  4. $DSM[0..510]$: same encoding.
  5. $XSM[0..510]$: same encoding.
  6. Exactly four trailing padding bits (expected to be zero) used to reach the byte boundary (Caldwell, 2024).

A decoder MAY treat nonzero padding bits as a format error.


Canonical Reconstruction Target


CSM reconstruction objective


For each block, the decoder must reconstruct: $$ CSM\ ∈\ \{0,1\}^{511 \times 511} $$ such that both the cross-sum constraints and the $LH$ chain constraints (discussed later) are satisfied.


Canonical Inverse Byte Mapping


Once a valid CSM is obtained, the output bytes are derived deterministically by traversing the CSM row-major and packing bits MSB-first in each byte. This mapping is the inverse of the compressor’s “read bytes sequentially, emit bits MSB-first, write to CSM row-major” rule (Caldwell, 2024). Therefore, correctness of the recovered bytes follows directly from the correctness of the reconstructed CSM (Section 8.1).


Cross-Sum Constraints


All arithmetic in diagonal addressing is modulo $s=511$. All sums are taken over exactly $s=511 terms.


Lateral Sum Vector (LSM)


For $r ∈ \{0,\dots,510\}$: $$ LSM[r]=\sum_{c=0}^{s-1}CSM[r,c] $$


Vertical Sum Vector (VSM)


For $c ∈ \{0,\dots,510\}$: $$ VSM[c]=\sum_{c=0}^{s-1}CSM[r,c] $$


Diagonal Sum Vector (DSM), cyclic


For $d ∈ \{0,\dots,510\}$: $$ DSM[d]=\sum_{r=0}^{s-1}CSM[r,\ (d-r)\ mod\ s)] $$


Anti-Diagonal Sum Vector (XSM), cyclic


For $x ∈ \{0,\dots,510\}$: $$ DSM[x]=\sum_{r=0}^{s-1}CSM[r,\ (r-x)\ mod\ s)] $$


Stored value interpretation


Each stored cross-sum element MUST be interpreted as the raw integer value of the corresponding sum, and MUST lie in ${0, \dots, 511}$, consistent with $b=9$ bits (Caldwell, 2024).


Lateral Hash Matrix ($LH$) Chain


Hash Primitive


SHA-256 MUST be implemented exactly as specified in the Secure Hash Standard (NIST, 2015).


Fixed Seed


CRSCE defines a fixed, unchanging seed string. Let seedBytes be the literal ASCII/UTF-8 bytes of the seed string defined by the CRSCE standard (Caldwell, 2024).

Define: $$ N=Sha256(seedBytes) $$


Row Serialization $RowBytes(r)$


For each $r\ ∈\ \{0,\dots,510\}$, define $RowBytes(r)$ as the packed 511-bit row $CSM[r,0,\dots,510]$ into exactly 64 bytes, where:

  • bits within a byte are MSB-first.
  • bytes follow increasing column order from $c=0$ to $c=510$ (with the unused final bit of the 512-bit container treated according to the compressor's defined packing rule for 511 bits.
  • the mapping is identical to the compressor's row packing convention (Caldwell, 2024).

Chain definition


The $LH$ chain is defined as: $$ \begin{aligned} LH_0=Sha256(N || RowBytes(i)) \\ \\ LH_r=SHA256(LH_{r−1} || RowBytes(r)) for r\ ∈\ \{1, \dots , 510\}. \end{aligned} $$


Acceptance Criteria and Correctness


Block Validity


A reconstructed candidate $\widehat{CSM}$ is valid if and only if:

  1. Cross-sum match: recomputed $LSM,\ VSM,\ DSM,\ XSM$ equal the stored vectors for all indices
  2. $LH$ match: recomputed $LH[0, \dots, 510]$ equal the stored $LH$ chain (defined above).

The stored $LH$ array is $LH[0,\dots,510]$. A decoder MUST recompute the chain from the reconstructed $CSM$ and compare for exact equality.

If either condition fails, the candidate MUST be rejected.


Byte-stream Correctness Given a Valid CSM


Let the compressor define a total order on block bits by reading input bytes sequentially, emitting each byte’s bits MSB-first, and placing emitted bits into CSM in row-major order (Caldwell, 2024). Define the decoder’s inverse as reading CSM in the same row-major order and packing groups of 8 bits MSB-first into bytes.

Because both mappings use the same bit order and traversal order, the composition “compress bit placement” followed by “decompress bit extraction” is the identity on the block bitstring, up to final-block padding removal. Thus, once $\widehat{CSM}$ is valid, the recovered bytes are determined uniquely and are correct by construction.


Fail-Hard Requirement


If no valid $\widehat{CSM}$ is found, the decoder MUST return an ERROR.

If a block consumes more than DecoderTimeout, the decoder MUST return an ERROR, where the value of DecoderTimeout is implementation-defined and should provide a safeguard to unsolvable blocks.


Reconstruction Strategy (Solver-Agnostic Contract)


Normative Statement

The reconstruction method is implementation-defined. A conforming decoder MAY use any internal method (including but not limited to deterministic elimination, constraint propagation, loopy belief propagation, and Game of Beliefs Protocol message passing) provided the acceptance criteria in Section 8 are satisfied exactly (Caldwell, 2024).


Informative: why a message-passing approach is appropriate


The CRSCE constraint system can be represented as a factor graph with:

  • variable nodes $x_{r,c}\ ∈\ \{0,\ 1\} for each $CSM$ cell, and
  • factor nodes enforcing each cross-sum equality (one per row/column/diagonal/anti-diagonal index).

This aligns naturally with the factor-graph formalism and sum-product style message passing described in foundational work on belief propagation and factor graphs (Kschischang et al., 2001; Pearl, 1988; Wainwright & Jordan, 2008). CRSCE’s GOBP framing can be viewed as an engineering protocol for orchestrating such message updates and integrating deterministic constraints and local moves (Caldwell, 2024).


Informative: prototype-ready DE+GOBP decomposition


Disclaimer: As said, above, decompression is implementation-defined (Caldwell, 2024) and this section merely suggests one possible implementation. Nothing in this section should be construed as precluding a different decompression implementation, provided any implementation must be cross-compatible with the other sections in this standard.

A prototype-ready decoder commonly separates reconstruction into:

  1. Deterministic Elimination (DE):


    1. Simple Folding

      Maintain for each constraint line a residual: $$ R = S - \sum(assigned\ ones), U=$(unassigned\ variables\ on\ the\ line). $$ If $R=0$, all unassigned variables on that line are forced to $0$; If $R=U$, then all are forced to 1. These implications are logically sound because they are a direct consequence of the equality constraint over $\{0,1\}$ variables.

    2. Known-Hash Folding

      Certain patterns MAY be identified which occur with regular frequency, such as alternating 0's and 1's. Hashes for these patterns can be pre-computed and stored in the decoder. Then at the start of the decoder process, a quick search of this known-hash table can be made to solve any rows that match a known hash value. The memory consumed by the table is more than justified where the probability of solving even one row using this method will save a non-trivial amount of compute and memory in the decoding process.

      This may be applied repeatedly during the decoding process.

  2. Iterative Inference (GOBP/LBP):


    When DE reaches a fixed point, apply iterative message passing on the factor graph to prioritize highly biased variables for assignment or to drive convergence toward a consistent global solution. The use of damping and scheduling is a standard stabilization practice in loopy graphs (Wainwright & Jordan, 2008).

    This may be applied repeatedly during the decoding process.

  3. LH-based Selection


    Because LH is an exact oracle over the reconstructed rows, it serves to reject candidate solutions that satisfy cross-sums but do not match the committed row content. This use of a cryptographic commitment as an accept/reject oracle is consistent with SHA-256’s standardized role as an integrity mechanism (NIST, 2015).

    This may be applied repeatedly during the decoding process when a candidate row is prepared.

Critically, LH is not required to “search the space” by itself; it is used to select among solutions produced by DE+GOBP (Caldwell, 2024).


Feasibility and Performance Argument


Feasibility: existence of a valid solution for conforming blocks


For any block produced by a conforming CRSCE compressor, there exists at least one valid $CSM$: the $CSM$ constructed during compression from the input stream. The compressor computes and stores cross-sums and $LH$ from that $CSM$ (Caldwell, 2024). Therefore, the constraint system is satisfiable for conforming data; decompression is a search for that satisfying assignment.

The constraint structure has strong redundancy: each variable participates in multiple constraints (row, column, one diagonal family, and one anti-diagonal family). Such redundancy is a standard condition under which iterative constraint propagation and belief propagation heuristics often perform well in large structured systems, even when exact inference is intractable in the worst case (Kschischang et al., 2001; Wainwright & Jordan, 2008). Moreover, DE provides provably correct forced assignments that reduce degrees of freedom before stochastic or iterative steps are invoked.

This does not claim worst-case polynomial-time guarantees; rather, it supports a defensible engineering claim: given CRSCE’s highly constrained structure and deterministic oracle, a hybrid DE + message-passing solver has a significant likelihood of recovering the original CSM for compressor-generated instances, especially when parameters are tuned empirically (Caldwell, 2024).


Time model (explicit)


A conservative engineering time model is to treat a full decode attempt as dominated by operations proportional to $s^3$ per block, reflecting repeated passes across $O(S^2)$ variables or $O(s)$ iterations or phases (Caldwell, 2025a). Under the simplifying assumptions of:

  • an 8-core CPU at 3.5 GHz,
  • fully utilized across cores, and
  • with negligible I/O and no memory bandwidth bottleneck,

the effective operation rate is: $$ T = 8 x 3.5 x 10^9 = 2.8 x 10^10 ops/s $$ Then the per-block decompression time estimate is: $$ T_{decomp} \approx \frac{s^3}{T} \times 10^3\ ms $$ For $s=511$: $$ T_{decomp} \approx \frac{511^3}{2.8 x 10^{10} ops/s} \times 10^3\ ms \approx 4.77 ms/block. $$ This yields an expected throughput on the order of: $$ \frac{32,640}{4.77\ ms} \approx 6.8\ MB/s $$ per the stated assumptions (Caldwell, 2025b).


Performance targets and scaling rationale


CRSCE is designed as a fixed-block, random-access-friendly format where performance can be improved by:

  • batching blocks,
  • parallelizing across blocks, and
  • accelerating iterative steps on GPU-like architectures.

A prototype plan targeting sub-millisecond block times via parallel implementations and tuned iteration parameters is consistent with the algorithm’s structure (Caldwell, 2025a). From a standards standpoint, these targets are non-normative, but they support a defensible feasibility claim: there exists a straightforward path to performance improvement beyond the conservative $s^3$ CPU model through parallelism and hardware acceleration.


Security and Robustness Considerations

  1. Integrity verification: $LH$ depends on SHA-256 correctness; implementations MUST match the standard exactly (NIST, 2015).
  2. No confidentiality:

    CRSCE does not provide secrecy, though it may enhance secrecy when used with robust encryption as discussed earlier.

  3. Fail-hard safety:

    Silent acceptance of invalid blocks constitutes data corruption; fail-hard behavior is required (Caldwell, 2024).

  4. Untrusted input handling

    Decoders SHOULD validate header fields, block sizes, and bounds before allocation and processing


Reference Pseudocode


Block decoding contract


DECODE_BLOCK(payload_bits):
(LH, LSM, VSM, DSM, XSM, pad) = PARSE(payload_bits)
if pad != 0000:
MAY error

CSM = RECONSTRUCT_IMPLEMENTATION_DEFINED(LSM, VSM, DSM, XSM, LH)
if CSM not found:
ERROR

if not CROSS_SUMS_MATCH(CSM, LSM, VSM, DSM, XSM):
ERROR
if not LH_MATCH(CSM, LH):
ERROR

return CSM

Deterministic recovery of bytes from a valid CSM


CSM_TO_BYTES(CSM):
bits = traverse row-major over r=0..510, c=0..510
pack each 8 bits MSB-first into a byte
return bytes

Padding removal for the final block is performed using original_file_size_bytes (Caldwell, 2024).


References


Bradner, S. (1997). Keywords for use in RFCs to indicate requirement levels (RFC 2119). RFC Editor. https://www.rfc-editor.org/rfc/rfc2119

Caldwell, S. (2024, December 3). Cross-Sums Compression and Expansion (CRSCE) Specification (working document). samcaldwell.net. https://samcaldwell.net/2024/12/03/cross-sums-compression-and-expansion-crsce/

Caldwell, S. (2025a, June 28). CRSCE GPU Decompression Implementation Plan (working document).

Caldwell, S. (2025b). Target Benchmarks (working document).

Herman, G. T., & Kuba, A. (Eds.). (1999). Discrete tomography: Foundations, algorithms, and applications. Birkhäuser.

Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519. https://www.mit.edu/~6.454/www_fall_2002/lizhong/factorgraph.pdf

Leiba, B. (2017). Ambiguity of uppercase vs lowercase in RFC 2119 key words (RFC 8174). RFC Editor. https://www.rfc-editor.org/rfc/rfc8174

National Institute of Standards and Technology. (2015). Secure Hash Standard (SHS) (FIPS PUB 180-4). U.S. Department of Commerce. https://csrc.nist.gov/pubs/fips/180-4/upd1/final

Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann.

Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2), 1–305. https://people.eecs.berkeley.edu/~jordan/papers/wainwright-jordan-fnt.pdf