Cross-Sums Compression and Expansion (CRSCE) Specification
Abstract
Cross-Sums Compression and Expansion (CRSCE) is a block-structured binary representation that encodes a $511 \times 511$ bit Cross-Sum Matrix (CSM) through (i) four families of cyclic cross-sum constraints and (ii) a chained lateral hash array (LH) used as a deterministic verification oracle. The CRSCE format defines a fixed-size block payload, a fixed header, and strict verification criteria for decoding. Reconstruction (decompression) is defined only by its acceptance constraints—any internal solving strategy (e.g., deterministic elimination, loopy belief propagation, or Game of Beliefs Protocol message passing) is implementation-defined, provided the final candidate CSM satisfies the canonical cross-sum constraints and the LH hash chain exactly. This document specifies the file format, bit-ordering, cross-sum definitions, LH construction, packing order, padding, random-access semantics, and conformance test vectors. It also provides explanatory context by relating CRSCE decompression to constrained reconstruction and message passing over factor graphs (Pearl, 1988; Kschischang et al., 2001) and to discrete tomography-style ambiguity management (Herman & Kuba, 1999). Cryptographic verification relies on SHA-256 as standardized by NIST (National Institute of Standards and Technology [NIST], 2015).
Terminology and Language
Conformance Language
The keywords MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are to be interpreted as normative requirements.
Primary objects
- Input stream: a sequence of bytes to be encoded.
- Block: a contiguous segment of $s^2$ bits ($s=511$), i.e. $261,121$ bits, except that the final block may be padded with null bits to reach full block size.
- CSM: the Cross-Sum Matrix, a $511 \times 511$ matrix of bits.
- Cross-sum vectors: $LSM, VSM, DSM, XSM$, each stored for indices $0,...,510$ (length 511).
- LH array: the lateral hash chain values $LH[0,...,510]$ (length 511), each a SHA-256 digest.
Design Rationale
CRSCE encodes a dense binary block into a set of global constraints (cross sums) plus cryptographic commitments (LH). Decompression becomes a constrained reconstruction problem: find a CSM consistent with the cross-sum constraints and validate it with the LH oracle. This aligns with a general family of problems in discrete reconstruction where projection-like measurements can underdetermine the original object, requiring solver strategies and validation (Herman & Kuba, 1999). CRSCE’s message-passing-oriented decompression strategies (when used) are naturally expressed as factor-graph inference with local constraints and iterative belief updates (Pearl, 1988; Kschischang et al., 2001).
Importantly, CRSCE treats SHA-256 as an integrity mechanism: the LH chain provides deterministic accept/reject validation of row content (NIST, 2015).
CRSCE correctness does not depend on any particular inference algorithm, only on meeting the acceptance constraints.
Global Constants and Indexing
Fixed Matrix Size
$$s=511$$ The CSM is a $511 \times 511$ bit matrix with row indices $r \in \{0,\dots,510\}$ and column indices $c \in \{0,\dots,510\}$.
Canonical stored index domain
CRSCE stores cross-sum vectors and LH over the canonical index domain:
$$ i \in \{0,\dots,s-1\} = \{0,\dots,510\}. $$
Thus:
- $LSM$, $VSM$, $DSM$, $XSM$ each have $511$ elements indexed $0,\dots,510$.
- $LH$ has $511$ elements indexed $0,\dots,510$.
Bit-width for stored cross-sum values
$$ b = \lceil \log_2(s) \rceil = \lceil \log_2(511) \rceil = 9 $$ Each stored cross-sum element is a raw integer in $\{0,\dots,511\}$ and is serialized using $b=9$ bits (MSB-first within the 9-bit field).
Bit Ordering, Block Formation, and CSM Construction
Stream endianness and bit extraction
CRSCE follows little-endian conventions at the byte-stream level. Bytes are read sequentially from the data source. Each byte is serialized MSB-first into bits. For an input byte $B$, the emitted bit sequence is: $$ (\ (B\ \gg\ 7)\ \&\ 1,\ (B\ \gg\ 6)\ \&\ 1, \dots,\ (B\ \gg\ 0)\ \&\ 1\ ). $$
Mapping bits to CSM
Bits are written into the CSM left-to-right, top-to-bottom (row-major). For bit-position $𝑘 ∈ \{0,\dots, s^2\}$: $$ r=\left\lfloor \frac{k}{s}\right\rfloor,\qquad c = k \bmod s, $$ and the $k$-th emitted bit is stored at $CSM[r, c]$.
Block sizing and padding
Each full block contains: $$ s^2 = 511^2 = 261,121\ bits \approx 32,641\ bytes \approx 32\ KB $$
- All blocks except the last MUST be exactly $s^2$ bits
- The last block MAY be shorter; it MUST be padded with null ($0$) bits to reach $s^2$ bits.
- The file header MUST store the original (unpadded) input size in bytes so the decoder can remove padding in the final block.
Empty-file rule
If the original input size is $0$ bytes, the CRSCE output MUST contain:
- a valid header, and
- zero blocks ($block_count=0$) .
Cross-Sum Vectors
All cross-sums are computed using the canonical loop bounds $0,\dots,510$ ($511$ terms), and cyclic diagonal addressing uses modulo $s=511$. Let "$mod\ s$" mean modulo $511$.
Lateral Sum Vector (LSM)
For $r ∈ \{0,\dots,510\}$: $$ LSM[r]=\sum_{c=0}^{s-1}CSM[r,c] $$
Vertical Sum Vector (VSM)
For $c ∈ \{0,\dots,510\}$: $$ VSM[c]=\sum_{c=0}^{s-1}CSM[r,c] $$
Diagonal Sum Vector (DSM)
For $d ∈ \{0,\dots,510\}$: $$ DSM[d]=\sum_{r=0}^{s-1}CSM[r,\ (d-r)\ mod\ s)] $$
Anti-Diagonal Sum Vector (XSM)
For $x ∈ \{0,\dots,510\}$: $$ DSM[x]=\sum_{r=0}^{s-1}CSM[r,\ (r-x)\ mod\ s)] $$
Stored value definition
Each stored element is the raw integer value of the corresponding sum above and MUST lie in ${0,\dots,511}$. This is what allows $b=9$ storage.
Lateral Hash (LH) Chain
Hash Primative
CRSCE uses SHA-256 as specified by NIST (NIST,2015).
Fixed Seed
CRSCE defines a fixed, unchanging seed string:RG9uYWxkVHJ1bXBJbXBlYWNoSW5jYXJjZXJhdGVIaXN0b3J5UmVtZW1iZXJz
. The seed bytes are the literal ASCII/UTF-8 bytes of that character sequence (no Base64 decoding).
A hash of the seed can be precalculated in an implementation as it is unchanging.
RowBytes definition
For any row index $r$, define: $$ RowBytes(r)\ ∈\ \{0,1\}^{511}\ →\ {0,1}^{64\ bytes} $$ as the packed 511-bit row $CSM[r,0,\dots,510]$ using the canonical bit order:
- bits in a byte are MSB-first
- bytes follow the row-major traversal
Thus, $RowBytes(r)$ is exactly 64 bytes.
Chain definition (stored indices $0,\dots,510$)
Define: $$ N=SHA256(seedBytes) $$ Then: $$ LH_0 = SHA256(N||RowBytes(0)) $$ and for $r\ ∈\ \{1,\dots,510\}$: $$ LH_r = SHA256(LH_{r-1}||RowBytes(r)) $$ The LH array stored in the CRSCE payload is: $$ LH[0,\dots,510]. $$
Block Payload Layout and Bit Packing
Serialization order
Within each block payload, fields MUST be serialized in this exact order:
- $LH[0],LH[1],\dots,LH[510]$ (each 256 bits)
- $LSM[0,\dots,510]$ (each 9 bits, MSB-first within the 9-bit field)
- $VSM[0,\dots,510]$ (each 9 bits, MSB-first)
- $DSM[0,\dots,510]$ (each 9 bits, MSB-first)
- $XSM[0,\dots,510]$ (each 9 bits, MSB-first)
Array indices MUST be serialized in increasing order.
Cross-sum element bit order
Let a cross-sum value $v\ ∈\ \{0,\dots,511\}$. Its serialized 9-bit representation MUST by emitted MSB-first: $$ (v_8,\ v_7,\ \dots,\ v_0), $$ where $v = \sum_{j=0}^{8}v_j2^j$.
Block payload size (bits and padding)
The payload content size is: $$ \underbrace{511 \times 256}_{LH} \;+\; \underbrace{4 \times 9 \times 511}_{\text{four cross-sum arrays}} \;=\; 130{,}816 + 18{,}396 \;=\; 149{,}212~\text{bits}. $$ To store payloads byte-aligned, the encoder MUST append exactly 4 zero padding bits at the end of the payload, producing: $$ 149{,}216~\text{bits} $$ or $$ 18{,}652~\text{bytes}. $$ per block on disk.
File Container Format
All multibyte integer fields in the header are little-endian
Header fields (fixed layout)
The header MUST contain:
magic(4bytes): ASCIICRSCversion(uint16): currently1header_bytes(uint16): total header size in bytesoriginal_file_size_bytes(uint64)block_count(uint64)header_crc32(uint32): CRC-32 computed over the preceding header bytes (frommagicthroughblock_count)
Thus, for version 1: $$ header_bytes = 28. $$
Block framing and random access
Blocks are fixed-size. With: $$ header\_bytes = 28 block\_bytes = 18, 652 $$ the file offset to block $i$ (0-based) is: $$ block\_offset = header\_bytes + i \times block\_bytes $$
Decoder Acceptance Criteria and Error Semantics
Success Criteria
A decoded block is valid if and only if the decoder produces a $511 \times 511$ CSM such that:
- cross-sums match exactly for indices $0,\dots,510$ under the normative definition.
- The LH chain recomputed from the decoded $CSM$ matches the stored $LH[0,\dots,510]$ exactly.
Failure criteria (fail-hard)
If the decoder cannot produce a valid block that satisfies all acceptance criteria, it MUST:
- stop decoding, and
- return an ERROR
Partial-output or best-effort behavior is non-normative and MUST NOT be the default.
Reference Pseudocode (Explanatory, Non-Normative)
Encoding (per-block)
ENCODE_BLOCK(bytes[32768]) -> payload_bits:
CSM = BuildCSM(bytes) // MSB-first per byte; row-major fill
for r in 0..510:
LSM[r] = sum_{c=0..510} CSM[r,c]
for c in 0..510:
VSM[c] = sum_{r=0..510} CSM[r,c]
for d in 0..510:
DSM[d] = sum_{r=0..510} CSM[r,(d-r) mod 511]
for x in 0..510:
XSM[x] = sum_{r=0..510} CSM[r,(r-x) mod 511]
seedBytes = ASCII("RG9uYWxkVHJ1bXBJbXBlYWNoSW5jYXJjZXJhdGVIaXN0b3J5UmVtZW1iZXJz")
N = SHA256(seedBytes)
RowBytes(r) = Pack511Bits(CSM[r,0..510]) // 64 bytes, MSB-first in each byte
LH[0] = SHA256( N || RowBytes(0) )
for r in 1..510:
LH[r] = SHA256( LH[r-1] || RowBytes(r) )
payload = SerializeBits(
LH[0..510] (256-bit each),
LSM[0..510] (9-bit MSB-first each),
VSM[0..510] (9-bit MSB-first each),
DSM[0..510] (9-bit MSB-first each),
XSM[0..510] (9-bit MSB-first each)
)
append 4 zero bits
return payload
Decoding (accept/reject form)
DECODE_BLOCK(payload_bits) -> CSM or ERROR:
parse LH[0..510], LSM[0..510], VSM[0..510], DSM[0..510], XSM[0..510]
ignore final 4 padding bits
CSM = RECONSTRUCT_IMPLEMENTATION_DEFINED(LH, LSM, VSM, DSM, XSM)
if CSM == NONE:
return ERROR
if CrossSumsMatch(CSM, LSM, VSM, DSM, XSM) AND LHChainMatches(CSM, LH):
return CSM
else:
return ERROR
Conformance Test Vectors
TV0: Empty File
- $original_file_size_bytes = 0$
- $block_count = 0$
- No blocks.
TV1: One full block of $0x00$
- Input: $32,768$ bytes of $0x00$
- Expected: CSM all zeros.
- Cross sums: all stored values $0$
- $LH$: determined by the fixed seed and rows of 64 bytes $0x00$.
TV2: One full block of $0xFF$
- Input: $32,768$ bytes of $0xFF$
- Expected: CSM all ones.
- Cross sums (over indices $0..510$): all stored values $511$ (binary $111111111$).
- LH: determined by the fixed seed and rows of 64 bytes 0x00.
TV3: One full block of $0xAA
- Input: $32,768$ bytes of 0xAA (binary $10101010$, MSB-first)
- Expected: for all rows, $CSM[r, c] = 1 if c even$
- Cross sums:
- $LSM[r] = 256 for all r ∈ 0,...,510$
- $VSM[c] = 256 for all c ∈ 0,...,510$
- $DSM[d] = 256 for all d ∈ 0,...,510$
- $XSM[x] = 256 for all x ∈ 0,...,510$
Security and Robustness Considerations
Integrity and assumptions
CRSCE relies on SHA-256 as a standardized cryptographic hash for integrity validation (NIST, 2015). A successful decode requires exact LH chain verification, providing strong resistance against accidental corruption and many classes of malicious modification, subject to standard assumptions about SHA-256 preimage and collision resistance.
No confidentiality
CRSCE does not provide encryption and MUST NOT be interpreted as providing confidentiality. However, it MAY be used with proper encryption to increase the strength of the cryptographic systems where separate keys are used for each cross-sum/hash output for transmission on separate channels.
Defensive parsing
Decoders MUST treat the input as untrusted. Implementations MUST validate header fields before allocation and MUST bounds-check all reads and offsets to prevent memory safety failures.
Versioning and Forward Compatibility
$version$ must be present.
Decoders MUST reject unknown versions and MUST NOT attempt heuristic decoding of unknown formats.
Project-Normative Sources and Relationship to Working Documents
This specification canonizes and supersedes earlier internal drafts and working notes by fixing parameters and eliminating recurrent ambiguity (e.g., the canonical index domain $0..510$ for stored vectors, fixed $s=511$, and fixed $b = 9$). The reconstruction algorithm remains implementation-defined to permit experimentation and iteration while maintaining deterministic acceptance conditions.
References
Herman, G. T., & Kuba, A. (Eds.). (1999). Discrete tomography: Foundations, algorithms, and applications. Birkhäuser.
Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2), 498–519. https://doi.org/10.1109/18.910572 University of Toronto Communications
National Institute of Standards and Technology. (2015). Secure Hash Standard (SHS) (FIPS PUB 180-4, Update 1). U.S. Department of Commerce. https://csrc.nist.gov/pubs/fips/180-4/upd1/final NIST Computer Security Resource Center
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann. ScienceDirect