Rendered at 04:03:30 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
Lama9901 23 hours ago [-]
let me be direct about where i see this going.
right now there's no standard way to verify a
computational result independently. you either
trust the number or you don't. that's true for
ML benchmarks, simulation outputs, pharma pipelines,
financial models — everything.
what this builds toward: any result, any domain,
packaged once, verifiable forever by anyone with
python and 5 minutes. no access to the original
environment. no trust required.
the physical anchor is the part that excites me most —
for materials and engineering, the chain connects to
actual physical reality. not a number i chose.
not a convention. physics.
that's a different category of proof than anything
that exists right now in this space.
if you're working in a domain where results need
to be audited, reproduced, or submitted to regulators —
this is the missing layer. try it:
if it works — let's talk about your use case.
if it doesn't — tell me exactly where it breaks.
proof not trust. that's the whole thing.
Lama9901 2 days ago [-]
Author update: spent the day doing a final pass before asking HN to re-up the post.
What changed since the original submission:
- 8 active claims (added DT-FEM-01 — FEM/digital twin verification)
- 107 tests passing, steward_audit PASS
- Every link on the site now points to the actual file in the repo
- system_manifest.json synced, all docs consistent
Still solo, still transparent about limitations (reports/known_faults.yaml).
Happy to answer any questions about the protocol design.
rubyrfranklin2 1 days ago [-]
Real-time speech translation is something I think about constantly running heyvid.ai — we're always chasing that latency vs. quality tradeoff for multilingual video. JEPA's approach is interesting because it sidesteps the typical encode-decode bottleneck that kills most real-time pipelines. I'd be curious how it holds up on accented or fast speech. Back at Adobe I saw how even 200ms of lag completely destroyed the perceived quality of live demos. The latency budget for translation is so much tighter than transcription-only, so any architectural win like this is worth watching closely.
wasabi991011 1 days ago [-]
Sorry, I think I missed how OP's post relates to this.
ddfproof 1 days ago [-]
looked at the repo — the bypass attack test caught my eye.
how common is this attack in practice? like do you actually
see people trying to game verification systems this way or
is it more of a theoretical concern you're protecting against?
ddfproof 1 days ago [-]
Also, just wanted to say the site itself looks really well put together. The layout is clean, everything is easy to follow, and the overall presentation feels polished. It’s genuinely pleasant to browse through and explore the project. Nice work on that.
Lama9901 23 hours ago [-]
spent a lot of time on that. the whole idea of the site was proof not trust,
so it had to actually feel like that, not just say it.
Lama9901 24 hours ago [-]
mostly theoretical right now — but that's the point
of building it before it's needed.
anyone submitting results for audit or regulatory review
has an incentive to make numbers look right. strip the
evidence, recompute hashes — if only integrity is being
checked, the attack is silent and undetectable.
i kept asking myself "what would i do if i wanted to
cheat this?" that was the first answer. so it became
an adversarial test:
tests/steward/test_cert02_*
the protocol shouldn't assume good faith.
especially not in regulated domains.
and thanks on the site — built that solo too.
Lama9901 4 hours ago [-]
shipped something today. then found a problem with it. fixed it. here's the full story.
there are three ways a computational result can lie to you:
the file was changed after the fact — SHA-256 catches this
the evidence was stripped from the bundle — the semantic layer catches this
the computation itself was run differently than claimed — nothing catches this
until today.
i added Step Chain Verification to ML_BENCH-01. every step of the computation hashes itself into the next:
init_params → hash_1
hash_1 + dataset → hash_2
hash_2 + metrics → hash_3
hash_3 + verdict → trace_root_hash
change the seed, skip a step, reorder anything — trace_root_hash doesn't match. the chain breaks.
this isn't blockchain. no network, no consensus, no tokens. same idea as git commits — each commit hashes its parent. except here it's computation steps, not code commits.
then i checked the actual verifier.
mg.py verify --pack bundle.zip — the command i've been telling people to run — wasn't checking trace_root_hash at all. the chain was in the data. the construction tests passed. but the verifier itself ignored it entirely.
so "three verification layers" was technically true in the data structure. not true in what the verifier actually ran.
i fixed it before posting. added to scripts/mg.py _verify_semantic():
trace_root_hash must equal the final step hash
if one field exists without the other → FAIL
if any step hash isn't valid 64-char hex → FAIL
then wrote tests/steward/test_cert03_step_chain_verify.py — 5 tests that attack the verifier specifically, not just the chain construction.
now mg.py verify actually runs all three layers:
integrity: SHA-256 root_hash match
semantic: job_snapshot present, payload.kind correct
step chain: trace_root_hash == final step hash
118 tests total. steward_audit PASS.
git clone https://github.com/Lama999901/metagenesis-core-public
python -m pytest tests/steward/test_cert03_step_chain_verify.py -v
the lesson: "i implemented X" and "X runs when you call verify" are two different things. found that gap myself. fixed it first.
# the chain is just SHA-256, chained:
hash_1 = SHA256("init_params" + data + "genesis")
hash_2 = SHA256("generate_dataset" + data + hash_1)
hash_3 = SHA256("compute_metrics" + data + hash_2)
trace_root_hash = SHA256("threshold_check" + data + hash_3)
```
change anything — seed, sample count, noise level, step order — trace_root_hash changes. the verifier catches it.
118 tests. three independent layers. MIT license. no network. no trust required.
```
git clone https://github.com/Lama999901/metagenesis-core-public
python -m pytest tests/steward/test_cert03_step_chain_verify.py -v
itsthecourier 1 days ago [-]
"A hash-only check still passes. MetaGenesis Core adds a second layer: - integrity layer → PASS - semantic layer → FAIL (job_snapshot missing)"
may you please elaborate on this?
Lama9901 1 days ago [-]
Sure. The semantic layer is a second verification pass that runs independently of file integrity.
Here's why SHA-256 alone isn't enough. An adversary can:
Remove job_snapshot from the artifact (stripping the core evidence of what actually ran)
Recompute all SHA-256 hashes to match the modified files
Rebuild the manifest
A hash-only verifier sees everything consistent and returns PASS. The attack succeeds silently.
The semantic layer catches this. After the integrity check passes, it independently verifies:
job_snapshot is present (evidence of the actual computation, not just file hashes)
payload.kind matches the registered claim type (can't swap one claim for another)
canary_mode flag is consistent (dual-mode execution provenance intact)
If job_snapshot was stripped, the semantic check returns FAIL: job_snapshot missing — even if every SHA-256 is valid.
This specific attack is an adversarial test in the public repo: tests/steward/test_cert02_pack_includes_evidence_and_semantic_verify.py
The deeper point — which I didn't explain in the original post:
In physics and engineering domains, the semantic layer connects to something stronger than an internal threshold. Young's modulus for aluminium is ~70 GPa. That's not a value I chose — it's been measured independently in thousands of labs worldwide.
When MTR-1 runs, it verifies the computation against that physical constant (rel_err ≤ 1%). The chain extends to FEM verification (DT-FEM-01, rel_err ≤ 2%) and drift monitoring (DRIFT-01).
The difference: tamper-evident provenance answers "was the bundle modified?" — the physical anchor answers "does the number agree with physical reality?" These are different questions. Both matter, but the second is harder to fake because the ground truth is external to the system.
This doesn't apply to ML accuracy or data pipelines — there the value is purely tamper-evident provenance, not physical grounding. The protocol is honest about that distinction in reports/known_faults.yaml.
measurablefunc 1 days ago [-]
This is another "art" project. Nice work OP.
Lama9901 1 days ago [-]
What would change your mind? Genuine question.
The adversarial test is public and runnable in 5 minutes:
If output isn't PASS/PASS on your machine, I want to know.
If the protocol design is flawed, I want to know where specifically.
Known limitations are machine-readable: reports/known_faults.yaml
Charon77 1 days ago [-]
First of all, I don't want to run anyone's code without proper explanation, so help me understand this.
Let's start with the verifier. The 3rd party verifier receives a bundle, not knowing what the content is, not having access to the tool used to measure, and just run a single command based on the bundle which presumably contains expected results and actual measurements, both of which can easily be tampered. What good does that solve?
Lama9901 1 days ago [-]
Right question. Bundle alone proves nothing — you're correct.
Two things make it non-trivial to fake:
The pipeline is public. You can read scripts/steward_audit.py
before running anything. It's not a black box.
For materials claims — the expected value isn't in the bundle.
Young's modulus for aluminium is ~70 GPa. Not my number.
Physics. The verifier checks against that, not against
something I provided.
ML and pipelines — provenance only, no physical grounding.
Said so in known_faults.yaml :: SCOPE_001.
wasabi991011 1 days ago [-]
If I may ask, how much of the code, original post, and comments are AI generated?
Lama9901 1 days ago [-]
Heavily AI-assisted, not AI-generated.
Claude + Cursor wrote the structure. I fixed hundreds of
errors — wrong tests, broken pipelines, docs that didn't
match the code. That's literally why the verification
layer exists. AI gets it wrong constantly.
This comment — also Claude, on my direction. That's the
point. Tool, not author.
right now there's no standard way to verify a computational result independently. you either trust the number or you don't. that's true for ML benchmarks, simulation outputs, pharma pipelines, financial models — everything.
what this builds toward: any result, any domain, packaged once, verifiable forever by anyone with python and 5 minutes. no access to the original environment. no trust required.
the physical anchor is the part that excites me most — for materials and engineering, the chain connects to actual physical reality. not a number i chose. not a convention. physics.
that's a different category of proof than anything that exists right now in this space.
if you're working in a domain where results need to be audited, reproduced, or submitted to regulators — this is the missing layer. try it:
if it works — let's talk about your use case. if it doesn't — tell me exactly where it breaks.proof not trust. that's the whole thing.
What changed since the original submission: - 8 active claims (added DT-FEM-01 — FEM/digital twin verification) - 107 tests passing, steward_audit PASS - Every link on the site now points to the actual file in the repo - system_manifest.json synced, all docs consistent
Still solo, still transparent about limitations (reports/known_faults.yaml). Happy to answer any questions about the protocol design.
strip job_snapshot, recompute hashes, rebuild manifest — hash-only verifier passes silently.
how common is this attack in practice? like do you actually see people trying to game verification systems this way or is it more of a theoretical concern you're protecting against?
anyone submitting results for audit or regulatory review has an incentive to make numbers look right. strip the evidence, recompute hashes — if only integrity is being checked, the attack is silent and undetectable.
i kept asking myself "what would i do if i wanted to cheat this?" that was the first answer. so it became an adversarial test: tests/steward/test_cert02_*
the protocol shouldn't assume good faith. especially not in regulated domains.
and thanks on the site — built that solo too.
the file was changed after the fact — SHA-256 catches this the evidence was stripped from the bundle — the semantic layer catches this the computation itself was run differently than claimed — nothing catches this
until today. i added Step Chain Verification to ML_BENCH-01. every step of the computation hashes itself into the next: init_params → hash_1 hash_1 + dataset → hash_2 hash_2 + metrics → hash_3 hash_3 + verdict → trace_root_hash change the seed, skip a step, reorder anything — trace_root_hash doesn't match. the chain breaks. this isn't blockchain. no network, no consensus, no tokens. same idea as git commits — each commit hashes its parent. except here it's computation steps, not code commits. then i checked the actual verifier. mg.py verify --pack bundle.zip — the command i've been telling people to run — wasn't checking trace_root_hash at all. the chain was in the data. the construction tests passed. but the verifier itself ignored it entirely. so "three verification layers" was technically true in the data structure. not true in what the verifier actually ran. i fixed it before posting. added to scripts/mg.py _verify_semantic():
trace_root_hash must equal the final step hash if one field exists without the other → FAIL if any step hash isn't valid 64-char hex → FAIL
then wrote tests/steward/test_cert03_step_chain_verify.py — 5 tests that attack the verifier specifically, not just the chain construction. now mg.py verify actually runs all three layers: integrity: SHA-256 root_hash match semantic: job_snapshot present, payload.kind correct step chain: trace_root_hash == final step hash 118 tests total. steward_audit PASS. git clone https://github.com/Lama999901/metagenesis-core-public python -m pytest tests/steward/test_cert03_step_chain_verify.py -v the lesson: "i implemented X" and "X runs when you call verify" are two different things. found that gap myself. fixed it first. # the chain is just SHA-256, chained: hash_1 = SHA256("init_params" + data + "genesis") hash_2 = SHA256("generate_dataset" + data + hash_1) hash_3 = SHA256("compute_metrics" + data + hash_2) trace_root_hash = SHA256("threshold_check" + data + hash_3) ```
change anything — seed, sample count, noise level, step order — trace_root_hash changes. the verifier catches it.
118 tests. three independent layers. MIT license. no network. no trust required. ``` git clone https://github.com/Lama999901/metagenesis-core-public python -m pytest tests/steward/test_cert03_step_chain_verify.py -v
may you please elaborate on this?
Remove job_snapshot from the artifact (stripping the core evidence of what actually ran) Recompute all SHA-256 hashes to match the modified files Rebuild the manifest
A hash-only verifier sees everything consistent and returns PASS. The attack succeeds silently. The semantic layer catches this. After the integrity check passes, it independently verifies:
job_snapshot is present (evidence of the actual computation, not just file hashes) payload.kind matches the registered claim type (can't swap one claim for another) canary_mode flag is consistent (dual-mode execution provenance intact)
If job_snapshot was stripped, the semantic check returns FAIL: job_snapshot missing — even if every SHA-256 is valid. This specific attack is an adversarial test in the public repo: tests/steward/test_cert02_pack_includes_evidence_and_semantic_verify.py
The deeper point — which I didn't explain in the original post: In physics and engineering domains, the semantic layer connects to something stronger than an internal threshold. Young's modulus for aluminium is ~70 GPa. That's not a value I chose — it's been measured independently in thousands of labs worldwide. When MTR-1 runs, it verifies the computation against that physical constant (rel_err ≤ 1%). The chain extends to FEM verification (DT-FEM-01, rel_err ≤ 2%) and drift monitoring (DRIFT-01). The difference: tamper-evident provenance answers "was the bundle modified?" — the physical anchor answers "does the number agree with physical reality?" These are different questions. Both matter, but the second is harder to fake because the ground truth is external to the system. This doesn't apply to ML accuracy or data pipelines — there the value is purely tamper-evident provenance, not physical grounding. The protocol is honest about that distinction in reports/known_faults.yaml.
The adversarial test is public and runnable in 5 minutes:
If output isn't PASS/PASS on your machine, I want to know. If the protocol design is flawed, I want to know where specifically.Known limitations are machine-readable: reports/known_faults.yaml
Two things make it non-trivial to fake:
The pipeline is public. You can read scripts/steward_audit.py before running anything. It's not a black box.
For materials claims — the expected value isn't in the bundle. Young's modulus for aluminium is ~70 GPa. Not my number. Physics. The verifier checks against that, not against something I provided.
ML and pipelines — provenance only, no physical grounding. Said so in known_faults.yaml :: SCOPE_001.
Claude + Cursor wrote the structure. I fixed hundreds of errors — wrong tests, broken pipelines, docs that didn't match the code. That's literally why the verification layer exists. AI gets it wrong constantly.
This comment — also Claude, on my direction. That's the point. Tool, not author.
Clone it and run it. If it doesn't work, tell me.