The previous post was about the evidence-first agent architecture behind this project. This one is about what changed in review. Release 2.0 lets the system pull deterministic full text and PDF parser output into the same run-centered workflow instead of stopping at the abstract.
Full text only matters when it turns into reviewable material. In bio-agent 2.0, parsed sections, source hashes, page numbers, and character offsets can point reviewers to stronger evidence. They still have to pass through extraction, packet selection, citation audit, logic audit, graph validation, provenance, and Run Evidence Review.
biomed-evidence-graph-v1 biomed-evidence-review-v1 full-text span locators argument-graph-v2 watch graph drift Document sections, source hashes, page and offset locators.
Evidence items, packets, citation audit, logic audit.
Claim cards, Watch drift, Argument Graph v2, trace links.
Runs were already claim-audited, but source inspection mostly stopped at paper records and extracted spans.
Known papers can carry deterministic document sections, source hashes, pages, and character offsets.
The dashboard Library can inspect full-text ingestion and Watch drift without turning parser output into evidence.
Argument Graph v2 connects support, attack, and qualifier relationships back to Evidence Graph node IDs.
Research Watch can compare snapshots for papers, claims, methods, limitations, entities, and support shifts.
Eval now checks full-text ingestion, span locator validity, drift schema, and argument graph linkage.
A stored paper can now start from full text or a PDF fixture.
The parser turns that source into sections, hashes, page labels, and character offsets.
Only extracted evidence records move on to packets, audit, graph validation, and review.
Argument and drift views help reviewers inspect change without overriding safety policy.
Product object model
The run is still the review object, but the source gets deeper
The main product choice from the previous release still holds: a biomedical answer is not
just text. It is an addressable answer run with a review contract:
RunEvidenceReview. Release 2.0 keeps that object and gives it a deeper source
trail.
A reviewer can still start from a run ID and inspect claim support, audit verdicts, snapshot metadata, validation status, trace links, provenance, and redacted graph export. The difference is simpler: an evidence card can now point closer to the source, not only to a paper and extracted span, but to a full-text section, source hash, page number, and character offsets when that material exists.
Full text is not the new product object. The run is. That keeps the workflow anchored to the thing the reviewer actually needs to judge: one answer, with one set of claims, built from evidence at a specific time.
Full-text boundary
Parser output is not evidence yet
Release 2.0 brings deterministic full-text and PDF ingestion for known papers. The ingestion layer stores document metadata, normalized sections, source hashes, and span locators such as section label, page number, and character offsets. That still counts as source structure, not evidence.
This distinction matters. A PDF parser can tell the system where text came from. It cannot
decide that the text supports a biomedical claim. Full-text-derived material becomes
evidence only after an extractor creates an EvidenceItem and that item moves
through packet selection, citation audit, logic audit, Evidence Graph validation, snapshot
persistence, and Run Evidence Review.
The full-text path is deliberately boring in the best way:
paper -> document sections -> locators -> extracted evidence -> packet -> audit -> graph -> review.
The parser adds precision. It does not get to bypass review.
Normalized source text attached to a known paper.
Integrity signal for the text a locator points into.
Reviewers can inspect the local source span, not just the paper ID.
Only extracted records can support claims downstream.
Review surface
The Library becomes part of the review workspace
The dashboard is now organized around Chat, Runs, Review Queue, Library, and Settings. Runs remain the main inspector for answer review, snapshot diffs, trace, evidence packets, audit, logic, argument, math, and provenance. Release 2.0 makes the Library more useful because that is where full-text ingestion and Watch drift become inspectable.
That split feels right to me. A reviewer should not have to hunt through raw graph nodes to understand whether a run is trustworthy. The run view answers, "what happened in this answer?" The Library answers, "what source material and ongoing topic state can I inspect around it?"
- Runs: claim cards, support status, audit verdicts, graph snapshots, trace, and provenance.
- Library: full-text document inspection, source sections, locators, evidence records, and graph lookup.
- Watch: topic snapshots, drift comparison, relevance decisions, and advisory change signals.
- Settings: research-only boundary, clinical refusal behavior, memory policy, and retrieval limits.
Reviewer signals
Watch drift and Argument Graph v2 are signals, not authority
Release 2.0 adds two review aids that would be dangerous if they were treated as truth. Research Watch graph drift compares topic snapshots and highlights changes in papers, claims, methods, limitations, entities, and support shifts. Argument Graph v2 connects support, attack, and qualifier relationships back to Evidence Graph node IDs.
Useful for seeing what changed between snapshots: new papers, changed claims, support/contradiction shifts, method changes, limitations, and clusters.
Useful for making argument structure visible while preserving links back to Evidence Graph nodes and qualifier edges for partial or overclaimed support.
Both remain advisory reviewer context. They can point at places worth inspecting. They cannot override clinical refusal, source policy, citation audit, logic audit, or the evidence packet contract.
Side-effect boundary
Dashboard Chat gets the same biomedical boundary
Release 2.0 also documents the framework-native Dashboard Chat boundary. Chat runs through the shared dashboard channel, session history, event stream, agent loop, and tool hooks. The biomedical plugin does not get a separate chat backend. Its policy still lives where the tools and routes live.
GET /api/biomed/answer-runs/{run_id}/evidence-review GET /api/biomed/papers/{paper_id}/full-text GET /api/biomed/watch/{watch_id}/drift review decisions write/export tools sensitive biomedical actions Clinical requests still stop before memory, retrieval, LLM work, parsing, graph construction, or export. Sensitive write, review, and export tools stay denied in chat until the framework has durable approval and resume semantics.
Quality contract
The eval gate now checks the deeper review trail
The old graph and review checks still matter: schema validity, validation rate, traceability, refusal behavior, export redaction, and run evidence review validity. Release 2.0 adds new checks for the new surface area instead of asking reviewers to trust it by inspection alone.
The deterministic mock eval now includes full-text ingestion success, full-text span locator validity, Watch drift schema validity, Argument Graph v2 schema validity, and argument-to-evidence link rate. The README also records a live PubMed plus DeepSeek smoke path with 27/27 checks passing for audited answer, trace, evidence packet, provenance graph, and clinical guardrail behavior.
A parsed document can decorate an interface just as easily as a citation can decorate a sentence. Release 2.0 is useful because parser output has to become traceable evidence before it can support a claim.
The product change that matters in 2.0
The useful part of Release 2.0 is not that bio-agent can ingest more text. More text is easy to romanticize and easy to misuse. The useful part is that full text enters through the same discipline as the rest of the system: locators, extraction, packet selection, audit, graph validation, snapshots, provenance, and reviewer-visible surfaces.
That is the step beyond RAG I care about now. A biomedical answer is not generated text with citations attached, and it is not a PDF parser with a nicer UI. It is a reviewable evidence object that can point back to source text, show how claims changed, expose argument structure, and still refuse to turn workflow context into biomedical truth.