Active research line
Cross-Domain Bridging
Operationalizing and measuring whether language models, given the right inference-time augmentation, can produce structurally cross-domain answers at materially higher rates than current published synthetic-data-pipeline practice.
What this work is
The hard problem of synthetic-data and self-improvement pipelines is not generating more output — it is generating structurally novel output: answers whose form draws on patterns from outside the question's home domain. Most current synthetic-data practice produces volume; it does not reliably produce cross-domain structure. We built a controlled experiment to measure the gap and to test whether a specific class of inference-time augmentation closes it.
The experiment compares a multi-stage augmented pipeline against the strongest fair baseline we could construct from current published technique — a faithful instantiation of Self-Instruct lineage, Evol-Instruct operators, Persona Hub, and constitutional self-critique. Five generating models across three vendor families produce answers to the same locked question set; cross-family LLM judges score every answer blind on bridging quality, novelty, boundary-audit discipline, and trust. The full evidence chain is committed in git from pre-registration through final scoring.
Why the methodology is the point
Claims about reasoning capability collapse under the weight of unfalsifiable framing and selective reporting. We designed the experiment to be hard to wave away. The pre-registration was locked before any arm ran. The unblind mapping was committed with an SHA-256 anchor before any score landed. Judges sat across vendor families from the models they judged, removing in-family bias. Krippendorff's α measures cross-judge concordance; the bridging axis clears the spec's full-pass threshold with margin. Every output, judgment, and analysis script is reconstructible from the committed evidence alone.
Current state
Generation and judging are complete across all 5 × 5 × 30 = 750
cells, with 480 cross-family judging calls producing
2,400 blind score records. The internal findings document is
draft-complete across twelve sections, including a stratified analysis of where the
pre-registered predictions held and where they missed. The external paper
(Cross-Domain Bridging Beyond Current Synthetic-Data Practice: A Five-Model,
Three-Vendor Replication) is in draft v1.1 after one round of external review.
Both documents are at the cold-re-read-then-lock gate; a human-judge concordance pass
over the pre-registered ambiguous cell pool converts the verdict from
"full pass on measurable criteria" to "full pass — locked."
What's next
Lock the paper. Run the human-judge concordance pass. Begin targeted outreach to frontier labs and research-program partners with the paper draft as the empirical anchor. A successor line of work — measuring the topology of which structural families transfer reliably across which domain pairs — is named and parked as a future experiment, not pulled into the current paper.
Reading the work
The public release of the paper follows the lock. This page will update with a direct link to the PDF and the citation block at that point. For substantive questions about the methodology or findings ahead of release, tim@cruxadjacent.com.