Reliability by design: quantifying and eliminating fabrication risk in LLMs. From generative to consultative AI: a comparative analysis in the legal domain and lessons for high-stakes knowledge bases
arXiv:2601.15476v1 Announce Type: new Abstract: This paper examines how to make large language models reliable for high-stakes legal work by reducing hallucinations. It distinguishes three AI paradigms: (1) standalone generative models (“creative oracle”), (2) basic retrieval-augmented systems (“expert archivist”), and (3) an advanced, end-to-end optimized RAG system (“rigorous archivist”). The authors introduce two reliability metrics -False Citation Rate (FCR) and Fabricated Fact Rate (FFR)- and evaluate 2,700 judicial-style answers from 12 LLMs across 75 legal tasks using expert, […]