FACTS Benchmark Suite: Systematically evaluating the factuality of large language models
Systematically evaluating the factuality of large language models with the FACTS Benchmark Suite.
Systematically evaluating the factuality of large language models with the FACTS Benchmark Suite.
In The Gay Science (1882), German philosopher Friedrich Nietzsche famously proclaimed the death of God. Recognizing the enormous implications of secularization and the uprooting of Christianity’s “fundamental concept” (faith in God) and the resulting moral confusion, he exclaimed: “God is dead! Continue Reading…
Unveiling what it describes as the most capable model series yet for professional knowledge work, OpenAI launched GPT-5.2 today. The model was trained and deployed on NVIDIA infrastructure, including NVIDIA Hopper and GB200 NVL72 systems. It’s the latest example of how leading AI builders train and deploy at scale on NVIDIA’s full-stack AI infrastructure. Pretraining: The Bedrock of Intelligence AI models are getting more capable thanks to three scaling laws: pretraining, post-training and test-time scaling. Reasoning models, which […]
Trump signed an AI executive order targeting state laws and promising one national rulebook. Critics warn it could trigger court battles and prolong uncertainty for startups while Congress debates federal rules.
Isolation Forest may look technical, but its idea is simple: isolate points using random splits. If a point is isolated quickly, it is an anomaly; if it takes many splits, it is normal. Using the tiny dataset 1, 2, 3, 9, we can see the logic clearly. We build several random trees, measure how many splits each point needs, average the depths, and convert them into anomaly scores. Short depths become scores close to 1, long depths close […]
Legal Intern Alexandra Rhodes contributed to this blog post. EFF filed an amicus brief urging the Arizona District Court to protect public school students’ freedom of speech and privacy by holding that the use of a school-issued laptop or email account does not categorically mean a student is “on campus.” We argued that students need private digital spaces beyond their school’s reach to speak freely, without the specter of constant school surveillance and punishment. Surveillance Software Exposed a […]
EFF has, for many years, raised the alarm about the proliferation of stalkerware—commercially-available apps designed to be installed covertly on another person’s device and exfiltrate data from that device without their knowledge. In particular, we have urged the makers of anti-virus products for Android phones to improve their detection of stalkerware and call it out explicitly to users when it is found. In 2020 and 2021, AV Comparatives ran tests to see how well the most popular anti-virus […]
AI is making inroads across the entire healthcare industry — from genomic research to drug discovery, clinical trial workflows and patient care. In a fireside chat Monday during the annual J.P. Morgan Healthcare Conference in San Francisco, NVIDIA founder and CEO Jensen Huang took the stage with industry leaders progressing each of these areas to advance biomedical science and meet the global demand for patient care. Healthcare has a more severe labor shortage than any other field — […]
When researchers are building large language models (LLMs), they aim to maximize performance under a particular computational and financial budget. Since training a model can amount to millions of dollars, developers need to be judicious with cost-impacting decisions about, for instance, the model architecture, optimizers, and training datasets before committing to a model. To anticipate the quality and accuracy of a large model’s predictions, practitioners often turn to scaling laws: using smaller, cheaper models to try to approximate […]
Computer-Aided Design (CAD) is the go-to method for designing most of today’s physical products. Engineers use CAD to turn 2D sketches into 3D models that they can then test and refine before sending a final version to a production line. But the software is notoriously complicated to learn, with thousands of commands to choose from. To be truly proficient in the software takes a huge amount of time and practice. MIT engineers are looking to ease CAD’s learning […]