digitado

technocracy

FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

digitado ⋅ 9 de dezembro de 2025

Systematically evaluating the factuality of large language models with the FACTS Benchmark Suite.

Ver mais

Like 0

Liked Liked

technocracy

OpenAI’s new GPT-5.2 model takes on Google’s Gemini 3

digitado ⋅ 12 de dezembro de 2025

As expected and following multiple reports, OpenAI officially announced the GPT-5.2 model. For those unaware, it’s a product that the company fast-tracked out of the heat it’s facing from the success of the latest models from Google and Anthropic. Not to forget, the GPT-5.2 is OpenAI’s second biggest update since the company officially launched GPT-5 in August. GPT-5.2 is here with improved raw performance results In the announcement, OpenAI describes GPT-5.2 as its “most capable model series yet […]

Ver mais

Like 0

Liked Liked

technocracy

MIT researchers propose a new model for legible, modular software

digitado ⋅ 8 de dezembro de 2025

Coding with large language models (LLMs) holds huge promise, but it also exposes some long-standing flaws in software: code that’s messy, hard to change safely, and often opaque about what’s really happening under the hood. Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are charting a more “modular” path ahead. Their new approach breaks systems into “concepts,” separate pieces of a system, each designed to do one job well, and “synchronizations,” explicit rules that describe exactly […]

Ver mais

Like 0

Liked Liked

technocracy

Update to GPT-5 System Card: GPT-5.2

digitado ⋅ 10 de dezembro de 2025

GPT-5.2 is the latest model family in the GPT-5 series. The comprehensive safety mitigation approach for these models is largely the same as that described in the GPT-5 System Card and GPT-5.1 System Card. Like OpenAI’s other models, the GPT-5.2 models were trained on diverse datasets, including information that is publicly available on the internet, information that we partner with third parties to access, and information that our users or human trainers and researchers provide or generate.

Ver mais

Like 0

Liked Liked

technocracy

The Absolute State of r/Anarcho_Capitalism

digitado ⋅ 8 de dezembro de 2025

Seriously, y’all are getting way too caught up in your feelings about this debate. submitted by /u/Bigger_Sherma [link] [comments]

Ver mais

Like 0

Liked Liked

technocracy

5 Free Tools to Experiment with LLMs in Your Browser

digitado ⋅ 11 de dezembro de 2025

Discover five free tools that let you run and test large language models directly in your browser without any setup.

Ver mais

Like 0

Liked Liked

technocracy

The cost of thinking

digitado ⋅ 8 de dezembro de 2025

Large language models (LLMs) like ChatGPT can write an essay or plan a menu almost instantly. But until recently, it was also easy to stump them. The models, which rely on language patterns to respond to users’ queries, often failed at math problems and were not good at complex reasoning. Suddenly, however, they’ve gotten a lot better at these things. A new generation of LLMs known as reasoning models are being trained to solve complex problems. Like humans, […]

Ver mais

Like 0

Liked Liked

technocracy

The Rational Bull Elk

digitado ⋅ 8 de dezembro de 2025

I watch a lot of nature documentaries. I’m not very choosy about the animals covered, whether whales, moles, lions, ants, chameleons, blowfish, or mosquitoes. I’m even fascinated by footage of bacteria under a microscope. I’m usually immersed as I sit in front of my large-screen television, so long as I learn something about the intricacies of the species filmed in vibrant colors. What do they eat and how do they avoid being eaten? What are their life expectancies, […]

Ver mais

Like 0

Liked Liked

technocracy

Building AI fluency at scale with ChatGPT Enterprise

digitado ⋅ 8 de dezembro de 2025

Commonwealth Bank of Australia partners with OpenAI to roll out ChatGPT Enterprise to 50,000 employees, building AI fluency at scale to improve customer service and fraud response.

Ver mais

Like 0

Liked Liked