March 2023

Autoregressive Models, OOD Prompts and the Interpolation Regime

digitado ⋅ 30 de March de 2023

A few years ago I was very much into maximum likelihood-based generative modeling and autoregressive models (see this, this or this). More recently, my focus shifted to characterising inductive biases of gradient-based optimization focussing mostly on supervised learning. I only very recently started combining the two ideas, revisiting autoregressive models throuh the lens of inductive biases, motivated by a desire to understand a bit more about LLMs. As I did so, I found myself surprised by a number […]

Ver mais

Like 0

Liked Liked

technocracy

Finetuning Large Language Models On A Single GPU Using Gradient Accumulation

digitado ⋅ 28 de March de 2023

Previously, I shared an article using multi-GPU training strategies to speed up the finetuning of large language models. Several of these strategies include mechanisms such as model or tensor sharding that distributes the model weights and computations across different devices to work around GPU memory limitations. However, many of us don’t have access to multi-GPU resources. So, this article illustrates a simple technique that works as a great workaround to train models with larger batch sizes when GPU […]

Ver mais

Like 0

Liked Liked

technocracy

What a Time for Language Models

digitado ⋅ 26 de March de 2023

Hi, I’m Jay from the Illustrated series of blog posts about language models. You’re getting this because you’ve subscribed to my mailing list. I have recently moved it to Substack and we’re due for an update. Thanks for reading Jay Alammar’s Substack! Subscribe for free to receive new posts and support my work. Language Models are All The Rage Now It’s truly fascinating how quickly language models have been developing. Their commercial potential is now bleeding into the […]

Ver mais

Like 0

Liked Liked

technocracy

Jesse Johnson: Bringing Together AI and Medical Research

digitado ⋅ 24 de March de 2023

Today, we speak with Jesse Johnson, Ph.D., a prominent data scientist with over ten years of experience in AI. After a number of years leading data teams at biotech startups, Jesse recently founded a company called Merelogic, whose goal is to help biotech organizations turn their machine learning proof-of-concept projects into tangible impact. Jesse Johnson has transitioned from academia to biotechnology, applying his deep expertise in mathematics and data engineering to practical research and analysis of genomic structures […]

Ver mais

Like 0

Liked Liked

technocracy

We May be Surprised Again: Why I take LLMs seriously.

digitado ⋅ 22 de March de 2023

“Deep Learning is Easy, Learn something Harder” – I proclaimed in one of my early and provocative blog posts from 2016. While some observations were fair, that post is now evidence that I clearly underestimated the impact simple techniques will have, and probably gave counterproductive advice. I wasn’t alone in my deep learning skepticism, in fact I’m far from being the most extreme deep learning skeptic. Many of us who grew up working in Bayesian ML, convex optimization, […]

Ver mais

Like 0

Liked Liked

technocracy

Rust for Haskell Developers

digitado ⋅ 21 de March de 2023

We love Haskell, but we also love learning new languages. In this article, we want to show how to use your Haskell knowledge to write Rust code. We’ll go through the concepts familiar to most Haskell developers, present a few gotchas, and cover these questions: To what extent is FP possible in Rust? (On the scale from Java streams to doing Lambda Calculus on paper.) Which Haskell concepts map well to Rust, and which don’t? Are there monads […]

Ver mais

Like 0

Liked Liked

technocracy

Prompt Engineering

digitado ⋅ 15 de March de 2023

Prompt Engineering, also known as In-Context Prompting, refers to methods for how to communicate with LLM to steer its behavior for desired outcomes without updating the model weights. It is an empirical science and the effect of prompt engineering methods can vary a lot among models, thus requiring heavy experimentation and heuristics. This post only focuses on prompt engineering for autoregressive language models, so nothing with Cloze tests, image generation or multimodality models. At its core, the goal […]

Ver mais

Like 0

Liked Liked

technocracy

Bias-Variance Tradeoff in Machine Learning

digitado ⋅ 13 de March de 2023

Bias-Variance Tradeoff in Machine Learning Imagine a scenario in which a model works perfectly well with the data it was trained on, but provides incorrect predictions when it meets new, unfamiliar data. On the other hand, in certain cases, it struggles to grasp the intricacies of the data and thus fails to provide an accurate prediction. Striking a balance between accuracy and the ability to make predictions beyond the training data in an ML model is called the […]

Ver mais

Like 0

Liked Liked

technocracy

Haskell in Enterprise: Interview with Rob Harrison

digitado ⋅ 7 de March de 2023

We’ve all heard about Haskell success stories from famous companies like Meta and Tesla. But did you know that Haskell is successfully used in plenty of enterprises, many of which you wouldn’t think of as being at the forefront of technology? Our today’s guest is Rob Harrison, a Lead Architect at Flowmo.co. He has worked as a technical lead on projects for clients like Vodafone and Tesco. In the interview, we’ll be talking about his experience and techniques […]

Ver mais

Like 0

Liked Liked