Bayes, bits & brains

This site is about probability and information theory. We'll see how they help us understand machine learning and the world around us.

A few riddles

More about the content, prerequisites, and logistics later. I hope you get a feel for what this is about by checking out the following riddles. I hope some of them nerd-snipe you! ๐Ÿ˜‰ You will understand all of them at the end of this minicourse.

๐Ÿง  Intelligence test

Test your intelligence with the following widget! You will be given a bunch of text snippets cut from Wikipedia at a random place. Your job: predict the next letter! Try at least five snippets and compare your performance with some neural nets (GPT-2 and Llama 4).

Loading questionsโ€ฆ

Don't feel bad if a machine beats you; they've been studying for this test their entire lives! But why? And why did Claude Shannon - the information theory GOAT - make this experiment in the 1940s?

Hide โ–ฒ
๐Ÿ“ˆ Modelling returns
๐ŸŒ How much information is on Wikipedia?
๐Ÿ”ฎ Who's less wrong?
๐Ÿฆถ Average foot
๐Ÿค“ Explaining XKCD jokes

Onboarding

As we go through the mini-course, we'll revisit each puzzle and understand what's going on. But more importantly, we will understand some important pieces of mathematics and get solid theoretical background behind machine learning.

Here are some questions we will explore.

  • What's KL divergence, entropy and cross-entropy? What's the intuition behind them? (chapters 1-3)
  • Where do the machine-learning principles of maximum likelihood & maximum entropy come from? (chapters 4-5)
  • Why do we use logits, softmax, and Gaussian all the time? (chapter 5)
  • How to set up loss functions? (chapter 6)
  • How compression works and what intuitions it gives about LLMs? (chapter 7)
Prerequisites
How to read this

What's next?

This is your last chance. You can go on with your life and believe whatever you want to believe about KL divergence. Or you go to the first chapter and see how far the rabbit-hole goes.

pills
Visitor map