Graveyard

I wrote all kinds of additional text and widgets that did not fit with the main story that much. Hence this bonus content. If you like the mini-course so far, or some of the widgets dumped below interest you, check them out!

Bonus chapters

I wrote three chapters that show some nice stuff connected to KL divergence & entropy.

Kolmogorov Complexity

Mandelbrot set looks incredibly complex, yet it's generated by a very short formula.

50
Drag • Scroll to zoom • Double-click
↓ Kolmogorov complexity of the picture ↓
Mandelbrot Set Formula
z₀ = 0, zn+1 = zn² + c
c ∈ M if |zn| remains bounded
Max iterations: 50
Viewport Coordinates
Left: -2.50
Right: 1.50
Top: -1.50
Bottom: 1.50

Kolmogorov complexity gives us the language to talk about all this, and much more.

For example, it tells us a lot about pseudorandomness.

Coin Flip Randomness Tester

Click coins to build a sequence, then see if it passes statistical tests for randomness

Current Sequence (0 flips)

Click coins above to build your sequence

Alternatively, write text here:

It's also the ultimate answer to how similar files are.

Loading three categories analysis...

More in the Kolmogorov chapter.

Multiplicative Weights Update

Here's a riddle that's behind a ton of recent algorithms in CS and ML. Say you wanna get rich trading stocks. Lucky you—nn investors share their tips every day. Each day tt, they give advice, and afterward you find out how they did. For investor ii, you get gi(t)g_i^{(t)} - how many dollars she gained that day.

Your strategy: Start with equal trust in everyone—pi(0)=1/np_i^{(0)} = 1/n for all investors. Each morning, randomly pick an investor based on your trust distribution and follow their advice. After seeing how everyone did, update your distribution to p(t+1)p^{(t+1)}.

How should you update? What's best? 1

Try it yourself with the widget below:

Multiplicative Weights Update in Action

Select Algorithms

Choose Scenario

Also, here's how we can use it in algorithms, for noisy binary search problem.

Noisy Binary Search Visualization

1
6.7%
2
6.7%
3
6.7%
4
6.7%
5
6.7%
6
6.7%
7
6.7%
8
6.7%
9
6.7%
10
6.7%
11
6.7%
12
6.7%
13
6.7%
14
6.7%
15
6.7%
Steps: 0

If you got interested, continue to the multiplicative weights chapter.

Fisher Information

A typical US election poll asks 1000-2000 random people. Do the math and you'll find such polls are usually within 2-3% of the truth.2 Pretty wild—it doesn't even matter how many people live in the country, 1000 random ones get you within a few percent!

But here's the thing: US elections are super close. We already know both parties will get around 50%. So maybe we should poll more people (or combine polls) to get within 0.1%. How many people would that take?

Explore the relationship in the widget below.

Polling Error Calculator

0.1%10%
45/5550/5055/45

Required Sample Size

1K
people needed
Estimated cost:$2K
% of US voters:<0.001%

Quick Scenarios

More in the fisher info chapter.

Bonus random content

More on entropy and KL divergence applications; cut from the third chapter.

⚠️
Chain rule & uniqueness

Mutual Information Explorer

Drag bars to adjust probabilities

Joint Distribution P(Weather, Transport)

🚶‍♀️🚲🚌☀️☁️14.0%21.0%35.0%6.0%9.0%15.0%70.0%30.0%20.0%30.0%50.0%
I(Weather; Transport) = 0.0000 bits
I(X;Y) = Σ P(x,y) log₂ [P(x,y) / (P(x)P(y))]
⚠️
Conditional entropy and mutual information

More max entropy distributions, cut from the max-entropy chapter.

⚠️
Beta Distribution
⚠️
Exponential Family