grokking: generalization beyond overfitting on small algorithmic datasets (paper explained)
Published 2 years ago • 72K plays • Length 29:47Download video MP4
Download video MP3
Similar videos
-
29:51
new discovery: llms have a performance phase
-
2:43
pytorch in 100 seconds
-
37:01
transformerfam: feedback attention is working memory
-
57:21
an observation on generalization
-
36:37
∞-former: infinite memory transformer (aka infty-former / infinity-former, research paper explained)
-
50:24
linformer: self-attention with linear complexity (paper explained)
-
20:27
how far can we scale up? deep learning's diminishing returns (article review)
-
29:53
transgan: two transformers can make one strong gan (machine learning research paper explained)
-
28:12
mlp-mixer: an all-mlp architecture for vision (machine learning research paper explained)
-
18:57
vision transformers | lecture 10 (part 3) | applied deep learning (supplementary)
-
44:20
pondernet: learning to ponder (machine learning research paper explained)
-
8:38
transformers: the best idea in ai | andrej karpathy and lex fridman
-
0:57
introduction to regularization
-
24:34
scaling transformer to 1m tokens and beyond with rmt (paper explained)
-
43:04
deep networks are kernel machines (paper explained)
-
0:18
transformers | basics of transformers