every timeline I've had on AI has been wrong

Exigence for writing: this year’s AMC cutoffs are ridiculous (I didn’t qualify). DHR almost 150, almost 300 perfect scores, etc.

This year also is the first year where GPT is good enough to AK. Coincidence? I think not. In fact, we can prove that it’s most likely not leakers: in 2023, pyramid scheme leaked the test for several hours on AOPS publically, and still we didn’t have something that looked like this.

Ok, I’ve been wanting to write this for a long time, but this is the first time i’m mad enough to write this…

Back in like 2016, I remember watching the AlphaGo vs Lee Sedol match. Back then, in my like 4-th grade brain, my timeline was roughly “never”. I just didn’t think it would generalize.

In around like 2019, I did hear about GPT-2. I think I also saw a demo, somewhere. I remember trying to count the number of words it could generate before it didn’t make sense (hint: two hands, unary number system, was sufficient for this). Back then, apparently GPT-3 was “in development” but “too dangerous to release to the public at the time”. That was a funny joke to me. I didn’t believe in GPT-3. Nobody did.

* * *

So December of 2022, I was genuinely shocked by the release of GPT-3 as ChatGPT. Everyone was. It was just…… too good.

It was a huge amount of progress. For the first time, it could be claimed that AI could possibly have a “consciousness”, or “think”, or “reason”, or whatever quality that we liked to attribute to humans.

I think I went through a period of denial following the release, because it was just that much of an improvement. I really didn’t believe it for a while, it felt kind of surreal.

Anyway, even within like hours of trying it, people, like me, started to deny that it had any real intelligence, that it was just an autoregressive model, and it couldn’t reason or something. That everything it did was training data contamination (ok, to be fair, this was pretty true). That it couldn’t solve anything it hadn’t seen before or that was non-standard.

I think the claimed CF rating was like 1400? or something? but we all thought (correctly?) that it was like 800. and we all thought (incorrectly) that it would never reach 1900. at least, it was definitely not in my plans for the next 5 years.

* * *

That year, I got into USACO Camp, and met Mark Chen (research vp? i forget @ OAI). At the same time was roughly the release of GPT-4o to the public.

At the time of camp, I remember Brian mentioning that the primary issue with cheating is using AI to paraphrase shared solutions.

Mark gave us a lecture on diffusion models. I distinctly remember a few things from the lecture:

advice he gave us: “if you want something, just ask”
he mentioned that OAI was working on reasoning models for CP, and that he thinks it’ll just generalize
he demoed 4o to us, and told us that 4o could be trained to replicate a voice in just 10s of audio data. yes, 10s.

Little did I know, point (2) was to revealed to the public that September (literally 4 months later). This is like 1.75 years after the release of GPT-3.

They claimed like CF 1800, and like some math results, and like they solved a CF 2700. I (like everyone) blamed most of it on data leakage, and sure enough, it did fall short of the benchmarks when it was released in like January (this year).

But it was also good enough, to the point that I couldn’t ignore it. At this point is when the cheater blogs on CF started showing up. At this point we started getting white text on USACO Plat problems (that means AI could solved at least one of them). It was scary, to say the least.

At this point, the issue with cheating on USACO was using AI to solve the problems.

At this point was really an awakening, and I started to believe in the exponentials. Still, I thought, “maybe plat will be safe, who knows”, and like “it won’t, beat [insert GM I looked up to], right? at least, I’ll be safe for another USACO season, right?”

Also, this AIME, I got outscored by AI.

* * *

Teamscode summer contest, 2025, arrives. And with it, GPT-5.

One of our last 4 problems, i think it was called Novice Problem?, got instantly sniped. And I couldn’t solve that one, I think hyforces and penciltimer teamsolved it.

That contest, I spent like 5 hours with other setters weeding through GPT submissions. To my surprise, there were even GPT teams in like the top 5.

AI starts appearing at the top of codeforces leaderboards.

And then IOI (ai gets gold medal), IMO (ai gets gold medal), and ICPC WF (ai ak’s).

And that takes us to the present.

The EA people like to write timelines and predictions. But what is there to predict?

I’d say, maybe, original math research, but it appears to already have been done? Or CF 3000, but, what use is that (and how do we know its not 3k already)? There’s really not that many useful endpoints that I’d like to predict.

some quick takes: long-term alignment is impossible. this is just a consequence of evolution.

short-term alignment is possible, but hard. after all, look at china trying to align deepseek.

* * *

The EA people also like to talk about when the point of no-return is. Sometimes, I can’t but help think that it has already passed.

Even, consider this indirect scenario. Humans get so dependent on AI, that they “soon” become incapable of thinking. It goes from “need some information? ask ai”, “stupid bug in code? ask ai”, “have a hard math problem? ask ai”, “writing some code? ask ai”. “think? why think? ask ai!”

I was vibecoding the hillhacks website earlier and I’m experiencing this. Most of the stuff I offloaded was tedious stuff, but I get the feeling that perhaps, I’m not longer as capable of doing that stuff by myself as I could before. Although this could also be due to not dealing with nextjs ever, or doing too much cp and not enough projects. But I could see someone starting with AI, and never learning these skills. IMO, they are still important to have, somewhat, but perhaps this is just “too little faith in AI”.

But then, while I was teaching robotics, I saw some 9-year-old kids trying to vibecode their robot (it didn’t work, and they gave up and started doing it themselves). That by itself is a bit shocking, but what I find more shocking is the way in which they use AI. That they try to offload all of the thinking to the AI, instead of just offloading the coding. (ok, in honesty, I do that for homework sometimes). But like, to not think and let AI do everything for you, doesn’t that just defeat the point? especially in education…

And then there are cheaters just throwing everything into GPT. Like on the recent AMC. Or USACO.

Also there are teachers that just use AI to make all their assignments. And the students use AI to write all their assignments. And maybe it’s AI graded (or just 100s everywhere), idk. But it’s all AI slop. The internet is now also full of AI slop. I’m vulnerable to this too.

Surely, it is still worth thinking. If it’s not worth thinking, all hope is probably gone. And thus this seems a bit problematic to me.

a certain RY once told me that he thinks research positions will still survive, because even if AI does all the research, humans will still be needed to interpret it. And if we don’t care about this, society is hopeless anyway. Although, why can’t AI interpret it?

* * *

I’m starting to think that by the time I graduate college, AI will have taken over to such a significant extent that I won’t have anything to do. Like, is there really anything that I can do that AI can’t, even now? And why should I expect that to change, in my favor, over the next 4 years?

including my current internal timeline, probably

CATALOG

FEATURED TAGS

FRIENDS