The rumored “Strawberry” model from OpenAI has been released in beta for ChatGPT Plus and Tier 5 API customers. The new line is called “o1“, a new type of AI models trained for multi-step reasoning. According to the release announcement, it is intended to tackle tough problems in sciences and coding.
The o1 series (o1 and o1-mini) takes a new approach to AI reasoning. These models are trained to spend more time processing information before responding, mimicking human thought patterns. In theory gpt-o1 can explore options for solving problems, refine them, and identify mistakes in its own “thinking.” The result is better problem solving.
is GPT o1 smart?
In benchmark tests, the upcoming o1 model looks pretty spiffy. Don’t they always, though?
Allegedly it performs on par with Ph.D. students in scientific disciplines and shows big improvements in math. For example, on a qualifying exam for the International Mathematics Olympiad, the o1 model correctly solved 83% of problems, compared to GPT-4o’s 13%. Its coding abilities have also been evaluated in contests, reaching the 89th percentile in Codeforces competitions.
The model has been trained on an all new dataset tailored to the new multi-step approach. More importantly, it has been extensively trained using RLHF (Reinforcement Learning from Human Feedback).
Tantalizingly, one of the researchers suggested that the limits to performance improvement through RLHF seem to scale differently than the (rapidly diminishing) improvements from model pre-training. In other words, maybe they can keep improving it through RLHF indefinitely.
There is aso a new safety approach that supposedly makes o1 more resistant to jailbreaking attempts, with the o1-preview model scoring 84 out of 100 on OpenAI’s toughest jailbreaking test, compared to GPT-4’s score of 22.
What’s the CHATGPT o1 preview like?
If you’ve used Advanced Data Analysis (formerly Code Interpreter) in ChatGPT, you’ll be familiar with the little dropdown boxes to show the analysis steps the model went through and any code it wrote. The o1 preview operates similarly, showing its chain of thought beneath a message giving the amount of time it spent thinking.
It is moderately agentic. If you have tried AutoGPT or similar agentic frameworks, it’s a bit like that, but cleaner.
There are new “reasoning tokens” in the API documentation. These represent the model making many small API calls for each step of its reasoning process. It sounds like implementation in the API could be a little complicated and costly to boot. GPT-4o is about 4x times cheaper than o1.
Reasoning Test
I was asking ChatGPT to help me identify demographic correlations to advertise an affiliate product, Prodentim. [If you purchase through that link we will get a commission, thanks for your support!] The ads aren’t intended for The Servitor, but if you happen to be a developer with dental problems…
I have a test account at Rumble Advertising Center where I’m doing some small scale pay-per-click advertising experiments. Any sort of supplement or nutritional product is expensive to advertise–the markets are highly competitive. I was asking GPT to help me find less expensive keyword categories that might cross over to the target demographic.
O1 demonstrates its thought process in a series of steps before providing its final output. When given a task to generate indirect keyword categories for targeting a specific demographic, the model broke down its approach into several stages.
The model first identified the key components of the keyword task, including the target demographic characteristics and the need for indirect, cost-effective keyword categories.
It considered the given demographic – middle-aged, predominantly male, white American conservatives with strong GOP and MAGA affiliations. It touched on interest topics like DIY home improvement, classic cars, outdoor activities, and financial planning.
The model’s final output was a list of ten indirect keyword categories. These ranged from men’s health supplements (not very helpful) to patriotic merchandise and veteran support communities. It was… ok. Not mind-blowing, but not bad.
Coming Soon to a ChatGPT Near You
Access to these new models is being rolled out gradually. ChatGPT Plus and Team users can now access o1 models, while API access is limited to qualified developers (those who have spent $1000+ on the OpenAI API).
The new approach by o1 provides a window into the AI’s decision-making process. It’s just a bit more transparent than previous models, showing each step of its reasoning before arriving at the final answer.
It’s neat, but I have questions about what to do with this expanded thought process and whether it actually improves the quality of the output. It may be a model in search of a use case.