Disclaimer: This post does not explain how to install and setup AutoGPT. For more information about this, I recommend looking at the documentation. (Personally, using Docker is the best.)
Since March 17, AutoGPT has been increasingly gaining attention from tech enthusiasts. As a result, the number of stars on the GitHub repository has surpassed 150K as of October. Recently, it has raised $12M and is planning to take off by hiring more engineers and DevRels to get the word out.
However, AutoGPT has a blind spot—it provides irrelevant results, if sufficient information hasn't been provided enough—and not just one. AutoGPT had issues regarding hallucinations with some executions when calling functions to solve tasks. (However, an update from OpenAI enabled 3rd party APIs to be called by the assistant, which are explicitly defined functions. This could now be solved in a pretty straightforward manner.) SO, users still need to spend time and effort designing prompts that can correctly guide these autonomous agents. Not only do they need to work on prompts, but they can't iterate upon the given result, making it a one-way sequence.
But you might be wondering, how does (vanilla) AutoGPT work under the hood? What's causing these behaviors? Let's take a look inside the logs and figure out what's going on.
Initialization
When AutoGPT becomes initialized, it asks you what you want it to do. Let's say I need help figuring out what kind of flowers my wife would like.—Fyi. this is an example. I'm not even married. 😅
Now under the logs/DEBUG directory, you can see that a new directory is created. The name of the directory follows format of yyyyMMdd_hhmmss_{agent_name}. Let's call this the "root" directory just for the sake of simplicity.
Progression
Under the root directory, a sub-directory—let's call them "step" directories—is created as the agent proceeds in action. Normally, each sub-directory consists of four files (except for the 000 directory). Below is the order that they are created in. Notice the prefix number doesn't match the order that they are created in.
When a new step is initiated, 2_next_action.json is created. It stores information about the agent's next action—which you will see later that this is merely what the assistant is printing in a JSON format.
2-1) 0_full_message_history.json
0_full_message_history.json stores all the interactions between the user, assistant, and system. A role is simply an enum of speakers. For more information, see the official documentation from OpenAI.
The assistant’s content can be broken down into, thoughts and command. You can see that it uses a reasoning prompt similar to ReAct.
The plan attribute looks like this. It's nothing but a list of tasks that need to be done. It gets updated as steps proceed.
2-2) 1_current_context.json
1_current_context.json is the "holy grail" of AutoGPT. This data is what's delivered in the attribute messages. It keeps the agent to act in a certain manner using the instruction given by the system to the assistant—you guessed right, it's like configuring custom instructions in ChatGPT.
For those of you who do not know how OpenAI's Chat completions API works, Fig 10 is an official example from OpenAI. The messages attribute is where the message history is delivered to the LLM for providing context.
Now back to business. It's important to note that LLMs are not invincible—it cannot remember the whole history because of the limited context window. (But neither is larger context window size, the perfect cure. Read this for more information.) OpenAI states the following in their documentation.
Because the models have no memory of past requests, all relevant information must be supplied as part of the conversation history in each request. If a conversation cannot fit within the model’s token limit, it will need to be shortened in some way.
Here's where AutoGPT's weakness is at. It doesn't summarize the context being delivered, leading to an early saturation of the context window. As a result, it forgets its role that needs to be played. Also, since OpenAI enabled function calling, defining functions in prompts is actually inefficient.
However, the response format, stated in Fig 12, doesn't change because of its recency. LLMs have biases regarding this issue too. It's called a recency bias.
3) 3_user_input.txt
3_user_input.txt is nothing but a plain text of the user's feedback. It doesn't exist in certain steps when user input is not needed.
Conclusion
Although AutoGPT has limitations to its applications with the current architecture, it gave tech enthusiasts a taste of what the future—AGI—might look like. A lot needs to be fixed to use it in production but we're humans, we'll get there eventually.
Not only are we going to get there, we're going to get there fast. LLM related studies are exploding everyday and people around the world have already started working on various agents of their own thesis. ASQ is one of them. We're building a personal assistant for everyone, for everything. If you're excited about the future as much as we are, please subscribe to our blog. Also, I'd love to keep in touch with those who are building the future as well.
Sign up for Blog﹒ASQ
We talk about our journey towards making LLM-powered autonomous agents, the right way. Also, informative posts are uploaded from time to time.