Langchain CSV Agent — a chain of function calls

7 min readOct 29, 2023

Autonomous agents powered by LLMs are said to be the future and a subject of much enthusiasm as well as spook at the moment. While they have not been rigorously defined, from what is generally understood is that they essentially comprise the following components:

— A user interface to interact with humans or other bots via text, speech or vision

— A controller powered by LLMs/MMMs to interpret the user requirement

— An arsenal of tools (software) at its disposal to be triggered if it serves as a way to achieve that requirement

— A memory to keep track of what it has already done, what worked and what didn’t and to store any additional context that it would need for rendering the task at hand.

In its crudest implementation, this LLM powered controller is nothing but your generative language model that is capable of predicting the next token. Via prompting, you can configure it to “think step by step” to get to the user requirement by letting it know that there are tools that can be used in the process and that these tools require inputs in a certain format. The LLM will then try to format the inputs to this tool which will then be invoked. At least at the moment, the said tool is usually a logic/function that is human written and its invocation depends on the LLM predicting it as useful via its token/word generation.

Currently, the most challenging parts of this set-up are a) designing prompts that directs the LLM to get to the user requirement in a structured manner b) to get it to output tokens in a manner that can be easily parsed by subsequent software logic or functions that can then be run without throwing repeated errors (LLMs are capable of rectifying these outputs on iteration but repeated iterations incur cost on both time and money) and c) to retrieve the most relevant context for LLM to get it to do the required task accurately. There is a lot of human ingenuity involved in getting this agent to work as intended.

Agent Deep dive

To understand primarily the first two aspects of agent design, I took a deep dive into Langchain’s CSV Agent that lets you ask natural language query on the data stored in your csv file. Let’s take a look at all (most of) the python function invocations involved in this process.

This is a basic implementation :

from langchain.agents import create_csv_agent
from langchain.llms import OpenAI

agent = create_csv_agent( OpenAI(temperature=0), csv_file_path, verbose=True)
## default temperature is equal to 0.7
output = agent.run(user_question)
print(output)

Here, create_csv_agent will return another function create_pandas_dataframe_agent(llm, df) where df is the pandas dataframe read from the csv file and llm is the model used to instantiate the agent.
create_pandas_dataframe_agent(llm, df) will return another class method AgentExecutor.from_agent_and_tools( agent=agent, tools=tools ) where AgentExecutor is a class inherited from a class called Chain which is the “Base interface that all chains should implement”.
Now , let’s understand the agent and tools passed to the method in 2. The default AgentType is ZERO_SHOT_REACT_DESCRIPTION. This type instantiates ZeroShotAgent using llm_chain, tool_names as parameters. ZeroShotAgent is inherited from the Agent class which in turn inherits from BaseSingleActionAgent class. ZeroShotAgent takes llm_chain =llm_chain, allowed_tools=tool_names at instantiation. llm_chain is instantiated from LLMChain which takes llm=llm, prompt=prompt to initialize. LLMChain inherits from the Chain class. llm , here is the OpenAI model that was passed and prompt is constructed via another function called _get_prompt_and_tools which returns both prompts and tools . After navigating a few other functions within _get_prompt_and_tools and a few variables saved in other files (like ZeroShotAgent.create_prompt() which returns an instance of PromptTemplate (another class) on which `partial` method is invoked to substitute the df_head with actual result and change the input variables to input and agent_scratchpad), you will decipher that the prompt being passed here is as follows :

"""
You are working with a pandas dataframe in Python. 
The name of the dataframe is `df`. 
You should use the tools below to answer the question posed of you:
Python_REPL:  A Python shell. Use this to execute python commands. 
              Input should be a valid python command.
              If you want to see the output of a value, 
              you should print it out with `print(...)`.
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
This is the result of `print(df.head())`:
{df_head}
Begin!
Question: {input}
{agent_scratchpad}
"""

An important element here is to set pd.set_option(“display.max_columns”, None) because when df.head(5).to_markdown() is done, it displays all the columns to the LLM ( 5 is default number here). Also , your custom prompt while using this agent must also include agent_scratchpad. It is mentioned in one of the classes .

4. The default tool here is PythonAstREPLTool whose _run method you must check out to learn how the python code generated by LLM is run before sharing its results with the LLM.

## source : https://github.com/langchain-ai/langchain/blob/e80834d783c6306a68df54e6251d9fc307aee87c/libs/langchain/langchain/tools/python/tool.py#L15
import ast
import re
import sys
from contextlib import redirect_stdout
from io import StringIO
def _run(
    self,
    query: str,
    run_manager: Optional[CallbackManagerForToolRun] = None,
) -> str:
    """Use the tool."""
    try:
        if self.sanitize_input:
            query = sanitize_input(query)
        tree = ast.parse(query)
        module = ast.Module(tree.body[:-1], type_ignores=[])
        exec(ast.unparse(module), self.globals, self.locals)  # type: ignore
        module_end = ast.Module(tree.body[-1:], type_ignores=[])
        module_end_str = ast.unparse(module_end)  # type: ignore
        io_buffer = StringIO()
        try:
            with redirect_stdout(io_buffer):
                ret = eval(module_end_str, self.globals, self.locals)
                if ret is None:
                    return io_buffer.getvalue()
                else:
                    return ret
        except Exception:
            with redirect_stdout(io_buffer):
                exec(module_end_str, self.globals, self.locals)
            return io_buffer.getvalue()
    except Exception as e:
        return "{}: {}".format(type(e).__name__, str(e))

5. Now, coming to the agent.run(user_question) part. This run method can be found in the Chain class from which the AgentExecutor inherits. This method essentially has a way to call the Chain class as a method via __call__ which in turn depends on 3 more methods prep_inputs, _call , prep_outputs. The first and third are implemented in the Chain class itself while the _call method is implemented in the child class. It is useful to check out the _call method.

## source: https://github.com/langchain-ai/langchain/blob/a50630277295c3884be8e839b04718d0e99b4ea4/libs/langchain/langchain/agents/agent.py#L566
def _call(
    self,
    inputs: Dict[str, str],
    run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Dict[str, Any]:
    """Run text through and get agent response."""
    # Construct a mapping of tool name to tool for easy lookup
    name_to_tool_map = {tool.name: tool for tool in self.tools}
    # We construct a mapping from each tool to a color, used for logging.
    color_mapping = get_color_mapping(
        [tool.name for tool in self.tools], excluded_colors=["green", "red"]
    )
    intermediate_steps: List[Tuple[AgentAction, str]] = []
    # Let's start tracking the number of iterations and time elapsed
    iterations = 0
    time_elapsed = 0.0
    start_time = time.time()
    # We now enter the agent loop (until it returns something).
    while self._should_continue(iterations, time_elapsed):
        next_step_output = self._take_next_step(
            name_to_tool_map,
            color_mapping,
            inputs,
            intermediate_steps,
            run_manager=run_manager,
        )
        if isinstance(next_step_output, AgentFinish):
            return self._return(
                next_step_output, intermediate_steps, run_manager=run_manager
            )

        intermediate_steps.extend(next_step_output)
        if len(next_step_output) == 1:
            next_step_action = next_step_output[0]
            # See if tool should return directly
            tool_return = self._get_tool_return(next_step_action)
            if tool_return is not None:
                return self._return(
                    tool_return, intermediate_steps, run_manager=run_manager
                )
        iterations += 1
        time_elapsed = time.time() - start_time
    output = self.agent.return_stopped_response(
        self.early_stopping_method, intermediate_steps, **inputs
    )
    return self._return(output, intermediate_steps, run_manager=run_manager)

It is here that the agent_scratchpad is updated on iteration until the “Final Answer” is obtained. The default max_iteration is 15.

while self._should_continue(iterations, time_elapsed) → this part checks if the max_iteration has been reached.

_take_next_step( ….) → this function invokes an agent.plan() method using intermediate steps which is a list.

agent.plan() method, in turn, has 3 methods being called inside it :

self.get_full_inputs() → which uses self._construct_scratchpad(intermediate_steps) to aggregate thoughts via for action, observation in intermediate_steps:
thoughts += action.log
thoughts += f”\n{self.observation_prefix}{observation}\n{self.llm_prefix}”
return thoughts
observation.prefix is “Observation: ”and llm.prefix is “Thought: ” found in ZeroShotAgent class . The returned thoughts serve as value for agent_scratchpad via {“agent_scratchpad”: thoughts, “stop”: self._stop}
llm_chain.predict( ) → uses the output from the previous step to predict . This method returns a call to itself which as we have seen earlier will depend on the _call method implemented in the child class which is LLMChain here. The _call method here calls a generate method of LLMChain which in turn calls a few other methods of BaseOpenAI class.
self.output_parser.parse() → this parses the output generated from LLM , maps this structured output into either an AgentAction (indicating a next step to perform) or an AgentFinish (indicating a final answer).

It would be helpful to checkout the parser .

def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
        includes_answer = FINAL_ANSWER_ACTION in text
        regex = (
            r"Action\s*\d*\s*:[\s]*(.*?)[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
        )
        action_match = re.search(regex, text, re.DOTALL)
        if action_match and includes_answer:
            if text.find(FINAL_ANSWER_ACTION) < text.find(action_match.group(0)):
                # if final answer is before the hallucination, return final answer
                start_index = text.find(FINAL_ANSWER_ACTION) + len(FINAL_ANSWER_ACTION)
                end_index = text.find("\n\n", start_index)
                return AgentFinish(
                    {"output": text[start_index:end_index].strip()}, text[:end_index]
                )
            else:
                raise OutputParserException(
                    f"{FINAL_ANSWER_AND_PARSABLE_ACTION_ERROR_MESSAGE}: {text}"
                )

        if action_match:
            action = action_match.group(1).strip()
            action_input = action_match.group(2)
            tool_input = action_input.strip(" ")
            # ensure if its a well formed SQL query we don't remove any trailing " chars
            if tool_input.startswith("SELECT ") is False:
                tool_input = tool_input.strip('"')

            return AgentAction(action, tool_input, text)

        elif includes_answer:
            return AgentFinish(
                {"output": text.split(FINAL_ANSWER_ACTION)[-1].strip()}, text
            )

        if not re.search(r"Action\s*\d*\s*:[\s]*(.*?)", text, re.DOTALL):
            raise OutputParserException(
                f"Could not parse LLM output: `{text}`",
                observation=MISSING_ACTION_AFTER_THOUGHT_ERROR_MESSAGE,
                llm_output=text,
                send_to_llm=True,
            )
        elif not re.search(
            r"[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)", text, re.DOTALL
        ):
            raise OutputParserException(
                f"Could not parse LLM output: `{text}`",
                observation=MISSING_ACTION_INPUT_AFTER_ACTION_ERROR_MESSAGE,
                llm_output=text,
                send_to_llm=True,
            )
        else:
            raise OutputParserException(f"Could not parse LLM output: `{text}`")

The parsed output is added to the intermediate steps and the loop continues until the final answer is had. In this maze of code, I was not able to locate the code where the next tool selected is run and its observation is added to the prompt (possibly the part with run_manager) . Let me know if you find it.

Final Thoughts

I have mixed feelings about the complexity of all of this right now. But I have to appreciate the details and thoughts that have been put into it. After all, a good library must have a robust error handling while catering to the diversity of LLMs , prompting strategies and tools that a user can deploy.

Langchain CSV Agent — a chain of function calls

Written by Anukriti Ranjan

No responses yet