Building a Multimodal AI Agent for Web Search and Stock Analysis

6 min readJan 6, 2025

Artificial Intelligence (AI) has made significant strides in recent years, allowing us to solve complex problems with innovative solutions. A key breakthrough in AI is the development of multimodal AI agents. These agents can process and integrate multiple types of data inputs to make smarter decisions and provide more comprehensive responses. In this blog, we will explore what a multimodal AI agent is, how to build one, and its potential applications. We will also dive deep into the code and the technical details of building a multimodal AI agent for web search and stock analysis.

What is a Multimodal AI Agent?

A multimodal AI agent is an artificial intelligence system designed to process and combine multiple forms of input, such as text, images, videos, or other sensory data. Unlike traditional AI agents, which rely on a single mode of input (e.g., just text or just images), multimodal agents can handle diverse data sources and work together in a collaborative manner to provide better insights and more accurate results.

For example, a multimodal AI system might take input from both text-based data (e.g., news articles) and numerical data (e.g., stock market prices) to offer a more complete understanding of a particular topic or make informed decisions.

Why Multimodal AI Agents Matter

In real-world applications, challenges often require a combination of data from various sources. For instance:

Web search: If you are looking for detailed information about a company or a stock, you might need to search the web for articles and reviews, while also pulling in specific financial data to get a comprehensive view.
Stock analysis: Making informed financial decisions often requires pulling data from various sources, such as stock prices, analyst recommendations, and company news, and presenting them in a useful format.

Multimodal AI agents can integrate these different types of information to create a more informed, actionable response. Whether it’s for business, healthcare, or finance, these agents help us understand complex issues better and faster.

How to Build a Multimodal AI Agent

Building a multimodal AI agent involves integrating multiple tools and models that specialize in processing different kinds of data. Below is a step-by-step guide to building a multimodal AI agent for web search and stock analysis.

Step 1: Choosing the Right AI Model

The first step is to select an AI model that is capable of processing the required data. In this case, we use the Groq model, which is powered by Llama-3.3–70b. This model provides powerful capabilities in handling complex tasks like text generation, web search, and data interpretation.

Groq’s versatility allows us to use it in combination with various tools that serve specialized functions, such as searching the web (using DuckDuckGo) or fetching stock data (using Yahoo Finance).

Step 2: Selecting Tools and Integrating Them

Next, we integrate external tools that will allow the agent to gather the necessary data. For our agent, we are using:

DuckDuckGo: A privacy-oriented search engine that will allow the agent to search the web for information.
YFinanceTools: A Python library that fetches financial data, including stock prices, analyst recommendations, and company news.

Step 3: Setting Up the Agent

Now that we have the tools and model ready, the next step is to create the agent. We can create individual agents, such as a web search agent and a financial analysis agent, and then combine them into a team agent that handles both tasks.

Here is the complete breakdown of the agents in our project:

Web Search Agent: This agent uses the Groq model and the DuckDuckGo tool to search the web for relevant information. It follows instructions to always include sources and provide results in markdown format.
Financial Analysis Agent: This agent also uses the Groq model, but with the YFinanceTools to fetch financial data such as stock prices, analyst recommendations, and company news. It displays the results in tables for clarity.
Multimodal Agent: The multimodal agent is the central agent that combines the web search agent and the financial analysis agent into a single team. This agent can respond to queries that require both web search and financial analysis, providing more complete responses.

Step 4: Querying the Agent

Once the agents are set up, we can send a query to the multimodal agent, and it will use both web search and financial analysis tools to generate a detailed response. For example, the query “Summarize analyst recommendations and share the latest news for TESLA stock” will prompt the multimodal agent to fetch the latest news on TESLA, along with analyst ratings and stock performance.

Code Walkthrough

Below is the code for the multimodal AI agent:

from phi.agent import Agent
from phi.model.groq import Groq
from phi.tools.yfinance import YFinanceTools
from phi.tools.duckduckgo import DuckDuckGo

api_key = "my-groq-api"
# Web search agent
websearchagent = Agent(
    name='webagent',
    role='search the web for the information',
    model=Groq(id = 'llama-3.3-70b-versatile', api_key=api_key),
    tools = [DuckDuckGo()],
    instructions = ['Always include sources'],
    show_tools_calls = True,
    markdown = True,
)
# Financial analysis agent
finagent = Agent(
    name = 'finagent',
    model=Groq(id='llama-3.3-70b-versatile', api_key=api_key),
    tools = [YFinanceTools(stock_price=True, analyst_recommendations=True, stock_fundamentals=True, company_news=True)],
    instructions = 'use tables to display the data',
    shows_tools_calls = True,
    markdown=True,
)
# Multimodal agent combining both agents
multiagent = Agent(
    team=[websearchagent, finagent],
    model=Groq(id = 'llama-3.3-70b-versatile', api_key=api_key),
    instructions = ['Always include sources','use tables to display the data'],
    show_tools_calls=True,
    markdown=True,
)
# Making the query
multiagent.print_response('Summarize analyst recommendations and share the latest news for TESLA stock', stream=True)

Detailed Code Explanation

Imports: The code imports necessary libraries such as Agent, Groq, YFinanceTools, and DuckDuckGo to build the agent.
API Key: The Groq model requires an API key, which is stored in the api_key variable.
Web Search Agent: This agent is configured to search the web using the DuckDuckGo tool and always include sources in markdown format.
Financial Analysis Agent: This agent uses YFinanceTools to fetch financial data and display it in tables. It is configured to fetch stock price, analyst recommendations, stock fundamentals, and company news.
Multimodal Agent: The multimodal agent combines both the websearchagent and finagent into a team, enabling the agent to process both web search and financial data.
Query: Finally, the agent is queried to summarize analyst recommendations and share the latest news for TESLA stock, which is processed using both the web search and financial analysis agents.

Applications of Multimodal AI Agents

Multimodal AI agents have a broad range of applications across various industries. Some examples include:

Finance: Automating financial analysis, including stock market predictions, financial reporting, and data extraction from multiple sources.
Healthcare: Analyzing patient records, medical research, and health data from multiple sources to make more informed healthcare decisions.
E-commerce: Combining user reviews, product data, and purchase history to recommend products and optimize pricing strategies.
Education: Integrating content from textbooks, online courses, and research papers to help students find resources and improve learning outcomes.

Further Usage and Extensions

This multimodal agent can be further extended in various ways:

Adding More Tools: You can integrate additional tools, such as news APIs or social media scraping tools, to provide more diverse data.
Advanced Data Processing: Incorporate machine learning models to analyze or classify the data before presenting it to the user.
Real-time Analytics: Use real-time data streams to update the agent’s responses based on the latest information.

Conclusion

In this blog, we discussed the concept of multimodal AI agents, how to build one, and demonstrated a practical example where the agent is used to combine web search and stock analysis. Multimodal agents can help businesses and professionals make more informed decisions by integrating diverse data sources and providing a more comprehensive view of any given situation.

If you’re interested in exploring more or working on a similar project, feel free to check out the code in the GitHub repository, experiment with the agent, and make it your own!