Large Language Models - OpenAPI as the portal to the real world

A closer look at the OpenAPI specification

The OpenAPI Specification (OAS) is a widely adopted, standardized way of describing and documenting RESTful APIs. It provides a human-readable and machine-readable format, typically written in either JSON or YAML, that enables developers, as well as large language models, to understand, interact, and generate code for APIs more effectively. With a well-defined OpenAPI specification, large language models can learn to interface with APIs, allowing for a seamless integration experience. By using OAS, developers can ensure consistency, interoperability, and easy integration across different platforms, tools, and even AI-powered systems.

At the root level of a OpenAPI specification there are three main keys. The first is the info key that provides a high level understanding of the API. The second is the servers key that provides a list of server URLs that host the API. Lastly, there is the paths key that defines all of the API endpoints, the callable methods and the available parameters. The schema of the parameters can be provided directly or referenced using the $ref key. Below is an annotated example of an OpenAPI specification for a Todo Plugin.

openapi: 3.0.1 # Specification version
info:
  title: TODO Plugin # API name 
  description: > # supports multiple lines
    A plugin that allows the user to create and manage a TODO list using ChatGPT. 
    If you do not know the user's username, ask them first before making queries to the plugin. 
    Otherwise, use the username "global". 
    # Includes instructions for the model
    # A description supports CommonMark or HTML
  version: 'v1' # version of the API supports semantic versioning
servers:
  - url: http://localhost:5003 # The URL where the API is available. Format: scheme://host[:port][/path]
paths: # endpoint + http method combination is known as an operation
  /todos/{username}: # `{}` - denotes a path parameter. The URL resolves as http://localhost:5003/todos/{username}
    get: # HTTP Method - one of {get, post, put, patch, delete, head, options, and trace}. 
      operationId: getTodos # unique identifier across all operations
      summary: Get the list of todos # short overview of what the endpoint does.
      parameters:
      - in: path # can be one of path, query, header, cookie
        name: username
        schema:
            type: string
        required: true # path parameters are always required
        description: The name of the user.
      responses:
        "200":
          description: OK
          content:
            application/json: # Media type defines the format of the response
              schema: # definition of the object returned
                $ref: '#/components/schemas/getTodosResponse'

components: 
  schemas:
    getTodosResponse: # referenced above
      type: object
      properties:
        todos:
          type: array
          items:
            type: string
          description: The list of todos.

How large language models call external APIs

When a user interacts with a chat plugin powered by a large language model, the model processes the user's input and searches for relevant information to generate a response. During this process, the model evaluates various APIs based on their OpenAPI Specification. By examining the metadata, the model can determine if the API is suitable for the given context and user intent.

For instance, if a user asks for weather information, the large language model will look for APIs with info objects containing relevant titles and descriptions, such as those related to weather services. By using the metadata provided by the info object, the AI model can efficiently filter and select appropriate APIs to interact with, thereby generating accurate and contextually relevant responses for the user. The following illustrates how a language model might evaluate an OpenAPI Specification to select an appropriate API and endpoint:

Evaluate the info object: The language model first examines the info object, which contains the API's title, description, and version. This step helps the model gain a high-level understanding of the API's purpose and functionality and decide whether it's potentially relevant to the user's query.
Assess the paths object: If the API seems relevant based on the info object, the language model proceeds to evaluate the paths object. This component defines the available endpoints and their associated HTTP methods (e.g., GET, POST). The model examines the paths and their summaries and descriptions to find the ones that best match the user's intent and the specific information they seek.
Inspect the parameters and requestBody objects: Once a potentially suitable endpoint is identified, the language model inspects the parameters and requestBody objects, if applicable. These objects provide details about required and optional input parameters, data types, and constraints. The model uses this information to determine if it has the necessary data to make a successful API request and, if not, to prompt the user for additional input.
Analyze the responses object: Finally, the language model reviews the responses object, which defines the possible response codes, descriptions, and response payload structures. This information helps the model understand the expected output format and the type of information the API will return, ensuring that the response can be appropriately processed and presented to the user.

The info object plays a crucial initial role in the process flow, as it serves as a gateway for the language model to consider the relevance of an API based on its purpose and functionality. However, the subsequent evaluation of other fields, such as the paths, parameters, and responses, is essential to ensure the selected API and endpoint are suitable for addressing the user's query and generating an accurate, contextually relevant response.

While these models are designed to evaluate APIs based on their OpenAPI Specification, potential biases or inaccuracies may arise during the selection process. For example, the language model may favor APIs with more detailed or keyword-rich descriptions, even if another API might be better suited for the user's query. Additionally, the model's understanding of the API's purpose and functionality is limited to the information provided in the OpenAPI Specification, which may not always be comprehensive or up-to-date. As a result, the model's decision-making process could be influenced by the quality and clarity of the API documentation, rather than the API's true capabilities or relevance to the user's needs. Therefore, it's crucial for users to exercise critical judgment and potentially cross-reference the model's recommendations with other reliable sources or expert opinions to ensure they select the most suitable API for their specific requirements.