AI PODs for Inferencing At a Glance

At a Glance

Available Languages

Download Options

  • PDF
    (1.5 MB)
    View with Adobe Reader on a variety of devices
Updated:November 16, 2024

Bias-Free Language

The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.

Available Languages

Download Options

  • PDF
    (1.5 MB)
    View with Adobe Reader on a variety of devices
Updated:November 16, 2024

Table of Contents

 

 

Overview

Companies around the world, in every industry, are keen to leverage AI to transform their business, improve customer satisfaction, and gain a competitive advantage. Deploying a generative AI application is a complex process that requires careful planning, evaluation of models and infrastructure, and execution. It is as much an opportunity to succeed as to fail.

Many organizations are struggling to define a winning strategy for their investment in AI projects, while addressing the risk of costly and complex point AI solutions. The infrastructure needs can vary significantly based on the type and size of the AI model. Cisco® can help you to right-size your investment in AI-related infrastructure while balancing current business and IT needs, with a view for scalability in the future.

What is AI inferencing?

AI inferencing involves taking a pre-trained model (e.g., GPT-4, Claude 3, Llama 3) and using it to analyze new data, generating inferences or the most probable outcomes based on that data. This process is widely used in applications like chatbots, coding assistance, and image recognition. While effective for general knowledge questions, traditional AI models can struggle with queries requiring specific data that wasn’t part of their training, such as proprietary company data.

This is where Retrieval-Augmented Generation (RAG) comes in. RAG enhances the accuracy and relevance of AI inferencing by incorporating external data sources that the original model wasn’t trained on. It connects the model to domain-specific data, enabling it to generate more precise and relevant outputs. For example, consider an insurance model trained on a country’s population data—by adding your customer-specific data, the model can provide more accurate and business-relevant insights.

AI POD for Inferencing solution

Figure 1.            

AI POD for Inferencing solution

Benefits

      Confidently deploy AI-ready infrastructure with performance assurance and seamless scalability, ensuring your systems are prepared for advanced AI workloads.

      Accelerate AI model deployment and shorten time to production-ready inferencing by leveraging full-stack validation of infrastructure, software, and AI toolsets.

      Operate with best-in-class single-support for your AI deployment architecture, streamlining operations and enhancing reliability across your AI systems.

What it does

Cisco has been developing and providing Validated Designs for over 20 years. Cisco Validated Designs (CVDs) are comprehensive, rigorously tested guidelines that help customers deploy and manage IT infrastructure effectively. They include detailed implementation guides, best practices, and real-world use cases, often incorporating Cisco technology partner products. CVDs reduce deployment risk, optimize performance, and ensure scalability, all while being supported by the Cisco Technical Assistance Center (TAC). This support and integration provide customers with a reliable and efficient path to achieving their business objectives.

Cisco AI PODs for Inferencing are CVD-based solutions for Edge Inference, RAG, and Large-Scale Inferencing. It provides accelerated deployment with centralized management and automation. The solution has been performance tested and demonstrates linear scalability through benchmark tests on real-life model simulation, showcasing consistent performance even with varying dataset sizes. Cisco AI PODs for Inferencing have independent scalability at each layer of infrastructure and are perfect for DC or Edge AI deployments. There are four configurations that vary the amount of CPU and GPUs in the POD.

Regardless of the configuration, they all contain:

      Cisco UCS X-Series Modular System

    Cisco UCS X9508 Chassis

    Cisco UCS-X-Series M7 Compute Nodes

    Cisco UCS X440p PCIe Node with Nvidia GPUs

    Cisco UCS 9108 Intelligent Fabric Modules

    Cisco UCS 6536 Fabric Interconnect or Cisco UCS Fabric Interconnect 9108 100G

    Cisco UCS X9416 X-Fabric Modules

    Cisco Intersight®

    Cisco Services

    Nvidia AI Enterprise Subscription

    RedHat OpenShift licensing

Optional storage is also available from NetApp (FlexPod) and Pure Storage (FlashStack). Both provide DataOps toolkits to help developers and data scientists perform numerous data management tasks.

Learn more

      For more in-depth information on the Cisco AI PODs for Inferencing, refer to the data sheet.

      For Information on all Cisco AI native infrastructure for Data Center, visit Cisco.com.

      For more information on the Cisco UCS X-Series Modular System, visit https://www.cisco.com/go/ucsx.

Book an expert consultation to start your AI-ready infrastructure journey

Receive expert guidance on modernizing your network and compute infrastructure with AI-ready infrastructure—combining technologies, products, and Cisco Validated Designs to support and scale AI workloads, all while advancing sustainability initiatives.

Get Started

 

 

 

Learn more