Team Symbl, Author at Symbl.ai

How to Integrate Genesys Cloud and Symbl to Obtain Call Scores

Team Symbl — Fri, 25 Oct 2024 06:10:10 +0000

Though vital to many organizations’ operations, running a contact center is a costly and time-consuming endeavor – so it is vital for companies to get as high of a return on their investment as possible.

By integrating a contact center platform solution such as Genesys Cloud with the conversational intelligence provided by Symbl, companies can extract valuable insights from interactions that will enable them to optimize their contact center performance.

An example of conversational intelligence is Call Score API from Symbl.ai, which provides a concise evaluation of a contact center interaction and helps to streamline the call evaluation process, which expedites the improved performance of a company’s contact center agents.

With this in mind, this guide offers a step-by-step explanation of how to integrate Genesys Cloud with Symbl to obtain Call Scores for the conversations that take place within your contact centers.

What is Genesys Cloud?

Genesys Cloud is a contact center solution used by organizations in over 100 countries to track and manage their customer interactions across multiple channels, including voice, email, messaging, apps, and social media. By combining Customer Experience (CX) and Customer Relationship Management (CRM) workflows, Genesys Cloud enables companies to streamline their communications, resulting in:

Enhanced Customer Service: shorter queues, faster response times, and higher levels of customer satisfaction.
Increased Agent Productivity: AI-powered features such as chatbots and automated routing allow human agents to concentrate on more intricate interactions that require their expertise.
Better Data-Driven Insights: with all communications stored in a single location, it’s easier for organizations to extract valuable insights that help them improve their services and increase profitability. Additionally, real-time analytics offer increased insight into customer preferences and operational performance, further aiding companies in crafting and refining their short and long-term strategies.

What is Call Score?

Symbl’s Call Score API allows you to evaluate the quality of conversations that occur within your organizations, as well as the performance of participants – at scale and, crucially, without having to listen to each conversation. It generates a numerical score (out of 100) for each conversation, along with an individual score and a detailed breakdown for each of your defined criteria, which makes it easier to determine the quality of an interaction, assess the performance of the human agent, and identify and compare relevant interactions.

Use cases for Call Score include:

Contact Center Quality Assessments: organizations can determine the quality of conversations taking place at their contact centers, as well as the overall efficacy of their teams. With the ability to assess call quality without having to listen in (or to recordings), supervisors and management can identify areas of improvement in less time and more frequently, leading to rapidly compounding increases in customer service levels.
Sales Performance Evaluations: similarly, companies can use Call Score to objectively evaluate the performance of their sales teams and hone in on the most essential areas for improvement. This then allows sales managers and trainers to structure sales meetings and professional development strategies most effectively.
Refine Communication and Branding Guidelines: Call Score can help companies pinpoint exactly how they’d like their contact centre and sales teams to come across to their customers, thereby assisting in refining their communication standards and brand voice.
Compliance: in addition to communication guidelines, Call Score helps you evaluate your employees’ compliance with regulations that pertain to your industry, such as data privacy legislation. This helps mitigate legal, reputation, and, subsequently, financial risk and encourages ethical conduct.
Recruitment: Call Score can also be used during the recruitment process, allowing you to measure a job candidate’s responses against the attributes you’re looking for.

Scorecard and Criteria

Call Score is composed of two core concepts: scorecard and criteria.

A scorecard is the summary of the Call Score: combining your chosen criteria to provide an overall breakdown, as well as a separate score for each criterion. To learn more about scorecards, please refer to our documentation.

Criteria, meanwhile, are the specific qualities or traits used to create a detailed evaluation of the quality of an interaction, as well as the agent involved, which is used to generate the Call Score. You can use a default set of criteria, called Managed Criteria, for a simple out-of-the-box evaluation.

Managed criteria include:

Communication and engagement: the agent’s communication style and how well they engage with a customer.
Question handling: the ability to answer questions and handle objections from the prospect.
Sales process: how well a sales representative adheres to your company’s sales process and protocols.
Forward Motion: an agent’s ability to drive the interaction towards the desired outcome.

Alternatively, you have the ability to define custom criteria by which to evaluate your contact center personnel. These can be combined with the Managed Criteria to create the most comprehensive scorecard for evaluating the efficacy of your contact centers To learn more about criteria, please refer to our documentation.

Advantages of using Symbl’s Call Score for contact center evaluation

The advantage of using Symbl’s Call Score API to automate quality assurance and evaluate contact center agents’ performance over Genesys’ native quality assurance and agent evaluation tools is twofold:

The customizability and flexibility of performance evaluation tools in the native contact center tech stack is limited. If businesses have complex or nuanced quality assurance requirements, using Symb’s custom criteria and exhaustive checklist questions and adjusting scoring weights and priorities provides this added flexibility.
Customers are increasingly using a combination of multiple contact center vendors and technology solutions. Symbl’s Call Score API is a vendor agnostic low-code API that seamlessly integrates directly into the CRM, BI tools or custom applications – all through the single API. This overcomes constraints and limitations of using vendor provided out-of-the-box offerings.

How to Integrate Genesys Cloud and Symbl: Step-By-Step Implementation

Let us turn our attention to integrating Genesys Cloud with symbl to generate Call Scores for your contact center interactions.

Initial Setup

The first step in integrating is to establish the correct setup, which requires three steps:

Set Up a Genesys Cloud Organization: if you don’t have one already, create a new Genesys Cloud organization, by completing the required initial setup tasks.
Install AudioHook Monitor: in your Genesys organization, install AudioHook Monitor, the protocol Genesys Cloud uses to transmit audio data in near real-time to any third-party platform. In addition, for AudioHook to function, you will need to enable voice transcription.
Acquire Your Symbl API key: to connect to Symbl APIs, you will need access credentials, i.e., an app id and app secret, which you can obtain by signing into the developer platform, as illustrated below:

Configure the AudioHook Monitor

With your environment set up, the next step is configuring your audiohook monitor so it knows where to transmit the audio feed. This is achieved with by doing the following:

Open the Genesys Cloud UI go to Admin, on the far right of the top menu, and click Integrations. On the Integrations menu, click the three dots on the right of the AudioHook option; then select Edit Integration on the pop-up menu that appears.
This will display an interface with three tabs: Details, Configuration, and Support – we’re only concerned with the first two:
1. Click the Details tab and label your AudioHook instance a more meaningful name for easier future reference – in this case, we’ll simply call it Call Score
2. Click the Configuration tab and stay on Properties: set Channel to both, to specify stereo sound, and enter the following under Connection URI, which is the address that establishes a secure connection to Symbl’s Stream API (which, in turn, makes use WebSocket Service (WSS) protocol) :
  
  wss://api.symbl.ai/v1/streaming/{CONNECTION_ID}?access_token={ACCESS_TOKEN}
  
  CONNECTION_ID is an arbitrary value of your choice, e.g., 12345, as Symbl’s servers will generate a unique conversation ID for the interaction. Meanwhile, ACCESS_TOKEN is the access token that you generated with your app ID and secret (as detailed above).
3. Next, click on Credentials, enter your Symbl API key and secret into the appropriate field,s and click OK.

Customizing the Call Score Request

Here. we will customize the request, as a JSON object, that will be sent to Symbl’s servers. This is where you can configure Call Score to best fit your organizational requirements.

Go back to Admin > Integrations > Edit Integration > Configuration (as in the last step) and then click the Advanced tab. Define a JSON object to customize the Symbl request; an example of which is presented below:

{
 type: "start_request",
 actions: [
             {
               name: 'generateCallScore',
               parameters: {
               conversationType: "general",
               salesStage: "general",
               prospectName: "Real-time Call Score",
               callScoreWebhookUrl: "https://webhook.site/abcdef-123456"
                    }
                }
            ]
}

The most important parameter is callScoreWebhookUrl, as this is the address that Symbl will send the Call Score status to the interaction is complete. To learn more about how to customize your Call Score requests, please refer to the Streaming API documentation that details the available parameters.

Selecting Which Interactions Receive Call Scores

To ensure a conversation receives a Call Score, the associated queue must have voice transcription activated; this is achieved as follows:

On the UI, go to the Admin panel, select Contact Center, and then click Queues.
Select a Queue and click on Voice, then Voice Transcription, and set it to On. Now, all calls in that queue will have a call score generated by Symbl, with the completed sent to the WebHook URL specified in the JSON object in the previous step.

Activate the Integration

Finally, with everything correctly configured, you can activate the integration. Return to the Admin panel, select Integrations, and toggle the button next to AudioHook to activate the integration. Now, you should be able to put a call through your Genesys Cloud organisation and see the status messages at your specified WebHook address, which will appear as below:

{
    "conversationId": "",
    "status": "completed"
}

Upon receiving the response, you can then use the conversationID to make a request to Symbl’s servers to retrieve the call score for that interaction. The request to be sent to the server is shown below

GET https://api.symbl.ai/v1/conversations/{conversationId}/callscore/status

This will retrieve the call score associated with that conversationId, which will look similar to the JSON object shown below, depending on your configuration and chosen criteria.

{
  "score": 82,
  "summary": "The sales call had some good aspects, but there were areas that could be improved. The representative was able to engage the prospect and discuss the product, but … ,
  "criteria": [
    {
      "name": "Communication and Engagement",
      "score": 75,
      "summary": "The representative engaged in a conversation with the prospect, discussing the product and its features. However …",
      "feedback": {
        "positive": {
          "summary": "The representative was able to engage the prospect in a conversation about the product and its features …"
        },
        "negative": {
          "summary": "The representative interrupted the prospect a few times during the call, and …"
        }
      }
    },
    {
      "name": "Question Handling",
      "score": 85,
      "summary": "The representative was able to answer most of the prospect's questions and provide information about the product. The new context shows …",
      "feedback": {
        "positive": {
          "summary": "The representative was able to answer all of the prospect's questions and provide information about the product, including …"
        },
        "negative": {
          "summary": "There were no major issues in question handling."
        }
      }
    },
    }
  ]
}

You can then take the JSON object containing the Call Score and detailed insights and incorporate it into reports, applications, or other desired mediums and platforms.

In summary/outro

Genesys Cloud is a widely-used contact center platform used by organizations to manage their omnichannel customer interactions across voice, email, messaging, apps, and social media.
Symbl’s Call Score API allows you to evaluate the quality of conversations and performance of participants – at scale and without having to listen to each conversation.
Use cases for Call Score include:
- Contact center quality assessments
- Sales performance evaluations
- Refine communication and branding guidelines
- Compliance
- Recruitment
Steps for integrating Genesys Cloud and Symbl include:
- Initial setup, i.e., creating Genesys Cloud
- Configuring AudioHook Monitor
- Customizing the Call Score request
- Selecting which interactions receive call scores
- Activating the integration

To fully harness the capabilities of Call Score, and to learn how to tailor it to extract maximum value from your organization’s contact center interactions, we encourage you to familiarize yourself with the different options and parameters outlined in the documentation.

Additionally, to experiment with the other functionality provided by Nebula, Symbl’s proprietary LLM that has been fine-tuned for conversational analysis, sign up for Symbl’s developer platform.

The post How to Integrate Genesys Cloud and Symbl to Obtain Call Scores appeared first on Symbl.ai.

Real-Time AI Assistance for Call Center Agents

Team Symbl — Mon, 21 Oct 2024 19:15:05 +0000

Call center agents often struggle with responding to challenging customer inquiries, especially during remote troubleshooting calls. To provide assistance to agents in real-time and improve customer satisfaction, call centers can implement AI-assisted troubleshooting. One of the first AI call center solutions like this improved agent performance by 34% and new technology can offer even greater gains.

In this tutorial, you will build an AI call center assistant using Symbl’s intelligence APIs. The solution will stream real-time audio from Amazon Connect to Symbl via Amazon Kinesis and use Trackers, Nebula LLM, and retrieval augmented generation (RAG) to provide agents with real-time troubleshooting tips during phone conversations with customers.

Prerequisites

AWS account with access to Amazon Connect and Amazon Kinesis
Symbl account (with access to AppID and AppSecret)
Nebula LLM access (with Nebula API key — request within Symbl platform)
Python 3.4+ installed
AWS python SDK installed
Symbl python SDK installed

With these prerequisites in mind, let’s build your AI assistant for call center agents!

Set up streaming for phone conversations

In this initial step, you will stream audio data from Amazon Connect to Symbl using Amazon Kinesis. This will allow you to capture real-time conversation data for tracking and analysis.

1.1 Create a Kinesis data stream

Log in to your AWS Management Console, navigate to Amazon Kinesis, and create a new Kinesis Data Stream with an appropriate name (e.g. symblai-kinesis-data-stream).

1.2 Configure Amazon Connect to stream audio to Kinesis

In the AWS Management Console, go to Amazon Connect and follow the setup wizard to create an Amazon Connect instance if you don’t have one. Then select your instance and go to Data streaming. Enable data streaming and select the Kinesis Data Stream you created.

1.3 Set up a Python script to consume Kinesis stream

Create a new Python file called audio_receiver.py with the following code.

import boto3
from botocore.config import Config

# Configure boto3 client
config = Config(
    retries = dict(
        max_attempts = 10
    )
)

kinesis_client = boto3.client('kinesis', config=config)

# Stream details
stream_name = 'symblai-kinesis-data-stream' # Replace with your stream name
consumer_name = 'my-local-consumer'

def process_record():
    # TODO: This will be completed later in this tutorial.

def register_consumer():
    try:
        response = kinesis_client.register_stream_consumer(
            StreamARN=get_stream_arn(stream_name),
            ConsumerName=consumer_name
        )
        print(f"Consumer {consumer_name} registered successfully.")
        return response['Consumer']['ConsumerARN']
    except kinesis_client.exceptions.ResourceInUseException:
        print(f"Consumer {consumer_name} already exists.")
        return get_consumer_arn()

def get_stream_arn(stream_name):
    response = kinesis_client.describe_stream(StreamName=stream_name)
    return response['StreamDescription']['StreamARN']

def get_consumer_arn():
    response = kinesis_client.describe_stream_consumer(
        StreamARN=get_stream_arn(stream_name),
        ConsumerName=consumer_name
    )
    return response['ConsumerDescription']['ConsumerARN']

def main():
    consumer_arn = register_consumer()
    shard_id = "" # Populate from kinesis
    starting_sequence_number = ""  # Populate from kinesis

    shard_iterator = kinesis_client.get_shard_iterator(
        StreamName=stream_name,
        ShardId=shard_id,
        ShardIteratorType='AT_SEQUENCE_NUMBER',
        StartingSequenceNumber=starting_sequence_number
    )['ShardIterator']

    response = kinesis_client.get_records(ShardIterator=shard_iterator, Limit=2000)

    for record in response['Records']:
        process_record(record['Data'])

if __name__ == "__main__":
    main()

This code registers a consumer for the Kinesis stream, then continuously fetches audio data from the Kinesis stream into your server code which is later sent to Symbl using the Streaming API.

1.4 Push data to Symbl using websocket

Now you will initiate a websocket connection to Symbl using the Streaming API. You can use the Python SDK to initiate the connection and send audio data to Symbl with the following code.

...

def process_record(data):
    connection_object.send_audio(data)

def main():
    ...

    response = kinesis_client.get_records(ShardIterator=shard_iterator, Limit=2000)

    connection_object = symbl.Streaming.start_connection(trackers=trackers)
    
    for record in response['Records']:
        process_record(record['Data'])

This ensures that the audio data is being sent to Symbl for processing. However, you need to be able to capture the response when Symbl detects a special event. For this, you need to set up Trackers which will act as a trigger for the AI assistant. You also need to set up a retrieval augmented generation (RAG) system to help with fetching contextually relevant text as the final response.

Determine when call center agents receive AI assistance

In this step, you will set up Symbl Trackers that automatically identify when assistance is needed by listening to conversations between call center agents and customers in real time.

2.1 Set up a custom tracker with Symbl

Trackers are part of Symbl’s intent detection engine. It identifies relevant events from any live conversation — in this case, where a call center agent might need AI-assisted troubleshooting help during a call.

To set up a tracker, log in to your Symbl account and create a custom tracker by navigating to Trackers Management > Create Custom Tracker.

Tracker Name: Troubleshooting Tracker
Description: This tracker identifies when a customer shares a problem they are facing.
Categories: Contact Center
Language: en-US (or any preferred language)
Vocabulary: unable to do, facing a problem, trouble, cannot do it, issue

On saving, you can locate this tracker under Trackers Management > Your Trackers.

2.2 Fetch custom tracker details

When your custom tracker is detected in a streaming conversation, a tracker_response event is sent in the websocket connection. This event is created whenever the conversation contains any keyword or phrase in your customer tracker.

To ensure this event is triggered, you will need to send your tracker metadata when initiating a Streaming API connection. You can get your custom tracker details using the following code.

...
def get_troubleshooting_tracker():
    troubleshooting_tracker_url = f"https://api.symbl.ai/v1/manage/trackers?name={requests.utils.quote('Troubleshooting Tracker')}"

    headers = {
        'Authorization': f'Bearer {generate_token()}'
    }
    response = requests.request("GET", troubleshooting_tracker_url, headers=headers)
    troubleshooting_tracker = json.loads(response.text)['trackers']
    return troubleshooting_tracker

def main():

    ...
    response = kinesis_client.get_records(ShardIterator=shard_iterator, Limit=2000)

    trackers = get_troubleshooting_tracker()
    ...

Now generate a token using the following code.

import requests

def generate_token():
    APP_ID = "SYMBL_APP_ID" # Replace with your Symbl AppId
    APP_SECRET = "SYMBL_APP_SECRET" # Replace with your Symbl AppSecret
    url = "https://api.symbl.ai/oauth2/token:generate"

    payload = {
        "type": "application",
        "appId": APP_ID,
        "appSecret": APP_SECRET
    }
    headers = {
        "accept": "application/json",
        "content-type": "application/json"
    }

    response = requests.post(url, json=payload, headers=headers)
    return response.text

2.3 Subscribe to `tracker_response` event with the Streaming API

When starting the websocket connection using the Streaming API, you will have to subscribe to the special Symbl events and handle them when triggered.

In this case, you will capture and handle a single event: the tracker_response event (view list of other supported events). This event is triggered when a customer mentions any of the words you added in the tracker vocabulary.

def handle_tracker_response(tracker):
    # TODO: You will populate this in the next section.
...
def main():

    ...
    response = kinesis_client.get_records(ShardIterator=shard_iterator, Limit=2000)

    trackers = get_troubleshooting_tracker()

    connection_object = symbl.Streaming.start_connection(trackers=trackers)

    events = {
        'tracker_response': lambda tracker: handle_tracker_response(tracker)
    }
    connection_object.subscribe(events)

    for record in response['Records']:
        process_record(record['Data'])

You have now established a connection and the live audio stream chunk is being sent out over the websocket. When your custom tracker is detected, Symbl will send out the tracker_response event.

Provide agents with real-time AI assistance

In this step, you will leverage Symbl’s proprietary Nebula LLM to display AI-assisted troubleshooting guidance.

Before you can use the Nebula LLM to fetch relevant contextual responses, you need to preprocess and store your knowledge data for efficient querying.

3.1 Create vector embeddings from knowledge data

Knowledge data is internal data specific to your organization. You can create embeddings from it using the Embedding API and Nebula embedding model that creates vector embeddings from conversations, documents, or text data. It is a vector representation of text used to compare and identify text with similar characteristics.

import requests
import json

NEBULA_API_KEY = "" # Replace with your Nebula API Key

def get_vector_embeddings(data):
    url = "https://api-nebula.symbl.ai/v1/model/embed"

    payload = json.dumps({
    "text": data # You’ll replace this in the next step
    })
    headers = {
    'ApiKey': NEBULA_API_KEY, # Replace with your value
    'Content-Type': 'application/json'
    }

    response = requests.request("POST", url, headers=headers, data=payload)

    return response.text

You can use the sample knowledge corpus defined in the Appendix at the end of this tutorial. This contains metadata and troubleshooting steps for a couple of issues for three imaginary products. It is structured in a way that the lines starting with “—-” (four hyphens) align with the intent in our situation (can act as key), and the values after that, until the next “—-” act as the relevant knowledge value.

3.2 Store vector embeddings in a vector DB for efficient search

You can now create vector embeddings for the “key” and store them along with the corresponding data in any vector DB. These vector embeddings will allow you to query the vector DB efficiently and fetch associated data.

You can use any vector DB, but for this implementation we’ll use MongoDB Atlas.

After setting up a MongoDB and creating a database (mydb) and collection (mycollection), you can establish the connection using the MongoDB connection URI. Store that value in MONGODB_URI for use while storing and retrieving from the DB.

Using the code below, you can insert documents containing these embeddings and associated data into the MongoDB collection created above.

import pymongo

def get_vector_embeddings():
    # … Same as previously defined 


def parse_text(text):
    data = []
    current_key = None
    current_value = []

    for line in text.splitlines():
        line = line.strip()

        # Start of a new section
        if line.startswith("----"):
            if current_key and current_value:
                data.append({
                    "key": current_key,
                    "value": "\n".join(current_value)
                })
                current_value = []

            # Extract key from the same line as ----
            current_key = line[4:].strip() 
        
        # Handle lines within a section (not empty and not a key line)
        elif current_key and line and not line.startswith("----"):
            current_value.append(line)

    # Capture the last section if it exists
    if current_key and current_value:
        data.append({
            "key": current_key,
            "value": "\n".join(current_value)
        })

    return data

def open_mondo_db_connection():
    mongoclient = pymongo.MongoClient(MONGODB_URI) #replace with your MongoDB URI
    db = mongoclient['mydb'] #replace with your database name
    collection = db['mycollection'] #replace with your collection name
    return collection

def populate_knowledge_data():

    parsed_data = []
    with open('knowledge_data.txt') as f:
        parsed_data = parse_text(f.read())

    for data in parsed_data:
        key_embedding = get_vector_embeddings(data['key'])

        document = {
            'data': data['value'],
            'embedding': json.loads(key_embedding)['embedding']
        }
    collection = open_mondo_db_connection()
    collection.insert_one(document)

The parse_text is specific to the sample knowledge corpus defined in the Appendix. For your own knowledge corpus, modify the function to suit your needs.

3.3 Configure a vector DB search index

Follow the instructions in Create an Atlas Vector Search Index to create a search index. You will use this to efficiently do a similarity search over the stored embeddings. Provide a name for the index (my_index) and provide the following fields:

numDimensions: length of the vector embeddings; 1024 for the Nebula Embedding Model
path: field over which the vector embedding similarity search is carried out
similarity: metric used for calculating similarity

{
    "fields": [
        {
            "numDimensions": 1024,
            "path": "embedding",
            "similarity": "cosine",
            "type": "vector"
        }
    ]
}

3.4 Retrieve knowledge context with vector search on tracker detection

Whenever your custom tracker is detected, you want to fetch data relevant to the tracker. To do this, you need to run a vector index search on your vector DB using the detected tracker. This will fetch only the data related to your tracker.

This data will be added as context for the Nebula LLM chat along with other details. It will also help you get a useful response from it. It’s here where RAG comes into play.

For the detected tracker, you need to create vector embeddings and do a vector search on the DB to retrieve the contextual knowledge.

Use the following code to retrieve the knowledge context from the vector DB.

def vector_index_search(tracker):
    collection = open_mondo_db_connection()
    tracker_embedding = get_vector_embeddings(tracker)
    tracker_embedding_vector = json.loads(tracker_embedding)['embedding']
    retrieved_context = collection.aggregate([
    {
        "$vectorSearch": {
            "queryVector": tracker_embedding_vector,
            "path": "embedding",
            "numCandidates": 10, #total number of embeddings in the database
            "limit": 1, #number of closest embeddings returned
            "index": "my_index"
            }
        }])
    return next(retrieved_context, None)

3.5 Build prompt for Nebula using retrieved knowledge context and transcript

Now that you have access to the contextual knowledge, you can pass this to Nebula along with the transcript of the chat to get a response. You can set up a system prompt to specify their response behavior and give all the relevant context.

To chat with Nebula, use the Nebula Chat API as described below. This will return a response from the Nebula LLM which can be shared with the customer support agent.

def get_nebula_response(conversation, relevant_info):
    payload = json.dumps({
    "max_new_tokens": 1024,
    "system_prompt": f"You are a customer support agent assistant. You help the agents perform their job better by providing them relevant answers for their inputs. You are respectful, professional and you always respond politely. You also respond in clear and concise terms. The agent is currently on a call with a customer. Relevant information: {relevant_info} . Recent conversation transcript: {conversation}",
    "messages": [
        {
        "role": "human",
        "text": "Hello, I am a customer support agent. I would like to help my customers based on the given context."
        },
        {
        "role": "assistant",
        "text": "Hello. I'm here to assist you."
        },
        {
        "role": "human",
        "text": "Given the customer issue, provide me with the most helpful details that will help me resolve the customer’s issue quickly."
        }
    ]
    })

    headers = {
        'ApiKey': NEBULA_API_KEY, # Replace with your value
        'Content-Type': 'application/json'
    }

    response = requests.request("POST", NEBULA_CHAT_URI, headers=headers, data=payload)
    print(json.loads(response.text)['messages'][-1]['text'])

3.6 Handle `tracker_response` and transcript

Now that you have all the other functionalities in place, you can update your tracker_response event handler to extract the conversation transcript and the tracker value, query the knowledge database using the tracker value, and get relevant knowledge data.

This is the final step of the integration.

def handle_tracker_response(tracker):

    tracker_value = tracker_response['trackers'][0]['matches'][0]['value']

    conversation_message = '\n'.join([x.text for x in connection_object.conversation.get_messages().messages])

    relevant_info = vector_index_search(tracker_value)['data']

    get_nebula_response(conversation_message, relevant_info)

def vector_index_search(tracker):
   # ... Already defined in Step 3.4

def get_nebula_response(tracker):
   # ... Already defined in Step 3.5

Testing

To test your AI call center solution using the sample input defined in the Github repository, run the following commands.

% source venv/bin/activate # Activate virtual environment

% python store_vector.py # To store sample knowledge data into vector DB

% python audio_streamer.py # To store sample audio data in kinesis

% python audio_receiver.py # To fetch data from kinesis and call Nebula for coherent response

Store the shardId and sequence number from the first response of python audio_streamer.py. Then use this shardId and sequence number when requested while running the command python audio_receiver.py.

You should see an output similar to this:

Sure, here are some helpful details to assist you in resolving the customer's issue:

1. Ask the customer to open the HomeSync app and go to the 'Automations' tab.
2. Have them review each automation rule to ensure all devices are online and connected to the hub.
3. Check that the trigger conditions and actions are correctly set for each automation rule.
4. If there are any problematic rules, suggest deleting them and recreating them.
5. If the issue persists, advise the customer to restart the hub by unplugging it for 10 seconds.
6. If the issue still isn't resolved, ask the customer to check if their Wi-Fi router is working properly and if there are any other devices experiencing connectivity issues.
7. If necessary, suggest contacting the router manufacturer or internet service provider for further assistance.

Conclusion

In this tutorial, you’ve learned how to integrate Amazon Connect with Symbl to create an AI-assisted call center solution. With an AI assistant for agents that leverages Symbl’s real-time speech analysis, custom trackers, and Nebula LLM, call center personnel can provide faster, more efficient support to customers in real-time.

You can take this solution further by integrating it with a customer relationship management (CRM) system to provide personalized assistance based on customer history. You can then use data gathered on calls to generate reports and insights with Symbl that can improve the overall performance of your call center.

Appendix

This is the sample knowledge corpus defined in the tutorial. You can store this in your knowledge_data.txt file.

---- SmartHome Hub X1 METADATA

Manufacturer: SmartHome Solutions

Category: Smart Home Hub

Wi-Fi Compatibility: 2.4GHz only

Key Features:

Voice control integration
Mobile app control
Compatible with 100+ smart devices
Energy monitoring

---- SmartHome Hub X1 Wi-Fi Connection Issues:

Ensure your smartphone is connected to a 2.4GHz Wi-Fi network.
Locate the reset pinhole on the back of the device.
Use a paperclip to press and hold the reset button for 10 seconds until the LED flashes blue.
Open the SmartHome app and select 'Add New Device'.
Choose 'SmartHome Hub X1' from the list.
Enter your Wi-Fi password when prompted.
Wait for the connection process to complete (LED will turn solid green when successful).

---- SmartHome Hub X1 Device Not Responding:

Check if the power cable is securely connected.
Verify that your Wi-Fi network is functioning properly.
Restart the hub by unplugging it for 30 seconds, then plugging it back in.
If issues persist, perform a factory reset using the pinhole button.

---- ConnectTech METADATA

Category: Smart Home Hub

Wi-Fi Compatibility: Dual-band (2.4GHz and 5GHz)

Key Features:

Alexa and Google Assistant integration
Z-Wave and Zigbee compatible
IFTTT support
Advanced automation rules

---- ConnectTech Wi-Fi Connection Issues:

Ensure your smartphone is connected to your home Wi-Fi network.
Press and hold the 'Connect' button on top of the device for 5 seconds until the LED blinks white.
Open the ConnectHome app and tap 'Add Device'.
Select 'ConnectHome Central 2000' from the list.
Choose your preferred Wi-Fi network (2.4GHz or 5GHz) and enter the password.
Wait for the connection process to complete (LED will turn solid blue when successful).

---- ConnectTech Device Pairing Issues:

Put your smart device into pairing mode (refer to device manual).
In the ConnectHome app, select 'Add Device' and choose the device type.
Follow the in-app instructions for your specific device.
If pairing fails, move the device closer to the hub and try again.
For stubborn devices, try resetting them to factory settings before pairing.

---- SyncTech Solutions METADATA

Category: Smart Home Hub

Wi-Fi Compatibility: 2.4GHz only

Key Features:

Local processing for faster response
Customizable automation rules
Open API for developers
Energy usage insights

---- SyncTech Solutions Wi-Fi Connection Issues:

Connect your smartphone to your 2.4GHz Wi-Fi network.
Power on the HomeSync Controller 500.
Wait for the LED to blink green, indicating it's ready to connect.
Open the HomeSync app and tap the '+' icon.
Select 'Add Hub' and choose 'HomeSync Controller 500'.
Follow the on-screen instructions to enter your Wi-Fi credentials.
The LED will turn solid green once successfully connected.

---- SyncTech Solutions Automation Issues:

Open the HomeSync app and go to the 'Automations' tab.
Review each automation rule to ensure all devices are online.
Check that trigger conditions and actions are correctly set.
Try deleting problematic rules and recreating them.
If issues persist, restart the hub by unplugging it for 10 seconds.

The post Real-Time AI Assistance for Call Center Agents appeared first on Symbl.ai.

Transform Multimodal Interaction Data with Symbl and Snowflake

Team Symbl — Thu, 17 Oct 2024 20:33:22 +0000

In today’s business landscape, data is the fuel that drives success. However, while structured data from CRM systems, transactions, and performance metrics are often well-managed, the wealth of unstructured interaction data remains largely untapped. How can organizations turn conversations into powerful insights and actions? By combining Symbl.ai’s contextual AI with Snowflake’s data capabilities, organizations can elevate their analytics, empower search-driven insights, and build agentic workflows that drive impactful business outcomes.

This blog post will dive into three core capabilities enabled by integrating Symbl.ai with Snowflake: Search, Data Enrichment, and Centralized Agentic Workflows.

1. Search: Making Interaction Data Actionable

Interaction data often holds critical information buried in conversations—whether they are customer calls, sales discussions, or internal meetings. Extracting valuable insights from this data requires more than basic transcription. With Symbl.ai’s AI-driven insights, unstructured interaction data becomes searchable, actionable, and a true source of business intelligence.

Using Symbl.ai, businesses can turn conversational data into structured insights such as action items, key questions, and topics discussed. When integrated with Snowflake, these insights become part of the broader data ecosystem, enabling powerful, context-aware search capabilities that make finding the right information quick and efficient.

Cortex Search: Powered by Purpose-Built Models and Custom AI

Cortex Search, utilizing advanced language models, offers semantic search functionality that enables users to look for specific interaction data with precision and context. Businesses have the option to leverage purpose-built models from Symbl.ai that are ready to deliver immediate value, as well as train custom models to fit their unique needs. This flexibility provides the foundation for tailored interaction intelligence, ensuring that businesses can achieve both rapid deployment and customized insights. Imagine being able to search for specific action items discussed in sales calls, identify customer objections across support conversations, or instantly find questions asked in product demos—all of these are made possible by combining Symbl.ai’s insights with Snowflake’s data management capabilities.

2. Enriching Existing Analytics: Supercharging Data with Interaction Insights and Industry-Specific KPIs

Most organizations already have structured data from CRM systems, sales performance metrics, and customer history. However, these data sources only tell half the story. Symbl.ai provides a new layer of contextual insights by extracting information from conversations—such as call sentiment, engagement levels, and action items—and enriching existing analytics in Snowflake. This enrichment adds depth and context, allowing for a much more comprehensive analysis of business performance.

Adding a New Layer of Richness to Business Analytics and Understanding Key Industry KPIs

Consider customer journey analytics, which traditionally relies on CRM and transaction data. Different industries have specific KPIs that can be enriched by interaction data. For example, in financial services, KPIs like customer satisfaction, cross-sell rate, and issue resolution time can be enhanced by analyzing customer conversations. In healthcare, metrics such as patient satisfaction and care quality can be improved through insights derived from patient interactions. Retail organizations often track customer loyalty, conversion rate, and average order value—all of which can be enriched with sentiment and engagement data. Finally, in telecommunications, metrics like customer retention, net promoter score (NPS), and average handling time can benefit significantly from deep insights into customer conversations. By incorporating sentiment trends from customer conversations into these analytics, organizations can better understand the emotional trajectory of their customers’ journey—insight that can help in personalizing the customer experience and addressing pain points more effectively.

Take the example of a sales call. Symbl.ai can provide metrics like call scoring that evaluates how well a sales rep is performing based on criteria such as communication, engagement, and adherence to the sales process. In retail and consumer goods, call scoring can also track product preferences and sentiment trends, helping sales teams to better tailor offers and promotions. When these scores are combined with CRM data in Snowflake, it gives sales managers a complete view of not just sales outcomes but also what drives those outcomes—leading to better training, more effective coaching, and ultimately improved sales performance.

Real-Time Data Enrichment

With Symbl.ai’s programmable APIs, the process of enriching data can be automated to deliver real-time insights to Snowflake. This means that enriched analytics dashboards are always up to date, providing decision-makers with the most current information available. Imagine a dynamic sales dashboard that not only displays the numbers but also reveals the sentiment and effectiveness of each sales interaction—allowing leaders to take immediate action where needed.

3. Centralized Backbone for New Agentic Workflows

Beyond enriching existing analytics, the integration of Symbl.ai with Snowflake lays the foundation for a centralized backbone that supports new agentic workflows. These workflows are proactive, contextually aware, and driven by real-time insights from interaction data.

What Are Agentic Workflows?

Agentic workflows are workflows that respond intelligently to business events, guided by the data at their disposal. By centralizing interaction insights from Symbl.ai in Snowflake, businesses can build workflows that do more than react—they anticipate, adapt, and act based on real-time information.

Filling Knowledge Gaps with Real-Time Insights

Symbl.ai’s real-time streaming APIs allow organizations to process data from live interactions and use it to fill gaps in the knowledge base that powers customer service and internal decision-making. Imagine a scenario where a customer support team is frequently asked questions that are not well documented in the existing knowledge base. With Symbl.ai providing real-time insights, these gaps can be identified and filled immediately, ensuring that the knowledge base evolves dynamically to meet customer needs.

In addition, Symbl.ai’s capabilities can support Retrieval-Augmented Generation (RAG) implementations that provide enhanced context to customer interactions. By continuously feeding new, relevant information into Snowflake, these RAG models can deliver more accurate and context-driven responses, empowering both AI-driven agents and human support teams with the latest information.

By using Snowflake as a centralized repository, these insights can be stored, analyzed, and used to continuously improve the performance of these knowledge-based workflows. The data stored in Snowflake serves as the backbone that makes these workflows more intelligent and effective over time.

Connectors for Seamless Integration

Symbl.ai offers connectors that integrate interaction data across various communication platforms, such as Genesys, Five9, and CPaaS systems. These connectors make it easy to aggregate and analyze interaction data from different sources into a centralized Snowflake repository. This means organizations can maintain a unified view of all customer interactions and use it to drive consistent and informed agentic workflows.

For example, a financial services customer support workflow could be enhanced to automatically identify recurring issues and notify the product team for resolution, thus improving customer satisfaction and reducing churn. In telecommunications, workflows can help identify common network issues from customer interactions and drive faster resolution times. Similarly, marketing workflows in the retail sector can use interaction data to tailor campaigns more effectively based on the topics and sentiments observed during customer conversations, directly impacting campaign ROI and customer engagement. Similarly, marketing workflows can use interaction data to tailor campaigns more effectively based on the topics and sentiments observed during customer conversations.

Bringing It All Together: A Modern Interaction Intelligence Ecosystem

The integration of Symbl.ai with Snowflake enables organizations to build a powerful ecosystem for interaction intelligence, centered on three key capabilities:

Search: Make unstructured interaction data easily discoverable and actionable through advanced, context-aware search.
Data Enrichment: Enhance existing business analytics with rich, contextual insights from interaction data, driving deeper understanding and more informed decision-making.
Centralized Agentic Workflows: Establish a centralized backbone for real-time, proactive workflows that respond intelligently to customer needs, enhancing both operational efficiency and customer experience.

Next Steps: Building Your Future with Symbl.ai and Snowflake

Integrating Symbl.ai and Snowflake provides a scalable, future-proof solution that enables organizations to harness the full potential of their interaction data. From powerful search capabilities and data enrichment to real-time, agentic workflows, this partnership offers the tools needed to transform raw data into actionable business outcomes.

Check out our step by step guide on how to build a sample real-time performance dashboard for the sales team by processing call recordings and transcripts from your existing meeting platforms – augmenting with salesforce data.

Start building today here.

The post Transform Multimodal Interaction Data with Symbl and Snowflake appeared first on Symbl.ai.

Accelerate Time to Value for Real-Time Assistance with Symbl.ai’s Real-Time Assist API

Team Symbl — Tue, 08 Oct 2024 15:30:38 +0000

We’re excited to introduce Symbl.ai’s Real-Time Assist API, the latest addition to our suite of generative APIs. The API empowers enterprises to bring in advanced real time AI capabilities such as contextual guidance to handle objections, compliance violations, and script guidance without extensive development and operational overhead. Enterprises can now reduce time-to-value for live assistance from months to days, enabling faster go-to-market without compromising quality.

The Challenge

For enterprises looking to build effective real-time agent assistance capabilities, the journey has been fraught with complexity, long development cycles, and resource-intensive maintenance.

Fragmented Technology Stack

Creating a real-time assistance solution typically involves manually integrating various components—Automatic Speech Recognition (ASR) systems, event detection models, knowledge bases, and Large Language Models (LLMs)—into a unified pipeline. Each of these components demands time-consuming evaluation and benchmarking, leading to extended setup times and a high risk of misalignment and failure. Additionally, building low-latency, scalable solutions with a fragmented technology stack is challenging. Integrating systems from different vendors can increase processing times and latency in delivering real-time insights. This delay hampers agents’ ability to respond swiftly, ultimately reducing the effectiveness of real-time assistance. As the volume of live calls rises, each component must be optimized for high traffic, which can create performance bottlenecks during peak demand.

Delayed Time-to-Market

Developing a reliable real-time assistance system could take enterprises 4-6 months, significantly delaying the time to market. This lag in deployment also led to delayed returns on investment, as teams struggled to build systems that could accurately detect events like objections, compliance violations, or deviations from scripts and generate real-time, contextually relevant responses.

Maintenance Overhead

Once the real-time assistance system was live, enterprises faced continuous maintenance challenges. Ensuring the system stayed up-to-date with new objections, compliance issues, and business priorities required ongoing manual intervention, making the entire process both expensive and resource-intensive. In some cases, teams had to rebuild or reconfigure key components, further adding to the operational burden.

The Symbl.ai Solution

Symbl.ai’s Real-Time Assist API addresses these challenges by offering an end-to-end, fully integrated solution that simplifies real-time agent assistance while reducing setup time.

Unified Platform for Real-Time Guidance

With Symbl.ai’s API, there’s no need to manually stitch together different technologies. Symbl.ai integrates ASR, event detection, and real-time response generation into a single API. The system automatically detects critical events—such as objections or compliance risks—during live calls and generates contextually relevant responses using your business’s knowledge base. Built for high-performance environments, Symbl.ai’s Real-Time Assist API ensures low-latency responses, guaranteeing fast, accurate assistance during live interactions, even at scale. Enterprises no longer need to spend months evaluating, integrating, and optimizing disparate tools.

Accelerated Time-to-Value

Symbl.ai’s Real-Time Assist API drastically cuts down the time to deploy real-time assistance, allowing enterprises to go live in days rather than months. The API comes with pre-built event detection capabilities and customizable objection types, ensuring that your teams get up and running quickly while maintaining full flexibility to tailor responses based on your business needs.

Minimal Maintenance with Continuous Improvement

Symbl.ai’s solution is designed to adapt as your business evolves. The API continuously learns from new call data, surfacing new objections and compliance issues so that appropriate guidance can be developed for agents. This ensures that real-time assistance stays relevant and up-to-date. Enterprises can focus on strategic initiatives while Symbl.ai manages the complexity of real-time assistance behind the scenes.

Key Features

Objection Handling: When a customer raises an objection—whether about pricing, product fit, or competitors—the API instantly provides the agent with relevant data and guidance, increasing the likelihood of a positive outcome.
Script Adherence: The API ensures agents follow approved scripts, guiding them back on track if they deviate and guaranteeing a consistent, high-quality experience across conversations.
Compliance Monitoring: Symbl.ai monitors calls in real time, prompting agents to deliver necessary disclosures or follow industry regulations, ensuring full compliance.
Real-Time Q&A: When customers ask detailed or technical questions, the API provides agents with accurate answers drawn directly from your company’s knowledge base, eliminating delays and enhancing customer satisfaction.

Note: In this initial version, the Real-Time Assist API only supports objection handling. Additional features will be made available in future releases.

How It Works

The diagram above illustrates the different components of Symbl.ai’s Real-Time Assist API, showing how they work together to deliver real-time guidance during live conversations.

Real-Time Assist User: User who interacts with different components of the API
Management API: Handles the configuration of essential elements such as Real-Time Assist (RTA) instances, assistants, and context. This ensures Real-Time Assist is tailored to the specific needs of the organization.
Real-Time API: Facilitates the live streaming of audio and video from your application to Symbl.ai. This API manages the start and stop requests for streaming, processing the data in real-time to generate contextually relevant responses and guidance based on the assistants configured for an RTA.
Customer Knowledge Base: Provides the necessary data and information that the API references during live interactions to ensure responses are based on organizational context and knowledge. This is where you can add product related information or any necessary information for the assistants you add.
Large Language Model (LLM): Generates assistant responses by combining the context from the call and the knowledge base.
Conversation API: Provides post call analytics related to different assistants such as objection handling, Q&A, compliance, script adherence, and communication style.

Get Started

Getting started with the Real-Time Assist API is straightforward and involves two steps:

Setup RTA:
- Select the assistant type (e.g., objection handling) and specify the objections for the system to address in real time.
- Provide context for handling objections, such as FAQs, sales playbooks, or product information as text documents.
Stream Live Conversation:
- Stream your live conversation to the Real-Time RTA API via WebSockets and receive instant assistance for the configured assistants.

To learn more, refer to our documentation.

Schedule a demo today to see how Symbl.ai can accelerate your journey to integrating real-time assistance into your platform, driving customer satisfaction and enhancing agent productivity.

The post Accelerate Time to Value for Real-Time Assistance with Symbl.ai’s Real-Time Assist API appeared first on Symbl.ai.

Open-Source vs Closed-Source LLMs: Which is the Best For Your Organization?

Team Symbl — Fri, 04 Oct 2024 17:59:59 +0000

Even in their relative infancy, Large Language Models (LLMs) have revolutionized the way organizations approach their work. Boosting productivity and excelling at tasks such as question-answering, sentiment analysis, text synthesis and summarization, LLMs can be applied to a growing number of use cases that enable companies to provide superior products and services – while saving time and money. As LLM-powered applications become more integral to a company’s ability to keep pace in an increasingly competitive landscape, a crucial decision faced by organizations is choosing between an open-source or closed-source model for their projects.

In this post, we provide a comprehensive comparison of open-source and closed-source LLMs, detailing the differences between them, their respective pros and cons, and, most importantly, how to determine which is best for your organization.

Understanding LLMs

LLMs are deep learning models that have been trained on vast amounts of textual data to learn the relationships and patterns within human languages. As a result of this, they’re able to predict the next word (or, more accurately, the next token, i.e., ¾ of a word) in sequence, enabling them to understand and, subsequently, generate text. This grants LLMs a range of powerful natural language understanding (NLU) and processing (NLP) capabilities including question-answering (QA), machine translation and document analysis among others that have led to the development of increasingly powerful AI- applications.

Although LLMs have their roots in the 1960s, with the development of the Eliza model, and have been the subject of consistent research since the late 1990s, it was the introduction of the Transformer Architecture in 2018 that ushered in the sophisticated LLMs we have today. Google developed BERT soon afterwards, which improved on the Transformer by adding bidirectional representations (reading sequences from left-to-right and right-to-left), which allows models to be trained on larger datasets in less time and with greater stability. This innovation then influenced OpenAI’s Generative Pre-Trained Transformer (GPT) used in popular applications such as ChatGPT.

Open Source LLMs

Open-source LLMs are characterized by their source code being publicly available, allowing anyone to use, modify, and distribute them. Though usually initially developed by a small team – or even an individual – open-source models are improved upon collaboratively by their communities, facilitating innovation and increased understanding.

Use cases for open-source LLMs include:

Customized Solutions: projects requiring tailored solutions or for the organization to retain complete control of the model. Consequently, they are well-suited for solutions that use private data, as companies can still track the flow of data within the system.
Educational and Research: their accessibility and transparency make them ideal for academic and research purposes, where practitioners can deeply explore their inner workings and push their capabilities.

Examples of popular open-source LLMs include:

LlaMA series (2, 3)
Mistral series (7, 7x8B, etc.)
Falcon 180B
Grok AI
MPT series (7B, 30B)
HuggingFace Bloom

The Pros and Cons of Open-Source LLMs

Let us explore the benefits and drawbacks of open-source language models.

Pros

Control: organizations can retain greater control over their models, enabling the use of sensitive data without fear of how it could be used by vendors and, consequently, running into non-compliance issues.
Transparency: similarly, open-source LLMs provide greater transparency, giving companies greater insight into how a model works and arrives at its outputs. Additionally, the accessibility of an open-source model’s code enables thorough security audits to ensure potential security or ethical issues can be identified and remediated quickly.
Community Support and Collaboration: the global communities of widely-used open-source LLMs, composed of researchers and developers, are very engaged and contribute to their maintenance and development.
Cost-Effectiveness: open-source LLMs are free to use, on a small scale, at least, which significantly reduces the barriers to entry for AI adoption.

Cons

Less Secure: because open-source LLMs are collectively overseen, they can be subject to less stringent security testing and fewer updates than closed models, for which there is clearer ownership and responsibility.
Stability: as they receive less consistent maintenance and have less vigorous QA standards, open-source LLMs can exhibit instability. Their developers also tend to have fewer resources to make models as robust as possible.
Integration Challenges: open-source LLMs often offer fewer integrations with existing tools and platforms, causing a lack of standardized APIs and other compatibility issues.

Closed Source LLMs

Closed-source LLMs are proprietary models developed and maintained by private vendors. In contrast to open-source models, their source code is not publicly accessible, and their usage typically requires a licensing fee.

Use cases for open-source LLMs include:

Commercial and Enterprise Solutions: companies often prefer closed-source models because of their stability, security, and vendor support – all of which are for enterprise-level applications.
Industry-Specific Applications: closed-source models can be optimized for specific industries, such as healthcare or finance, or specific tasks, such as conversational analysis or educational support, to offer specialized functionality that boosts productivity and efficiency in a particular field.

Examples of popular closed-source LLMs include:

OpenAI GPT series (3.5, 4, 4-o, o1)
Google Gemini
Anthropic Claude series (2, 3, 3.5)
Command R
Nebula

Pros and Cons of Closed-Source LLMs

Let us now turn our attention to assessing the advantages and disadvantages of closed-source language models.

Pros

More Performant and Robust: closed LLMs are developed and supported by a dedicated team of experts with the considerable computation resources required to build and support large models with vast capabilities. They are also governed by policies and controls that ensure they are extensively tested and refined to make them as secure and stable as possible.
Proprietary Innovations: similarly, as closed-source LLM vendors possess greater computational and personnel resources, their models often feature cutting-edge capabilities and optimizations that are not yet available in open-source alternatives.
Simpler Integration: closed LLMs usually require minimal configuration -with many even “plug-and-play”. This considerably lowers the barriers to adoption by making AI technology available to companies that lack the required in-house, technical knowledge.
Consistency and Support: vendors provide dedicated support for their models, ensuring consistent performance and reliable troubleshooting – including detailed API documentation. Developers regularly release updates for their closed-sources models, for enhanced performance and to fix bugs and security vulnerabilities.

Cons

Cost: companies are typically required to pay a licensing fee to access a closed-source LLM, adding to initial setup costs.
Data privacy: many closed-source LLM vendors could use the data you enter into their models for future training and research purposes, as stipulated in their privacy agreements. This raises privacy concerns for companies as they won’t be able to account for the location and security of sensitive data – leading to non-compliance with data privacy legislation.
Limited flexibility and transparency: limited access to the model’s architecture and training data makes them less suitable for experimentation and research. This also makes it challenging for users to fully understand why the model generates certain output and how to improve it.

Open Source vs Closed Source LLMs: A Comparative Analysis

With a better understanding of both types of models, let us compare them across several aspects to help you determine which is the best fit for your project.

Cost Comparison

Although open-source LLMs can be more cost-effective, as you avoid licensing fees, their total cost of ownership increases when you factor in the need for in-house personnel for its setup and maintenance, as well as additional infrastructure, whether on-premise or cloud-based. Closed-source models, while incurring ongoing usage costs, don’t require infrastructure upgrades and typically come with comprehensive support and maintenance, reducing operational overhead while providing predictable expenses.

Flexibility and Customization

Open-source LLMs are more flexible, which allows for extensive customization for specific use cases. Their greater transparency and control also better enable the use of private data. Closed-source LLMs are less transparent, by definition, and while some offer some fine-tuning capabilities, they are less customizable than open-source models.

Security Considerations

Closed source models are typically more secure as only a few authorized individuals have access to the codebase and they are regularly updated. The code for open-source models, in contrast, is publicly available, so malicious actors can better identify vulnerabilities. Also, security updates are contributed by the model’s development community, so they’re less frequent and effective than those for closed-source models.

Performance and Support

Closed-source models are usually more performant, as they’re backed by vendors with more resources at their disposal. They also come with dedicated support, which is especially useful for organizations without the expertise to maintain the model. Conversely, open-source models tend to be stable, as they’re subject to less compliance and safeguards. Meanwhile, support is provided by the model’s community through documentation, forums, etc., and is less structured than direct vendor support.

Conclusion

Ultimately, choosing between an open-source or closed-source LLM for your organization’s digital transformation projects depends on your particular needs and available resources. While open-source models offer the flexibility and cost savings ideal for smaller tailored projects and research, closed models provide the reliability and support vital for production-grade applications. It’s important to assess your requirements carefully to select the type of model that will best support your organization’s objectives and align with your digital roadmap.

The post Open-Source vs Closed-Source LLMs: Which is the Best For Your Organization? appeared first on Symbl.ai.

Streaming Databases for Building Real-Time GenAI Applications

Team Symbl — Fri, 04 Oct 2024 17:52:52 +0000

The advent of Generative AI (GenAI) has revolutionized an array of industries by ushering in applications capable of generating human-like text, images, audio, and increasingly more. However, to develop reliable, real-time GenAI applications, it is not only the size and capabilities of the underlying model that are important but also how quickly and efficiently your application can process data. This is where streaming databases come into play.

This post explores the role of streaming databases in building real-time GenAI applications, providing insights into how they work and their benefits, as well as an overview of the top data streaming solutions and how to select the most suitable option for your GenAI application.

Understanding Streaming Databases

Streaming databases are designed to process continuous data streams in real time as they are generated from a source, such as an application. This is in contrast with a traditional relational database management system (RDBMS), in which static data is typically processed in batches at regular intervals, i.e., batch scheduling.

Initially developed for the financial industry, to handle the high-velocity data associated with stock trading and fraud detection, streaming databases have evolved to support a wide range of real-time applications across all industries. By collecting, processing, and analyzing data as soon as it’s available, streaming databases enable immediate actions and insights within applications and systems.

As opposed to a specific type of database, the term “streaming database” actually refers to several types of databases that process streaming data in real-time, including in-memory, NoSQL, and time-series databases. The core capabilities of a streaming database include:

Data Streaming: the ability to process a continuous flow of data generated from various sources, known as producers. As producers generate data, the streaming database processes it and delivers it to endpoints, referred to as consumers.
Event-Driven Processing: instead of querying sources for new data, streaming databases listen for predefined events, such as data being added or changed, which trigger it to process data. This is essential for time-critical applications for which intermittent batch processing isn’t feasible.
Real-Time Analytics: streaming databases allow for the instant analysis of live data, which enables faster, data-driven decision-making and the ability to deliver better products and services.

The Benefits of Streaming Databases for GenAI Applications

Here are some of the key advantages that streaming databases bring to GenAI applications:

Real-Time Data Processing: streaming databases enable real-time data processing, which is essential for GenAI applications that require the most up-to-date information to perform effectively. Real-time chatbots and recommendation systems, for instance, benefit from streaming databases as they grant them access to the most current available information – resulting in superior user experiences.
Scalability and Performance: streaming databases are designed to process large volumes of continuous data with minimal latency by distributing processing tasks across multiple nodes. This enables horizontal scaling as data loads increase, which is desirable for GenAI applications that need to process vast amounts of data efficiently.
Easy Integration with AI Tools: GenAI platforms integrate seamlessly with streaming databases for efficient end-to-end application development. As well as enhancing the performance of GenAI applications, streaming databases support the continuous training and updating of AI models, by providing real-time access to continuously updated datasets, helping to improve their accuracy and capabilities.

Top Streaming Databases for Real-Time GenAI Applications

Let us turn our attention to looking at the top streaming data platforms, considering open-source, source-available, and closed options.

Open-Source Streaming Databases: as they’re free to use (to a certain extent, in some cases), you might opt for an open-source platform if you need to minimize costs. Alternatively, if customization and control are required for your project, the transparency and access to the code provided by open-source databases make them a great fit.

Apache Kafka: a widely-used data streaming platform known for its high throughput, fault tolerance, and scalability, making it ideal for building enterprise-level streaming applications
RisingWave: well-suited for GenAI applications with its real-time SQL-based analytics and cloud-native scalability, enabling dynamic data exploration and interactive generation tasks.
Arroyo: provides low latency and fault-tolerant processing, crucial for real-time adjustments and instant responses from GenAI applications that require real-time content generation and refinement.-

Source-Available Streaming Databases: the licensing agreements for source-available streaming databases sit between open and closed-source to various degrees, with their vendors placing certain restrictions on how you can use their database, e.g., the number of users.

KsqlDB: Built on Kafka Streams, it provides powerful real-time SQL queries on streaming data, making it effective at interactive data manipulation and dynamic content generation.
Materialize: provides instant materialized views with minimal latency, facilitating real-time data insights and immediate feedback, essential for interactive GenAI applications that rely on up-to-date data.
EventStoreDB: specializes in event sourcing with strong consistency, allowing for efficient handling of event-driven GenAI applications – particularly when tracking and managing complex event histories.

Closed-Source Streaming Databases: when stability and security are of paramount importance, as in production environments, then a closed-source streaming database is most suitable. Another reason to go the closed-source route is if support is essential, as the vendor takes responsibility – as opposed to it being a community effort, as with open-source solutions.

Timeplus: excels in time-series analysis with advanced queries and visualizations, which is ideal for GenAI applications that involve real-time monitoring and dynamic content generation based on temporal data
DeltaStream: offers seamless real-time data transformation and rapid pipeline deployment, which is essential for GenAI applications that require continuous and dynamic data processing.

Considerations for Selecting a Streaming Database

Here are the main aspects to consider when choosing a streaming database for your GenAI application:

Performance Metrics: evaluate metrics such as latency, throughput, and processing speed to ensure your chosen database platform meets the performance demands of your application.
Scalability: how well the database handles increasing data volumes and concurrent users without compromising performance. This includes its support for vertical, horizontal, and elastic, i.e., automatic, scaling.
Ease of Use and Integration: how intuitive the streaming database’s user interface is, as well as how easily it integrates with the existing tools and platforms within your application’s ecosystem.
Cost Considerations: calculate the initial setup and ongoing operational costs, including licensing fees, infrastructure, and maintenance. It’s also important to factor in the potential hidden expenses associated with managing open-source solutions, such as support costs.

Conclusion

By providing continuous, event-driven data processing, streaming databases play a crucial role in the development of performant real-time GenAI applications. Their scaling capabilities ensure performance isn’t compromised as your application grows and their ease of integration incurs a low technical overhead when adding them to your IT ecosystem.

As organizations find new and innovative ways to integrate GenAI into their operations, the use of streaming databases is sure to become increasingly common. We encourage you to explore the concepts and solutions from this post further, so you can determine how streaming databases can improve the efficacy of your real-time GenAI applications.

The post Streaming Databases for Building Real-Time GenAI Applications appeared first on Symbl.ai.

How to Fine-Tune GPT on Conversational Data

Team Symbl — Thu, 08 Aug 2024 17:51:31 +0000

While AI has been around in various forms for decades, and has had mainstream applications, such as chatbots and virtual assistants (e.g., Alexa), it was ChatGPT that undoubtedly sparked the AI revolution we are currently in. With the most recent estimates placing its user base at over 180 million people, ChatGPT is not only the most popular AI application but one of the most popular applications overall – and the fastest-growing consumer application in history.

However, despite its immense capabilities and advantages, ChatGPT, or more specifically, GPT, the learning language model (LLM) that powers the application has a few limitations – particularly when it comes to commercial use.

Firstly, it lacks specialized knowledge. Naturally, this is to be expected: GPT, which stands for Generative Pre-trained Transformer, can’t be expected to know everything, especially when the overall field of human knowledge is growing so rapidly. But also, in a more practical sense, GPT has a cutoff based on when its training process ended, e.g., the latest GPT-4-o models’ knowledge ends in Oct 2023.

Secondly, and more importantly, there are limitations around the use of private and/or proprietary data. On one hand, there’s no guarantee that GPT will understand an organization’s distinct data formats or the nature of requests made by users – resulting in diminished efficacy at more specialized takes. On the other hand, there’s the problem of OpenAI using the sensitive data fed into GPT to train future models. Consequently, companies that enter private data into GPT may be unintentionally sharing sensitive information – making them non-compliant with data privacy laws.

Despite these challenges, as organizations become increasingly aware of the productivity-enhancing and cost-saving potential of generative AI, they are motivated to integrate LLMs like GPT with their distinct workflows and proprietary and private data. This is where LLM fine-tuning comes in.

Fine-tuning is the process of taking a pre-trained base LLM and further training it on a specialized dataset for a particular task or knowledge domain. The pre-training stage involves inputting vast amounts of unstructured data from various sources into an LLM. Fine-tuning an LLM, conversely requires a smaller, better-curated, and labeled domain or task-specific dataset.

With all this in mind, this post takes you through the process of fine-tuning GPT for conversational data. We will detail how to access OpenAI’s interface, load the appropriate dataset, fine-tune your choice of model, monitor its progress, and make improvements, if necessary.

How to Fine-Tune GPT on Conversational Data: Step-by-Step

Set Up Development Environment

First, you need to prepare your development environment accordingly by installing the OpenAI software development kit (SDK). We will be using the Python version of the SDK for our code examples, but it is also available in Node.js and .NET. We are also installing the python-dotenv, which we will need to load our environment variables.

pip install openai python-dotenv

# For Python Version 3 and above

pip3 install openai python-dotenv

From there, you can import the OpenAI class as shown below. This will return a client object that is used to access the OpenAI interface and that acts as a wrapper around calls to its various APIs.

import os
from openai import OpenAI

client = OpenAI(
  api_key=os.environ['OPENAI_API_KEY'],
)

To access OpenAI’s APIs, you’ll also need an API key, which you can obtain by signing up for its developer platform. Above, we have accessed our API key, by creating an instance of the os class; this allows us to retrieve the API key from a .env file like the one shown below.

# .env 

OPENAI_API_KEY=your_openai_api_key

Choose a Model to Fine-Tune

With your environment established, you need to choose the model you want to fine-tune; OpenAI currently offers the following models for fine-tuning:

GPT-4o-mini-2024-07-18
GPT-3.5-turbo
davinci-002
babbage-002

When looking at OpenAI’s pricing, you can see that the newest model, Gpt-4o-mini, is the 2nd lowest priced after babbage 002 – despite being the most current model with the largest context length. This is because GPT-4-o-mini is a scaled-down version of GPT with fewer parameters; this results in a lower computational load and, consequently, lower costs. In contrast, GPT-3.5-turbo and davinci-002 are larger models with a greater number of parameters and a more complex architecture – hence their training prices being higher. Ultimately, your model of choice will depend on the specific needs of your conversational use case – and your allotted budget.

Prepare the Datasets

Having set up your environment and decided on the model you want to fine-tune, next comes the vital step of preparing your fine-tuning data. For this example, we’re going to use the Anthropic_HH_Golden dataset, hosted on HuggingFace, which is an excellent resource for downloading datasets, as well as all other aspects of AI application development.

The Anthropic dataset is suitable because it contains a large variety of conversational data to use in our fine-tuning use case. Additionally, the dataset already conforms to the same format as OpenAI’s Chat Completions API: a prompt-completion pair as shown below:

{"prompt": "", "completion": ""}
{"prompt": "", "completion": ""}
…

Lastly, this dataset is already conveniently divided into training and evaluation subsets, which saves us the effort of splitting it ourselves. Dividing the dataset in this way ensures that the model encounters different data during the fine-tuning and evaluation stages, which helps to prevent overfitting, i.e., where the model can’t generalize to unseen data.

To download the dataset, you must clone its git repository onto your device with the following command:

git clone https://huggingface.co/datasets/Unified-Language-Model-Alignment/Anthropic_HH_Golden

Upload the Training Datasets

Once you have prepared your datasets, the next step is uploading them using the Files API. The code snippet below uploads the training and evaluation datasets we created earlier and uses them to create a pair of file objects that will be used during the fine-tuning process.

training_dataset = client.files.create(
  file=open(training.jsonl, "rb"),
  purpose="fine-tune"
)

evaluation_dataset = client.files.create(
  file=open(evaluation.jsonl, "rb"),
  purpose="fine-tune"
)

Printing the file object allows you to examine its structure, an example of which is shown below:

print(training_dataset)

#Response

{
  "id": "file-vGysBmsoqVn9c2TKB2D3s9',",
  "object": "file",
  "bytes": 120000,
  "created_at": 1677610602,
  "filename": "training.jsonl",
  "purpose": "fine-tune",
}

The key thing of note here is the “id” attribute, which is used to uniquely identify the file object.

Create a Fine-Tuning Job

After loading your datasets and turning them into the required file objects, it’s now time to create a fine-tuning job through the fine-tuning API. In addition to creating a job, the fine-tuning API enables you to retrieve an existing job, check the status of a job, cancel a job, or list all existing jobs.

The only required parameters are model, the name of the model you want to fine-tune (which can be one of Openai’s base models, discussed earlier, or an existing fine-tuned model), and training_file, the file object’s ID that was returned when uploading the training file in the previous step. You can copy and paste in the ID explicitly, or by accessing the id attribute directly as shown below.

Additionally, as we have loaded an evaluation dataset, we will use it to create our fine-tuning job object, resulting in the code below.

ft_job = client.fine_tuning.jobs.create(
  model="model_name"
  training_file=training_dataset.id, 
  validation_file=evaluation_dataset.id,",
  )

You can also choose to include hyperparameters when initially creating your fine-tuning job. OpenAI presently allows the configuration of three hyperparameters:

Number of epochs: the number of complete passes through the entire fine-tuning dataset
Learning rate multiplier: a scaling factor applied to the base learning rate, which changes the speed at which the model’s weights are updated during fine-tuning
Batch size: the number of fine-tuning samples processed simultaneously before the model’s parameters are updated

However, it is recommended that you start fine-tuning without initially defining any hyperparameters, as OpenAI’s API will automatically configure them based on the size of the dataset. If you were to disregard this advice and wanted to include hyperparameters to establish finer control over the fine-tuning process, the code from above would now look as follows (using example values):

ft_job = client.fine_tuning.jobs.create(
  model="model_name"
  training_file=training_dataset.id, 
  validation_file=evaluation_dataset.id,",
  hyperparameters={
    "n_epochs": 10,
	"batch_size": 8,
	"learning_rate_multiplier": 0.3
  }
)

Similar to loading the fine-tuning data, the above operation will return a fine-tuning job object. This also has an id that remains important throughout the process as it’s used to reference and access the fine-tuning job for subsequent tasks. Once a fine-tuning job is complete, you will receive confirmation via email; the amount of time this requires will differ, depending on your choice of model and the size of the dataset.

Check the Status of Your Model During Fine-Tuning

While your model is being fine-tuned, i.e., the job is in progress, you can check its ongoing status by requesting a list of events. OpenAI provides the following training metrics during the fine-tuning process:

Training loss: measures how well the model’s predictions match the target values in the training dataset; the lower the training loss, the better the model’s performance.
Training token accuracy: the percentage of tokens, i.e., segments of output, correctly predicted by the model.
Valid loss: measures the model’s performance on the evaluation (or validation) dataset, assessing its ability to generalize to unseen data points.
Valid token accuracy: the percentage of tokens correctly predicted by the model for the evaluation dataset – indicating its accuracy to generalize on new data.

You can request to see a list of events with the code below. The limit parameter defines how many events to list; if left undefined, OpenAI will produce 10 by default.

client.fine_tuning.jobs.list_events(
  fine_tuning_job_id=ft_job.id,
  limit=2
)

Access Your Fine-Tuned Model

Although the fine-tuned model should be ready once the job is complete, it may take a few minutes for it to become available for use. If you cannot find your model, through its id, or requests to your model time out, it’s probably still loading and will be available shortly.

When a fine-tuning job has been completed successfully, however, you will be able to use the retrieve function to search for it by its id – where you will now see the fine_tuned_model attribute contains the name of the model, where it previously was null. Additionally, the status attribute should now read “succeeded”.

The code below shows how to retrieve a fine-tuning job by its id and the structure of the object you will receive in response. Again, while you could explicitly enter the fine-tuning job’s id, here, we are accessing it via the id attribute of the object.

ft_retrieve = client.fine_tuning.jobs.retrieve(ft_job.id)

print(ft_retrieve)

#Response

{
  "object": "fine_tuning.job",
  "id": "ftjob-abc123",
  "model": "davinci-002",
  "created_at": 1692661014,
  "finished_at": 1692661190,
  "fine_tuned_model": "ft:davinci-002:my-org:custom_suffix:7q8mpxmy",
  "organization_id": "org-123",
  "result_files": [
      "file-abc123"
  ],
  "status": "succeeded",
  "validation_file": "file-sFseAwXoqWn8c2ZDB24j4",
  "training_file": "file-vGysBmsoqVn9c2TKB2D3s9",
  "hyperparameters": {
      "n_epochs": 4,
      "batch_size": 1,
      "learning_rate_multiplier": 1.0
  },
  "trained_tokens": 5768,
  "integrations": [],
  "seed": 0,
  "estimated_finish": 0
}

You can now specify the use of this model by passing it as a parameter to the Chat Completions API (for gpt-3.5-turbo and gpt-4-o-mini) or within the OpenAI Playground to test its capabilities. Alternatively, if you’ve selected babbage-002 or davinci-002, you would use the Legacy Completions API.

completion = client.chat.completions.create(
  model="your fine-tuned model",
  messages=[
    {"role": "system", "content": "insert context here"},
    {"role": "user", "content": insert prompt here"}
  ]
)

print(completion.choices[0].message)

Access Model Checkpoints

As well as producing a final model when a fine-tuning job is complete, model checkpoints are created at the end of each training epoch. Each checkpoint is a complete model that can be used in the same way as a fully fine-tuned model.

Checkpointing is highly beneficial as it provides a layer of fault recovery: giving you a jumping-off point if your model crashes or the training process is interrupted for any reason – which increases in likelihood with the size of your model. Similarly, they give you a place to revert to if your model’s performance decreases with extra training, e.g., starts to overfit. All in all, model checkpointing adds structure and security to the fine-tuning process and allows for more experimentation.

To access model checkpoints, the fine-tuning job must first finish successfully: you can confirm its completion by querying the status of a job, as shown in the previous step. From there, querying the checkpoints’ endpoint with the fine-tuning job’s id will produce a list of checkpoints associated with the fine-tuning job. As with requesting events, the limit parameter defines how many checkpoints to create – otherwise, it is 10 by default.

client.fine_tuning.jobs.list_checkpoints(
  fine_tuning_job_id=ft_job.id,
  limit=2
)

For each checkpoint object, the fine_tuned_model_checkpoint field is populated with the name of the model checkpoint, as shown below.

{
  "object": "list"
  "data": [
    {
      "object": "fine_tuning.job.checkpoint",
      "id": "ftckpt_zc4Q7MP6XxulcVzj4MZdwsAB",
      "created_at": 1721764867,
      "fine_tuned_model_checkpoint": "ft:gpt-4o-mini-2024-07-18:my-org:custom-suffix:96olL566:ckpt-step-2000",
      "metrics": {
        "full_valid_loss": 0.134,
        "full_valid_mean_token_accuracy": 0.874
      },
      "fine_tuning_job_id": "ftjob-abc123",
      "step_number": 2000,
    },
    {
      "object": "fine_tuning.job.checkpoint",
      "id": "ftckpt_enQCFmOTGj3syEpYVhBRLTSy",
      "created_at": 1721764800,
      "fine_tuned_model_checkpoint": "ft:gpt-4o-mini-2024-07-18:my-org:custom-suffix:7q8mpxmy:ckpt-step-1000",
      "metrics": {
        "full_valid_loss": 0.167,
        "full_valid_mean_token_accuracy": 0.781
      },
      "fine_tuning_job_id": "ftjob-abc123",
      "step_number": 1000,
    },
  ],
  "first_id": "ftckpt_zc4Q7MP6XxulcVzj4MZdwsAB",
  "last_id": "ftckpt_enQCFmOTGj3syEpYVhBRLTSy",
  "has_more": true
}

Improving Your Model

If after testing your model, it doesn’t perform as expected or isn’t as consistently correct as you anticipated, it is time to improve it. OpenAI enables you to refine your model in three ways:

Quality: improving the quality of the fine-tuning data
- Double-check all data points are formatted correctly
- If your model struggles with particular types of prompts, add data points that directly demonstrate to the model how to respond to these accordingly.
- Refine your dataset’s diversity, i.e., ensure it has examples that reflect an accurate range of prompts and responses.
Quantity: increasing the size of the dataset
- The more complex the task for which you are fine-tuning, the more data you’re likely to require.
- Increasing the size of the dataset means it is likely to contain a greater number of unconventional data points, i.e., edge cases, allowing the model to learn to generalize to them more effectively.
- Increasing the size of the dataset is also likely to remediate overfitting, as the model has more data from which to learn its true, underlying relationships – as opposed to just learning the correct responses.
Hyperparameters: adjusting the hyperparameters of the fine-tuning job. Here are some guidelines on when to increase or decrease each hyperparameter:
- Number of epochs
  - Increase if: the model is underfitting, i.e., underperforming on both training and validation data; the model is converging slowly, i.e., the model’s training and valid loss is decreasing but has not stabilized.
  - Decrease if: the model is overfitting, i.e., performing well on training data but not the evaluation dataset; the model converges early in the training process but loss increases after additional epochs.
- Learning rate multiplier
  - Increase if: the model is converging slowly; you’re working with a particularly large dataset.
  - Decrease if: the model’s loss fluctuates considerably, i.e., oscillation; it is overfitting.
- Batch size:
  - Increase if: the model is being fine-tuned successfully: you can probably afford to increase the batch size to accelerate the process; if the model’s loss is oscillating.
  - Decrease if: the model is converging poorly (smaller batches allow models to learn the data more thoroughly); the model is overfitting and other hyperparameter adjustments prove ineffective.

Conclusion

In summary, the steps for fine-tuning GPT on conversational data include:

Choosing a model to fine-tune
Preparing the datasets
Uploading the training datasets
Creating a fine-tuning job
Checking the status of your model during fine-tuning
Accessing your fine-tuned model
Accessing model checkpoints
Improving your model

Fine-tuning is an intricate process but can transform the efficacy of generative AI applications when applied correctly. We encourage you to develop your understanding and competency with further experimentation. This includes configuring different hyperparameters, loading different datasets, and attempting to fine-tune the different models OpenAI has available. You can learn more by referring to the resources we have provided below.

Alternatively, if you’d prefer to sidestep the process of fine-tuning an LLM altogether, Nebula LLM is specialized to support your organization’s conversational use cases.

Nebula LLM is Symbl.ai’s proprietary large language model specialized for human interactions. Fine-tuned on well-curated datasets containing over 100,000 business interactions across sales, customer success, and customer service and on 50 conversational tasks such as chain of thought reasoning, Q&A, conversation scoring, intent detection, and others, Nebula is ideal for tasks and workflows involving conversational data:

Automated Customer Support: Nebula LLM can be used to equip chatbots with more authentic, engaging, and helpful conversational capabilities.
Real-time Agent Assistance: extract key insights and trends to help human agents on live calls and enhance customer support, including generating conversational summaries, generating responses to objections, and suggesting follow-up actions
Call Scoring: score the important conversations taking place within your organization, based on performance criteria such as communication and engagement, question handling, and forward motion to assess a human agent’s performance and enable targeted coaching.

To learn more about the model, sign up for access to Nebula Playground.

Additional Resources

The post How to Fine-Tune GPT on Conversational Data appeared first on Symbl.ai.

Building Performant Models with The Mixture of Experts (MoE) Architecture: A Brief Introduction

Team Symbl — Wed, 24 Jul 2024 17:00:00 +0000

Understanding The Mixture of Experts (MoE) Architecture

Mixture of experts (MoE) is an innovative machine learning architecture designed to optimize model efficiency and performance. The MoE framework utilizes specialized sub-networks called experts that each focus on a specific subset of data. A mechanism known as a gating network directs input to the most appropriate expert for addressing the given query.

This results in only a fraction of the model’s neural network being activated at any given time, which reduces computational costs, optimizes resource usage, and enhances model performance.

While the MoE architecture has gained popularity in recent years, the concept is not a new one, having first been introduced in the paper Adaptive Mixture of Local Experts (Robert A. Jacobs et al, 1991). This pioneering work proposed dividing an AI system into smaller, separate sub-systems, with each specializing in different training cases. This approach was shown to not only improve computation efficiency but also decrease training times – achieving target accuracy with fewer training epochs than conventional models.

How Mixture of Experts (MoE) Models Work

MoE models comprise multiple experts within a larger neural network – with each expert itself being a smaller neural network with its own parameters, i.e., weights and biases, allowing them to specialize in particular tasks. The MoE model’s gating network is responsible for choosing the best-suited expert(s) for each input, based on a probability distribution – such as a softmax function.

This structure enforces sparsity, or conditional computation: only activating relevant experts and, subsequently, selecting portions of the model’s overall network. This contrasts with the density of conventional neural network architectures, in which a larger amount of layers and neurons are required to process every input. As a result, MoEs can maintain a high capacity without proportional increases in computational demands.

The Benefits and Challenges of MoE Models

The MoE architecture offers several benefits over traditional neural networks, which include:

Increased Efficiency: By only activating a fraction of the model for each input, MoE models can be efficient and reduce overall computational demands.
Scalability: MoE models can successfully scale to large sizes, as adding more experts allows for more capacity without having to increase the computational load for each inference.
Specialization: with experts specializing in different areas or domains, MoE models can handle an assortment of tasks or datasets more effectively than conventional models.

Despite these advantages, however, implementing the MoE architecture still presents a few challenges:

Increased Complexity: MoE models introduce additional complexity in terms of architecture, dynamic routing, optimal expert utilization and training procedures.
Training Considerations: the training process for MoE models can be more complex than for standard neural networks due to having to train both the experts and the gating network. Consequently, there are a number of aspects to keep in mind:
- Load Distribution: if some experts are disproportionately selected early on during training, they will be trained more quickly – and continue to be chosen more often as they offer more reliable predictions than those with less training. Techniques like noisy top-k gating mitigate this by evenly distributing the training load across experts.
- Regularization: Adding regularization terms, i.e., load balancing loss, which penalizes an overreliance on any one expert, and expert diversity loss, which rewards the equal utilization of experts, facilitates balanced training and improves model generalization.

Applications of MoE Models

Now that we’ve covered how the Mixture of Experts models work and why they’re advantageous, let us briefly take a look at some of the applications of MoE.

Natural Language Processing (NLP): MoE models can significantly increase the efficacy of NLP models, with experts specializing in different aspects of language processing. For instance, an expert could focus on particular tasks (sentiment analysis, translation), domains (coding, law), or even specific languages.
Computer Vision: sparse MoE layers in vision transformers, such as V-MoE, achieve state-of-the-art performance with reduced computational resources. Additionally, like NLP tasks, experts can be trained to specialize in different image styles, images taken under certain conditions (e.g., low light), or to recognize particular objects.
Speech Recognition: the MoE architecture can be used to solve some of the inherent challenges of speech recognition models. Some experts can be dedicated to handling specific accents or dialects, others to parsing noisy audio, etc.

Conclusion

The Mixture of Experts (MoE) architecture offers an approach to building more efficient, capable, and scalable machine learning models. By leveraging specialized experts and gating mechanisms, MoE models provide a tradeoff between the greater capacity of larger models and the greater efficiency of smaller models – achieving better performance with reduced computational costs. As research into MoE continues, and its complexity can be reduced, it will pave the way for more innovative machine learning solutions and the further advancement of the AI field.

The post Building Performant Models with The Mixture of Experts (MoE) Architecture: A Brief Introduction appeared first on Symbl.ai.

How to Implement WebSocket and SIP-based Integration with Symbl.ai

Team Symbl — Tue, 23 Jul 2024 17:00:00 +0000

In today’s increasingly competitive landscape, applications that provide real-time data exchange and communication are crucial for enhancing user experiences, carving out market share, and, ultimately, driving business success. WebSockets and SIP (Session Initiation Protocol) are fundamental technologies for facilitating smooth, reliable online interactions.

In this guide, we explore the concepts of WebSockets and SIP and the role they play in developing performant modern applications. We also detail how to use these protocols to integrate your application with Symbl.ai’s conversational intelligence capabilities to draw maximum insights from your messages, calls, video conferences, and other interactions.

What is WebSocket?

WebSocket is a widely used protocol for facilitating the exchange of data between a client and a server. It is well suited for any application that requires real-time, two-way communication between a web browser and a server, such as messaging applications, collaborative editing tools, stock tickers, displaying live sports results, and even online gaming.

How do WebSockets Work?

WebSockets sit on top of the Transmission Control Protocol/Internet Protocol (TCP/IP) stack and use it to establish a persistent connection between a client and server. To achieve this, WebSockets first use the Hypertext Transport Protocol (HTTP – as used to serve websites to browsers) to establish a connection, i.e., a “handshake”. Once the connection is established, WebSockets replace HTTP as the application-layer protocol to create a persistent two-way, or “full-duplex” connection, and the server will automatically send new data to the client as soon as it is available.

This is in contrast to how HTTP transmits data – whereby the client continually has to request data from the server and only receives it is new data is available, i.e., HTTP long-polling. By maintaining a persistent handshake, WebSockets eliminate the technical overhead of continually having to establish connections and send HTTP request/response headers, significantly reducing latency and opening the door for the development of a wider range of applications that rely on real-time communication.

What Are the Benefits of WebSockets?

Speed: as a low-latency protocol, WebSockets are ideal for applications that need to exchange data instantaneously.
Simplicity: as WebSockets sits atop TCP/IP and uses HTTP to establish an initial connection, it does not require the installation of any additional hardware or software.
Constant Ongoing Updates: WebSockets enable the server to transmit new data to the client without the need for requests, i.e. GET operations, allowing for continuous updates.

What is SIP?

As useful as WebSockets are for general-purpose, bi-directional communication, they lack the mechanisms for real-time media transmission; this is where the Session Initiation Protocol (SIP) comes into play. SIP is a signaling protocol that’s used to establish interactive communication sessions, such as phone calls or video meetings. As an essential component of Voice over Internet Protocol (VoIP), SIP can be used in a variety of multimedia applications, including IP telephony and video conferencing applications.

How Does SIP Work?

SIP functions much like a call manager: establishing the connection between endpoints, handling the call, and closing the connection once it is finished. This starts with one of the endpoints initiating the call by sending an invite message to the other endpoint(s), which includes their credentials and the nature of the call, e.g., voice, video, etc. The other endpoints receive the invite message and respond with an OK message, comprising their information so the connection can be established. Upon receiving the OK message, the initiating endpoint sends an acknowledgement (ACK) message and the call can begin.

These messages can be sent via TCP, as with WebSockets, as well as UDP (User Datagram Protocol) or TLS (Transport Layer Security). Once the connection is established, SIP hands over the transmission of media to another protocol such as Real-Time Transfer Protocol (RTP) or Real-Time Transport Control Protocol (RTCP) (hence being called the Session Initiation Protocol, as its role is to solely establish communication between endpoints).

What Are the Benefits of SIP?

Interoperability: SIP is protocol-agnostic when it comes to the type of media being transmitted, with the ability to handle voice, video, and multimedia calls.
Adaptability: SIP is compatible with a large variety of devices and components. Additionally, it works with legacy systems such as Public Switched Telephone Network (PSTN) and is designed in such a way to accommodate emerging technologies.
Scalability: SIP can be used in both small and large-scale communication networks, with the ability to establish and terminate connections as necessary to utilize resources efficiently.

How to Integrate Your Application with Symbl.ai via WebSocket

Now that we have explored WebSockets and how they work, let us move on to how to integrate your application with Symbl.ai’s conversational intelligence capabilities via WebSocket – which is accomplished through Symbl.ai’s Streaming API.

In this example, the code samples are in Python, using functions from the Symbl.ai Python SDK; however, Symbl.ai also provides SDKs in Javascript and Go.

Prepare Environment

Before you begin, you will need to install the Symbl.ai Python SDK, as shown below:

# For Python Version < 3
pip install symbl

# For Python Version 3 and above
pip install symbl

Additionally, to connect to Symbal.ai’s APIs, you will need access credentials, i.e., an app id and app secret, which you can obtain by signing into the developer platform.

Create WebSocket Connection

The first step is establishing a connection to Symbl.ai‘s servers this will create a connection object, which can have the following parameters:

Parameter	Description
credentials	Your app id and app secret from Symbl.ai’s developer platform
speaker	Speaker object containing a name and userId field.
insight_types	The insights to be returned in the WebSocket connection, i.e., Questions and Action Items.
config	Optional configurations for configuring the conversation. For more details, see the config parameter in the Streaming API documentation.

The code snippet below is used to start a connection:

connection_object = symbl.Streaming.start_connection(
    credentials={app_id: , app_secret: }
    insight_types=["question", "action_item"],
    speaker={"name": "John", "userId": "john@example.com"},
)

Receive Insights via Email

You can opt to receive insights from the interactions within your application via email. This will provide you with a link to view the conversation transcripts, as well as details such as the topics discussed, generated follow-ups, action items, etc., through Symbl.ai’s Summary UI.

To receive the insights via email, add the code below to the instantiation of the connection object:

actions = [
        {
          "invokeOn": "stop",
          "name": "sendSummaryEmail",
          "parameters": {
            "emails": [
              emailId #The email address associated with the user’s account in your application 
            ],
          },
        },
      ]

Which results in a connection object like that shown below:

connection_object = symbl.Streaming.start_connection(
    credentials={app_id: , app_secret: }
    insight_types=["question", "action_item"],
    speaker={"name": "John", "userId": "john@example.com"},
),
actions = [
        {
          "invokeOn": "stop",
          "name": "sendSummaryEmail",
          "parameters": {
            "emails": [
              emailId #The email address associated with the user’s account in your application 
            ],
          },
        },
      ]

Subscribe to Events

Once the WebSocket connection is established, you can get live updates on conversation events such as generation of a transcript, action items or questions, etc. Subscribing to events is how the WebSocket knows to send the client new information without explicit requests from the server.

The .subscribe is a method of the connection object that listens for events from an interaction and allows you to subscribe to them in real time. It takes a dictionary parameter, where the key can be an event and its value can be a callback function that should be executed on the occurrence of that event.

The table below summarizes the different events you can subscribe to:

Event	Description
message_response	Generates an event whenever a transcription is available.
message	Generates an event for live transcriptions. This will include the `isFinal` property, which is `False` initially, signifying that the transcription is not finalized.
insight_response	Generates an event whenever an action_item or question is identified in the transcription.
topic_response	Generates an event whenever a topic is identified in the transcription.

An example of how to set up events is shown below, with the events stored in a dictionary before being passed to the subscribe method:

events = {
    "message_response": lambda response: print(
        "Final Messages -> ",
        [message["payload"]["content"] for message in response["messages"]],
    ),
    "message": lambda response: print(
        "live transcription: {}".format(
            response["message"]["punctuated"]["transcript"]
        )
    )
    if "punctuated" in response["message"]
    else print(response),
    "insight_response": lambda response: [
        print(
            "Insights Item of type {} detected -> {}".format(
                insight["type"], insight["payload"]["content"]
            )
        )
        for insight in response["insights"]
    ],
    "topic_response": lambda response: [
        print(
            "Topic detected -> {} with root words, {}".format(
                topic["phrases"], topic["rootWords"]
            )
        )
        for topic in response["topics"]
    ],
}


connection_object.subscribe(events)

Send Audio From a Mic

This allows you to send data to WebSocket directly via your mic. It is recommended that first-time users use this function when sending audio to Symbl.ai, to ensure that audio from their application works as expected.

connection_object.send_audio_from_mic()

Send Audio Data

You can send custom binary audio data from some other library using the following code.

connection_object.send_audio(data)

Stop the Connection

Lastly, you need to close the WebSocket, with the code below:

connection_object.stop()

How to Integrate your Application with Symbl.ai via SIP

In this section, we will take you through the process of integrating your application with SIP through Symbl.ai’s Telephony API. As with our implementation of WebSocket above, the code snippets are Python but the Symbl.ai SDK is available in Javascript and Go.

Prepare Environment

Before you begin, you will need to install the Symbl.ai Python SDK, as shown below:

# For Python Version < 3
pip install symbl

# For Python Version 3 and above
pip install symbl

Additionally, to connect to Symbal.ai’s APIs, you’ll need access credentials, i.e., an app id and app secret, which you can obtain by signing into the developer platform.

Create SIP Connection

After setting up your environment accordingly, the initial step requires you to establish a SIP connection. You will need to include a valid SIP URI to dial out to.

The code snippet below allows you to start a Telephony connection with Symbl.ai via SIP:

connection_object = symbl.Telephony.start_sip(uri="sip:8002@sip.example.com")

Receive Insights via Email

As with a WebSocket integration, you can choose to receive insights from the interactions from the call via email. This will provide you with a link to view the conversation transcripts, as well as details such as the topics discussed, generated follow-ups, action items, etc., through Symbl.ai’s Summary UI.

To receive the insights via email, add the code below to the instantiation of the connection object:

actions = [
        {
          "invokeOn": "stop",
          "name": "sendSummaryEmail",
          "parameters": {
            "emails": [
              emailId #The email address associated with the user’s account in your application 
            ],
          },
        },
      ]

Which results in a connection object like that shown below:

connection_object = symbl.Telephony.start_sip(uri="sip:8002@sip.example.com"),
actions = [
        {
          "invokeOn": "stop",
          "name": "sendSummaryEmail",
          "parameters": {
            "emails": [
              emailId #The email address associated with the user’s account in your application 
            ],
          },
        },
      ]

Subscribe to Events

Once the SIP connection is established, you can get live updates on conversation events such as the generation of a transcript, action items, questions, etc.

The connection_object.subscribe is a function of the connection object that listens to the events of a live call and lets you subscribe to them in real time. It takes a dictionary parameter, where the key can be an event and its value can be a callback function that should be executed on the occurrence of that event.

The table below summarizes the different events you can subscribe to:

Event	Description
message_response	Generates an event whenever transcription is available.
insight_response	Generates an event whenever an action_item or question is identified in the message.
tracker_response	Generates an event whenever a tracker is identified in the transcription.
transcript_response	Also generates transcription values; however, these will include an isFinalproperty that will be False initially, meaning the transcription is not finalized.
topic_response	Generates an event whenever a topic is identified in any transcription.

An example of how to set up events is shown below, with the events stored in a dictionary before being passed to the subscribe method:

events = {
    'transcript_response': lambda response: print('printing the first response ' + str(response)), 
    'insight_response': lambda response: print('printing the first response ' + str(response))
    }

connection_object.subscribe(events)

Stop the Connection

Finally, to end an active call, use the code below:

connection_object.stop()

Querying the Conversation Object

Whether implementing a WebSocket or SIP connection, you can use the conversation parameter associated with the Connection object to query Symbl.ai’s Conversation API to access specific elements of the recorded interaction.

The table below highlights a selection of the functions provided by the Conversation API and their purpose.

Function	Description
connection_object.conversation.get_topics(conversation_id))get_conversation_id()	Returns a unique conversation_Id for the conversation being processed. This can then be passed to the other functions described below.
connection_object.conversation.get_messages(conversation_id)	Returns a list of messages from a conversation. You can use this to produce a transcript for a video conference, meeting or telephone call.
connection_object.conversation.get_topics(conversation_id))	Returns the most relevant topics of discussion from the conversation that are generated based on the combination of the overall scope of the discussion.
connection_object.conversation.get_action_items(conversation_id)	Returns action items generated from the conversation.
connection_object.conversation.get_follow_ups(conversation_id)	Returns follow-up items generated from the conversation, e.g., sending an email, making subsequent calls, booking appointments, setting up a meeting, etc.
connection_object.conversation.get_members(conversation_id)	Returns a list of all the members in a conversation.
connection_object.conversation.get_questions(conversation_id)	Returns explicit questions or requests for information that come up during the conversation.
connection_object.conversation.get_conversation(conversation_id)	Returns the conversation meta-data like meeting name, member name and email, start and end time of the meeting, meeting type and meeting id.
connection_object.conversation.get_entities(conversation_id)	Extracts entities from the conversation, such as locations, people, dates, organization, datetime, daterange, and custom entities.
conversation_object.conversation.get_trackers(conversation_id)	Returns the occurrence of certain keywords or phrases from the conversation.
conversation_object.conversation.get_analytics(conversation_id)	Returns the speaker ratio, talk time, silence, pace and overlap from the conversation.

Conclusion

To recap:

WebSocket is a widely used protocol for facilitating the exchange of data between a client and a server
SIP is a signaling protocol that is used to establish interactive communication sessions, such as phone calls or video meetings
The benefits of WebSockets include:
- Speed
- Simplicity
- Constant ongoing updates
The benefits of SIP include:
- Interoperability
- Adaptability
- Scalability
Integrating your application via Websocket is done through Symbl.ai’s Streaming API and includes:
- Preparing the environment
- Creating a WebSocket connection
- Subscribing to events
- Sending audio from a mic, or prepared binary audio data
- Stopping the connection
Integrating your application via Websocket is done through Symbl.ai’s Telephony API and includes:
- Preparing the environment
- Creating a SIP Connection
- Subscribing to Events
- Stopping the Connection
You can use the conversation parameter associated with the Connection object to query Symbl.ai’s Conversation API to access specific elements of the recorded interaction.

To discover more about Symbl.ai’s powerful APIs and how you can tailor them to best fit the needs of your application, visit the Symbl.ai documentation. Additionally, sign up for the development platform to gain access to the innovative large language model (LLM) that powers Symbl.ai’s conversational intelligence solutions, Nebula, to better understand how you can extract more value from the interactions that take place throughout your organization.

The post How to Implement WebSocket and SIP-based Integration with Symbl.ai appeared first on Symbl.ai.

How to Build LLM Applications With LangChain and Nebula

Team Symbl — Mon, 22 Jul 2024 17:00:00 +0000

With millions of monthly downloads and a thriving community of over 100,000 developers, LangChain has rapidly emerged as one of the most popular tools for building large language model (LLM) applications.

In this guide, we explore LangChain’s vast capabilities and take you through how to build a question-and-answer (QA) application – using Symbl.ai’s proprietary LLM Nebula as its underlying language model.

What is LangChain?

LangChain is an LLM chaining framework available in Python and JavaScript that streamlines the development of end-to-end LLM applications. It provides a comprehensive library of building blocks that are designed to seamlessly connect – or “chain” – together to create LLM-powered solutions that can be applied to a large variety of use cases.

Why Use LangChain to Build LLM Applications?

Some of the benefits of LangChain include:

Expansive Library of Components: LangChain features a rich selection of components that enable the development of a diverse range of LLM applications.
Modular Design: LangChain is designed in a way that makes it easy to swap out the components within an application, such as its underlying LLM or an external data source, which makes it ideal for rapid prototyping.
Enables the Development of Context-Aware Applications: one of the aspects at which LangChain excels is facilitating the development of context-aware LLM applications. Through the use of prompt templating, document retrieval, and vector stores, LangChain allows you to add context to the input passed to an LLM to produce higher-quality output. This includes the use of proprietary data, domain-specific information that an LLM hasn’t been trained on, and up-to-date information.
Large Collection of Integrations: LangChain includes over 600 (and growing) built-in integrations with a wide variety of tools and platforms, making it easier to incorporate an LLM application into your existing infrastructure and workflows.
Large Community: as one of the most popular LLM frameworks, LangChain boasts a large and active user base. This has resulted in a wealth of resources, such as tutorials and coding notebooks, that make it easier to get started with LangChain, as well as forums and groups to assist with troubleshooting.

Just as importantly, LangChain’s developer community consistently contributes to the ecosystem, submitting new classes, features, and functionality. For instance, though officially available as a Python or JavaScript framework, the LangChain community has submitted a C# implementation.

LangChain Components

With a better understanding of the advantages it offers, let us move on to looking at the main components within the LangChain framework.

Chains: the core concept of LangChain, a chain allows you to connect different components together to perform different tasks. As well as a collection of ready-made chains tailored for specific purposes, you can create your own chains that form the foundation of your LLM applications.
Document Loaders: classes that allow you to load text from external documents to add context to input prompts. Document loaders streamline the development of retrieval augmented generation (RAG) applications, in which the application adds context from an external data source to an input prompt before passing it to the LLM – allowing it to generate more informed and relevant output.

LangChain features a range of out-of-the-box loaders for specific document types, such as PDFs, CSVs, and SQL, as well as for widely used platforms like Wikipedia, Reddit, Discord, and Google Drive.
Text Splitters: divide large documents, e.g., a book or extensive research paper, into chunks so they can fit into the input prompt. Text splitters overcome the present limitations of context length in LLMs and enable the use of data from large documents in your applications.
Retrievers: collect data from a document or vector store according to a given text query. LangChain contains a selection of retrievers that correspond to different document loaders and types of queries.
Embedding Models: these convert text into vector embeddings, i.e., numerical representations that an LLM can process efficiently. Embeddings capture different features of text from documents that allow an LLM to compare their semantic meaning with the user’s input query.
Vector Stores: used to store documents for efficient retrieval after they have been converted into embeddings. The most common type of vector store is a vector database, such as Pinecone, Weaviate, or ChromaDB.
Indexes: separate data structures associated with vector stores and documents that pre-sort, i.e., index, embeddings for faster retrieval.
Memory: modules that allow your LLM applications to draw on past queries and responses to add additional context to input prompts. Memory is especially useful in chatbot applications, as it allows the bot to access previous parts of its conversation(s) with the user to craft more accurate and relevant responses.
Prompt Templates: allow you to precisely format the input prompt that is passed to an LLM. They are particularly useful for scenarios in which you want to reuse the same prompt outline but with minor adjustments. Prompt templates allow you to construct a prompt from dynamic input, i.e., from input provided by the user, retrieved from a document, or derived from an LLM’s prior generated output.
Output Parsers: allow you to structure an LLM’s output in a format that’s most useful or presentable to the user. Depending on their design, LLMs can generate output in various formats, such as JSON or XML, so an output parser allows you to traverse the output, extract the relevant information, and create a more structured representation.
Agents: applications that can autonomously carry out a given task using the tools it is assigned (e.g., document loaders, retrievers, etc.) and use an LLM as its reasoning – or decision-making – engine. LangChain’s strength in loading data from external data sources enables you to provide agents with more detailed, contextual task instructions for more accurate results.
Models: wrappers that allow you to integrate a range of LLMs into your application. LangChain features two types of models: LLMs, which take a string as input and return a string, and chatbots, which take a sequence of messages as input and return messages as output.

How to Build a QA Bot: A Step-By-Step Implementation

We are now going to explore the capabilities of LangChain by building a simple QA application with Nebula LLM.

Our application will use a prompt template to send initial input to the LLM. The model’s response will then feed into a 2nd prompt template, which will also be passed to the LLM. However, instead of making separate calls to the LLM to achieve this, we will simply construct a chain that will execute all the actions with a single call.

Additionally, to access Nebula’s API, you will need an API key, which you can obtain by signing up to the Symbl.ai platform and using your App ID and App Secret.

Setting Up Your Environment

First, you need to set up your development environment by installing LangChain. There are three options depending on what you intend to use the framework for:

install langchain: the bare minimum requirements
install langchain [llms]: to include all the modules required to integrate common LLMs
install langchain [all]: to include all the modules required for all integrations.

While we do not require the dependencies required for all integrations, we do want those related to LLMs, so we are going with option 2 as below:

The following code will install the requisite libraries:

# For Python Version < 3

pip install langchain [llms]

# For Python Version 3 and above

pip3 install langchain [llms]

Loading the LLM

With your development environment correctly configured, the next step is loading our LLM of choice – which, in this case, is Nebula LLM.

To use Nebula LLM, we are first going to leverage LangChain’s extensibility and create a custom LLM wrapper: extending LangChain’s LLM class to create our own NebulaLLM class. Our custom wrapper includes a _call method, which sends an initial system prompt (to establish context for the LLM) and the user’s input prompt – and returns Nebula’s response. This will enable us to call Nebula LLM in LangChain in the same way as an OpenAI model or an LLM hosted on HuggingFace.

import requests
import json
from langchain.llms.base import LLM
from typing import Optional

class NebulaLLM(LLM):
    def __init__(self, api_key: str):
        self.api_key = api_key
   self.url = "https://api-nebula.symbl.ai/v1/model/chat"

    #Implement the class’ call method 


    def _call(self, prompt: str, stop: Optional[list] = None)  -> str:

   #Constructing the message to be sent to Nebula
    
payload = json.dumps({
         "max_new_tokens": 1024,
         "system_prompt": "You are a question and answering assistant. You are professional and always respond politely.",
         "messages": [
                {
                    "role": "human",
                    "text": prompt
                }
            ]
        })

    #Headers for the JSON payload

    headers = {
            'ApiKey': self.api_key,
            'Content-Type': 'application/json'
        }

    #POST request sent to Nebula, containing the model URL, #headers, and message, then assigned to response variable


    response = requests.request("POST", url, headers=headers, data=payload)
    
    return response['messages'][-1]['text']

    #Property methods expected by LangChain

    @property
    def _identifying_params(self) -> dict:
        return {"api_key": self.api_key}

    @property
    def _llm_type(self) -> str:
        return "nebula_llm"

The two properties at the end of the code snippet are getter methods required by LangChain to manage the attributes of the class. In this case, they provide access to the Nebula instance’s API key and type.

Additionally, in the _call method, we passed an optional list, which is intended to contain a series of stop sequences that Nebula should adhere to when generating the response. However, in this case, the list is empty and is included to ensure compatibility with LangChain’s interface.

Creating Prompt Templates

Next, we are going to create the prompt templates that will be passed to the LLM and specify the format of its input.

The first prompt template takes a location as an input and will be passed to Nebula LLM. It will then generate a response containing the most famous dish from said location that will be used as part of the second prompt passed to Nebula LLM. The second prompt then takes the dish returned from the initial prompt and generates a recipe.

from langchain import PromptTemplate


#Creating the first prompt template

location_prompt = PromptTemplate(

input_variables=["location"],

template = "What is the most famous dish from {location}? Only return the name of the dish",
)

#Creating the second prompt template

dish_prompt = PromptTemplate(

input_variables=["dish"],

template="Provide a short and simple recipe for how to prepare {dish} at home",
)

Creating the Chains

Finally, we are going to create a chain that takes a series of prompts and runs them in sequences in a single function call. For your example, we will use LLMChain and a SimpleSequentialChain that combines both chains and runs them in sequence.

As well as the two chains, we’ve also passed the sequential chain the argument Verbose = True, which will cause the chain will show its process and how it arrived at its output.

# Create the first chain

chain_one = LLMChain(llm=llm, prompt=location_prompt)

# Create the second chain

chain_two = LLMChain(llm=llm, prompt=dish_prompt)

# Run both chains with SimpleSequentialChain

overall_chain = SimpleSequentialChain(chains=[chain_one, chain_two], verbose=True)

final_answer = overall_chain.run("Thailand")

Note, that when calling SimpleSequentialChain, the order in which you pass the chains to the class is important. As chain_one determines the output for chain_two – it must come first.

Potential Use Cases for a QA bot

Here are a few ways that a question-and-answer LLM application could add value to your organization.

Knowledge Base: through fine-tuning or RAG, you can supply a QA bot with a task or domain-specific domain to create a knowledge base.
FAQ System: similarly, you can customize a QA bot to answer questions that are frequently asked by your customers. As well as addressing a customer’s query, the QA bot can direct them to the appropriate department for further assistance, if required. By delegating your FAQs to a bot, human agents have more availability for issues that require their expertise – and more customers can be served in less time.
Recommendation Systems: alternatively, by asking pertinent questions as well as answering them, a QA bot can act as a recommendation system, guiding customers to the most suitable product or service from your range. This allows customers to find what they are looking for in less time, boosts conversion rates, and, through effective upselling, can increase the average revenue per customer (ARPC).
Onboarding and Training Assistant: QA bots can be used to streamline your company’s onboarding process – making it more interactive and efficient. A well-designed question-and-answer LLM application can replace the need for tedious forms: taking answers to questions as input and asking the employee additional questions if they didn’t supply sufficient information. Similarly, it can be used to handle FAQs regarding the most crucial aspects of your company‘s policies and procedures.

Additionally, a QA bot can help with your staff’s ongoing professional development needs, allowing an employee to learn at their own pace. Through the quality of answers given by the user, the application can determine their rate of progress and supply training resources that match: providing additional material if the user appears to be struggling while glazing over concepts with which they are familiar.

Conclusion

In summary:

LangChain is an LLM chaining framework available that enables the efficient development of end-to-end LLM applications
Reasons to use LangChain to develop LLM applications include:
- An expansive library of components
- Modular design
- Enables the development of context-aware applications
- Large collection of integrations
- Large community
The core LangChain components include:
- Chains
- Document loaders
- Text splitters
- Retrievers
- Embedding models
- Vector Stores
- Indexes
- Memory
- Prompt templates
- Output parsers
- Agents
- Models
The steps for creating a QA bot with LangChain and Nebula include:
- Setting up your environment
- Loading the LLM
- Creating prompt templates
- Creating the chains
Potential Use Cases for a QA bot include:
- Knowledge base
- FAQ system
- Recommendation systems
- Onboarding and training assistant

LangChain is a powerful and adaptable framework that provides everything you need to develop performant and robust LLM applications. We encourage you to develop your comfort with its ecosystem by going through the LangChain documentation, familiarizing yourself with the different components on offer, and better understanding which could be most applicable to your intended use case.

Additionally, to discover how Nebula LLM can automate a variety of customer service tasks and transform your company’s unstructured interaction data into valuable insights, trends, and analytics, visit the Nebula Playground and gain exclusive access to our innovative proprietary LLM.

Additional Resources

The post How to Build LLM Applications With LangChain and Nebula appeared first on Symbl.ai.

Team Symbl, Author at Symbl.ai

How to Integrate Genesys Cloud and Symbl to Obtain Call Scores

What is Genesys Cloud?

What is Call Score?

Scorecard and Criteria

Advantages of using Symbl’s Call Score for contact center evaluation

How to Integrate Genesys Cloud and Symbl: Step-By-Step Implementation

Initial Setup

Configure the AudioHook Monitor

Customizing the Call Score Request

Selecting Which Interactions Receive Call Scores

Activate the Integration

In summary/outro

Real-Time AI Assistance for Call Center Agents

Prerequisites

Set up streaming for phone conversations

1.1 Create a Kinesis data stream

1.2 Configure Amazon Connect to stream audio to Kinesis

1.3 Set up a Python script to consume Kinesis stream

1.4 Push data to Symbl using websocket

Determine when call center agents receive AI assistance

2.1 Set up a custom tracker with Symbl

2.2 Fetch custom tracker details

2.3 Subscribe to tracker_response event with the Streaming API

Provide agents with real-time AI assistance

3.1 Create vector embeddings from knowledge data

3.2 Store vector embeddings in a vector DB for efficient search

3.3 Configure a vector DB search index

3.4 Retrieve knowledge context with vector search on tracker detection

3.5 Build prompt for Nebula using retrieved knowledge context and transcript

3.6 Handle tracker_response and transcript

Testing

Conclusion

Appendix

Transform Multimodal Interaction Data with Symbl and Snowflake

1. Search: Making Interaction Data Actionable

Cortex Search: Powered by Purpose-Built Models and Custom AI

2. Enriching Existing Analytics: Supercharging Data with Interaction Insights and Industry-Specific KPIs

Adding a New Layer of Richness to Business Analytics and Understanding Key Industry KPIs

Real-Time Data Enrichment

3. Centralized Backbone for New Agentic Workflows

What Are Agentic Workflows?

Filling Knowledge Gaps with Real-Time Insights

Connectors for Seamless Integration

Bringing It All Together: A Modern Interaction Intelligence Ecosystem

Next Steps: Building Your Future with Symbl.ai and Snowflake

Accelerate Time to Value for Real-Time Assistance with Symbl.ai’s Real-Time Assist API

The Challenge

Fragmented Technology Stack

Delayed Time-to-Market

Maintenance Overhead

The Symbl.ai Solution

Unified Platform for Real-Time Guidance

Accelerated Time-to-Value

Minimal Maintenance with Continuous Improvement

Key Features

How It Works

Get Started

Open-Source vs Closed-Source LLMs: Which is the Best For Your Organization?

Understanding LLMs

Open Source LLMs

The Pros and Cons of Open-Source LLMs

Pros

Cons

Closed Source LLMs

Pros and Cons of Closed-Source LLMs

Pros

Cons

Open Source vs Closed Source LLMs: A Comparative Analysis

Cost Comparison

Flexibility and Customization

Security Considerations

Performance and Support

Conclusion

Streaming Databases for Building Real-Time GenAI Applications

Understanding Streaming Databases

The Benefits of Streaming Databases for GenAI Applications

Top Streaming Databases for Real-Time GenAI Applications

Considerations for Selecting a Streaming Database

Conclusion

2.3 Subscribe to `tracker_response` event with the Streaming API

3.6 Handle `tracker_response` and transcript