Data Scientist(AI in NLP)

What is GPT-3 and is it the Next Revolution in Natural Language Processing?

Sekhar Vallath — Wed, 29 Sep 2021 00:19:04 +0000

GPT-3 is a probabilistic language model that has been trained on uncategorized text from the internet to produce human-like text. GPT-3 is HUGE, with a capacity close to 170 billion parameters! Its sheer size is what marks GPT-3 apart, and means it’s able to generate code for a problem statement and significantly more sophisticated NLP abilities. Currently it’s released with beta-only access to mitigate ethical concerns.

What is GPT-3?

GPT-3 is a sophisticated general language model created by Open.ai and trained on uncategorized text from the internet. GPT-3 uses deep learning to understand the grammar and syntax of language to produce human-like text. It’s a probabilistic model, which means it can predict the next word when given a set of previous words in a similar way your mobile phone does.

Once a language model is trained, it can be used downstream with other applications for things like sentiment analysis, language inference, and paraphrasing.

GPT-3 is the latest in a trend of natural language processing (NLP) systems. There have been previous iterations of GPT, with GPT-3 being the third and largest to date. Prior to the release of GPT-3 in June 2020, the largest NLP was Microsoft’s Turing NLG model (introduced in February 2020) with a capacity of 17 billion parameters. Comparatively, GPT-3 is on a completely different stratosphere with its capacity of closer to 170 billion parameters. A parameter is a configuration variable internal to the model and is a value that can be estimated from data. Parameters are learned from historical training data.

GPT-3 strikes a balance between size and skill

For most common tasks, GPT-3 can easily be used as a plug-and-play tool, for example predicting the sentiment of movie reviews. For more specialized use cases, such as predicting sentiment for conversations between a salesperson and a customer, it needs to be fine-tuned.

The more trainable parameters there are, the better the model. But there’s a fine balance to be struck between the number of parameters and the data set size. This is because the greater the number of parameters, the larger the learning capacity and the need for more data to fill up that capacity. But this also makes it harder for the model to be maintained. So, increasing the learning capacity and the ability to scale will require significant infrastructure (by way of cost and size).

Making GPT-3 work for you

Open.ai reports that “To date, over 300 apps are using GPT-3 across varying categories and industries, from productivity and education to creativity and games.”

Examples of apps that are using GPT-3:

Viable uses GPT-3 to provide useful insights from customer feedback in easy-to-understand summaries. Viable identifies themes, emotions, and sentiments from a variety of customer feedback forums, such as helpdesk tickets, chat logs, reviews, and surveys. It then pulls insights from this feedback and provides an almost instantaneous summary so that companies get a better understanding of their customers’ wants and needs.

For example, if asked, “What’s frustrating our customers about the checkout experience?” Viable might provide the insight: “Customers are frustrated with the checkout process because it takes too long to load. They also want to be able to edit their address during checkout and store multiple payment methods.”

Fable Studio creates interactive stories and uses GPT-3 to help power their story-driven “Virtual Beings.” GPT3 gives Fable the ability to give their characters life with natural conversations, combining their artist’s vision, AI, and emotional intelligence to create powerful narratives.
Algolia uses GPT-3 in their Algolia Answers product. This tool offers advanced searches that better understand complex customer questions and quickly connects their customers to specific parts of content for the answers. The use of GPT-3 enables Algolia to identify deeper contextual information which proves better quality results.

GPT-3 in action

Traditionally, artificial intelligence struggles with “common sense”, but GPT-3 is actually pretty slick at answering many common-sense questions. Here’s an example of GPT-3 deploying common sense:

Q: Are there any animals with three legs?

A: No, there are no animals with three legs.

Q: Why don’t animals have three legs?

A: Animals don’t have three legs because they would fall over.

Surprisingly, GPT-3 is not perfect at simple math! Such operations are easy for a customized program, but recursive logic doesn’t quite translate into the neural net architecture upon which GPT-3 operates.

An area where GPT-3 is impressive is in its ability to write code. Here’s an example video:

https://player.vimeo.com/video/426819809

AI has always struggled with bias, and while GPT-3 has room for improvement, it’s going in the right direction. Here’s a test of a few biased questions asked on the OpenAI GPT-3 Playground. It flags up that the answer “may contain unsafe content” when it suggests women belong in the kitchen, and it also created the same flag in relation to the answer that men belong in the kitchen.

Source: OpenAI GPT-3 Playground

GPT-3 seems to be quite impressive in some areas, and yet still subhuman in others. Hopefully, with a better understanding of its strengths and weaknesses, we’ll all be better equipped to use GPT-3 in real products.

The GPT-3 buzz

When Open.ai launched GPT-3, the creators said they wouldn’t launch with the full model capable of the largest number of parameters. This is why there are variations of GPT-3. Open.ai was concerned about the way it would be used and the ethics of such a powerful and clever model. Perhaps unsurprisingly, this created quite a buzz!

The worries are that the inevitable bias within the data from the internet used to train the model will filter through to the results the model creates. For example, GPT-3 could generate harmful Tweets, or even long-form content, that could be indistinguishable from content produced by a human. However, you can fine-tune the dataset to target the weaknesses of the system by selecting sensitive categories (such as abuse/violence, human behavior, inequality, health, political opinion, relationships, sexual activity, and terrorism), and then you can define the data to remove any potential bias.

Eventually, Open.ai did release the largest version of GPT-3. But, not everyone can access it yet. In an effort to alleviate the ethical concerns, it’s currently in private beta testing. There is a waitlist, and Open.ai selectively invites people into the program. This is so the model can be continuously improved and kept in a safe, controlled setting.

Is GPT-3 a NLP revolution?

In our opinion, GPT-3 has not revolutionized the space, nor is it a paradigm shift – that started with BERT. Rather, GPT-3 represents an important milestone. It’s no different from its predecessors in that there are no methodological changes, but its sheer size is what sets it apart. Since GPT-3 is so much bigger than what has come before, it’s able to generate code and significantly more sophisticated NLP abilities.

Additional reading:

Giving GPT-3 a Turing Test

What is GPT3? Everything business needs to know about OpenAIs breakthrough AI language program

OpenAI PALMS – Adapting GPT-3 to Society | By Alberto Romero

Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets

The post What is GPT-3 and is it the Next Revolution in Natural Language Processing? appeared first on Symbl.ai.

Running Inference With BERT Using TensorFlow Serving

Sekhar Vallath — Tue, 31 Aug 2021 00:49:54 +0000

BERT is a powerful natural language processing tool with a wide range of capabilities, but the size and complexity of the architecture make it challenging to implement in a production environment. To optimize it for memory-efficient, low-latency settings, the best approach is to implement it using TensorFlow Serving.

What is BERT, anyway?

When Google released BERT, it kicked off a frenzy in the natural language processing (NLP) space. BERT — short for Bidirectional Encoder Representations from Transformers — is a breakthrough NLP tool that can handle a wide range of tasks, including named entity recognition, sentiment analysis, and classification.

BERT made it possible for a neural network to understand the intricacies of language through a simple strategy known as word masking. Using this approach, words in sentences were randomly masked and the model was asked to predict what each masked word was.

During this pre-training process, the algorithm was able to learn how words combine to form sentences and how grammatical rules operate within any language. And once the algorithm was trained, it could be specialized to complete a wide range of tasks, including next sentence prediction and natural language inference.

Using BERT with TensorFlow Serving

These days, developers have a lot of options to put machine learning models into production, which requires memory-efficient, low-latency settings. Perhaps the most popular is TensorFlow Serving. It’s a high-performance system that’s written in low-level C++, which makes it ideal for production environments.

In this post, we’ll look at how BERT can be used as a language model that outputs word-level probabilities for any sentence. We’ll see how to wrap it in TensorFlow Serving, so it’s optimized for production. And we’ll look at how to serve TensorFlow models that are written using their high-level Estimator class.

Using BERT as a language model

BERT is a masked language model, or MLM — meaning that it was trained by masking words and attempting to predict them. That makes it challenging to use it as a language model, since it needs words from both before and after the masked word to generate a prediction. By contrast, with sequential language models that predict the next word in a sequence, the algorithm only needs words before the masked word.

Let’s take a look at how the authors of the base repository, bert-as-language-model, get around this.

To predict how probable a given word is in a sentence, the authors use a repeating mask-and-predict technique. With any given input sentence, a number of copies are generated equal to the number of words in the sentence — so, for instance, 14 copies would be made of a 14-word sentence.

With each copy, a different word is masked. In the first copy, just the first word would be masked. In the second copy, just the second word would be masked. And so on. All of these sentences are then submitted to the model to generate the probability of the masked word.

Google’s pretrained BERT model doesn’t function as a language model in the way we just described. To make it work, we need to add a layer on top of the final layer in the encoder stack of the architecture. This layer transforms the output from (batch_size, max_seq_length, hidden_units) to (batch_size, max_seq_length, vocab_size). To get the probability scores for each word, we run the softmax function over this transformation using the get_masked_predictions() function in the run_lm_predict.py file.

We won’t get too deep into the codebase here, but it’s important to identify the inputs that are used for serving the model:

input_ids	Words in the sentence transformed into their dictionary indices
input_mask	List of 1s and 0s after padding to max_len 1 denotes that the word exists 0 signifies that it’s a padding token
segment_ids	ID used for downstream applications that would typically use two sentences for input, such as natural language inference (this is syntactically needed, but not necessary for this language model)
masked_lm_positions	Indicates the position of the masked word inside the original sentence
masked_lm_ids	Indicates the ID of the actual word that was masked

Saving your TensorFlow model In SavedModel format

Before we can serve any TensorFlow model, we need to save it into the SavedModel format. But in this case, there’s an extra wrinkle: since we’re adding an extra layer at the top in this use case, we need to run the prediction loop so that the weights in the added layer are initialized first (fine-tuning on one’s own dataset also accomplishes this same task) — and then move on to saving. Without this step, we might get tracebacks and other unexpected behavior from the model.

To transform the final model into the SavedModel format, the Estimator class exposes an export_savedmodel function. This function uses the serving_input_receiver_fn() function, which indicates the shapes and data types of all the input tensors needed by the final, servable model.

Here’s what the serving_input_receiver_fn() function would look like:

def serving_input_rec_fn(): serving_features = {"input_ids": tf.placeholder(shape=[None, max_seq_length], dtype=tf.int32),  "input_mask": tf.placeholder(shape=[None, max_seq_length], dtype=tf.int32),  "segment_ids": tf.placeholder(shape=[None, max_seq_length], dtype=tf.int32),  "masked_lm_positions": tf.placeholder(shape=[None, max_predictions_per_seq], dtype=tf.int32),  "masked_lm_ids": tf.placeholder(shape=[None, max_predictions_per_seq], dtype=tf.int32)} return tf.estimator.export.build_raw_serving_input_receiver_fn(features=serving_features)

Now that we’ve defined the receiver function, we want to ensure that the model is executed once and the extra layers we defined for the language model are included. To do that, add these two lines:

save_hook = tf.train.CheckpointSaverHook(output_dir, save_secs=1) result = estimator.predict(input_fn=predict_input_fn, hooks = [save_hook])

Next, we want to save the model in SavedModel format in our working directory, using these lines of code:

estimator._export_to_tpu = False export_path = estimator.export_savedmodel(os.getcwd(), serving_input_rec_fn())

Running inference on the served model

And with that, we’ve saved the model into the format we need for TensorFlow Serving. The next step is to host the servable model on a Docker container. We can then use it to run predictions with very low latency.

First, we need to set up a Docker container that has TensorFlow Serving as the base image, with the following command:

docker pull tensorflow/serving:1.12.0

For now, we’ll call the served model tf-serving-bert. We can use this command to spin up this model on a Docker container with tensorflow-serving as the base image:

docker run -p 8500:8500 -p 8501:8501 --mount type=bind,source=$(pwd)/exported-model,target=/models/tf-serving-bert -e MODEL_NAME=tf-serving-bert -t tensorflow/serving

In this command, $(pwd)/exported-model is the location where we saved the SavedModel. It contains the graph in .pb format and the variables folder that contains the .data-00000-of-00001 and .index files.

At this point, the model is set up. Now we just need to send a REST API request to the served model. When you’ve hosted it on your local system, the model should be running at this endpoint:

http://localhost:8501/v1/models/tf-serving-bert:predict

Looking for an example? Check out this sample Python script that accepts a .tsv file as its input, sends a request to the above URL, and outputs word-level probabilities for each line in the .tsv file. (Be sure to run it in the same repository that has the tokenization.py script. And of course, make any necessary changes to the path and other parameters in the file provided above.)

By using BERT with TensorFlow Serving, you can get results dramatically faster than with traditional methods.

This post is the first in a two-part series on how to implement heavy models like BERT in low-latency environments. In Part 2, we’ll look at how you can take this model to the next level by applying it on a GPU.

Additional reading

Looking to learn more about using BERT with TensorFlow Serving? Check out these resources:

BERT Explained: State of the Art Language Model for NLP

A Deep Dive Into BERT

Deploy a Servable Question Answering Model Using TensorFlow Serving

Serving Google BERT in Production Using Tensorflow

The post Running Inference With BERT Using TensorFlow Serving appeared first on Symbl.ai.

A Bite-Sized Guide to Building Contextual Conversation Intelligence Solutions

Sekhar Vallath — Fri, 13 Aug 2021 17:04:40 +0000

Natural language understanding is advancing at a breakneck pace, but most solutions are focused on task-specific problems. Moving beyond models that were trained on written text to spoken conversations is critical. Building your own solution for contextual conversation intelligence is the best way to extract the insight you need.

Natural language processing (NLP) has progressed by leaps and bounds over the last few years.

And while advances in the natural language understanding (NLU) space (a subset of NLP) have equipped developers with incredible capabilities, they come with a catch: most of the existing solutions are laser-focused on specific problems.

Technical advances in NLU have improved our ability to solve specific problems with ever-increasing accuracy. You can use a model to do text classification on each line of a given passage. You can perform sentiment analysis on each line. And you can paraphrase every paragraph within a text.

But what’s missing from your toolkit are tools that enable you to get a more intelligent, contextual understanding of everyday natural language.

Context is critical for understanding everyday language. In human to human (H2H) conversations, we rely on our memory of things previously said in the conversation, our ability to recall past conversations, and our knowledge of what the other person knows.

Lack of contextual understanding is the fundamental problem that most developers working in the natural language space face today.

Solving the problem of contextual understanding

Developers in the natural language space create ML/DL models. The ultimate goal is to build models that can process written and spoken language in the same way a human can.

But so far, advances in the NLP space haven’t yielded a contextual understanding of our actual everyday language.

However, developers can now build models that can process the linguistic relationships and patterns of thought beneath the surface-level language to go beyond task-specific models and deliver real conversation intelligence.

Developers have access to a wealth of existing, task-specific solutions in the natural language space. And that’s the key. We can assemble these tools together to create models that offer a genuine contextual understanding of language.

Conversation summaries: the challenge

Consider a common task in most organizations: generating written summaries of conversations. Let’s think through how to build a model that can accomplish this effectively — and why it requires more than just using existing natural language solutions.

There are plenty of resources available for summarization — when it’s done on written text. But spoken language contains lots of complexities and irregularities that written language doesn’t. Text classification and sentiment analysis, for example, doesn’t tell the whole story about what someone is actually talking about.

In spoken language, we use lots of repetition, circling back to the same thought again and again. We use filler words like yeah, uh, you know, kinda. And we use more informal, casual language than in written text.

Since summarization models have been trained on written text, they can’t be used to generate accurate summaries of human conversations. If we were to input the transcript from a typical thirty-minute conversation into a summarization model, it would likely include a lot of irrelevant information and leave out a lot of information that was actually critical to the conversation.

We can’t apply a model that was trained on written text to spoken conversations. But if we use other algorithms strategically in combination, we can build a multi-step model that generates clean conversation summaries.

Conversation summaries: 3 steps to building a model

If we want to summarize a conversation, it’s important to consider the way structure plays a role. Conversations are dialogues in which speakers share their thoughts with each other in real time, clarifying and expanding on their ideas as they often require knowledge of the immediate context to make sense.

Piecing together existing tools gives us a powerful way to achieve contextual understanding of conversations — but making it happen requires careful thought and planning. Curious about the exact steps you need to take? Here’s a closer look.

1. Identify topics

The first step in building a conversation summary model is topic modeling.

Consider a typical recording from an all-hands department meeting. Discussion during the meeting might cover the introduction of new team members, wins from the previous month, reviewing results from the last quarter, the rollout of a new product update, news from the marketing team, updates on the company’s remote work status, and more.

To surface these and other key topics mentioned during the conversation, we can use topic modeling algorithms like probabilistic Latent Semantic Analysis (LSA), a technique used to model information under a probabilistic framework, where topics are treated as latent or hidden variables. Another common option is Latent Dirichlet Allocation (LDA), a method used to determine topics that are likely to have generated a collection of words.

2. Build timelines

We know what was discussed. Now we need to know when.

Topic modeling gives us an idea of what the main topics in the conversation are. Next, we can break up the conversation using topic segmentation so that we know the specific points in time when speakers were discussing each of the key topics.

Topic segmentation provides us with a timeline, breaking the conversation down into blocks of time based on the topics discussed. For example, in our team meeting summary, we might see that the department head made general introductions from 0:00 to 1:10, new team members were introduced from 1:10 to 4:23, team members shared recent wins from 4:23 to 9:37, and so on.

3. Create summaries

It’s time to generate your summary. The last step is to use a text summarization algorithm to generate summaries of each conversation block. Once these have been generated, you can assemble them into an overall summary of the conversation.

Here, it’s critical to ensure that your model is trained on short blocks of conversation data. If your summary model was trained on heavily edited informational articles, it won’t be able to identify the most important points when generating a summary.

Create your own conversation intelligence models

Building your own solution for contextual conversation intelligence requires strategic thinking, technical expertise, and plain old hard work. While you could build your own model, Symbl.ai makes it possible to skip all of that hard work with solutions like our GET Topics API.

Additional reading

What It Really Means to Add Context To Your Conversation AI

The What, Where, and Why of Contextual AI

What’s That, Human? The Challenges of Capturing Human to Human Conversations

Contextual AI: The Next Frontier of Conversational Intelligence

The post A Bite-Sized Guide to Building Contextual Conversation Intelligence Solutions appeared first on Symbl.ai.

What It Really Means to Add Context to Your Conversation AI

Sekhar Vallath — Wed, 05 May 2021 01:09:30 +0000

Context enables your conversation AI to personalize its outputs using relevant information from user inputs and answers, historical data, previous chats, and other data sources. This results in more intelligent and efficient conversations between humans and machines, and enhances an AI’s ability to capture insights and even sentiments from conversations between humans.

Why is context so important in AI?

Context in conversation AI is as important as context in daily conversation.

Chatting with a colleague about your latest project is much easier when the other person is aware of all aspects of the conversation, like the people involved and the due date that has already whizzed past. It’s much more frustrating to talk about the same project with your distant aunt, who needs you to explain what the project is about, who each person is, and what your name is again.

The same happens with conversation AI. When it has access to past knowledge and key details, it can leverage relevant information at the right time to create dynamic conversations or take the most appropriate actions.

In human to machine (H2M) conversations, for example, a chatbot that asks for your name and then asks for it again later in the conversation is a recipe for eye-rolling. Giving it the basic ability to “remember” previous inputs and reuse that information when needed is a game-changer, not just for the conversation but also for the user experience.

In human to human (H2H) conversations, like sales calls or company meetings, context is even more valuable. For example, during a daily scrum, an AI transcribing the meeting could draw from past transcriptions to identify the speakers, what their last action items were, and which timezone to use when booking a follow-up meeting.

Whatever the scenario, even the most basic context can make your conversation AI smarter, better, and more productive.

Where can your AI get context from?

“Context” seems like a broad term when it comes to building it into your AI. It can be pulled from anywhere that has valuable information, including historical data, meeting agendas, chat logs, emails, and even browsing activity.

The potential sources are limitless, but to get you started, here are some context sources you can use to point your AI in the right direction:

User input: This is the information you get directly from the person on the other end, like their name, location, and preferred payment method. Storing this data can make future transactions more efficient since the AI won’t need to ask for it again. For example, if Frank wants to change his hotel booking, a chatbot could pull up his last booking and skip the need to ask for his name, dates, reservation number, etc.

Enterprise knowledge: This information is pulled from a company’s knowledge base, like meeting notes or their FAQ, so your AI can accurately represent your company in the right context. For example, say that Frank now wants a full refund, but the room he booked can only be credited. The AI could consult the company’s refund policy and enforce it.

User/task information: This data is dynamically captured behind the scenes, like number of purchases and page views, then given to your AI to augment an interaction. Staying with our hotel example, you could collect the number of times a user checks a specific room, then design and apply logic for the bot to offer them a flash discount on their third visit as a motivational nudge.

Session context: This is what’s discussed during a single session that your AI should “remember” to streamline tasks. For example, if Frank asks the hotel bot, “how much credit do I have in my account?”, the bot could recognize that he means the same account he used to book the hotel room instead of asking for his account information.

You can upgrade your conversation AI even further with the ability to capture emotion and tone. By identifying keywords, connotations, and word placements, your AI can quickly pick up on the user’s emotional state and use it to trigger custom flows.

For example, if a human starts yelling or swearing at a customer service bot, it could acknowledge the keywords and emotional cues, then hand the conversation off to a human agent and avoid losing the customer.

Example: Capturing context to augment a sales call

To give you a better idea of how context can greatly improve user interactions — perhaps in ways you hadn’t considered yet — lets take the common scenario of a sales call between a human agent and a potential customer.

Here the AI’s job is to transcribe the conversation in real time, while analyzing context to draw important insights that the agent can use to steer the call towards a sale.

To start, the agent typically asks for the caller’s basic information, like their name and email. The AI then takes that information and runs it back to the CRM to check if there’s a match in the system. If there is, then the AI lets the agent know that this is the caller’s second sales chat. It can even pull up a summary of what was discussed last time so the agent can pick up where the previous agent left off.

As the conversation progresses, the AI transcribes what’s being said, who’s saying it, and links any important mentions. For example, when the caller mentions they also emailed “Daya” about a demo, the AI verifies that Daya is indeed part of the sales team and tags her in the notes for reference.

Eventually, the caller mentions that they’re pleased with the demo but aren’t ready to subscribe to a paid plan just yet. At this point, the AI catches the keywords “demo,” “subscribe,” and, “not ready,” and analyzes buyer’s intent. The AI then recommends the agent take the following action: offer a two-week trial.

And so the agent does, and the caller is happy to get a test run before committing to a purchase. The company is also happy because the caller is still on their way to becoming a paying customer, and the sales team can use these call transcriptions as training material.

To neatly wrap up the interaction, the AI automatically sends the potential customer an email summarizing the key points from the call along with a link to create their trial account. On the business end, the AI updates the caller’s record in the CRM with all this new information, and even schedules a follow-up call in two weeks when the trial is over.

That’s the power of context.

Upgrade your conversation AI with the help of APIs

Adding context to your conversation AI can involve a plethora of processes, from basic speech recognition and speaker diarization to advanced open/closed-domain conversation understanding systems.

It’s common to bump into developers using tools like Rasa, DialogFlow, or a combination of Python and TensorFlow to add contextual abilities to their home-made bots. Others might be experimenting with knowledge graphs and deep learning frameworks to give their business AI an extra kick. But if you’re looking to give your AI substantial contextual understanding (for both H2M and H2H conversations) without the complexity of building it yourself, you’re better off using third-party APIs that take care of the heavy lifting.

These APIs have typically mastered natural language processing (NLP) of voice and text conversations, and can be easily integrated with other third-party systems like JIRA, Twilio, Hubspot, and others to bring meaningful information to the surface.

Although keep in mind that no single platform can do it all. You still have to dig deep and understand your users’ needs before building the necessary context into your AI. It’s an iterative process that pays off in the long run, as you’ll find valuable contextual connections here and there that can make interacting with your conversation AI feel just a bit more like magic.

Additional reading

Now that you know what kind of context your AI needs, here are some extra resources to get your inspiration going:

Contextual AI: The Next Frontier of Artificial Intelligence
Conversation AI: Design & Build a Contextual AI Assistant
Deep context switching in conversation AI
Building contextual chatbots with TensorFlow
Symbl APIs for analyzing natural human conversations
Join the Symbl developer community Slack Channel to get help from people who’ve built it all before.

The post What It Really Means to Add Context to Your Conversation AI appeared first on Symbl.ai.

Ethical Conversation Intelligence – Managing Bias

Sekhar Vallath — Fri, 30 Apr 2021 06:00:30 +0000

Bias can creep into algorithms and data in many different ways. The key is “ethics by design”, and the only way to make sure your program doesn’t harm users is to make the AI ethical from day one. Bias can be pre-existing (conscious or subconscious), technical, or emergent. You need to be mindful at every stage of development to avoid bias and objectively check your training data.

The delicate balance of trust in Conversation AI

Businesses and governments are enthusiastic about the potential of AI and it falls right at the heart of the current tech boom. But the flip side is that AI also makes many consumers anxious because they don’t trust the technology.

Arguably, AI will have such a profound effect on us all, that YOU (AI developers) are representatives of future humanity and so you have an ethical obligation to be transparent in your efforts. No pressure!

Why are ethics important in AI?

Conversation Intelligence (CI) relies on two things: algorithms and data.

Bias can creep into algorithms in many ways. An algorithm may be skewed to achieve a particular outcome – for example, towards greater caution in offering loans to a certain group of people, or prioritizing those with higher “social credit” scores. The personal bias of developers (be it conscious or unconscious) can creep in when writing the algorithms too. In CI, problems can arise from the “text corpus” — the source material the algorithm uses.

Machines that learn from biased data will make biased decisions. For example, when Amazon’s AI hiring system learned to give female candidates a lower score than male candidates. Or when an AI allocated a higher standard of healthcare to white patients in the U.S.

Unfortunately, most code is only decipherable by the developers who wrote it, making it difficult for even technical people to understand, let alone end users. This means even open-source code is not transparent. The key is “ethics by design”; in other words, it’s now on you as a developer, and your team, to make sure your program doesn’t harm users. And the only way to make sure it doesn’t is to make the AI ethical from day one.

What constitutes bias?

To identify bias you need to understand what it is. There are three categories of bias in computer systems:

Pre-existing bias comes from underlying social institutions, practices, and attitudes. These can create personal biases within developers, which can be explicit and conscious, or implicit and unconscious. The same is true of input data which, if poorly selected or from a biased source, will influence the outcomes created by your CI system. For example, the bias can already be present in the training data itself, or a text classification problem might be biased towards one specific type of variant or sentiment. For example, gender stereotypes, when you label a doctor as a man and the system then concludes that a nurse is a woman.
Technical bias arises from technical constraints, like constraints on your program, its design, or some other system limitation. For example, if a flight path algorithm sorts by alphabetical order it’ll be biased in favor of American Airlines over United Airlines; if a search engine shows only three results per screen there’ll be a bias towards the top three results; and if software relies on random results, bias will arise if the number generator isn’t truly random.
Emergent bias happens when algorithms apply knowledge from old contexts to new ones, without considering changes in societal knowledge or cultural values. This can happen when:
- Training data doesn’t align with your algorithms’ real-world contexts
- Unpredictable correlations emerge when large data sets are compared to each other.
- Data collected about web-browsing patterns align with data marked as sensitive (e.g. relating to race or sexual orientation) and this can lead to discrimination.
- An algorithm draws conclusions from correlations, without being able to understand them.

When data collected for an algorithm results in real-world responses which are fed back into the algorithm (known as a feedback loop) this can introduce or compound (emergent) bias. An example of this is the COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) algorithm used by Judges to predict risks of granting bail. Defendants’ reoffending risk score was found to be biased against African-Americans, as compared to white defendants who were equally likely to reoffend, resulting in longer periods of detention while awaiting trial.

How to test your model for bias

You need to be mindful at every stage of building to avoid bias by design, and check your training data for pre-existing bias or bias that could emerge.

For example, with the risk-predicting bail system, you could remove the column of data that refers to race so that the system focuses on other factors, like location and age.

Of course, the more data you have the bigger the issue to resolve.

Ethics failures in conversation data are much less common than, for example, with GPT3 and BERT. These are both huge models trained on data from the internet (for example the majority for GPT3 comes from Common Crawl, as well as other sources) which can enforce gender and racial stereotypes. You would deal with this by making changes when you pre-process the data. You clean it and remove the stereotyped comments so that they’re not reinforced when the data goes into the model.

If you use an API to build, it will lack data, but this provides you with the opportunity to introduce training data from the big models, that already understand semantics, that you can then fine tune in your model for the specific task you need.

You must test your own application for bias. How you do this depends on the use case:

If it’s text classification then you fool the model by replacing random entities or genders.
If it’s image classification you can blur the image or add another.
You can simulate a training/testing mismatch, so if you have a text classification task on sentiment you can apply the same to an action items classification model.
You could blind the algorithm to sensitive attributes, but do this with caution as blinding the algorithm to sensitive attributes can cause algorithmic bias in some situations.
You can also train with a domain shift, where you train your model in business-specific jargon, and then use the model on, say, a call center conversation. This will create a shift, as part of the training, to expand your system’s data and understanding. To evaluate the risk of bias, you can send slight variations of each data point and see if your model can generalize.
You can use bias checking tools like AI Fairness 360, Watson OpenScale, and What-If Tool to detect and mitigate bias.

Bias testing lifecycle

At Symbl.ai we have a tried and tested lifecycle to help our developers to identify bias. You can implement everything above by following the steps below to ensure a robust process.

Requirements Analysis – This is the stage where you figure out what or where potential bias could be in your model.
Test Design, Test Data, Preparation – Here you need to find data points to break the model. You’ll undertake data aggregation, labeling and experiences.

You labelling the data is key to this preparation stage in your model. The model learns from these labels. This where bias can creep in because some items can be subjective, for example, while one individual may label a certain word or phrase, another may not. Your method of approaching labeling will reduce bias in this data preparation stage.

It’s a good idea to give the data to five different people, rather than just have one person’s perspective, and for them all to label the same data and take the common ones from there. If you use diverse teams and test on diverse groups then you can avoid unsettling outcomes that magnify hidden prejudices.

If you’re using your own data, you can prepare it. But in most situations you’ll receive data from a third party. In this situation you have to fool the system as explained above. This is a fine-tuning step which will alter the explanations without hurting the accuracy of your original model.

Test Execution – This is the part where you actually hit the model with the bias-heavy testing dataset that you’ve prepared. The results from this step are the ones for you to scrutinize, in order to gauge the effect of any bias that might have crept in.
Test Cycle Close – The point at which you close the test cycle is a matter of judgment. The next section below, “When have you done enough?”, for more actionable advice on this.
Auditing – As a step of best practice, after the test cycle has been closed you should undertake continued formal and regular auditing of the algorithms. Here you can review both input data and output decisions to provide insight into the algorithms’ behaviour.

When have you done enough?

“Testing AI systems presents a completely new set of challenges. While traditional application testing is deterministic, with a finite number of scenarios that can be defined in advance, AI systems require a limitless approach to testing,” – Kishore Durg, senior MD, Growth and Strategy and Global Testing Services Lead for Accenture.

A ‘limitless approach’ is quite an intimidating prospect! But remember, bias can’t be completely eliminated. Your job to minimize the effect of bias as much as possible.

The extent to which you need to test and clean and train depends on the size of your training data. You may need to test 100-200 instances of each potential bias to be sure it has not crept into your model.

If you have bad results from your bias testing, you know your model isn’t working properly for certain types of bias. Try segmenting the different types of bias. Then you’ll need to make changes in the data preprocessing phase (like masking gender/race related info, if the bias is indeed related to genders/race). Once you’ve done that you need to retest for all the biases you identified earlier, and then validate the results to evaluate whether the biases have been mitigated. If not, the cycle repeats until you end up with a tolerably small error rate.

Learn more about the ethical approach of Symbl.ai.

The What, Where, and Why of Contextual AI

Sekhar Vallath — Tue, 30 Mar 2021 07:00:07 +0000

Contextual AI enables systems to interpret information the same way a human would. From analyzing wording and sentiments to recognizing cultural and environmental contexts, this “intuitive” understanding allows AI systems to produce more in-depth, relevant, and accurate outputs.

What is contextual AI?

In a sentence: contextual AI takes a human approach to processing content. It allows AI systems, like chatbots and virtual assistants, to have a real-world interpretation of language, audio, video, and images so they can behave less like traditional computers and more like humans.

It’s what helps an AI recognize when an image is upside down, whether you’re happy or frustrated by the tone of your voice, or that the right answer to the question, “where did Doc send Einstein?” is, “one minute into the future” — not, “sorry, I don’t know that one.”

This is because contextual AI is capable of analyzing the cultural, historical, and situational aspects surrounding incoming data, then using that context to determine the most meaningful outcome for the end-user.

In human to machine (H2M) conversations, this outcome can be as simple as a chatbot using your location to direct you to the nearest laptop repair shop. In human to human (H2H) conversations, it could be anything from recognizing speakers, sentiments, and buyer’s intent, to adding valuable insights to real-time sales call transcriptions.

Why does contextual AI matter?

Contextual AI creates a more collaborative partnership between humans and machines by driving dynamic conversations, providing highly relevant responses, and generating increasingly accurate predictions.

Context is a fundamental building block of machine learning (ML), and the missing puzzle piece to making your AI’s intelligence rival that of a human. So, by leveraging contextual AI, you can give your system the power to:

Generate new knowledge: A contextual AI system can pick up patterns and features in the data, and extrapolate context clues from a few supervised learning cases to gain a deeper understanding of any situation. This allows your AI system to learn in an unsupervised manner, figuring out new scenarios on a case-by-case basis — just like a human would.
Transfer knowledge between contexts: This means your AI system is able to take what it learned from one context and apply it to another to perform better on a similar task. For example, a contextual AI in charge of transcribing a company meeting could instantly recognize and link a project name that was once mentioned in a different meeting.
Infer context to problem-solve: As it learns from each interaction, your AI system gets better at considering every aspect of a situation to deduce what the end-user truly needs at that moment. For example, a self-driving car could capture environmental cues, like wet roads and pedestrians ahead, then automatically reduce its speed.

Achieving this “human level” of intelligence, however, also requires a few key ingredients that you need to build in:

Domain knowledge: Contextual AI lets you train your models with business-specific data for much more detailed, accurate, and valuable results. This is a step up from the generic outputs you usually get after training your models using bulk aggregated data from AI providers, like Google Vision.

For example, in the case of automatically meta-tagging images, a generic AI would add simple tags like “hobbit” and “ring”. Whereas contextual AI would be able to add more helpful tags like, “Frodo Baggins”, “The One Ring”, “inside Mount Doom”, and “dangerous” — which give you a much better idea of the specific scene it’s referring to.

Furthermore, now that the AI understands what “dangerous” looks like, it can intelligently add that tag to completely different images it has never been specifically trained with. This means you can use smaller, more focused data sets to train your AI, then just set it loose and let it learn as it goes.

Explainability: Explainability is when a system can show what it knows, how it knows, and what it’s doing. Currently, many AI systems operate as a “black box,” where the reasoning behind their decisions is indecipherable. This lack of transparency makes the AI untrustworthy, particularly in safety-critical settings, like cancer detection or criminal facial recognition, where a bad AI prediction can be potentially life-changing.

Contextual AI adds explainability throughout the ML pipeline, from data ingestion to inference. Having this visibility into the inner workings of your AI will help you build better and safer systems that you can easily understand, improve, and steer away from any misguided decision-making.

Customization: As you know, contextual AI has the ability to adapt to situations it hasn’t been specifically trained to handle. But just as you wouldn’t instantly excel at a brand new task, AI won’t always get everything right either.

For a contextual AI system to continually improve, users need to be able to tweak its behavior so it can better meet their expectations. For example, if a music streaming service keeps suggesting questionable songs, you should be able to alter your preferences and get the AI back on track.

Where can you use contextual AI?

Contextual AI is a good idea when a more sophisticated understanding of human situations would improve the user experience. The most common scenarios involve things like self-driving cars, facial recognition, and quality control. But voice-based assistants and conversational agents are where contextual AI can really work its magic.

To give you a better idea of when it makes sense to use contextual AI, consider these two main types of conversations:

Human to machine (H2M) conversations

These involve a person typing or speaking to a conversational agent, like a chatbot or voice assistant. While they could all benefit from contextual AI, not every setting actually needs it.

For example, if you develop a closed-domain chatbot with the sole purpose of tracking food orders, it can coast by perfectly fine using a rule-based system. In contrast, a virtual assistant at an automated customer support call center will surely induce human rage if it can’t make sense of basic requests.

With contextual AI, you can make the virtual assistant recall historical data, user inputs, previous interactions, and even identify the caller’s emotional state to steer the conversation. This added intelligence would make the entire interaction considerably smarter, smoother, and more user-friendly.

The decision to use contextual AI in an H2M situation largely boils down to the complexity of the conversation you expect it to handle. The more aspects your AI system will need to consider (e.g. emotions, intent, etc.), the more impactful contextual AI would be.

Human to human (H2H) conversations

These conversations can be between two or more people, like a business conference, a sales meeting, or a telehealth video call. H2H scenarios are ideal playgrounds for contextual AI since human conversations are largely unstructured and contextually ambiguous — which will quickly overwhelm a generic AI system.

For example, an e-learning session between a teacher and a group of students could see the conversation go in any direction. The students will likely raise questions, bring up related topics, or mention things like past homework and upcoming exam dates.

If your AI had to transcribe this lesson (either from a recording or in real-time), contextual AI ensures your system can understand what’s being said with the same level of perception as a fellow student. It could then surface the most important information for later in the form of notes, listed topics and questions, or even suggested action items and follow-ups.

Without contextual understanding, much of that data would be reduced to a heap of intangible, random information, and you’d have your work cut out attempting to train your AI on all the possible topics future lessons could cover.

Contextual AI clearly adds a valuable dimension that results in more human-like behavior, more meaningful insights, and vastly better user experiences. So, now that you know what it is and why it matters, next consider learning and what tools you need to implement it.

Additional reading

For more information on the fascinating subject of contextual AI, check out these links:

The post The What, Where, and Why of Contextual AI appeared first on Symbl.ai.

Best Speech Recognition Building Options for Your Applications

Sekhar Vallath — Wed, 17 Mar 2021 22:01:06 +0000

Open source speech recognition demands more work on your part specifically with data collection, but is customizable and often available for free. With popular speech recognition APIs, you don’t have to worry about building and fine tuning with your data, but your ability to customize will be limited.

Building speech recognition into your application

When integrating speech recognition into any app, you have two main options: open source tools or “done-for-you” APIs.

Open source speech recognition models

There’s always a catch when something’s free. While there are several open source speech recognition models available, they will take more work on your part specifically on data aggregation than their API based alternatives.

Some developer toolkits are easy to customize to fit your needs perfectly, and all of them will let you create offline speech recognition to work with locally hosted data. You get to define your own level of security and privacy in your application without worrying about future expenses for this part of your solution.

Pros:
+ Customizable
+ Possibly free
+ Can be used to run both online and offline deployment

Cons:
– You have to train with your data
– And fine tune, maintain
– And make sure it’s deployed securely.

APIs for speech recognition

These come ready to use but at a cost. While you have data traffic from your application to a server and back, the API ships this data in small, secure packets. You don’t need to build and maintain the solution.

Pros:
+ No coding
+ Security and updates are all taken care of
+ Fast, easy to get working speech recognition in your application

Cons:
– Rarely free
– Don’t offer the opportunity of customization
– Don’t work well for languages with small amounts of available data like the endangered or extinct languages recorded on UNESCO’s
language atlas

Popular open source options and what sets them apart

If you’re working with a spoken language with a very small amount of available data, your best bet is to customize an open source speaker recognition model to fit your needs.

One interesting use case for open source speech recognition is preserving endangered languages for the future. Let’s say you’re building an application that helps the Northern Paiute people (currently estimated to consist of around 300 individuals) use their native language to search the web or make notes while offline. These are the most tried, tested, and trusted models you can use:

Project DeepSpeech (also known as Mozilla Voice STT): Mozilla Voice STT has an English speech recognition model that’s proven relatively easy to adjust to other languages. In just two months, Silicon Valley AI Lab got the system to work on Mandarin Chinese. The tool is available in several programming languages and easy to adjust to other alphabets. So, DeepSpeech would be your best bet for building a speaker diarization model for languages other than English.

Plus, Project DeepSpeech uses the open source AI library TensorFlow to build speech recognition models. The library works so well that DeepSpeech has been used in projects across the globe and is currently a key ingredient in the Papa Reo project, which helps New Zealanders engage with voice assistants in their own languages.

Deepspeech can also be used with Mozilla’s Common Voice dataset to train voice-enabled applications with an ever-growing number of languages, including even those with relatively small speaker populations.

You can find everything you need to get starting with DeepSpeech on Mozilla’s GitHub Deepspeech wiki.

How a speech application learns (Mozilla Research)

CMUSphinx: CMUSphinx is a widely known toolkit designed to handle the issues of speaker recognition in languages with small amounts of available data. The speech recognition library is lightweight, and the toolkit has a very adjustable speech recognizer written in C++.

Pocketsphinx, which is part of the CMUSphinx toolkit, is written in C and offers a lightweight speech recognition engine that works particularly well with handheld and mobile devices. It’s easy to install and set up, but said to have issues with low quality audio.

If you want to use CMUSphinx, you’ll find an active forum of developers on Github, Reddit and Telegram for support.

CMU Sphinx design architecture (Lamere et. al, 2003)

Kaldi: Kaldi is written in C++, licensed under the Apache License v2.0, and has been thoroughly tested in speech recognition research since 2011. The Kaldi ASR is an open source speech recognition tool that runs on Linux and is designed to be easy to modify for your purposes.

Kaldi benefits developers with generic algorithms and easy-to-reuse code. The tool comes with three free datasets for you to start training your machine learning algorithms on.

Diagram of Kaldi architecture.

Wav2Letter++ (by Facebook): Wav2Letter++ is written exclusively in C++ (as you may have already guessed) and is said to be the fastest open source speech recognition system.

This end-to-end ASR toolkit is very similar to DeepSpeech, so if you’re building a model for a small language, this could also be good option. Wav2Letter++ comes with a Python API that makes it easy for you to take advantage of the large libraries in your own Python code through Python bindings.

Wav2letter++ library and architecture (Source: Pratap, V. et al., 2018)

Alizé: Alizé provides the basic operations required for handling configuration files and features, matrix operations, error handling, etc. It has several layers in its architecture: a low-level library for use of Gaussian mixture models, a high-level functionality for model training, speaker diarization, data manipulation, and more. It even has a Java API for use in Android applications.

You can find support for Alizé on GitHub and in a LinkedIn group maintained by the University of Avignon in France.

Basic architecture of Alizé (Source: University of Avignon).

Popular APIs for speech recognition

For an effective speech recognition API that you can quickly build into your application, check out these APIs.

Google Speech-to-Text: Google Speech-to-Text API has four different machine learning models, each one pre-trained and named for the purpose it serves:

The first two APIs transcribe audio. “ASR: Command and search” is made to capture short clips (like voice search) or commands directed at a voice assistant, Smart TV, or another IOT device. Finally, “ASR: Default” is made for offline transcription of longer monologues, like medical transcriptions.

When using Google’s API you benefit from the enormous amount of voice and video data that Google has gathered over the years. This data has helped to ensure a very low error rate and provide the ability to automatically recognize more than 125 languages.

However, the advanced video and phone call APIs have a larger price tag and currently only support versions of English, Mandarin Chinese and Japanese. If you want to customize your API, this might not be your best choice, since Google Speech-to-Text only lets you add specific terms and phrases relevant to your task.

Symbl Conversation API: When speaking of APIs for speech recognition, we can’t ignore our very own Symbl Conversation API. You can use our APIs with minimal technical know-how and pair it with any programming language.

Symbl’s comprehensive suite of APIs can be integrated directly over telephony, real-time audio streams, and audio recordings – saving you the time of building your own real-time infrastructure. Plus, Symbl improves the accuracy of transcription by adding context understanding and enabling you to surface AI-powered insights – all out of the box.

We also have an active developer community on Slack where you can go to learn more, get support, and let us know what you’d like to see in our future releases.

Other popular APIs to consider for your speaker recognition include:

- AWS Transcribe by Amazon
- Speech to Text by Microsoft Azure
- Rev.ai
- Deepgram
- Speechmatics

Additional reading

You can find more info on the solutions mentioned here and likely stumble upon even more at these links:

The post Best Speech Recognition Building Options for Your Applications appeared first on Symbl.ai.

Key Metrics for Evaluating Speech Recognition Software

Sekhar Vallath — Thu, 11 Mar 2021 06:00:27 +0000

Speech recognition software is designed to capture human to human conversations either in real-time or asynchronously. A more accurate record of the conversation requires testing the system and evaluating its efficacy.

Anyone who has used AI for transcription can tell you that automatic speech recognition software (ASR) has come a long way in recent years.

If you’re developing ASR tech, you know that before you can release it into the wild, you have to make sure that it’s interpreting what’s being said well enough to not frustrate your users (even if the mistakes are hilarious).

Live captioning gone wrong (source)

When evaluating an ASR system, you can compare what the system can capture against a conversation that has already been accurately transcribed. This lets you look at the two conversations and compare the accuracy of them based on certain metrics.

To help you avoid these frustratingly funny faux-pas, you can use the following metrics to evaluate how effective your ASR is at accurately capturing human to human conversations (H2H), like sales calls or company meetings.

Word error rate

This is the most common metric used to evaluate ASR. Word error rate (WER) tells you how many words were logged incorrectly by the system during the conversation.

The formula for calculating WER is: WER = (S+D+I)/N

S = substitutions: When the system captures a word, but it’s the wrong word. For example, it could capture “John was hoppy” instead of, “John was happy”.
D = deletions: These are words the system doesn’t include. Like, “John happy”.
I = insertions: When the system includes words that weren’t spoken, “John sure was happy”.
N = total number of words spoken: How many words are contained in the entire conversation.

WER is a good starting point when evaluating an ASR system because it gives you a base number to work with. You know the overall accuracy as a percent for your system. A WER of 25% or less is considered average.

The one thing that WER doesn’t tell you is where the mistakes happened. For that, you can use these metrics:

Levenshtein distance

The Levenshtein distance is a string metric that measures the distance between two words. This means it calculates the difference between two words based on how many characters need to be changed to get there.

For example, the Levenshtein distance between bird and bard is one. The difference between kitten to sitting would be three because you’d have to make three substitutions to get there.

Going from kitten to sitting (source)

Like a lot of the metrics involved with evaluating ASR, Levenshtein distance ties back to WER because it measures insertions and deletions. It helps by providing a more in-depth look at what changes are being made, rather than just how many changes happen as a whole.

Number of word-level insertions, deletions, and mismatches

This metric tells you how accurate your translation is at the word level. When you compare what the system captures to the original text, you get an output that tells you how many insertions, deletions, and mismatched words happened.

Number of phrase-level insertions, deletions, and mismatches

Similar to the last metric, phrase-level insertions, deletions, and mismatches tell you how accurate your system is at capturing what was said. The difference is that this measures the accuracy at the phrase level, meaning whole sentences or paragraphs.

Phrase level insertions (source)

Color highlighted text comparison to visualize the differences

Once you’ve used ASR to capture the sample text, you can compare it to the original. Color highlighted text provides you with a visual representation of the accuracy. It helps you see whether or not your ASR is understanding what is being said at glance, rather than having to analyze WER.

This is useful when you’re fine-tuning your system and want to quickly see how accurate it is.

General statistics about the original and generated files

This is another high-level stat. This information is data like character count, word count, new lines, and file size. When you compare this information in the original file and the one generated by the ASR, they should be identical. The discrepancy between this information gives you a bird’s eye view of the accuracy of your system.

Creating better ASR with speech recognition APIs

All this data helps you understand how effective your ASR is at processing H2H conversations. You might find that you don’t need the system to be perfect, but if you’re to use your ASR to help build a conversational AI system, then the more accurate the better.

To create a system that includes the maximum amount of customization, you can use a speech recognition API that provides the following features:

Real-time Speech Recognition via Streaming and Telephony for unlimited length with less than 300 ms latency
Word-level timestamps
Punctuation, as well as sentence boundary detection
Speaker diarization (speaker separation)
Channel separated audio/video files
Custom vocabulary to recognize your custom keywords and phrases
Sentence-level sentiment analysis included in the output
Multiple language support

These features are pretty standard for most speech recognition APIs. Symbl’spowerful ASR API offers these standard features, but we take it one step further by allowing our customers to use features like:

Key phrase detection to identify key parts of the conversation indicating important information and actions
Pre-formatted, ready to render transcripts
Enhance the output with speakers using external speaker talking events
Indexed with Named Entities and Custom Entities
Indexed with topics and insights
Support for all the audio and video codecs in asynchronous API

Customizable features like these allow you to better capture conversations as they’re happening. They also allow you to create the exact kind of ASR system that you need.

When your ASR is capable of accurately understanding both what’s being said and the proper context for the conversation, you can really unlock the power of your H2H conversations. One of the fastest ways to do this is by using a conversational API that gives you everything you need to integrate conversational AI into your voice platform.

With Symbl’s suite of flexible APIs, you can quickly add intelligence to your projects and do things like analyze sales calls to identify the most effective tactics, accurately transcribe and summarize e-learning lectures, follow up on important action items by automatically scheduling meetings, and even fact-check speech asynchronously or in real-time. If you want to compare how well your speech-to-text or speech recognition engine is doing against a human-generated transcription, check out our Automatic Speech Recognition (ASR) Evaluation utility on Github.

Additional reading

If you want to learn more, you can check out the following resources.

The post Key Metrics for Evaluating Speech Recognition Software appeared first on Symbl.ai.

4 Reasons Product & Engineering Teams Integrate API Solutions More Often Than Ever

Sekhar Vallath — Fri, 05 Mar 2021 09:49:14 +0000

Symbl’s Head of Product Anthony Claudia explains the meteoric popularity of APIs, and why they help developers and product managers do more faster.

At the beating heart of your favorite applications, you’ll find APIs powering complex actions. Whether they’re sending in-app messages, processing payments, or, like Symbl, formulating insights from voice conversations, APIs allow developers to implement cornerstone app features rapidly, with minimal overhead.

APIs are drastically changing the landscape of app creation. Whereas old-school app developers felt compelled to build every feature from scratch — a clunky, time-consuming process — modern developers can take a modular approach to app building by integrating features powered by pre-built APIs.

“API consumption over the last years has gone bananas,” says Anthony Claudia, Symbl’s head of product. Here’s why APIs are a top resource in every savvy developer’s toolkit.

1. APIs are Cost Effective

Most businesses that explore creating custom features from the ground up quickly realize that a from-scratch build is cost prohibitive. Not only must engineers — likely your company’s most valuable resource — spend months perfecting complex app functionality, but also the costs of fixing bugs, releasing updates, and maintaining code add up.

Some estimate the higher-end total investment to build versus buy a custom feature such as in-app messaging or AI-enhanced transcription may balloon to almost $1 million in just 18 months, depending on engineer salaries, infrastructure costs, and unplanned setbacks. With a third-party API solution, tech companies can save money, better manage budgets, and achieve stable features without excess funds.

2. APIs Prioritize Customizability & Flexibility

Whether it’s your favorite ride-sharing app, food delivery platform, or communication technology, many of the world’s most popular apps are built using third-party API solutions. But the end-user probably doesn’t know this. While turnkey software programs often display their company’s logo, name, and proprietary UI, APIs are less visible. “You can use APIs and they are still your brand, your company,” says Claudia. “If an API offers everything a SaaS product offers you, but your brand stays intact? That’s super compelling. You can build whatever you want and maintain your brand and positioning.”

Additionally, APIs are typically flexible — and they enable developers to tinker with both front-end components and back-end requirements to bring the exact product they envision for their customers to life.

3. APIs Enable Rapid Implementation

Adding complex features to your product can take months, if not years, to achieve. Plug-and-play APIs — which often consist of just a few lines of code — allow developers and product managers to move quickly, minimize tech debt, and build a scalable product that can easily expand to support more users.

With APIs, developers can integrate functionality that would normally have taken months to build in a matter of hours instead.

4. APIs Enjoy an Expanding Customer Base

Increased market focus on APIs has wrought a new customer. “Historically, APIs have been the domain of the developer. It’s a technical product, so you have a technical customer. This is completely shifting,” says Claudia. “The customer has evolved from just a developer to business people and everyone in between.” Product leads and company executives alike understand that leveraging APIs bypasses the headache of time-consuming, expensive, and unstable tech builds.

End-users of products built with APIs, whether they realize it or not, also help to drive API popularity. As more apps aim to remain market competitive, APIs allow product teams to provide the rich experience modern consumers expect in an increasingly digital world.

About Stream

Stream is the best-in-class, enterprise-grade chat and activity feed provider that serves over a billion end users. Stream’s feature-rich products include robust client-side SDKs for iOS, Android, React, React Native, Flutter, and support for the most commonly used server-side languages; scalable and secure APIs; and a beautiful UI kit. Stream is the fastest, most scalable solution on the market today, enabling application product teams to increase user engagement and retention and decrease time to market. Stream is headquartered in Boulder, Colo. with an office in Amsterdam. Learn more about what Stream Chat can do for your business by signing up for a free, 28-day Chat Trial.

The post 4 Reasons Product & Engineering Teams Integrate API Solutions More Often Than Ever appeared first on Symbl.ai.

Enhance Human Conversations with Conversation Intelligence

Sekhar Vallath — Wed, 03 Mar 2021 02:33:33 +0000

Conversation intelligence provides the ability to analyze natural human-to-human conversations in real-time. Going beyond simple natural language processing of voice and text conversations, mission-critical communications can be harnessed, analyzed, and optimized.

Data flowing through digital communication channels

Today, cloud communication products and workflows enable businesses to have conversations over multiple, secure digital channels. Machine learning systems can use conversation intelligence: a specialized form of AI which takes business communication experience to the next level.

You can build conversation intelligence for one-to-one or multi person conversations contextually. Businesses can build features and experiences into the system, empowering an ability to generate shared knowledge and outcomes to get the most out of their conversations.

Using machine learning to enhance the conversation

When you’re in a conversation with another human, AI can assist by analyzing speech patterns in real-time. A conversation intelligence system can undertake speaker separation and identification. You can leverage this for several types of customer conversations where it’s important to optimize engagement and amplify the interaction. For example you can:

Use models for emotional analysis – the conversation intelligence can recognize speakers’ current mood and any mood changes. In a call centre this helps agents avoid making a bad situation worse and lets them wrap up calls quicker and to the callers’ satisfaction.
Take care of tasks that humans don’t need to be involved in – like scheduling meetings, or sending task reminders.
Translate in real-time

When conversation intelligence is continually used the AI keeps learning and gets smarter. This means the system you build will get more and more useful. For example, you can use previously collected data; like a backlog of customer problems, including conversations and solutions from your voice calls; to train your AI to answer FAQs right there in the phone queue or divert certain topics to specialist operatives.

You can also build new conversation intelligence systems for your call recordings if they are stored on stack. These can be used to audit the calls for specific purposes, such as to find keyword phrases, redact sensitive data, or identify coaching opportunities.

You can use some off-the-shelf conversation AI APIs or open-source models to build this system on both voice and text data asynchronously. Symbl offers Async APIs on voice and video text that can be used to aggregate insights and analyze conversation with several aspects in offline mode:

Meta-data like speakers, contact information, title of the conversation.
All members, transcripts and messages in the conversation as well as the topics discussed.
Any questions or requests for information that went unanswered in the call with identified speakers.
Appointments or follow ups.

You can read more here about applying machine learning to voip systems.

Intelligence for human-to-human conversations

Human to human conversation (H2H), whether two-party or multi-party interaction, becomes highly unstructured and fragmented in almost any context (business, social, etc.). In contrast to the understanding of spoken utterances directed at machines (human to machine (H2M) conversation), the H2H conversation relies heavily on a sophisticated pre-built conceptual understanding of the world together with an innate language instinct. The combination of these allows for unstructured conversation flow and context disambiguation. As a result, in contrast to H2M conversations, H2H conversations are very rich and unstructured, filled with ideas and multiple concepts interlinked together.

Naturally, this means that the same intelligence that’s built for understanding narrow commands or wake words (e.g. “Hey Siri”, “Alexa”, and “OK Google”) is neither compatible nor sufficient to make sense out of the conversations where human beings are talking to each other. Symbl.ai is built to address that need – to provide intelligence specifically for H2H conversations.

Why talking with a human will never go away

H2H and H2M are the two main branches of conversation AI. Smart devices with digital voice assistants, such as Alexa and Siri, or applications built with platforms like DialogFlow) can process and recognize what humans are saying to them, but they are usually built to support narrow use cases that are transactional in nature and are not able to understand and capture knowledge when there are more than two humans. While there is currently an enormous amount of investment in the research necessary to make machines capable of understanding human language, most of the use cases and applications on the market today are limited to a basic mapping of the statistical modeling of the right intents and the response to it.

The H2H branch of CI is vital for unstructured human conversations and can allow your application to:

Gain a general understanding of the world and be able to relate to it.
Build abstract knowledge over time.
Reason with its understanding and learning within the context of the current dialog.

For example, when booking a flight with an agent, your application could derive insights related to where you want to go and recommend a plan for your itinerary. This would be a completely different conversation with a machine as it would focus only on the set of questions, intents and actions that it is trained for. It would not be capable of engaging in a real conversation with you to understand your individual needs to dynamically propose the best solution.

Conversation intelligence is the new superpower

Conversation is a vital part of human interaction. Symbl has learned that on average each person communicates around 10,000,000 words per year. That’s a lot of information!

It goes without saying that your business clients have mission-critical communications, and so the ability to harness, analyze, and optimize their power is crucial. Symbl.ai is a developer platform for H2H conversation intelligence and allows you to build your experience without any need to build or train a machine learning model.

WIth contextual AI for H2H conversations you can build a cutting-edge (and very cool) intelligence that actually hears what you’re saying, communicates, and allows you to identify and recommend solutions in a more natural, intuitive, and sophisticated way – with intelligence that is passively embedded in all communication workflows. Using the previous example of booking a flight, with contextual AI, you can explain a practical problem with changing your flight, including your reasons, and the system will understand the context and provide a solution.

Conversation intelligence products

Symbl.ai, is a platform that can be used to develop and/or enhance any vertical or use case specific application. However there are other technologies that have been developed to address the need for conversation intelligence in very specific use cases. Here are a few examples:

Chorus.ai and Gong.io are conversation AI tools providing insights for sales and service organizations. They can record, transcribe, and analyze meetings and calls.
CallMiner and Uniphore provide speech analytics and interaction analytics CI software for use in call centres.
Tethr has been developed to ensure compliance by providing conversation AI insights into calls and also manage risk, provide an audit trail and redact sensitive information.
Talview and Hirevue use conversation AI software in recruitment interviews conducted remotely by video and phone.

Conversation intelligence APIs

A conversation intelligence API is generally defined as a web service that can be used by developers and which is capable of contextually analyzing natural conversations between two or more humans. This service can understand the meaning of the conversation instead of requiring keywords or wake words as digital voice assistants and chatbots do.

Conversation intelligence can augment human capability by surfacing many things that matter in real time, including questions, action items, insights, contextual topics, and signals. A conversation intelligence API provides an extensible interface that developers can use to consume the contextual AI capabilities in their preferred communication interface.

Symbl is a conversation intelligence platform that provides real time and contextual AI capabilities, with a secure and scalable infrastructure, and programmable APIs and SDKs for developers. This enables businesses to get their products to market faster and without investing resources in building their own real time ML. infrastructure. They can focus instead of building differentiated pre/in/post conversation experiences.

Learn how to build a conversation intelligence system in our next post in this series.

Data Scientist(AI in NLP)

What is GPT-3 and is it the Next Revolution in Natural Language Processing?

What is GPT-3?

GPT-3 strikes a balance between size and skill

Making GPT-3 work for you

Examples of apps that are using GPT-3:

GPT-3 in action

The GPT-3 buzz

Is GPT-3 a NLP revolution?

Additional reading:

Running Inference With BERT Using TensorFlow Serving

What is BERT, anyway?

Using BERT with TensorFlow Serving

Using BERT as a language model

Saving your TensorFlow model In SavedModel format

Running inference on the served model

Additional reading

A Bite-Sized Guide to Building Contextual Conversation Intelligence Solutions

Solving the problem of contextual understanding

Conversation summaries: the challenge

Conversation summaries: 3 steps to building a model

1. Identify topics

2. Build timelines

3. Create summaries

Create your own conversation intelligence models

Additional reading

What It Really Means to Add Context to Your Conversation AI

Why is context so important in AI?

Where can your AI get context from?

Example: Capturing context to augment a sales call

Upgrade your conversation AI with the help of APIs

Additional reading

Ethical Conversation Intelligence – Managing Bias

The delicate balance of trust in Conversation AI

Why are ethics important in AI?

What constitutes bias?

How to test your model for bias

Bias testing lifecycle

When have you done enough?

Further reading:

The What, Where, and Why of Contextual AI

What is contextual AI?

Why does contextual AI matter?

Where can you use contextual AI?

Additional reading

Best Speech Recognition Building Options for Your Applications

Building speech recognition into your application

Open source speech recognition models

APIs for speech recognition

Popular open source options and what sets them apart

Popular APIs for speech recognition

Additional reading

Key Metrics for Evaluating Speech Recognition Software

Word error rate

Levenshtein distance

Number of word-level insertions, deletions, and mismatches

Number of phrase-level insertions, deletions, and mismatches

Color highlighted text comparison to visualize the differences

General statistics about the original and generated files

Creating better ASR with speech recognition APIs

Additional reading

4 Reasons Product & Engineering Teams Integrate API Solutions More Often Than Ever

1. APIs are Cost Effective

2. APIs Prioritize Customizability & Flexibility

3. APIs Enable Rapid Implementation

4. APIs Enjoy an Expanding Customer Base

About Stream

Enhance Human Conversations with Conversation Intelligence

Data flowing through digital communication channels

Using machine learning to enhance the conversation

Intelligence for human-to-human conversations

Why talking with a human will never go away

Conversation intelligence is the new superpower

Conversation intelligence products

Conversation intelligence APIs

Further reading: