Data Strategist and Scientist

Unlocking the Power of Customer Intelligence: From Data to Product

Supreet Kaur — Wed, 11 Oct 2023 17:53:22 +0000

Remember that moment when you received a personalized recommendation from an online store, and you thought, ‘Wow, they understand what I like!’ It’s all thanks to the power of customer intelligence.

Customer intelligence is collecting and analyzing detailed customer data to understand how to interact with each customer. Companies often gain valuable insights into customers’ behaviors, preferences, needs, and feedback by harnessing this valuable asset, “Data”. Gone are the days when this intelligence was lying in a database or Excel sheets. In today’s world, companies are treating data as a product.

This blog will explore the importance of treating customer intelligence data as a product, pitfalls, and best practices before you embark on this data-driven journey for your organization. I would also cover how to get started on this data-driven journey.

Examples of data as products

Now let’s look at some of the examples of how customer intelligence data can be packaged as a product:

Dashboard to understand customer behavior: Dashboard that captures historical customers’ transactions. It could also capture the feedback from customers in terms of click rates or revenue generated. This could be distributed across the organizations and used by business stakeholders.
AI products fueled by data: Recommendation engines that provide personalized customer recommendations based on their preferences are another example of data as a product.
Churn Prediction Models: Data products that include churn prediction models help businesses proactively identify customers likely to churn. Using these insights, companies can implement retention strategies to reduce customer churn.

Benefits of Treating customer intelligence data as a product

It must be evident through the introduction that companies need to care about this data to stand out in a competitive landscape. So here are the reasons why you should treat it like a product:

Data-Driven Decision Making: Packaging, your customer intelligence as a product, can easily make it accessible to everyone in the organization and empower technical and business folks to leverage it to drive decisions. It is also a step towards fostering a data-driven culture in your organization.
Improved customer experience: Treating data as a product can enable and fast-track AI solutions, hence providing a personalized experience to your customers.
Enhanced Data Quality: When data is treated as a product, there is a greater emphasis on data quality, accuracy, and governance. This ensures that data is maintained, curated, and updated, leading to more reliable insights and decisions.
Increased innovation and literacy: Data as a product can improve cross-collaboration across organizational stakeholders, allowing them to innovate and increase their knowledge of customer data.

How to get started

If you are thinking of getting started on this data-driven journey, it might seem overwhelming, but here are a few simple ways to get started:

Assess Data Assets: Begin by assessing the data assets within the organization. Look for data that can provide insights, drive decision-making, or support innovation in different areas.
Identify potential use cases: Once you have assessed the data, determine how the data can be leveraged to address specific business challenges or enhance existing products and services.
Customer-Centric Approach: Adopt a customer-centric approach when developing data products. Understand the needs and preferences of potential data customers and tailor offerings to meet their specific requirements.
Hire the right talent: To harness the potential of data, it’s essential to have the right talent. This could look like data strategists, analysts, data scientists, and data governance managers.

Pitfalls and Best Practices

Treating data as a product can have ethical implications, so it is essential to be aware. Here are a few things to be careful about and their mitigation:

Privacy: Data as a product requires companies to handle data with utmost care. Data products may contain PII, and organizations must ensure compliance with data privacy regulations, obtain informed consent, and protect individuals’ privacy rights. Implementing robust data anonymization and encryption techniques is crucial to safeguard user privacy throughout the data lifecycle.
Security: Data products can be valuable assets that attract potential threats. Organizations must prioritize data security by implementing robust security measures, including access controls, encryption, and regular security audits. Protecting data from unauthorized access, breaches, and cyber attacks is essential to maintain trust and prevent harm to individuals.
Bias and Fairness: Data products can inherit biases present in the underlying data. Biased data can lead to biased algorithms and discriminatory outcomes, perpetuating societal biases and inequalities. Organizations must address bias and ensure fairness in data products by carefully curating diverse and representative datasets.
Informed Consent and Transparency: Organizations should ensure individuals are well-informed about how their data is collected, used, and shared as part of data products. Obtaining informed consent and providing transparent information about data practices is crucial. This includes communicating the purpose of data collection, individuals’ rights over their data, and the measures taken to protect their privacy and security. Customers tend to engage more with brands they trust
Data Quality Checks: Organizations should have appropriate frameworks to monitor data quality continuously. This can help flag data drift and bias in data so that organizations can mitigate it as quickly as possible. Lower Quality data can impact your data and AI product significantly.

Customer intelligence is not just a buzzword but the bedrock of successful businesses in the digital age. By leveraging customer intelligence as a product, organizations can unlock endless possibilities for growth, innovation, and customer satisfaction. Let us embark on this transformative journey together, where data becomes the key to unlocking a world of opportunities and ensuring a brighter future for businesses and their valued customers.

The post Unlocking the Power of Customer Intelligence: From Data to Product appeared first on Symbl.ai.

A Data Scientist’s Guide to Prompt Engineering

Supreet Kaur — Mon, 27 Mar 2023 19:41:56 +0000

The launch of ChatGPT has sparked significant interest in generative AI, and people are becoming more familiar with the ins and outs of large language models.

Many individuals, including myself, have become fascinated with understanding the inner workings of this powerful tool. It’s worth noting that prompt engineering plays a critical role in the success of training such models. In carefully crafting effective “prompts,” data scientists can ensure that the model is trained on high-quality data that accurately reflects the underlying task.

Prompts are set of instructions that are given to the model to get a particular output. Some examples of prompts include:

1. Act as a Data Scientist and explain Prompt Engineering.

2. Act as my nutritionist and give me vegan breakfast recipe suggestions.

3. Imagine you are a songwriter and create a song based on the lives of data scientists.

Prompt Engineering is a Natural Language Processing (NLP) technique to create and fine-tune prompts to get accurate responses from the model. It is imperative to spend time fine-tuning the prompts for your required use. If not, it can lead to model hallucinations.

4 benefits of prompt engineering

Improved accuracy: Prompt engineering can improve the accuracy of AI models by providing them with more specific and relevant information. This can lead to more effective outcomes and better decision-making.
Increased efficiency: Prompt engineering can increase AI models’ efficiency by reducing the time and effort required to train them.
Customizability: It can improve the flexibility of AI models by allowing them to be used for a broader range of tasks.
Increased user satisfaction: Prompt engineering can increase user satisfaction with AI models by making them more user-friendly and intuitive.

Industry-specific use cases of prompt engineering

Healthcare: Prompt engineering is being used to improve the accuracy of medical diagnosis, develop new treatments, and personalize healthcare. For example, prompt engineering can summarize a patient’s medical history and symptoms, identify potential drug interactions, and create personalized treatment plans.

Finance: In finance, prompt engineering is used to create intelligent assistants that can provide personalized investment advice or help customers with financial planning. These assistants can provide more effective and relevant guidance by customizing prompts based on a customer’s financial goals and risk tolerance.

Education: Prompt engineering personalizes learning, provides feedback on assignments, and creates engaging learning experiences. For example, prompt engineering can generate a personalized learning plan for each student, provide feedback on essays and code, and create interactive stories and games.

Tips and tricks to generate effective prompts

1. Understand the use case in detail to add “keywords” while generating prompts.

Instead of the prompt “Write a blog on the life of a data scientist”, you can instead phrase it as, “Act like a data scientist and write a blog on the life of a data scientist covering various aspects of their life such as the career journey, what to expect in the job and challenges.”

2. Test and refine the prompts based on the output.

If the desired output is not received in the first iteration, try to add more keywords or details to modify the response

3. Reduce information that doesn’t serve any purpose in the prompt.

For example, instead of saying write a short, precise, and exciting description of XYZ product, use “Use 3-5 sentences to write the product description.”

Limitations and challenges

Bias in prompt generations: Prompt engineering can introduce bias into the training data if the prompts are not carefully designed. This can lead to biased models against certain groups of people or produce inaccurate or misleading results.

Difficulty in generating effective prompts: Generating effective prompts can be challenging, requiring domain expertise and a deep understanding of the underlying task. Poorly designed prompts can lead to inaccurate or irrelevant model output.

Limited flexibility: Prompt engineering can limit the flexibility of language models, as they are trained to produce output based on a specific prompt. This can make adapting the model to new tasks or scenarios challenging.

Time-consuming: Prompt engineering can be time-consuming, requiring significant manual effort to design and test prompts. This can make scaling prompt engineering to larger datasets or more complex tasks challenging.

Lack of generalization: Language models trained using prompts may not generalize well to new and unseen data. This can limit the usefulness of the model in real-world applications.

Data privacy concerns: Prompt engineering requires access to large amounts of training data, which may contain sensitive information about individuals. This raises concerns about data privacy and the potential misuse of personal information.

Ethical considerations: Prompt engineering raises ethical concerns around the responsible use of language models, particularly in healthcare, finance, and education. There is a need for guidelines and regulations to ensure that language models are used responsibly and ethically.

By carefully crafting effective prompts, data scientists can ensure that the model is trained on high-quality data that accurately reflects the underlying task. This, in turn, can lead to more accurate and relevant model outputs, resulting in improved performance on various tasks. As such, prompt engineering has emerged as a crucial area of research in natural language processing and machine learning.

The post A Data Scientist’s Guide to Prompt Engineering appeared first on Symbl.ai.

How AI can Effectively Automate Loan Application Approvals

Supreet Kaur — Mon, 30 Jan 2023 16:03:01 +0000

AI can not only automate the credit decision process, it can also make it more efficient and accurate. Machine learning algorithms can be trained on historical loan data to predict the likelihood of a loan being approved or denied; this can help underwriters make more informed decisions and cut down on the significant time and resources necessitated by a manual review process.

Additionally, AI can be used to identify patterns and trends in the data that would otherwise take a significant amount of time to appear to human underwriters. This can help improve the accuracy of the credit-decisioning process. However, it’s important to note that AI is only as good as the data that it’s trained on, so it’s crucial to have a clean and diverse dataset in order to achieve accurate results.

This blog will cover the three topics listed below:

1. Synthetic datasets for loan approval applications

2. Deep learning and its components (such as the neural network)

3. How to use LIME to interpret such a model and determine the features leading to the prediction

Deep Learning and its Components

For the purpose of this blog, I have used a synthetic dataset found online to demonstrate a deep learning use case.

The dataset contains the following columns:

Application ID: The unique identifier of an application
Gender: Gender of the applicant
Married: Marital Status of the applicant
Dependents: Stating if the applicant has any dependents
Education: Stating if the applicant is a graduate or not
Self Employed: If the applicant is self employed or not
Credit History: If the applicant has any previous credit history
Property Area: The property in discussion is urban or rural or semi-urban
Income: The Income of the applicant is low, medium or high
Application Status: Application Status is the target variable and signifies if the application was approved or not

For the sake of simplicity, I have not performed fairness bias testing on the gender attribute, even though it is provided in the data. However, it is generally recommended to examine the distribution of attributes such as gender to check for bias in the data. Unknowingly, bias can creep in the source data and can have serious legal implications.

What is Deep Learning?

Deep learning is a subfield of machine learning that seeks to replicate the pattern recognition abilities of the human brain by training artificial neural networks on large datasets. Deep learning aims to build more accurate predictive models and improve the performance of machine learning algorithms. Deep learning models are trained using neural network architectures and labeled datasets.

What is a Neural Network?

Neural networks are algorithms that attempt to simulate the way the human brain processes and recognizes patterns in data. They are composed of interconnected units called neurons, which process and transmit information.

The diagram below accurately depicts the architecture of a simple neural network. There are three main components to it:

Input Layer: This layer stores and processes the input data. A neuron is the basic unit of a neural network (which are the dark blue circles on the left).
Activation Function: The activation function decides whether the neuron’s input is essential in the prediction process. There are multiple kinds of activation functions, such as sigmoid or tanh.
Weights: They control the strength of the connection between two neurons. In other words, a weight decides how much influence the input will have on the output.

I will further break down the concept of “weight” with the following example. Say that you are trying to make coffee. A standard coffee beverage requires just three ingredients (coffee, milk, and sugar). These ingredients can be referred to as the neurons, because they are the starting point of the process. The amount of each ingredient represents the weight. Once all the ingredients are mixed, they transform into another state. This transformation process is called “activation.”

Credit: Investopedia.com

Hidden Layer: The hidden layer takes all the inputs from the input layer and performs all the calculations needed to generate the output. This is also called “hidden,” as the operations are hidden from the user.
Output Layer: The calculations performed in the hidden layer are then sent to the output layer, where the user can view the results of the computations.

What is Classification?

Classification is a supervised machine learning task in which the goal is to predict class labels based on input data. There are various types of classification, including binary classification, which involves predicting one of two class labels (such as “approved” or “not approved”). In this case, we will use binary classification to solve the task.

You can click through here to check out a live example of how to solve this using the tensor flow classification method.

Before you proceed, ensure tensor flow is installed in your system. Here is a link that can assist you with that.

References:

The post How AI can Effectively Automate Loan Application Approvals appeared first on Symbl.ai.

Explainable AI (XAI): Building Interpretable Models

Supreet Kaur — Thu, 01 Dec 2022 16:43:44 +0000

Explainable AI (XAI) has been growing in popularity over the past few years as the adoption of AI has increased among companies. One of the biggest challenges that businesses face is explaining the complexities behind an AI model to a non-technical audience, which just means that AI is something of a black box for them. For this reason XAI is also called the “glass-box approach” because it aims to cut through the confusion and provide users with techniques that can be leveraged to break down these complex models.

This blog serves as an end-to-end guide regarding the importance of XAI, some practical techniques to implement XAI, and common challenges faced by companies and individuals throughout the process of implementing XAI.

Overview of XAI

XAI is a framework that can be integrated with an existing ML model to understand the output of AI or ML algorithms. This framework is not only used to explain the results behind an ML algorithm but also to get feedback on results and re-train the model based on that feedback. Feedback can be received in terms of suggestions from business stakeholders to detect bias, and then one can tweak the hyperparameters and re-train the model.

XAI is a rule-based approach of early AI that exists in stark contrast with the concept of the “black box” in machine learning. It attempts to answer the transparency issues in the decision-making rationale of the system.

The importance of XAI

There are numerous benefits that arise out of implementing XAI; the top six are listed below:

Reduces errors: The more visibility one has over the ML framework, the less space there is for errors that have the potential to cost a company thousands of dollars. Errors are not limited to choosing the wrong algorithm; they can also look like a billion-dollar lawsuit!

Curbs model bias: XAI allows you to capture bias before your model is deployed and eventually act on it. Bias can be found in all shapes and forms; your model might be biased (favorable) to one sex, company, nationality, etc. It is essential to treat bias so that one of the target populations is included in your model and predictions.

Confidence and compliance: Seeking compliance approvals is an essential step in highly regulated industries. XAI frameworks allow you to explain AI models thoroughly and get the required approvals.

Model performance: One of the benefits of XAI is that it gives you the ability to gather feedback from stakeholders and re-train your model. This allows you to attain optimal model performance.

Informed decision-making and transparency: One of the goals of XAI is to make AI more collaborative. This means all AI-related decisions have the potential to include everyone.

Increased brand value: Statistically, customers are more likely to invest in your brand if they find it trustworthy. Using explainable AI frameworks strengthens your users’ trust and makes them more comfortable handing over their data.

Practical techniques to implement XAI

1. LIME (Local Interpretable Model Agnostic Explanations)

LIME is a model-agnostic technique, meaning it can be applied to any model. The goal of this model is to provide a local approximation. The local approximation can be found by training the model on a small perturbation of the original instance.

How to implement LIME:

1. Choose the instance for which you want to have an explanation of its black box prediction.

2. Discompose your dataset to create fake data points to produce the black box predictions.

3. Weight the new samples according to their proximity to the instance of interest.

4. Train a weighted, interpretable model (such as linear/logistic regression and decision trees) on the dataset with the variations.

5. Explain the prediction by interpreting the local model.

The output of LIME is a list of features alongside their respective contribution to the model’s prediction; this gives you a clear understanding of each contributing feature.

2. Fairness and bias testing

Bias often creeps into data, so it is essential to check the fairness and bias within the data first. Bias can come in all shapes and forms, but having biased data will lead to the exclusion of certain populations and thus an incomplete model.

Below are a few simple ways you can check forbias in your model:

A considerable number of missing values could be an indicator of missing an entire population, which means that the predictions might be favorable for one population exclusively.
Outliers could be another indication pointing toward bias.
Data skewness is another critical factor that should be analyzed because it is often an indicator that one population is more favorable than the other.

3. SHAP (Shapley Additive Explanations)

Shapley Values are a common technique used to assess feature importance. The output is an excellent plot with each of the feature’s importance. For example, if you were trying to predict how many people will test positive for the coronavirus each year, you can assess the influence of removing or adding a feature to the overall predictions

Let’s say the average prediction is 50,000 people. How much has each feature value contributed to the prediction compared to the average prediction?

Comparing it to Game Theory, the “game” is the prediction task for the given dataset. The “players” are the feature values of the instance that collaborate to predict a value.

In our coronavirus prediction example, features such as has_vaccinated, access_to_masks, and underlying_conditions came together to achieve the prediction of 52,000. Our goal is to explain the difference between the actual prediction (52,000) and the average prediction (50,000)—a difference of 2,000.

A possible explanation could be that has_vaccinated contributed to 500, access to masks contributed to 500, and underlying_conditions contributed to 1,000. The contributions add up to 2,000– the final prediction minus the average predicted coronavirus cases

Shapley Values for each variable are trying to find the correct weight such that the sum of all Shapley values is the difference between predictions and the average value of the model.

Challenges of XAI

Limited models availability: Though a few model-agnostic approaches, such as LIME, are available, there are still limited frameworks for complex ML methods.

Accuracy vs. interpretability: XAI seems tempting for an interpretable model, but one may have to compromise the model’s accuracy. This is a tough choice that any data scientist must make when opting for interpretable models.

XAI has proven itself to be a glimmer of hope for companies struggling to implement AI solutions as a result of pushback from compliance enforcement and other authorities. It is being adopted by doctors to predict the mortality rates and suitability of treatment options, as well as insurance industries to predict fraudulent claims.

All in all, XAI represents a way for AI ML models to solve impactful problems for the greater good. It will be interesting to observe the adaptability curve for these frameworks in the coming years and across industries.

The post Explainable AI (XAI): Building Interpretable Models appeared first on Symbl.ai.

Unboxing the Concept of Drift in Machine Learning

Supreet Kaur — Mon, 03 Oct 2022 15:55:20 +0000

Machine Learning Drift is a common phenomenon that occurs once the machine learning algorithm is deployed to production. It can adversely affect the overall performance of your machine learning model if not monitored closely and mitigated at the right time.

This article will provide an overview of machine learning drift and various types of drift, as well as cover some practical techniques to eliminate drift.

What is Machine Learning Drift?

Machine learning and AI models are built on the assumption that historical data projects an accurate representation of the future. But in a fast-changing world, this is rarely the case. The COVID-19 pandemic and the Russia-Ukraine war are two examples of unprecedented events that impacted model predictions.

Drift is a phenomenon where a model degrades over time in terms of performance; one observes a sudden decrease in the model performance compared to the training performance.

Types of Model Drift

The two main types of Model Drift are as follows:

Concept Drift: Concept Drift occurs when input data hasn’t changed, but the user behavior has changed, leading to a change in relationships between the input and target variables. One example of this is when the COVID-19 pandemic changed buyer behavior. People started purchasing more hand sanitizers and masks and spent less on travel. Any consumer-focused model that was trained pre-pandemic wouldn’t have been able to predict this behavior. Hence, there was a a decrease in model accuracy.
Data Drift: Data Drift occurs when the properties of input or output data have changed.

Data drift can further be divided into two types:

Label Drift: This occurs when output data shifts. For example, if you were trying to build a model to predict if an applicant should receive a credit card and a large proportion of credit-worthy applications start showing up.

Feature Drift: This occurs when input data shifts. For example, for the same model described above, if one of the input variables was income and most of the incomes of applicants increase or decrease.

Causes of Drift

Changes in user behavior: Commonly, user behavior will evolve, leading to changes within the input data. This will show up eventually in the model performance.
Bias data: Data Drift can occur as a result of bias in the input data. By “bias” I mean that your training data might favor one population over the other. This can cause the model to be biased as well, leading to inaccurate model predictions.
Training data is not an accurate representation: It is possible that the training or input data that was used to train the model is not an accurate representation of the actual data, which can lead to Data Drift as well. As an example, you might have used consumer data from the USA to train your model but launched the product in India; because users have unique patterns, you will observe model deterioration.

Detecting Drift

The obvious way to detect drift is to monitor the accuracy of the performance. However, in some cases it might not be as straightforward to calculate this accuracy. There are other alternative methods you can use in such cases—two are described below:

Kolmogorov-Smirnov (K-S) test: The KS test is a test used to compare the training and post-training data. The null hypothesis states that the distributions for both datasets are identical. If the alternate hypothesis is accepted, we can conclude that the model has drifted.
Population stability Index (PSI): PSI is another metric that can detect population changes over time. PSI<0.1 means no significant population changes, whereas PSI≥0.2 means significant population change.

Dealing with Drift

Drift is an inevitable phenomenon, so it is better to be prepared and deploy the following mechanisms that can detect it well in advance, which will give you enough time to mitigate it.

Monitor the model: The model’s performance is bound to change over time. This doesn’t mean that the relationship between input variable and output has changed, it just means that the model was not trained on this particular segment of data, so it doesn’t know how to act on it. Hence, monitoring the model is necessary. Companies can develop their frameworks to do that. They can also integrate frameworks such as AWS Sage Maker, Deep Checks, etc., that exist in the marketplace today.
Training and test data should be consistent: Training and test data should be synced. Check that both of them are in the same period and similar location.
Retraining and redeployment: A scenario could exist where the only option is to retrain the model. It is imperative to be prepared for such a scenario. At this point, it might make more sense to analyze the feature importance and add/delete a few that are the leading cause of drift.
Data monitoring: Sudden changes in the data are one of the causes of data drift. It is vital to have data quality mechanisms in place that could flag issues with the data. This will also help you to backtrack the data drift issue and assist in faster capture and hence remediation.
Unboxing the black box: The concept of explainable AI and responsible AI is gaining popularity because it allows you to understand the model output; having such frameworks will ensure that in the case of a shift in the machine learning model performance you can get to the root of the issue instantly. There are open source frameworks available to leverage like AX360 by IBM, What if by Tool by Google to name a few. There are some popular techniques as well such as LIME(Local Interpretable Model Agnostic Explainations )
Data Quality Checks: It’s crucial to have Data Quality Checks in place. Sometimes the drift can be caused by deteriorating data quality. There could be bias in data causing the model performance to decay over time.
Developing Statistical Metrics: Model performance metrics can be used for tracking the performance of supervised learning models. Statistical models including AUC and ROC can be set in place.

Drift can seem to be a challenging problem to solve. However, with the proper mechanisms in place it can be curbed and dealt with as it occurs.

Connect with Supreet Kaur on LinkedIn.

The post Unboxing the Concept of Drift in Machine Learning appeared first on Symbl.ai.