Our SVP: Elevating AI Through Human Intervention

Having lead product development for Amazon, Dropbox, Meta, Microsoft and Yahoo, Suvrat Joshi has firsthand experience of the impact of bias across online advertising, payment and commerce. He explains the crucial role of human intervention in minimizing bias in our AI-driven models.

Listen to full conversation with Suvrat now!

While Veriff is very much a tech company, a diverse team of human experts are central to our mission to make the internet safer for everyone. From initial research, modelling and development to improving and refining the live product, what differentiates our approach is the use of human intervention to constantly improve our technology.

During a recent conversation for our Veriff Voices podcast series, our Senior Vice President of Product, Suvrat Joshi, explained how Veriff uses a process known as reinforcement learning from human feedback to constantly improve the accuracy of our artificial intelligence-based systems.

Reinforcement learning from human feedback (or RLHF for short) uses human intelligence to review outputs from a machine-learning model to identify and address issues.

“There’s a data set. It’s labelled, it’s got attributes associated with it, and then you run model training, and the model learns. Then you create a model, which is then run in production,” says Suvrat. “The place where humans come in is the labelling and attribution.”

As well as preparing the input data set to train the model, human experts can review and annotate the output data set: “Essentially you inject humans into the loop in order to build a better model,” says Suvrat.

“Humans can add to that labelled data set to enrich it, augment it, or at times even correct it. And that really is helpful because that serves as a fresh input to the model, so the model becomes that much smarter, and the output is that much better.”

This process of human intervention to review and feed improved data back into the model is repeated in a cycle, hence the term reinforcement learning from human feedback.

Humans can add to that labelled data set to enrich it, augment it, or at times even correct it. And that really is helpful because that serves as a fresh input to the model, so the model becomes that much smarter, and the output is that much better.

Suvrat says the RLHF approach is widely applicable across sectors from gaming to fintech and healthcare, for use cases including payments, advertising and social media abuse.

“I won’t call it fraud, it’s really abuse or product abuse,” says Suvrat. “I think it’s a really good one where it’s widely used. They do need a lot of humans in the loop to provide that reinforcement learning or feedback to start capturing some of these strengths.”

Veriff’s core focus of identity verification is a prime candidate for RLHF, since the ability to stop fraud while maximising conversions is highly dependent on the quality of the overall dataset.

They do need a lot of humans in the loop to provide that reinforcement learning or feedback to start capturing some of these strengths.

A key use for RLHF is in reducing bias in machine learning models. This is important because if left unchecked, bias can be progressively amplified over time in AI systems. As has been seen with even the latest and most advanced generative AI models, the results can be unpredictable, and often undesirable.

“You can definitely remove bias,” says Suvrat. “I think that achieving a perfect model output all the time or over a period of time is hard but it’s never impossible, and it’s a great thing to aim for.”

However, to make that happen, Suvrat believes human input is essential.

“Augmentation is always needed. And it’s continuous learning, which allows the model to either stay on point or be improved over time.”

I think that achieving a perfect model output all the time or over a period of time is hard but it’s never impossible, and it’s a great thing to aim for.

As Suvrat points out, many identity verification products on the market are almost fully automated.

“What that means is you have a model, it’s been tuned on some data, and you’re going to throw it out into the real world.,” says Suvrat. “And it performs well in some cases. But it doesn’t perform well in others.”

This level of quality may be acceptable for some use cases, but when it comes to identity verification, accuracy is crucial. This is where an augmented model with human feedback in the loop stands out, offering improved fraud detection rates and better conversion.

“Offering that in a very competitive space is really awesome,” says Suvrat. “Having this human feedback in the loop, we can actually do all of that a lot better. And really all of our customers and everybody in this space is looking for that. We offer the best of both worlds and it’s really price competitive as well.”

“That value proposition is what the customer is looking for. Even if they start out with other solutions which meet their needs in a basic way, they very quickly outgrow them. Because they realize that those applications don’t quite meet their needs.”

Having this human feedback in the loop, we can actually do all of that a lot better. And really all of our customers and everybody in this space is looking for that.

Suvrat recognises that there is a degree of natural suspicion around artificial intelligence, particularly among the general public. RLHF can help address concerns around the use of AI both for Veriff’s clients and their end customers.

“I think that’s an essential part of building confidence,” says Suvrat, “tuning and improving so that we provide our customers with that reassurance that it’s not just something running on autopilot.”

RLHF can help address concerns around the use of AI both for Veriff’s clients and their end customers.

Veriff does offer fully automated products for different use cases, but even these benefit from the high-quality data sets derived from our RLHF process.

“What makes our automated solutions unique and rich is this human feedback in the loop,” says Suvrat. “And of course, we use that sort of labelling and that enrichment globally, in a compliant way, to make those models better. That does allow us to provide a superior product in the market.”