large language models graphic

 

One doesn’t have to go very far online to witness the rapid changes and transformation reshaping our world and the expectations we have for society in relation to rapidly advancing information technology.

We continue to witness an explosion in the volume of online data available but also questions concerning its veracity, accuracy, and potential harm. That concern has only been increased in recent years with the rapid advancement and breakthroughs in Generative Artificial Intelligence (AI) and the use of Large Language Models (LLMs) that power platforms such as Gemini, ChatGPT and Deepseek and others.

As an article in The New York Times indicated in 2023, “because these systems deliver information with what seems like complete confidence, it can be a struggle to separate truth from fiction when using them. Experts are concerned that people will rely on these systems for medical advice, emotional support and the raw information they use to make decisions.”

The ‘confidence’ that can and should be associated with content derived from AI is a major concern according to Smith Engineering Researcher Stephen Obadinma as well. A PhD candidate in Electrical and Computer Engineering, he notes that while the pace of change and advancement was significant just three years ago, since the launch of ChatGPT in 2022, the pace of change has increased on a new order of magnitude.

“The launch of ChatGPT was groundbreaking,” he says. “Technology suddenly got so good it could generate entire stories like poems, directions, travel guides and much more. Kind of like a big revolution.”

“The tech suddenly got very, very powerful and highly proficient at doing many tasks. But the issue, as many people are now aware, is that the technology is not perfect.”

 

A promising future with enormous challenges

The accelerating pace of progress has come with some significant challenges as many platforms can quite often provide means for false information to be widely disseminated. Researchers have noted that Generative AI platforms, while good at simpler tasks, can still be quite poor at tasks that require critical thinking and analytic discipline required to solve puzzles as an example, yet will present responses that appear competent without closer inspection.

The excitement behind the fast pace of platform advances in AI has meant that many of the deficiencies are not being fully appreciated at a time when some are already seeking methods to co-opt generative AI processes and subvert protections that models have built into their programming.

For example, in the past 12 months, the incidence of what are called “Jailbreak attacks” has grown considerably. Attacks of this kind are designed to manipulate the LLM away from its core design and cause it to produce outcomes that violate its intended design and can be characterized as false or even illegal. For example, models can be set up to answer specific types of questions such as “how do I build a bomb?”

“The issue with ‘jailbreaking’ similar attacks on a LLM is that they are proving relatively easy to accomplish and the models relatively easy to manipulate,” says Obadinma. “This makes a situation where the detection and management misinformation and hate speech, while currently overwhelming, can become much worse due to the scale and speed at which false data can being produced with AI.”

 

Current AI Models

For Obadinma the challenge is rooted in the nature in which LLMs operate. He splits it into two categories of computational processes. The first is simple pattern recognition such as translation where AI has proven highly efficient if not quite often perfect on those tasks. The second level category, though, is what is generally referred to as reasoning analysis which requires higher levels of cognition. Most LLMs have proven they are still not good enough at this level of complexity. When confronted with reasoning challenges, LLMs can often fail to evaluate what they are doing step-by-step at a macro level that can lead to unintended consequences, such as providing travel plans that are impossible to fulfill.

“Most LLMs gain efficiency by breaking down tasks and getting better, faster results by mimicking success in one process and replicating it in another,” says Obadinma. “In doing so, LLMs can skip the wholistic reasoning process that humans subconsciously perform. The inability to perceive that type of issue in a reasoning calculation currently create gaps that bad actors can exploit to manipulate outcomes.”

There are different schools of thought on how this flaw can be overcome. Some advocate for increased training of LLMs so that, as the theory goes, as we scale up the data and training of the size of the model, eventually we will get LLMs and outcomes that more closely resemble human reasoning. Not everyone agrees.

“Others contend that there's like a fundamental limitation to how these models work and basically will never fully realize human reasoning capability,” according to Obadinma. He believes the most effective path forward involves integration of those two approaches. In his vision, the optimal path forward is to pursue more progress in platform capacity and performance but to operate with expected limitations that include offramps for data evaluation by humans. This is where he sees potential for platforms to safeguard against the abuse of “jailbreak” and similar adversarial attacks.

Jailbreaking occurs when someone can add just a little bit of text to a query or add what is called a “trigger token” to break the rules governing how a query is handled and processed. This manipulation can break the LLM and cause it to do things never intended by its original programming as in the “build a bomb” query example.

 

Employing a balanced approach

In his work, Obadinma is seeking to add an evaluative layer to the generation of results that provides an assessment of “confidence” in the data integrity or, as Obadinma terms it, probability that the answer is correct. This would end up having models generate a correlated probability or confidence assessment for query results that rank the data confidence level on a percentage basis.

“So, if the model is what we call calibrated, then when it provides an answer to a query it also says its answer is let’s say 90 percent certain. In this case, it should in theory be correct 90 percent of the time with 10 percent likelihood of confidence failure on such answers,” says Obadinma. “By contrast, if the answer ranks with only 20 percent confidence, then you know the answer is likely to be wrong and you have to do further processing.”

In this approach, Obadinma suggests that the automated pre-evaluation of data confidence itself could be used to filter certain results for secondary human evaluation to provide a more reliably ‘reasoned’ result where needed. In essence, he sees a middle ground that would ensure higher quality results while still allowing for continued fast benefit from a great deal of what a LLM is designed to accomplish.

“If I can give you a confidence level of 90 percent and you can just set a threshold, let's say like 70 percent, anything below that, then you can basically pass it back to a human moderator for deeper analysis,” says Obadinma. “With that approach, you are actively considering the confidence level of data instead of making a binary yes or no decision. That would be an improvement right away over the status quo.”

Obadinma also recognizes that a derivative of an adversarial attack such as a traditional ‘jailbreak’ attack could be used to manipulate confidence scores themselves. From his perspective, this would extend both the need and utility of having additional layers of confidence assessments designed to evaluate outcomes in a similar fashion. Ultimately it will be important in his assessment to continually update and adapt process and factor the need for human interaction with data and assessments.

 

Building a future with more confidence and trust

Looking ahead, Obadinma sees this as a viable approach against what he expects to be a continuously evolving pattern of adversarial attacks on LLMs as AI takes on a greater and greater presence in society. In the same way that we have grown to expect IT security and IT hacking to be an ongoing battle and challenge, protection against AI vulnerabilities may be a new horizon where adversarial attacks require a constantly evolving and shifting response with the ranking of data confidence being central to targeting those efforts to maximum benefit.

“So, once you know the vulnerabilities, and can defend against them, you need to know and expect how to do this continuously,” says Obadinma. “To me, it’s important to keep our focus on having technology create sustainable advantages for society, not more challenges. It’s a parallel track that will allow us to both move forward but to do so with a measure of confidence in what data we can trust and what we need to assess for deeper evaluation.”

 

Stephen Obadinma
Stephen Obadinma is a PhD candidate in Electrical and Computer Engineering