In recent years, artificial intelligence (AI) has made amazing progress, particularly in the area of natural language processing (NLP). Systems known as large language models, such as GPT-3, Google's LaMDA, and Wu Dao 2.0, can now generate remarkably human-like text and engage in persuasive dialogue on virtually any topic imaginable. Their abilities seem to grow more impressive with each passing month.
But as AI researchers warn, "Language models don't have a fundamental understanding of the world. While they can skillfully manipulate language and string words together in coherent and meaningful ways, they lack a true understanding of the concepts, facts, and common sense that humans accumulate through lived experience. As such, their output can appear convincing on the surface, while containing falsehoods or biases that require human scrutiny.
This presents something of a conundrum when interacting with increasingly powerful language models that seem eager, sometimes overly so, to provide information on demand. How do we tap into their breadth of knowledge while accounting for inaccuracies or potential harm from misinformation? How can we ask good questions that elicit value rather than nonsense or deception?
This series explores principles and strategies for responsibly querying large language models to enrich understanding while minimizing risk. Getting the most out of this AI-powered tool requires care and diligence to keep it on track.
The promise and limitations of language models
Large language models use the natural language processing technique known as language modeling. This involves training AI systems on massive datasets of online text, allowing them to predict likely word sequences with increasing accuracy. By being exposed to astronomical amounts of text from websites, books, articles, and online conversations, the models effectively "learn" relationships between words, phrases, concepts, and contexts.
State-of-the-art models such as Google's LaMDA have been trained on over a trillion words. OpenAI's GPT-3 consumed 45 terabytes of Internet text with 570 billion parameters - orders of magnitude beyond previous systems. In the process, they developed a breathtaking ability to generate remarkably human-like, open-ended text.
However, as AI security researchers explain, such systems are "narrow," possessing language-generation skills without a broader understanding of the world or communication goals beyond reproducing text. They may be able to string words together into meaningful paragraphs and mimic certain patterns, but they lack the meaning behind the words and struggle with abstraction and context.
As such, they may be able to persuasively discuss topics using learned correlations between words, but have a limited understanding of the meaning behind events, facts, emotions, causes and effects, or societal implications of issues. This becomes apparent when their knowledge reaches the limits programmed by their training data. Discussions quickly turn nonsensical when current events or modern human contexts enter the fray.
Research reveals blind spots around harm, ethics, and misinformation. For example, when fed harmful input, large language models can produce output that includes racist dialogues, conspiracy theories, or political statements with serious societal dangers. They inherit the biases and problems latent in their original datasets.
In addition, while they may cite factual information, it exists alongside fabrications created with equal confidence. When they lack understanding of a topic, they attempt to "fill in the gaps" by smoothly generating new text that has high linguistic plausibility but no factual accuracy. They tend to hallucinate coherent sounding statements that are devoid of truth.
This requires thoughtful interrogation when using large language models for knowledge, decision making, or advice. Without proper safeguards, we risk spreading misinformation, embedding bias, or propagating harmful instructions that are as persuasive as the wisdom they can share when properly guided.
The critical skill of asking good questions
Interacting with large language models becomes an exercise in asking questions. Like automated reference librarians, their role is to respond to user queries by providing relevant information. Unlike human experts, however, their judgment cannot be fully trusted. The onus remains on human operators to carefully consider how they probe these systems.
Asking questions is indeed an art! The questions we ask determine the answers we get. Poorly framed questions lead to dead ends, while artfully crafted questions open doors to insight.
Psychologists note that biases are baked into dialogues depending on who is asking the questions and who is providing the answers. Linguist note that the power dynamic shifts back and forth between questioner and answerer, with the questioner steering the conversation. Asking questions becomes "a linguistic strategy for taking control.
Unfortunately, the average person is not trained to structure questions for clarity and truth-seeking. Most take a rather haphazard approach to questioning information sources. When questions exhibit ambiguity, false premises, narrow assumptions, or unconscious bias, the answers ultimately prove misleading at best or dangerous at worst, sending seekers down fruitless rabbit holes.
When dealing with AI systems that lack human judgment, it becomes imperative to hone questioning skills to minimize the risk of deception or misdirection. Doing so allows you to tap into their knowledge, while screening for limitations through critical thinking. This requires thoughtful curation of questions to support, rather than undermine, the search for truth.
The art of responsible questioning
In his seminal work The Art of Questioning, critical thinking expert Neil Browne outlines concepts that translate well to querying large language models:
- Ask open rather than closed questions: Closed questions limit answers to "yes/no" options or narrow paths that preclude meaningful responses. Well-crafted open-ended questions allow room for thoughtful responses.
- Separate facts from inferences: Fact questions seek objective truths, while inference questions ask for subjective interpretations from personal lenses or assumptions. Recognize the difference.
- Consider shades of gray: Questions that suggest only dualistic positions often obscure complex realities. Welcome nuanced perspectives between the extremes.
- Make assumptions explicit: Reveal underlying assumptions, backgrounds, or beliefs that shape questions, to check that they pass logical muster rather than smuggling in bias.
- Determine Relevant Context: Precisely define the frameworks, circumstances, domains, and contextual factors associated with questions. Don't ignore key details that guide valid responses.
- Examine implicit values: Determine whether questions subtly convey embedded values, ideologies, or judgments that shape the direction of responses. Be neutral to avoid bias.
These guidelines help avoid common pitfalls that undermine effective questioning. However, irresponsible questioning can undermine truth in more subtle ways. For example, poorly constructed questions can
- Phrase questions suggestively to confirm pre-existing beliefs or elicit predictable responses (confirmation bias)
- Omit crucial details that significantly affect the appropriateness of responses (errors of omission)
- Making false assumptions or assuming misleading "facts" (false premise questioning)
- Load language with emotional connotations designed to provoke certain reactions (manipulative framing)
- Propose absurd scenarios divorced from reality (flawed hypotheses)
- Demand ridiculous oversimplifications of multifaceted issues (reductionism)
Such structural flaws corrupt lines of inquiry from the outset. Even sources that contain wisdom or accuracy are derailed by trick questions that do not seek truth in good faith. The result becomes a dialogue of the deaf rather than a sound inquiry.
When such questions are directed to AI systems that lack human discretion or skepticism, the dangers are magnified exponentially. Without lifelike clarity about contexts, implications, or potential misdirection, they will generate responses that match the tone and assumptions baked into the original questions, leading users down fruitless dead ends.
Therefore, responsible questioning becomes imperative before we can expect reasonable answers from large language models. We must take great care to seek the truth, to explain context, to clarify assumptions, and to frame questions in an unbiased way to elicit knowledge rather than error or harm.
Key strategies for responsible questioning - a 9-part series
The above analysis reveals much room for improvement in human questioning. To responsibly harness the power of large language models for learning and decision making, while maintaining truth, ethics, and accuracy, we need principles and practices that promote reliable questioning and fruitful engagement.
This is the inspiration for this series focused on equipping users with strategies for querying large language models in a conscious, truth-seeking way. Each article will provide concrete tips and guidelines for structuring better questions to maximize productive dialogue with these systems.
While not exhaustive, these recommendations are intended to raise users' awareness of common pitfalls that undermine effective querying, so that they can consciously improve their skills in this area. With thoughtfulness and care, we can overcome the inherent limitations of AI systems by intentionally applying human wisdom, critical thinking, and ethical questioning.
The articles will address key facets of responsible questioning, such as
- Framing neutral and non-leading questions
- Specifying precise context and areas of relevance
- Making assumptions and values explicit
- Determining necessary levels of complexity
- Apply rules of logic and reason
- Seek justified evidence and reliable sources
- Structuring multi-step inquiries
- Monitoring for red flags and self-correction
- Providing appropriate feedback on system performance
By incorporating such responsible questioning strategies into our interactions with increasingly powerful language models, we can continue to benefit from the progress of AI, while at the same time controlling the risks posed by emerging technologies that lack human wisdom. The goal of this series is to provide users with the conceptual knowledge and practical methods to achieve such responsible questioning in order to unlock the benefits, rather than the dangers, of such systems.
The next article will explore the principles of framing neutral, unbiased questions that avoid baked-in assumptions or logical fallacies that compromise dialogue. Subsequent articles will build further skills in applying introspective, ethical questioning when relying on AI assistance for decision making or information gathering.
As AI capabilities accelerate, now is the time to sharpen skills for keeping these technologies on track through mindfulness, critical thinking, and intentional questioning that elevates truth while maintaining ethics. I welcome readers on this journey to empower responsible engagement with powerful but limited big language models.
Glossary
- Large language models - Sophisticated AI systems trained on vast datasets of text to generate human-like language and engage in dialogue. Examples include GPT-3, Google's LaMDA, and Wu Dao 2.0.
- Natural language processing (NLP) - The branch of artificial intelligence focused on enabling computers to understand, interpret, and generate human languages. Key technique is language modeling.
- Language modeling - Training AI systems to predict probable sequences of words through exposure to massive text datasets. Enables generating smooth, coherent text.
- Parameters - The internal variables or data weights within a machine learning model that are adjusted during training to improve its accuracy. More parameters allow modeling more complex patterns.
- Generalization - The ability of AI systems to extend their learning on a training dataset to new, unseen data. Poor generalization means struggling with unfamiliar inputs.
- Overfitting - When models become excessively tuned to idiosyncrasies of their training data rather than learning generalized knowledge. Leads to poor performance on new data.
- Out-of-distribution data - Input data that differs meaningfully from what a model was trained on. Can reveal limitations in generalization.
- Narrow AI - Systems that demonstrate intelligence in narrow domains but don't have generalized reasoning abilities. Large language models are currently narrow AI.
- Abstraction - The ability to interpret concepts at higher levels of meaning by identifying patterns, rules, themes, etc. behind specifics.
- Common sense - The collection of background knowledge, assumptions, social norms, causal intuitions and practical reasoning ability that humans accumulate through experience.
- Hallucination - When AI systems generate fabricated output that seems plausible but does not reflect truthful, factual reality.
- Confirmation bias - The tendency to search for or interpret information in ways that conform to one's preconceptions or hypotheses. Can skew inquiry.
- Reductionism - Explaining complex phenomena by reducing them to simpler constituent parts rather than recognizing multifaceted aspects.
- Leading question - Questions phrased to influence someone's response by making assumptions or suggesting desired answers.