Blog

2026-04-13 · 5 min read

What Makes a Good Simulation — An Input and Question Design Guide

Output quality in AI opinion simulation scales directly with input quality. Three principles for good background materials, five inputs to avoid, question design, and the top mistakes — in a practical checklist.

AI Opinion SimulationSimulation InputQuestion DesignPractical GuideStarling Checklist

After running simulations a few times, one question comes up more than any other. "Why did the result come out weird?"

In almost every case the answer is the same. The input was either insufficient or contaminated. This post covers how to design that input — background materials, the question, the population — and reads closer to a checklist than a theory piece.

1. Three Principles for Good Background Materials

Principle 1 — Facts Only

Good background materials contain only verifiable facts. Product specs, price, release date, competitor info, publicly available market data. That''s the entire list.

Sales numbers, pre-orders, post-launch reviews, poll numbers — these leak the answer. If the model sees the answer, it simply reproduces it. The validation value disappears.

Principle 2 — Specify the Population by Ratio

"100 general consumers" is a bad setup. The model produces a fuzzy "average consumer" abstraction. You have to break it down concretely.

Good example:

  • Year 1 (S24 users): ~15% early adopters
  • Year 2 (S23 users): ~30% considering replacement
  • Year 3 (S22 users): ~30% replacement window
  • Year 4+: ~25% performance degradation

When age, income, region, and interest ratios are specified, the simulation becomes much more accurate. Target representation is a ratio problem.

Principle 3 — Provide Enough Comparison Context

Consumers don''t decide in a vacuum. Include competitor specs and prices so agents can realistically compare. For a policy simulation, include the opposing camp''s platform. For an ad simulation, include competitors'' concurrent campaigns.

2. Five Inputs to Avoid

  1. Sales, pre-orders, signup counts — they leak the answer. The model reads "already selling well" and over-reports positive sentiment.
  2. Post-launch reviews or community reactions — the model will reproduce them.
  3. Subjective evaluations of competitors — phrasing like "X is overhyped" injects bias.
  4. Poll numbers — especially fatal for opinion simulation. Their mere presence gives away the answer.
  5. Conclusion-implying statements — "this product is innovative," "this policy will face heavy backlash." Separate facts and judgments.

Rule of thumb — never include information you would only know after the simulation.

3. Question Design Principles

One Question, Clear Scope

One simulation answers one question. The urge to ask about product reaction, price sensitivity, and competitive positioning all at once is strong — but the result blurs. Split and run them separately; the answer gets sharper.

The Category Determines the Question

Starling uses category-specific simulation logic. Pick the category first, then frame the question the way that category asks.

  • Marketing Reaction: "How will consumers react to this product?"
  • Public Opinion: "What is the opinion distribution on this issue?"
  • Policy Debate: "What is the pro/con structure on this policy?"
  • Crisis Response: "How will opinion flow after this announcement?"
  • General: "How will the people around me react to this decision?"

Good vs Bad Questions

❌ Bad: "Is this product good?" ✅ Good: "How will consumers react to the Galaxy S25? What are the likely criticism points?"

❌ Bad: "How will public opinion land on this policy?" ✅ Good: "What is the age-by-income opinion distribution for a 4-day-workweek announcement? What are the core arguments on each side?"

The key is narrowing the context and specifying the lens.

4. Top 5 Mistakes

Mistake 1 — Leaving the Population as "Average"

If you don''t specify the population, the model drifts toward its training-data average — a fuzzy group. Always provide concrete ratios.

Mistake 2 — Trusting a Single Run

LLM-based simulation is stochastic. One run is not trustworthy. Run three and compare — this is the minimum. If the three diverge materially, the input is weak or the question is ambiguous.

Mistake 3 — Taking the Absolute Value at Face Value

T2B (top-2-box purchase intent) absolute values diverge from real purchase rates. Survey over-claim appears in simulation too, to a degree. Focus on direction (T2B > B2B) and treat the absolute value as roughly 50–70% of real purchase rate.

Mistake 4 — Asking Sensory Questions

"How does this drink taste?" "How is this perfume''s scent preferred?" — sensory questions cannot be reproduced by AI. The sensory domain requires actual product experience: FGI, product trial.

Mistake 5 — Missing the Model''s Knowledge Cutoff

Events after the model''s training cutoff (e.g., Thailand''s 2026 election) are unknown to the model. You need to provide the event timeline in the background material for those. Conversely, events already in the training data may already be "answered" — for validation, target post-cutoff events.

5. Pre-Run Checklist

Check these before you run.

  • Background materials don''t contain post-launch info (sales, reviews, polls)
  • Population is specified with concrete ratios
  • Comparison targets (competitors, alternatives) are included
  • No subjective evaluations ("X is great") are mixed in
  • The question focuses on one topic
  • The category matches the nature of the question
  • The question isn''t sensory (taste, smell, touch)
  • You''ve planned at least three repeated runs
  • You''ve verified the model''s knowledge cutoff against the event timing

If any of these aren''t checked, strengthen the input before running. Ten minutes saved on the input costs an hour on result interpretation.

Conclusion

When a simulation result looks off, 90% of the time it''s the input. Another 10% is category choice or question design. The model''s own limits rarely turn out to be the problem.

A good input summary:

  • Facts only (no answer leakage)
  • Ratios for the population
  • Comparison context included
  • One question, aligned with category
  • Three runs minimum

If you want to try it yourself, sign up for the free tier — credits are granted immediately on signup.

Try Starling for AI-powered consumer research.

Start for Free