When Auto FAQ Bots Write Answers That Nobody Asked For
1. Setting up the initial question harvesting from real sources
Don’t start with ChatGPT yet. It’ll hallucinate a fake question like “How do you enable mode X in product Y?” when that mode doesn’t exist. Pull input from support tickets, Slack, bug comments in Notion, customer calls, whatever you can find that came from a human’s mouth or keyboard first.
I pulled questions directly from four places:
- Zendesk tickets with tags like “confusing”, “did not understand”, or just too many back-and-forths
- Slack threads from customer success reps asking devs for help
- Notes from recorded onboarding calls (Otter.ai helped here)
- Search logs from the help center (the keywords that turn up no results are gold)
This produced 42-ish usable questions. Exported those into a big Airtable base, added a checkbox for “Include in FAQ”, and started toggling. This step took longer than expected because the Airtable API skips over nested rich text markdown inside long text fields — so if you pipe it into GPT raw, all the formatting is flattened and unreadable.
One weird Airtable behavior: if your FAQ questions are in a linked record field with back-references, Zapier only sees the record IDs, not the visible question text. You have to add an Airtable ‘lookup’ column with the actual text and use Record -> Field -> first()["question"]
or whatever, which feels like a hack but works.
2. Prompting GPT to write clearly but only when asked
I tried throwing all 42 questions at GPT-4 by dumping the list and saying “Write short, technical FAQs for each.” It did, but also made up five new questions with answers like “Yes, this is fully supported.” Which was… not great, because that feature wasn’t launched yet.
Eventually I used OpenAI’s function calling structure and piped in only the questions tagged “Include in FAQ” from Airtable. Then used a structured JSON schema like this:
{
"question": "string",
"answer": "string",
"notes": "optional"
}
This worked fine except for a subtle failure: it reused the exact phrase “To resolve this, follow these steps” in 9 of the answers. I didn’t notice until pasting the batch into Notion, where the repetition was visually obvious. In the API response JSON, it wasn’t.
Now I run a RegEx matcher that caches all prior intros and flags ≥3 reuses. Haven’t found a good semantic de-duper yet that doesn’t overcorrect.
3. Handling stubs and empty answers in automated content
One real moment: I hit send on a batch upload to Intercom’s Articles tool and didn’t realize one answer was literally just “As of now, there’s no documentation on this topic.” That placeholder was supposed to be caught and replaced later. It wasn’t.
The chatbot surfaced that article anyway — and a user sent a screenshot with the caption: “cool FAQ bros.” So yeah, check BEFORE publication.
Quick validation steps I now use before pushing live:
- Anything under 30 words gets flagged for review
- Answers mentioning “not yet implemented” are only allowed with a linked roadmap card
- Answers ending with a question mark get rejected (this happens more than you’d think)
- Paragraphs that include the exact phrase “we aim to” get dubbed wishy-washy and skipped
Another funny edge case was GPT translating certain tech phrases too literally — like turning “soft delete” into “the file lovingly removed itself but remains in spirit.” I wish I was kidding.
4. Avoiding recursive confusion from auto-generated internal links
The first time I let the AI add internal links between FAQ items, it linked every instance of a phrase like “export settings” to the same previously written answer — even if the new answer was about import quirks.
Why? Because I fed it a prompt template that said “When relevant, link known concepts.” GPT decided any matching noun phrase was “relevant” and started hyperlinking them all over like a Wikipedia conspiracy page.
This got bad inside Intercom, where the WYSIWYG editor shows the links but doesn’t flag that they all go to the same place. The live preview showed five links to the same article, all with different anchor text.
I now control this manually with a Notion database of official linkable phrases. Only those are valid internal targets. ChatGPT gets that db at runtime, so it limits to match only where a phrase → URL mapping exists. Not scalable long-term, but worked for our last 60 FAQ items.
5. Connecting live feedback loops into FAQ selection is messy
After our first batch of AI-written FAQs went live, I tried to collect feedback clicks. Intercom logs “Was this helpful?” but not why someone clicked Yes or No.
So I added a Typeform popup using their embed JS. It slides up with one question: “What was unclear about this FAQ?” Responses pipe into Airtable. Then I wrote a Make.com scenario that checks for the answer keyword “outdated” or “not correct” and flags the FAQ for regeneration.
But here’s the friction point: sometimes the content is technically correct, just written in a style users don’t trust. For example, GPT sometimes says “Our team is working hard to deliver this soon.” Sounds corporate. People don’t believe it, even if it’s true.
Eventually I added an Airtable dropdown for actual: “Wrong info”, “Too vague”, “Writing tone”, “Out of date reference”, “External link broken”. Now I can route fixes appropriately. The AI doesn’t get to guess the tone fix though — that stays manual.
6. Trust collapsed when GPT wrote conflicting answers
This one broke hard. For some obscure file versioning issue, two separate FAQs output totally different instructions — both AI-generated, based on slightly different original queries.
One said to duplicate the file before uploading. The other said not to duplicate because that breaks the hash-based sync. Both were based on partial truth. I discovered this because a customer followed both FAQs in order and wrote back: “It told me to do and not do the same thing.”
Turned out I was calling GPT with isolated context per question. No full product context, no feature map. That meant the model didn’t know the two questions referred to the same function.
I now use a context primer prompt before every generation pass. It pulls known limitations from our internal Notion database and includes caveats as a structured YAML chunk. That lives at the top of the system message, so every answer shares the same assumptions.
Downside: costs more tokens, gets throttled more often, but so far — seems to reduce these contradiction events.
7. Using snippets but stopping them from snowballing template spam
One efficient move: I created reusable answer snippets in Notion for things like “Where do I find X setting” or “How long does it take for Y to go live.” These are boilerplate, and GPT will gladly insert them with proper casing and tone.
But on the fifth pass, I caught it slipping full snippet blocks into unrelated QA items. Example: it inserted the “how long until Y goes live” paragraph… into a question about deleting accounts. Why? Snippets were named vaguely (e.g. `snippet_timing_generic`) and it auto-chose them via name similarity.
Now all snippet inserts use a strict trigger list:
- Only inserted when question category is matched (via Airtable tag)
- Each snippet carries a max length cap, and GPT discards it if the existing answer is long enough
- Snippets come with pre-written link logic — if a link is included twice, the insert fails
- We log which snippets were used on which question IDs so we can detect overuse in audits
The win: fewer premature rewrites. The trap: harder to maintain once you’re scaling FAQs across multiple product areas.
8. What parts still feel brittle even after all these workarounds
The ingestion pipeline still breaks sometimes. OpenAI usage logs don’t make it obvious when it failed to generate a specific item — you have to diff your input batch and outputs manually. I keep an n8n flow that tracks object counts at each step now. It pings me if the totals drop between prompt and output without an error in the logs.
And there’s still no great answer for what to do when a feature changes weekly. The old FAQ is technically true for a day, then becomes outdated again. I’ve considered tagging certain answers with “zombie TTL” — auto-unpublish after 7 days unless reaffirmed. Haven’t built it yet, but it’s on the whiteboard.