Building Prompt Stacks That Actually Automate Sales Funnels

Building Prompt Stacks That Actually Automate Sales Funnels

1. Why prompt stacks beat single-shot GPT automation every time

If you’re connecting ChatGPT to your sales funnel through Zapier or Make, and relying on one monolithic prompt to summarize a lead, score it, determine intent, and suggest an action — yeah, that’s why half your leads get dumped into the wrong Airtable view and the other half get stuck with no follow-up.

Prompt stacks give you checkpoints. Not just in logic, but in visibility. You can see where your inputs go off-rails. I split out five micro-prompts for a B2B SaaS form: one to rewrite the job title, another to detect company size, then a risk evaluator using previous win-loss notes, followed by an objection predictor (surprisingly solid), and a final action trigger.

I didn’t do that from the beginning, obviously. It started as a single 400-word GPT-4 prompt inside a Make.com scenario. It worked… for four days. Then one Friday afternoon, I checked the pipeline, and GPT was confidently assigning product-qualified status to a freelancer signing up with a Hotmail address. That’s when I decided to break it into smaller prompts, stitch them with HTTP modules, and capture each stage’s JSON to a hidden Notion page for later review.

2. How I pass state across stacked AI prompts using hidden fields

If you’re doing stacked GPT prompts inside a Zapier flow or Make scenario, you’ll hit this weird cognitive gap: the LLM can’t remember what it decided two steps ago, and you probably didn’t store it anywhere. Learn from my mistake. Always pass interim prompt results in structured objects — key-value pairs — and map them between steps explicitly.

Instead of just piping in raw text, I now always store state in a Notion database row or an Airtable base with fields like:

  • lead_score: Numeric value returned from GPT
  • company_size_tier: Output from my size categorization prompt
  • likely_objection: Text from my objection predictor prompt

Then I explicitly retrieve and inject them into each subsequent prompt with sentence templates like, “Given a company of size tier {{company_size_tier}}, and a predicted objection of ‘{{likely_objection}}’, what is the next best message?”

Make.com makes this more painful than it should be — the JSON has to be massaged with their text parser before you can reliably map custom structured data forward. I’ve seen a couple scenarios suddenly pass empty strings due to a non-breaking space character inside the original GPT field. No console error. Nothing visibly wrong. Just ghost data.

3. Detecting garbage inputs before they hit the first model

You’d think your front-end form validation would catch most junk, but it won’t. Especially if you’re running Facebook Lead Ads or pipe leads in via import tools. I finally put a pre-filter GPT prompt in front of my actual processing chain: a “Sanitizer” prompt.

Quick logic of the sanitizer step

- takes in raw name, email, job title, company
- checks for missing values
- rejects known junk domains (like mailinator, yopmail)
- flags job titles that are meaningless ('student', 'me')
- returns a structured result with is_valid: true, and notes

Saved me from a weird bug where someone typed their last name "\u202EemanoneeL" (with a right-to-left override character to look like “Leonna” in reverse) and GPT totally lost it — stripped vowels, hallucinated gender, and assigned them to “tech support middle management” with 90+ confidence.

I now run this sanitizer as the very first OpenAI module. 20 tokens saved equals 2 hours of cleanup saved later.

4. When Make.com retries break your OpenAI prompt logic

This is the part that drove me into a two-day spiral. I had a perfectly working five-step GPT prompt chain inside Make. Then — randomly — the webhook fired twice. Once immediately. Once six seconds later. Same payload. Except OpenAI misbehaved on run #2.

Turns out Make.com, when a webhook chain fails past two HTTP modules, will auto-retry the whole scenario with the same payload… but it doesn’t isolate the HTTP history or GPT logs per execution. So you think you’re debugging run A, but you’re actually seeing merged traces from run A and B.

I only caught it when my Notion GPT notes started including snippets like:

“Original company: Dropbox. Second guess: Amazon. You may have submitted the person twice.”

GPT was responding to itself across runs.

Now I use a run_id UUID variable generated in the first step, passed all the way to the last. Each GPT prompt logs its output tagged to run_id, and I force deduplication by storing run_id keys in a Redis bucket via Hookdeck before kicking off the Make scenario. Sounds like overkill — is overkill — until you’ve seen GPT whisper to itself like an amnesiac ghost.

5. Dealing with hallucinated contact roles from scraped email fields

This one bit me early. If you pipe in scraped emails from enrichment tools like Clearbit or Apollo, GPT will happily assign fictional titles. I once fed in name: Jacob, email: contact@overwatch.cloud and asked, “What’s likely Jacob’s title based on domain?”. GPT answered: “Head of Infrastructure” — with confidence. Reader, the org had three people and no infra team.

There’s no good way to fix this with GPT alone because it turns patterns from previous leads into confident lies. So now I combine a domain-based lookup (via Hunter) and a GPT prompt that scores role likelihood between 0–1 instead of assigning a role outright. More reliable output:

{
  "sales_like_score": 0.78,
  "engineering_like_score": 0.2,
  "is_head_role": false,
  "suggest_role": "senior operations analyst (low certainty)"
}

It’s clunkier, but helps you weed out hallucinated CMOs when the domain doesn’t even have an /about page.

6. How automations die when AI-generated values exceed field limits

Airtable limits text fields to a little over 1000 characters — almost never documented clearly. So when GPT outputs a 1200-character follow-up message (because of course you wrote a verbose prompt), Airtable field updates silently fail. I didn’t even know this was happening until weeks later, when sales told me they weren’t getting any next-step messages.

No error in Make.com. No failed module. Just… a step that writes nothing. It wasn’t until I added a conditional that logs outputs over 900 characters to a separate notes record that I found this:

“Recommended message: We at [BLANK] understand that [user goal insight here]…”

No closing, no link, because nothing made it into the actual field.

I now truncate every GPT-generated message with a final pass like this:

"Output must not exceed 880 characters. If needed, reduce tone verbosity. Bullet points preferred. No intro or fluff."

Very specific, but it stopped randomly silent records.

7. Prompt chaining works better when you avoid clever formatting

There’s this temptation to over-pro engineer your prompt outputs: YAML, HTML comments, embedded meta-code — none of it survives long across multiple GPT modules. Especially when you move data through tools like Airtable or Notion that auto-format or escape characters in transit.

At one point, I was returning GPT output like this:

---
action: send personalized onboarding
confidence_score: 0.74
notes: “User seems interested in API features.”
---

…which worked fine until Notion started changing the smart quotes and flattening the dashes. Later prompts failed to parse it as structured YAML. I burned two hours reformatting it before realizing — just switch to basic JSON. Ugly, resilient, parseable even after passing through Airtable’s markdown renderer.

Now I only use one format across all GPT outputs in stacks:

{ "action": "send intro", "score": 0.74, "note": "Mentions Hubspot" }

It’s boring. But my entire prompt chain stopped arguing with itself.

8. Using human fallback notes to debug broken AI scoring decisions

After one horrific week of the AI labeling nearly every high-ICP lead as “casual curiosity” because they used emojis in the message, I started appending a second field: “Why this score?” Every GPT step must return a reason string along with the numeric or object result.

Sounds trivial. But when the model scores a lead as 0.35 sales-likelihood, reading that it was “due to mention of browsing for fun and no budget discussion” immediately made us go: oh, this person needs nurturing, not a 10-minute sales call.

I bake it into every instruction now:

"Explain your score decision in one sentence using only information from input."

And yeah, it slows token usage. Adds maybe 20–30 more per run. But debug speed improved massively — especially when Sales asks why GPT said not to follow up, and you don’t want to say “the model just didn’t feel like it.”