Prompt Engineering Shortcuts That Keep Breaking My Automations

Table of Contents

1. Using dynamic assist prompts inside Notion linked databases

I tried using Notion’s AI assist feature on a multi-table content tracker, which should have been fine, except once I added a “dynamic context” column that carried in subtask tags via a relation, the AI started stalling. It wasn’t that it didn’t work—it half-worked. It aligned bullet formats, hallucinated tone changes I didn’t ask for, and then pasted a blank line like it gave up.

I reached out to their team—no bug reported. But if the related column has more than three multi-select values, Notion seems to clip part of the source context during generation. Actually looking at the UI, what the AI saw didn’t include all the inline tags unless the database was manually opened. So unless you keep the page open and active during generation, your prompt data can get ghosted.

Fix: Use a formula to flatten related properties with some manual formatting inside. Something dumb like this:

format(join(prop("Context Tags"), ", ")) + " | " + prop("Owner")

Then map only that formula output into the AI prompt parameter. Lost about two hours fiddling with conditional logic before realizing I could just flatten the damn thing.

2. Zapier OpenAI prompt steps can silently drop input variables

Zapier’s integration with OpenAI started off smooth in theory—I prompted it to create bullet drafts from Airtable records. But when I added a new Airtable field and dropped it into the prompt after the fact, OpenAI stopped responding with full completions. It wasn’t an error. Just… shorter responses with no structure.

The issue came down to OpenAI’s field mapping silently skipping nulls. If a variable drops in empty, Zapier excludes it from the mapped payload, but doesn’t tell you. At all. The prompt still looks correct on screen. There’s no preview bug, no warning, nothing. But that field is just gone when the request hits OpenAI.

I confirmed this using webhook mode briefly to inspect the outgoing JSON. Fixing it required setting default text in Zapier like `[Missing]` so the field wouldn’t vanish. I filed it as feedback, but the closest match in the forums was someone thinking it was a GPT issue, not Zapier var parsing.

3. AI-assist paired with filters inside Make scenarios gets flaky fast

I had a Make scenario where Notion entries trigger an OpenAI summarizer step. As soon as I added a filter to run only on entries with new comments in the last 48 hours, the AI module started running inconsistently, sometimes on old records, sometimes not at all. Nothing in the logs looked wrong.

What actually caused it:

The filter condition was time-based, comparing a UTC datetime field to “now – 2 days.” But Make caches variables between modules. The scenario trigger pulled in old data if Notion’s webhook retried—which it does randomly when AI steps take longer than 30 seconds. So even though the time filter was valid, Make evaluated it against stale trigger data. I didn’t catch that until the fifth test, when I realized one run used a cached Notion ID from two hours ago despite the UI saying it was fresh.

I now force-refresh relevant data inside Make by adding a Notion Search Items module after the trigger, keyed by title, just to re-pull the latest block version. That nudged the AI step back into consistency. Not documented anywhere—found that solution on the 4th page of a Make community forum thread.

4. Resurfacing prompt variables inside Airtable buttons needs careful escaping

I set up a system inside Airtable where one field was a long prompt string with embedded GPT tags (like [Summary], [Benefit], [Tone]) and a button that launched a Webhook to an OpenAI endpoint with that full prompt. The button would grab the prompt text and URL-encode it… except it didn’t.

First attempt: Button formula was this:

"https://hooks.zapier.com/hooks/catch/xyzabc/?prompt=" & ENCODE_URL_COMPONENT({Prompt Text})

Looked right in the formula preview, but tags like [Summary] broke the request. Because Airtable only escapes outer-level characters—not square brackets—which Zapier parses as malformed query parameters.

I ended up Base64-encoding the prompt first in the button formula, then decoding it inside the Zap. Yes, this adds steps. But it stops Zapier from guessing where the query param ends. Also reduces the chance of nested quote errors when prompts start including markdown bullets or JSON.

This also uncovered an undocumented edge case: if the encoded field exceeds ~1000 chars, Airtable truncates it at random points before passing to the webhook URL. I measured it with a dummy prompt hitting a Cloudflare log just to confirm. That’s the approximate breakpoint. So yeah, use POST instead of GET if you cross that line.

5. Unexpected prompt leakage when chaining ChatGPT Memory with custom instructions

Everyone’s doing that trick where you set a system message like “You are a helpful copywriter” in the ChatGPT memory tab, and then layer prompt instructions in each chat. It works great—until you open a new thread and try to overwrite behavior. One time I typed, “Ignore previous structure. Write 5 sarcastic product taglines.” What I got was the same bland business tone as the last seven completions.

This wasn’t ChatGPT being boring—it was ignoring the new instruction. The system prompt in Memory overrode it silently. And because system messages aren’t exposed in the chat history UI, it just looked broken. Eventually I figured out: the memory applies BEFORE the prompt—you can’t fully override it inline.

A workaround I found in a support thread quote was this:

Use meta-directive language to explicitly shut down prior behaviors. For ex: “Disregard any stored memories. Override all prior tones.” Even that is not guaranteed unless you disable memory first.

Now I toggle memory off when testing especially weird prompt behaviors. Helps avoid caching effects where ChatGPT gets… familiar with your voice, even when you don’t want it to be.

6. Prompt token limits silently mess with GPT responses in scripts

Spent a late night building a CLI tool that takes user input from a markdown task list and feeds it to GPT-4 via the OpenAI API. Worked beautifully with short tasks. But once I dumped in a .md file over 1000 words, completions started… vanishing halfway. Like cut off mid-sentence or missing headers entirely.

I expected a 4K limit on input, but I was nowhere near it. Even so, the final prompt (plus embedded instructions) hit around 3900 tokens. GPT4 counts everything—even line breaks and whitespace—as contributing tokens. And JSON wrappers from template functions sometimes double the count.

This came up in a dev thread where someone shared this snippet:

"prompt": "\nTITLE: Tasks\nCONTENT: " + md_text + "\nINSTRUCTIONS:\nSummarize as daily plan."

The newline characters seemed innocent. But in token terms, that added extra weight. The aha moment: replacing multi-line system prompt blocks with templated single-line JSON aided compression. Also switched from GPT-4 to GPT-3.5 for long docs—it handled overflow truncation better. Not better quality, just… fails more gracefully.

Still trying to patch in a token estimator before each prompt send, but haven’t gotten it accurate enough to avoid ~20 percent variance in practice.

7. Reused prompt templates introduce silent bias over time when function calling

This happened in a project using OpenAI’s function calling to extract structured data from user support messages. Initially had a fixed prompt: “Extract name, plan tier, and platform if available.” It worked on ~80 test samples. Then accuracy started dipping across specific fields—especially when messages weren’t in English.

What changed? The assistant “learned” that most people used English and started defaulting parsed fields to assumed meanings. Ex: translating “Planeta” from Portuguese to “basic”—which was wrong. But this was a system function, not chat memory. The hidden state was buried inside the fine-tuned temperature behavior + prompt order over time.

Resolved it by testing reordered metadata placement. Putting “Do not assume meaning without explicit match” before the field list fixed most of the bias. It’s subtle. But large language models literally care about the order of intent. Putting qualifying instructions at the end of a prompt block? Might as well be invisible.

Eventually built a dynamic prompt generator that randomizes example order every few calls. That injected just enough entropy to stop the bias pattern from cementing. Not stable, but better than a hardcoded block that slowly drifts over time.