Creating a Creative GPT

Since the capability to build your own GPT was introduced by OpenAI, I have created a few private GPTs just to play around and test the limits of what was possible. One saved me from accepting a doomed consulting project. My favorite — that I continue to use — is one that takes the supposed ‘bug’ of generative AI — ‘hallucination’ — and turns it into a feature. It’s my brainstorming-when-I’m stuck partner, Creative Gap Minder:

https://chatgpt.com/g/g-rw8PngZ4v-creative-gap-minder

As a longtime product professional, I am generally practiced and confident in coming up with a reasonable, high-internal-integrity plan and approach for a new product strategy or project plan. But how can one ever know what one might still be missing?

Many diverse eyes on your plan is really the only way outside of actual experimentation — so let’s add a robot eye to our diverse perspectives. This GPT both suggests frameworks to smoke that out both for yourself or with your team, as well as ideas you may not have considered. Play with it and LMK what you think!

Whether you should have considered whatever ideas this GPT comes up with... your mileage may vary! It certainly fails embarrassingly for me at times — and I wouldn’t have it any other way. Since hallucination is really not deterministically different than non-hallucination, it is something you need to actually want as part of the product, not just to guardrail around and pray. Otherwise an LLM is almost certainly the wrong tool for the job (even if AI more generally is likely right). Here, my own limits as a RAG expert notwithstanding, it’s a great tool for the case, in principle.

Some qualitative observations through iteration:

  • Settled on and strongly recommend a framework (I unfortunately cannot attribute) for completeness of your custom GPT’s prompt. Include statements for all of the following: Role, Task, Context, Format, Tone, Examples. Do them under these titles in #markdown.

  • The GPT’s occasional disobeying of what I have provided in the RAG context I provided seems unlikely related to the prompt’s instructions to be provocative, or due to a high ‘temperature’ setting (especially as OpenAI has guardrailed temperature risks themselves in a number of ways since 3.5).

  • GPT 4o improved the result markedly off the same bank of QA prompts — it both adhered to my prompt instructions and seems to do a better job chunking and understanding the PDFs I have given it as context and reference material. Misalignment fails are non-zero but rare however. This seems like a good tradeoff for now, but the trendline suggests other foundation models that offer more configurability and control will be better long-term for the creative demands I have even with a safe use case like this.

So Llama probably ultimately but Perplexity next — it’s time to baseline what I could do there (to Mind the Gap?) with what they have out of the box…

Previous
Previous

Put me in, coach - or Put the coach in?: Operating vs. Advising

Next
Next

Voice of the Customer in our Product Development Process