In partnership with

After AI gives you an output, what do you usually do?

❝

WTF is an eval loop? An eval loop is a repeatable quality check for AI output.

The AI creates something.Then another step scores it against your standard.

If it fails, it gets fixed before it reaches people.

Imagine you are checking homework.

A student writes the answer neatly.

Nice handwriting. Clean steps.

Box around the final number.

But the answer is wrong.

The neatness did not help.

That is what AI output often feels like.

It can look correct before it is correct.

Take a customer reply.

AI writes:

I understand your frustration.
We are working to resolve this as soon as possible.

Polite. But did it mention the actual bug?

Did it say what was fixed?

Did it give the customer one clear next step?

If not, the problem is not grammar.

The problem is that nobody checked the answer.

We usually try to fix this by improving the prompt.

That helps, but only up to a point. A better question does not remove the need to check the answer.

Machina's Hermes article says the quiet part clearly:

❝

AI slop is not only a prompt problem. It is a quality-control problem.

So today's shortcut is simple.

After AI writes, make it check the work.

write
check
fix
then ship

If AI can make the draft, it can also help inspect the draft.

You still decide.

The checklist just makes the mistakes easier to see.

I am Alex, welcome to ShortCu8 by Innov8.

Lets Dive Deep 🐰

⭐Today's Shortcut

Use this flow:

AI writes
-> output gets scored
-> bad output gets blocked
-> failure becomes a new test

That is the basic eval loop.

You do not need to make it complicated on day one.

Start with one checklist.

Then use it every time before publishing, sending, or shipping AI work.

1. Stop Only Fixing the Prompt

Prompting is input-side.

It controls what you ask.

But slop is often output-side.

It happens after the AI answers.

Example:

Write a LinkedIn post about AI agents.

You get a decent post.

Then you ask again tomorrow.

This time it sounds like every other AI post.

So you try:

Make it more human.
Make it less generic.
Use my voice.
Avoid AI slop.

That may help.

But it still does not guarantee quality.

Because the model can still produce a weak version.

The real question is:

How do I know this output is good before I send it?

That is where evals come in.

2. Build a Small Quality Standard

Do not start with a giant system.

Start with 10 good examples.

For content, this could be:

your best posts
your best emails
your best scripts
your best landing page sections
anything you are proud to publish under your name

This is your gold standard.

It shows what "good" means for you.

Then write a simple rubric.

For our newsletter style, the rubric could be:

Specific: does it teach one clear thing?
Useful: can the reader apply it today?
Clear: can a beginner follow it?
Voice: does it sound like us?
Action: is there one step to try?

Now you have something better than vibes.

You have a standard.

3. Score the Output

Use a simple score from 0 to 1.

Example:

Score this draft from 0 to 1 for each criterion:

1. Specific
2. Useful
3. Clear
4. Voice
5. Action

For each score, give one short reason.

If the average is below 0.7, do not approve it.
Suggest the smallest fix needed.

The judge can still miss things.

The point is the second layer:

one more chance to catch the weak run before it leaves.

Instead of asking:

Does this feel good?

You ask:

Where did it fail?

That is a better question.

4. Turn Failures Into Tests

This is the easy part to skip.

When an AI output fails, do not just fix that one output.

Save the failure.

Example:

Failure:
The article opened with broad AI hype before saying anything useful.

New rule:
Do not open with broad AI hype.
Start with a concrete scene, mistake, or use case.

Now the next output can be checked against that rule.

Every failure should make the system harder to fool next time.

That is how the quality floor rises.

Not from one perfect prompt.

From a loop.

5. Where Hermes Fits

You can do the beginner version manually.

A saved prompt is enough.

Hermes becomes useful when you want the loop to run repeatedly.

Based on the official Hermes docs, it has a few pieces that fit this:

Memory: Hermes can keep compact notes about your preferences, projects, and things it learned.
Skills: Hermes can use reusable task instructions, like a quality-checking skill.
Cron: Hermes can run scheduled tasks, like checking outputs every day.
Goals: Hermes can keep working toward an objective across turns until the goal is satisfied.

So the Hermes version looks like this:

1. Save your gold examples.
2. Create a scoring skill.
3. Run every draft through the scoring skill.
4. Block anything below 0.7.
5. Save failures as new rules.
6. Use goals or scheduled checks when the task repeats.

That is where Hermes becomes useful.

Beginner Setup

Copy this:

You are my AI output quality checker.

Score the output from 0 to 1 on:

1. Specific
2. Useful
3. Clear
4. Voice
5. Action

My standard:
- no generic AI hooks
- no vague hype
- no fake urgency
- no long intro
- one useful shortcut per piece
- examples should be practical

If the average score is below 0.7:
- do not approve it
- show what failed
- rewrite only the weak section

Use this on your next AI draft.

The Real Shift

The upgrade is not:

better prompt

The upgrade is:

prompt + quality gate

That is how you stop AI slop before it reaches people.

The model can write.The loop checks.

You decide. That is the system we want.

Now go create something great.

The ShortList

🛠️Cool Tools of the Week:

Runway Aleph 2.0: The video AI company’s editing tool that allows for frame-by-frame editing.
Perplexity Bumblebee: The AI answer engine open sourced its read-only scanner for macOS and Linux.
Meta Forum: Meta released an AI-assisted clone to Reddit, calling it a dedicated space for “deeper discussions, real answers and communities you care about.”
Nvidia AI-Q: Nvidia released an open source tool to give specialized deep research skills to agents

📩 Innathe Shortcu8 engane undarunnu 👇️?

We read every reply - just reply to this email and let us know how we can improve !

Appo adutha Shortcu8il kanaam bie…👋

If you read till here, you might find this interesting

#AD1

Are you running your business on incomplete numbers?

Most small business owners have financials, but few have financial clarity. There's a real difference between books that are technically up to date and books that actually tell you what's going on in your business right now. When accounting is reactive — updated when there's time, reviewed at tax season — you lose visibility exactly when you need it most. You can't tell which clients are truly profitable. You can't spot a cash flow gap before it becomes a crisis. BELAY's outsourced accounting team changes that.

Download the Free Guide

#AD2

Cap table tools founders and finance teams love

As you grow, you need a cap table management platform that you can trust to keep up.

From issuing grants to 409A valuations or ASC 718 reporting, Pulley gives you the tools to manage equity. Experience a platform built for you: transparent, reliable, and designed to put your company first.

Learn why companies choose Pulley

🐔 How to Fix AI Slop (easily)

After AI gives you an output, what do you usually do?

⭐Today's Shortcut

1. Stop Only Fixing the Prompt

2. Build a Small Quality Standard

3. Score the Output

4. Turn Failures Into Tests

5. Where Hermes Fits

Beginner Setup

The Real Shift

The ShortList

🛠️Cool Tools of the Week:

📩 Innathe Shortcu8 engane undarunnu 👇️?

If you read till here, you might find this interesting

Are you running your business on incomplete numbers?

Cap table tools founders and finance teams love

Keep reading

The Shortcu8 by Innov8