From Tedious Text to Polished HTML: Automating Your Blog with a Local LLM

From Tedious Text to Polished HTML: Automating Your Blog with a Local LLM

I publish my blogs on Medium first, then mirror them on my website so non-Medium members can read them. My site is a static, plain-HTML setup. The problem? Medium doesn’t give you a quick way to export a single article as HTML. My old workflow was clumsy — copy the article, lose all the formatting, then manually rebuild the HTML and fix it up.

This post is about how I killed that manual work by automating the process with local Ollama and small, efficient models like gemma3n and phi-3. If you’ve ever poured hours into writing a great article, only to waste more time wrangling formatting, you’ll know exactly why I built this.

The Problem: The Manual Formatting Grind

Let’s break down the manual work involved in taking a draft to a final HTML page:



The Old Solution (And Its Failings)

A programmer’s first instinct is often to reach for regular expressions (regex). We could try to write a script that:

But this approach is brittle and quickly becomes a nightmare:



But What About WYSIWYG Editors?

You might be thinking, “Why not just use a WYSIWYG (What You See Is What You Get) editor like the one in WordPress or other CMS platforms?” While these editors are great for simple posts, they fall short when it comes to specialized, high-volume, or complex content for a few key reasons:



The New Solution: A Local LLM as Your Formatting Engine

Instead of writing complex logic to parse the text, what if we could just describe the final output we want? This is where LLMs shine. By giving the model a set of plain English instructions, we can delegate all the complex, context-aware formatting tasks.

payload = {
    "model": "gemma3n:e4b",
    "prompt": prompt,
    "stream": True,
    "options": {
        "temperature": 0.2,
        "num_ctx": 16384
    }
}

Here’s a peek at the Python payload that puts it all together:

prompt = """
You are an expert HTML formatter. Convert the following article text into clean, semantic HTML. Follow these rules precisely.

**Formatting Rules:**
- The first line of the text is the main title, wrap it in an <h1> tag.
- Identify all subsequent headings and use appropriate heading tags (<h2>, <h3>, etc.).
- Wrap all paragraphs in <p> tags.
- Identify blocks of code. Wrap them in <pre><code> tags. It is crucial that you preserve all original indentation and line breaks within these code blocks. If you can identify the language, add a class like `class="language-python"` to the <code> tag.
- Find all URLs (like http://example.com) and convert them into anchor tags (<a href="URL" target="_blank">URL</a>).

**Entity Linking Rule:**

- Identify names of companies, tools, frameworks, and organizations.
- For each *unique* entity, hyperlink its *first occurrence* to its legitimate official website.
- Do NOT add links inside <h1>, <h2>, <h3>, <pre>, or <code> tags.
- **Here are some examples of how to apply this rule:**
  - If you see “Apache Parquet”, the first time it appears it should become <a href="https://parquet.apache.org/" target="_blank">Apache Parquet</a>.
  - If you see “Apache Spark”, the first time it appears it should become <a href="https://spark.apache.org/" target="_blank">Apache Spark</a>.
  - If you see “Pandas”, the first time it appears it should become <a href="https://pandas.pydata.org/" target="_blank">Pandas</a>.
  - If you see “AWS Athena”, the first time it appears it should become <a href="https://aws.amazon.com/athena/" target="_blank">AWS Athena</a>.

- Use your knowledge to find the official websites for other tools and frameworks mentioned, even if they are not in these examples.

The End Result: Time Saved, Quality Gained

What was once a 30-minute manual task of copying, pasting, and formatting is now a 1-minute automated script. We execute python process_draft.py, and out comes a beautiful, fully-formed draft.html file with:

By offloading the complex, nuanced work to a locally-run LLM, we’ve created a tool that is not only more powerful and reliable than a regex-based script but also infinitely easier to update. If we want to change a rule, we just edit a line of English in our prompt. This is a game-changer for content creation workflows.

--- Note: I've removed the metadata (author name, read time, etc.) as per your instructions and only included the article body HTML.




Hungry for more hands‑on guides on coding, security, and open‑source? Join our newsletter community—new insights delivered every week. Sign up below 👇