Blog Post Generator with Vector Search

An AI-powered blog post generator that uses LanceDB for semantic similarity search to find the most relevant existing posts as context for generating new, high-quality blog content.

Features

Semantic Similarity Search: Uses LanceDB and sentence transformers to find the 10 most semantically similar blog posts
Style Analysis: Analyzes existing posts for tone, structure, and writing patterns
Iterative Improvement: Grades content and iteratively improves until A- grade or better
Smart Context: Combines category-based and semantic search for comprehensive context
Auto-generated Slugs: Creates SEO-friendly URL slugs
Dual AI Support: Works with both OpenAI and Ollama

Installation

# Using uv (recommended)
uv sync

# Or using pip
pip install -r requirements.txt

Usage

Basic Usage

python blog_post_generator.py \
  --source_file content.txt \
  --prompt "Your blog post idea" \
  --output_file my-post.md

With Vector Search (Default)

python blog_post_generator.py \
  --source_file transcript.txt \
  --prompt "AI trends in venture capital" \
  --categories ai startups funding

Disable Vector Search (Category-only)

python blog_post_generator.py \
  --source_file content.txt \
  --prompt "Blog post idea" \
  --no_vector_search

Using Ollama (Local AI)

python blog_post_generator.py \
  --source_file content.txt \
  --prompt "Blog post idea" \
  --use_ollama \
  --ollama_model gemma2

How It Works

Vector Database: On first run, creates embeddings for all blog posts using sentence-transformers
Semantic Search: Finds 10 most semantically similar posts to your prompt
Category Matching: Also finds posts matching specified categories
Style Analysis: Analyzes writing patterns from both sets of posts
Content Generation: Creates blog post using combined context
Iterative Improvement: Grades and refines content until high quality

Vector Search Output

When running with vector search enabled, you’ll see:

🔄 Building vector database from blog posts...
📊 Creating embeddings for 1683 documents...
✅ Vector database created with 1683 documents
🔍 Analyzing existing posts for style...
Found 5 category-relevant posts for style analysis
🧠 Finding semantically similar posts...
Found 10 semantically similar posts
  1. ai-investment-2024.md
  2. vc-market-ai-2024.md
  3. fundraising-compendium-guide.md
  4. the-venture-fund-of-the-future.md
  5. startup-ecosystem-trends.md
  ...and 5 more
📊 Using 15 total posts for style analysis

Arguments

--source_file: Path to source content file (required)
--prompt: Blog post idea/prompt (required)
--output_file: Output markdown file (auto-generated if not provided)
--categories: Categories for style analysis (default: crypto, web3, data analysis)
--content_dir: Directory containing blog posts (default: ./content/post)
--no_vector_search: Disable semantic search, use only categories
--use_ollama: Use Ollama instead of OpenAI
--ollama_model: Ollama model name (default: gemma2)

Dependencies

OpenAI API key (set OPENAI_API_KEY environment variable)
LanceDB for vector storage
sentence-transformers for embeddings
Standard Python packages: pandas, numpy, requests

Output Quality

The system aims for A- grade (90+ score) blog posts with:

Compelling hooks for broad audiences
Clear argument development
Specific data and examples
Professional but engaging tone
Strong conclusions that tie back to opening

Example output grades:

📝 Grading attempt 1...
Grade: A (93/100)
✅ Target grade achieved: A
Final Grade: A- (91/100)