Blog Post Generator with Vector Search

An AI-powered blog post generator that uses LanceDB for semantic similarity search to find the most relevant existing posts as context for generating new, high-quality blog content.

Features

  • Semantic Similarity Search: Uses LanceDB and sentence transformers to find the 10 most semantically similar blog posts
  • Style Analysis: Analyzes existing posts for tone, structure, and writing patterns
  • Iterative Improvement: Grades content and iteratively improves until A- grade or better
  • Smart Context: Combines category-based and semantic search for comprehensive context
  • Auto-generated Slugs: Creates SEO-friendly URL slugs
  • Dual AI Support: Works with both OpenAI and Ollama

Installation

# Using uv (recommended)
uv sync

# Or using pip
pip install -r requirements.txt

Usage

Basic Usage

python blog_post_generator.py \
  --source_file content.txt \
  --prompt "Your blog post idea" \
  --output_file my-post.md

With Vector Search (Default)

python blog_post_generator.py \
  --source_file transcript.txt \
  --prompt "AI trends in venture capital" \
  --categories ai startups funding

Disable Vector Search (Category-only)

python blog_post_generator.py \
  --source_file content.txt \
  --prompt "Blog post idea" \
  --no_vector_search

Using Ollama (Local AI)

python blog_post_generator.py \
  --source_file content.txt \
  --prompt "Blog post idea" \
  --use_ollama \
  --ollama_model gemma2

How It Works

  1. Vector Database: On first run, creates embeddings for all blog posts using sentence-transformers
  2. Semantic Search: Finds 10 most semantically similar posts to your prompt
  3. Category Matching: Also finds posts matching specified categories
  4. Style Analysis: Analyzes writing patterns from both sets of posts
  5. Content Generation: Creates blog post using combined context
  6. Iterative Improvement: Grades and refines content until high quality

Vector Search Output

When running with vector search enabled, you’ll see:

šŸ”„ Building vector database from blog posts...
šŸ“Š Creating embeddings for 1683 documents...
āœ… Vector database created with 1683 documents
šŸ” Analyzing existing posts for style...
Found 5 category-relevant posts for style analysis
🧠 Finding semantically similar posts...
Found 10 semantically similar posts
  1. ai-investment-2024.md
  2. vc-market-ai-2024.md
  3. fundraising-compendium-guide.md
  4. the-venture-fund-of-the-future.md
  5. startup-ecosystem-trends.md
  ...and 5 more
šŸ“Š Using 15 total posts for style analysis

Arguments

  • --source_file: Path to source content file (required)
  • --prompt: Blog post idea/prompt (required)
  • --output_file: Output markdown file (auto-generated if not provided)
  • --categories: Categories for style analysis (default: crypto, web3, data analysis)
  • --content_dir: Directory containing blog posts (default: ./content/post)
  • --no_vector_search: Disable semantic search, use only categories
  • --use_ollama: Use Ollama instead of OpenAI
  • --ollama_model: Ollama model name (default: gemma2)

Dependencies

  • OpenAI API key (set OPENAI_API_KEY environment variable)
  • LanceDB for vector storage
  • sentence-transformers for embeddings
  • Standard Python packages: pandas, numpy, requests

Output Quality

The system aims for A- grade (90+ score) blog posts with:

  • Compelling hooks for broad audiences
  • Clear argument development
  • Specific data and examples
  • Professional but engaging tone
  • Strong conclusions that tie back to opening

Example output grades:

šŸ“ Grading attempt 1...
Grade: A (93/100)
āœ… Target grade achieved: A
Final Grade: A- (91/100)