Compare commits

...

10 Commits

Author SHA1 Message Date
Kevin Bataille
ba43d70a56 Reuse --author flag for update_meta command
- Use existing --author flag instead of --author-filter
- Consistent with export command
- Cleaner CLI interface

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-17 00:11:18 +01:00
Kevin Bataille
66ea25002a Add author filter to update_meta command
- Add --author-filter option to filter posts by author name
- Resolve author names to IDs via WordPress API
- Support partial matching for author names
- Works with other filters (category, limit, post-ids)
- Fix argparse conflict with existing --author flag

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-17 00:10:20 +01:00
Kevin Bataille
93ea5794f0 Add direct WordPress meta description updater
- Add update_meta command to fetch, generate, and update meta on WordPress
- Require --site parameter to specify target website
- Support filtering by post IDs (--post-ids)
- Support filtering by category names (--category) or IDs (--category-id)
- Support limit parameter to batch process posts
- Skip existing good quality meta descriptions by default
- Add --force flag to regenerate all meta descriptions
- Include dry-run mode to preview changes
- Save update results to CSV for review
- Rate limited API calls (0.5s delay between requests)

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-17 00:04:54 +01:00
Kevin Bataille
ba8e39b5d8 Add AI-powered meta description generation
- Add meta_description command to generate SEO-optimized meta descriptions
- Use AI to generate compelling, length-optimized descriptions (120-160 chars)
- Support --only-missing flag for posts without meta descriptions
- Support --only-poor flag to improve low-quality meta descriptions
- Include quality validation scoring (0-100)
- Add call-to-action detection and optimization
- Generate detailed CSV reports with validation metrics
- Add comprehensive documentation (META_DESCRIPTION_GUIDE.md)

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-16 23:54:35 +01:00
Kevin Bataille
84f8fc6db5 Add post migration and author filter features
- Add migrate command to transfer posts between websites
- Support CSV-based and filtered migration modes
- Preserve original post dates (with --ignore-original-date option)
- Auto-create categories and tags on destination site
- Add author filtering to export (--author and --author-id flags)
- Include author_name column in exported CSV
- Add comprehensive documentation (MIGRATION_GUIDE.md, AUTHOR_FILTER_GUIDE.md)

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-16 23:50:24 +01:00
Kevin Bataille
06d660f9c8 Add confidence breakdown display
- Shows High/Medium/Low count breakdown
- Helps verify all matching posts will be processed
- Example output:
  Filtered to 328 proposals (confidence >= Medium)
    Breakdown: High=293, Medium=35, Low=0

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-16 18:21:16 +01:00
Kevin Bataille
54168a1c00 Add strict confidence filtering option
### New Feature:
- --strict flag for exact confidence matching
- Default: Medium = Medium + High (or better)
- Strict: Medium = Medium only (exact match)

### Usage:
./seo category_apply -s mistergeek.net -c Medium      # Medium or better
./seo category_apply -s mistergeek.net -c Medium --strict  # Medium only

### Example Output:
# Default (or better):
Filtered to 328 proposals (confidence >= Medium)

# Strict mode:
Filtered to 156 proposals (confidence = Medium, strict mode)

### Benefits:
- More precise control over which posts to update
- Can process confidence levels separately
- Better for batch processing in stages

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-16 17:13:06 +01:00
Kevin Bataille
b265125656 Fix category_apply - Filter by site and show article titles
- Filter proposals by current_site (only applies to selected site)
- Show article title for each post
- Show current and proposed categories
- Better error logging

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-16 16:36:51 +01:00
Kevin Bataille
fa700cba98 Fix category lookup - handle French characters and existing categories better
### Fixes:
- Improved get_or_create_category() with multiple lookup strategies
- Handle French characters in category names (Jeu vidéo, Téléchargement)
- Better handling of 'term_exists' 400 error from WordPress
- Fetch existing category details when creation fails

### Lookup Order:
1. Exact name match (case-insensitive)
2. Slug match
3. Normalized slug (handles French characters)
4. Partial name match

### Benefits:
- No more errors for existing categories
- Handles accented characters properly
- Better caching of existing categories
- More robust category creation

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-16 16:33:01 +01:00
Kevin Bataille
00f0cce03e Fix method name - load_posts instead of load_csv
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-02-16 16:10:02 +01:00
12 changed files with 3716 additions and 64 deletions

226
AUTHOR_FILTER_GUIDE.md Normal file
View File

@@ -0,0 +1,226 @@
# Author Filter Guide
Export posts from specific authors using the enhanced export functionality.
## Overview
The export command now supports filtering posts by author name or author ID, making it easy to:
- Export posts from a specific author across all sites
- Combine author filtering with site filtering
- Export posts from multiple authors at once
## Usage
### Filter by Author Name
Export posts from a specific author (case-insensitive, partial match):
```bash
# Export posts by "John Doe"
./seo export --author "John Doe"
# Export posts by "admin" (partial match)
./seo export --author admin
# Export posts from multiple authors
./seo export --author "John Doe" "Jane Smith"
```
### Filter by Author ID
Export posts from specific author IDs:
```bash
# Export posts by author ID 1
./seo export --author-id 1
# Export posts from multiple author IDs
./seo export --author-id 1 2 3
```
### Combine with Site Filter
Export posts from a specific author on a specific site:
```bash
# Export John's posts from mistergeek.net only
./seo export --author "John Doe" --site mistergeek.net
# Export posts by author ID 1 from webscroll.fr
./seo export --author-id 1 -s webscroll.fr
```
### Dry Run Mode
Preview what would be exported:
```bash
./seo export --author "John Doe" --dry-run
```
## How It Works
1. **Author Name Matching**
- Case-insensitive matching
- Partial matches work (e.g., "john" matches "John Doe")
- Matches against author's display name and slug
2. **Author ID Matching**
- Exact match on WordPress user ID
- More reliable than name matching
- Useful when authors have similar names
3. **Author Information**
- The exporter fetches all authors from each site
- Author names are included in the exported CSV
- Posts are filtered before export
## Export Output
The exported CSV includes author information:
```csv
site,post_id,status,title,slug,url,author_id,author_name,date_published,...
mistergeek.net,123,publish,"VPN Guide",vpn-guide,https://...,1,John Doe,2024-01-15,...
```
### New Column: `author_name`
The export now includes the author's display name in addition to the author ID.
## Examples
### Example 1: Export All Posts by Admin
```bash
./seo export --author admin
```
Output: `output/all_posts_YYYY-MM-DD.csv`
### Example 2: Export Specific Author from Specific Site
```bash
./seo export --author "Marie" --site webscroll.fr
```
### Example 3: Export Multiple Authors
```bash
./seo export --author "John" "Marie" "Admin"
```
### Example 4: Export by Author ID
```bash
./seo export --author-id 5
```
### Example 5: Combine Author and Site Filters
```bash
./seo export --author "John" --site mistergeek.net --verbose
```
## Finding Author IDs
If you don't know the author ID, you can:
1. **Export all posts and check the CSV:**
```bash
./seo export
# Then open the CSV and check the author_id column
```
2. **Use WordPress Admin:**
- Go to Users → All Users
- Hover over a user name
- The URL shows the user ID (e.g., `user_id=5`)
3. **Use WordPress REST API directly:**
```bash
curl -u username:password https://yoursite.com/wp-json/wp/v2/users
```
## Tips
1. **Use quotes for names with spaces:**
```bash
./seo export --author "John Doe" # ✓ Correct
./seo export --author John Doe # ✗ Wrong (treated as 2 authors)
```
2. **Partial matching is your friend:**
```bash
./seo export --author "john" # Matches "John Doe", "Johnny", etc.
```
3. **Combine with migration:**
```bash
# Export author's posts, then migrate to another site
./seo export --author "John Doe" --site webscroll.fr
./seo migrate output/all_posts_*.csv --destination mistergeek.net
```
4. **Verbose mode for debugging:**
```bash
./seo export --author "John" --verbose
```
## Troubleshooting
### No posts exported
**Possible causes:**
- Author name doesn't match (try different spelling)
- Author has no posts
- Author doesn't exist on that site
**Solutions:**
- Use `--verbose` to see what's happening
- Try author ID instead of name
- Check if author exists on the site
### Author names not showing in CSV
**Possible causes:**
- WordPress REST API doesn't allow user enumeration
- Authentication issue
**Solutions:**
- Check WordPress user permissions
- Verify credentials in config
- Author ID will still be present even if name lookup fails
## API Usage
Use author filtering programmatically:
```python
from seo.app import SEOApp
app = SEOApp()
# Export by author name
csv_file = app.export(author_filter=["John Doe"])
# Export by author ID
csv_file = app.export(author_ids=[1, 2])
# Export by author and site
csv_file = app.export(
author_filter=["John"],
site_filter="mistergeek.net"
)
```
## Related Commands
- `seo migrate` - Migrate exported posts to another site
- `seo analyze` - Analyze exported posts with AI
- `seo export --help` - Show all export options
## See Also
- [MIGRATION_GUIDE.md](MIGRATION_GUIDE.md) - Post migration guide
- [README.md](README.md) - Main documentation

327
META_DESCRIPTION_GUIDE.md Normal file
View File

@@ -0,0 +1,327 @@
# Meta Description Generation Guide
AI-powered meta description generation and optimization for WordPress posts.
## Overview
The meta description generator uses AI to create SEO-optimized meta descriptions for your blog posts. It can:
- **Generate new meta descriptions** for posts without them
- **Improve existing meta descriptions** that are poor quality
- **Optimize length** (120-160 characters - ideal for SEO)
- **Include focus keywords** naturally
- **Add call-to-action** elements when appropriate
## Usage
### Generate for All Posts
```bash
# Generate meta descriptions for all posts
./seo meta_description
# Use a specific CSV file
./seo meta_description output/all_posts_2026-02-16.csv
```
### Generate Only for Missing Meta Descriptions
```bash
# Only generate for posts without meta descriptions
./seo meta_description --only-missing
```
### Improve Poor Quality Meta Descriptions
```bash
# Only regenerate meta descriptions with poor quality scores
./seo meta_description --only-poor
# Limit to first 10 poor quality meta descriptions
./seo meta_description --only-poor --limit 10
```
### Dry Run Mode
Preview what would be processed:
```bash
./seo meta_description --dry-run
./seo meta_description --dry-run --only-missing
```
## Command Options
| Option | Description |
|--------|-------------|
| `--only-missing` | Only generate for posts without meta descriptions |
| `--only-poor` | Only generate for posts with poor quality meta descriptions |
| `--limit <N>` | Limit number of posts to process |
| `--output`, `-o` | Custom output file path |
| `--dry-run` | Preview without generating |
| `--verbose`, `-v` | Enable verbose logging |
## How It Works
### 1. Content Analysis
The AI analyzes:
- Post title
- Content preview (first 500 characters)
- Excerpt (if available)
- Focus keyword (if specified)
- Current meta description (if exists)
### 2. AI Generation
The AI generates meta descriptions following SEO best practices:
- **Length**: 120-160 characters (optimal for search engines)
- **Keywords**: Naturally includes focus keyword
- **Compelling**: Action-oriented and engaging
- **Accurate**: Clearly describes post content
- **Active voice**: Uses active rather than passive voice
- **Call-to-action**: Includes CTA when appropriate
### 3. Quality Validation
Each generated meta description is scored on:
- **Length optimization** (120-160 chars = 100 points)
- **Proper ending** (period = +5 points)
- **Call-to-action words** (+5 points)
- **Overall quality** (minimum 70 points to pass)
### 4. Output
Results are saved to CSV with:
- Original meta description
- Generated meta description
- Length of generated meta
- Validation score (0-100)
- Whether length is optimal
- Whether it's an improvement
## Output Format
The tool generates a CSV file in `output/`:
```
output/meta_descriptions_20260216_143022.csv
```
### CSV Columns
| Column | Description |
|--------|-------------|
| `post_id` | WordPress post ID |
| `site` | Site name |
| `title` | Post title |
| `current_meta_description` | Existing meta (if any) |
| `generated_meta_description` | AI-generated meta |
| `generated_length` | Character count |
| `validation_score` | Quality score (0-100) |
| `is_optimal_length` | True if 120-160 chars |
| `improvement` | True if better than current |
| `status` | Generation status |
## Examples
### Example 1: Generate All Missing Meta Descriptions
```bash
# Export posts first
./seo export
# Generate meta descriptions for posts without them
./seo meta_description --only-missing
```
**Output:**
```
Generating AI-optimized meta descriptions...
Filter: Only posts without meta descriptions
Processing post 1/45
✓ Generated meta description (score: 95, length: 155)
...
✅ Meta description generation completed!
Results: output/meta_descriptions_20260216_143022.csv
📊 Summary:
Total processed: 45
Improved: 42 (93.3%)
Optimal length: 40 (88.9%)
Average score: 92.5
API calls: 45
```
### Example 2: Fix Poor Quality Meta Descriptions
```bash
# Only improve meta descriptions scoring below 70
./seo meta_description --only-poor --limit 20
```
### Example 3: Test with Small Batch
```bash
# Test with first 5 posts
./seo meta_description --limit 5
```
### Example 4: Custom Output File
```bash
./seo meta_description --output output/custom_meta_gen.csv
```
## Meta Description Quality Scoring
### Scoring Criteria
| Criteria | Points |
|----------|--------|
| Optimal length (120-160 chars) | 100 |
| Too short (< 120 chars) | 50 - (deficit) |
| Too long (> 160 chars) | 50 - (excess) |
| Ends with period | +5 |
| Contains CTA words | +5 |
### Quality Thresholds
- **Excellent (90-100)**: Ready to use
- **Good (70-89)**: Minor improvements possible
- **Poor (< 70)**: Needs regeneration
### CTA Words Detected
The system looks for action words like:
- learn, discover, find, explore
- read, get, see, try, start
- and more...
## Best Practices
### Before Generation
1. **Export fresh data** - Ensure you have latest posts
```bash
./seo export
```
2. **Review focus keywords** - Posts with focus keywords get better results
3. **Test with small batch** - Try with `--limit 5` first
### During Generation
1. **Monitor scores** - Watch validation scores in real-time
2. **Check API usage** - Track number of API calls
3. **Use filters** - Target only what needs improvement
### After Generation
1. **Review results** - Open the CSV and check generated metas
2. **Manual approval** - Don't auto-publish; review first
3. **A/B test** - Compare performance of new vs old metas
## Integration with WordPress
### Manual Update
1. Open the generated CSV: `output/meta_descriptions_*.csv`
2. Copy generated meta descriptions
3. Update in WordPress SEO plugin (RankMath, Yoast, etc.)
### Automated Update (Future)
Future versions may support direct WordPress updates:
```bash
# Not yet implemented
./seo meta_description --apply-to-wordpress
```
## API Usage & Cost
### API Calls
- Each post requires 1 API call
- Rate limited to 2 calls/second (0.5s delay)
- Uses Claude AI via OpenRouter
### Estimated Cost
Approximate cost per 1000 meta descriptions:
- **~$0.50 - $2.00** depending on content length
- Check OpenRouter pricing for current rates
### Monitoring
The summary shows:
- Total API calls made
- Cost tracking (if enabled)
## Troubleshooting
### No Posts to Process
**Problem:** "No posts to process"
**Solutions:**
1. Export posts first: `./seo export`
2. Check CSV has required columns
3. Verify filter isn't too restrictive
### Low Quality Scores
**Problem:** Generated metas scoring below 70
**Solutions:**
1. Add focus keywords to posts
2. Provide better content previews
3. Try regenerating with different parameters
### API Errors
**Problem:** "API call failed"
**Solutions:**
1. Check internet connection
2. Verify API key in `.env`
3. Check OpenRouter account status
4. Reduce batch size with `--limit`
### Rate Limiting
**Problem:** Too many API calls
**Solutions:**
1. Use `--limit` to batch process
2. Wait between batches
3. Upgrade API plan if needed
## Comparison with Other Tools
| Feature | This Tool | Other SEO Tools |
|---------|-----------|-----------------|
| AI-powered | ✅ Yes | ⚠️ Sometimes |
| Batch processing | ✅ Yes | ✅ Yes |
| Quality scoring | ✅ Yes | ❌ No |
| Custom prompts | ✅ Yes | ❌ No |
| WordPress integration | ⚠️ Manual | ✅ Some |
| Cost | Pay-per-use | Monthly subscription |
## Related Commands
- `seo export` - Export posts for analysis
- `seo analyze` - AI analysis with recommendations
- `seo seo_check` - SEO quality checking
## See Also
- [README.md](README.md) - Main documentation
- [ENHANCED_ANALYSIS_GUIDE.md](ENHANCED_ANALYSIS_GUIDE.md) - AI analysis guide
- [EDITORIAL_STRATEGY_GUIDE.md](EDITORIAL_STRATEGY_GUIDE.md) - Content strategy
---
**Made with ❤️ for better SEO automation**

269
MIGRATION_GUIDE.md Normal file
View File

@@ -0,0 +1,269 @@
# Post Migration Guide
This guide explains how to migrate posts between WordPress sites using the SEO automation tool.
## Overview
The migration feature allows you to move posts from one WordPress site to another while preserving:
- Post content (title, body, excerpt)
- Categories (automatically created if they don't exist)
- Tags (automatically created if they don't exist)
- SEO metadata (RankMath, Yoast SEO)
- Post slug
## Migration Modes
There are two ways to migrate posts:
### 1. CSV-Based Migration
Migrate specific posts listed in a CSV file.
**Requirements:**
- CSV file with at least two columns: `site` and `post_id`
**Usage:**
```bash
# Basic migration (posts deleted from source after migration)
./seo migrate posts_to_migrate.csv --destination mistergeek.net
# Keep posts on source site
./seo migrate posts_to_migrate.csv --destination mistergeek.net --keep-source
# Publish immediately instead of draft
./seo migrate posts_to_migrate.csv --destination mistergeek.net --post-status publish
# Custom output file for migration report
./seo migrate posts_to_migrate.csv --destination mistergeek.net --output custom_report.csv
```
### 2. Filtered Migration
Migrate posts based on filters (category, date, etc.).
**Usage:**
```bash
# Migrate all posts from source to destination
./seo migrate --source webscroll.fr --destination mistergeek.net
# Migrate posts from specific categories
./seo migrate --source webscroll.fr --destination mistergeek.net --category-filter VPN "Torrent Clients"
# Migrate posts with specific tags
./seo migrate --source webscroll.fr --destination mistergeek.net --tag-filter "guide" "tutorial"
# Migrate posts by date range
./seo migrate --source webscroll.fr --destination mistergeek.net --date-after 2024-01-01 --date-before 2024-12-31
# Limit number of posts
./seo migrate --source webscroll.fr --destination mistergeek.net --limit 10
# Combine filters
./seo migrate --source webscroll.fr --destination mistergeek.net \
--category-filter VPN \
--date-after 2024-01-01 \
--limit 5 \
--keep-source
```
## Command Options
### Required Options
- `--destination`, `--to`: Destination site (mistergeek.net, webscroll.fr, hellogeek.net)
- `--source`, `--from`: Source site (for filtered migration only)
- CSV file: Path to CSV with posts (for CSV-based migration)
### Optional Options
| Option | Description | Default |
|--------|-------------|---------|
| `--keep-source` | Keep posts on source site after migration | Delete after migration |
| `--post-status` | Status for migrated posts (draft, publish, pending) | draft |
| `--no-categories` | Don't create categories automatically | Create categories |
| `--no-tags` | Don't create tags automatically | Create tags |
| `--category-filter` | Filter by category names (filtered migration) | All categories |
| `--tag-filter` | Filter by tag names (filtered migration) | All tags |
| `--date-after` | Migrate posts after this date (YYYY-MM-DD) | No limit |
| `--date-before` | Migrate posts before this date (YYYY-MM-DD) | No limit |
| `--limit` | Maximum number of posts to migrate | No limit |
| `--output`, `-o` | Custom output file for migration report | Auto-generated |
| `--dry-run` | Preview what would be done without doing it | Execute |
| `--verbose`, `-v` | Enable verbose logging | Normal logging |
## Migration Process
### What Gets Migrated
1. **Post Content**
- Title
- Body content (HTML preserved)
- Excerpt
- Slug
2. **Categories**
- Mapped from source to destination
- Created automatically if they don't exist on destination
- Hierarchical structure preserved (parent-child relationships)
3. **Tags**
- Mapped from source to destination
- Created automatically if they don't exist on destination
4. **SEO Metadata**
- RankMath title and description
- Yoast SEO title and description
- Focus keywords
### What Doesn't Get Migrated
- Featured images (must be re-uploaded manually)
- Post author (uses destination site's default)
- Comments (not transferred)
- Custom fields (except SEO metadata)
- Post revisions
## Migration Report
After migration, a CSV report is generated in `output/` with the following information:
```csv
source_site,source_post_id,destination_site,destination_post_id,title,status,categories_migrated,tags_migrated,deleted_from_source
webscroll.fr,123,mistergeek.net,456,"VPN Guide",draft,3,5,True
```
## Examples
### Example 1: Migrate Specific Posts from CSV
1. Create a CSV file with posts to migrate:
```csv
site,post_id,title
webscroll.fr,123,VPN Guide
webscroll.fr,456,Torrent Tutorial
```
2. Run migration:
```bash
./seo migrate my_posts.csv --destination mistergeek.net
```
### Example 2: Migrate All VPN Content
```bash
./seo migrate --source webscroll.fr --destination mistergeek.net \
--category-filter VPN "VPN Reviews" \
--post-status draft \
--keep-source
```
### Example 3: Migrate Recent Content
```bash
./seo migrate --source webscroll.fr --destination mistergeek.net \
--date-after 2024-06-01 \
--limit 20
```
### Example 4: Preview Migration
```bash
./seo migrate --source webscroll.fr --destination mistergeek.net \
--category-filter VPN \
--dry-run
```
## Best Practices
### Before Migration
1. **Backup both sites** - Always backup before bulk operations
2. **Test with a few posts** - Migrate 1-2 posts first to verify
3. **Check category structure** - Review destination site's categories
4. **Plan URL redirects** - If deleting from source, set up redirects
### During Migration
1. **Use dry-run first** - Preview what will be migrated
2. **Start with drafts** - Review before publishing
3. **Monitor logs** - Watch for errors or warnings
4. **Limit batch size** - Migrate in batches of 10-20 posts
### After Migration
1. **Review migrated posts** - Check formatting and categories
2. **Add featured images** - Manually upload if needed
3. **Set up redirects** - From old URLs to new URLs
4. **Update internal links** - Fix cross-site links
5. **Monitor SEO** - Track rankings after migration
## Troubleshooting
### Common Issues
**1. "Site not found" error**
- Check site name is correct (mistergeek.net, webscroll.fr, hellogeek.net)
- Verify credentials in config.yaml or .env
**2. "Category already exists" warning**
- This is normal - the migrator found a matching category
- The existing category will be used
**3. "Failed to create post" error**
- Check WordPress REST API is enabled
- Verify user has post creation permissions
- Check authentication credentials
**4. Posts missing featured images**
- Featured images are not migrated automatically
- Upload images manually to destination site
- Update featured image on migrated posts
**5. Categories not matching**
- Categories are matched by name (case-insensitive)
- "VPN" and "vpn" will match
- "VPN Guide" and "VPN" will NOT match - new category created
## API Usage
You can also use the migration feature programmatically:
```python
from seo.app import SEOApp
app = SEOApp()
# CSV-based migration
app.migrate(
csv_file='output/posts_to_migrate.csv',
destination_site='mistergeek.net',
create_categories=True,
create_tags=True,
delete_after=False,
status='draft'
)
# Filtered migration
app.migrate_by_filter(
source_site='webscroll.fr',
destination_site='mistergeek.net',
category_filter=['VPN', 'Software'],
date_after='2024-01-01',
limit=10,
create_categories=True,
delete_after=False,
status='draft'
)
```
## Related Commands
- `seo export` - Export posts from all sites
- `seo editorial_strategy` - Analyze and get migration recommendations
- `seo category_propose` - Get AI category recommendations
## See Also
- [README.md](README.md) - Main documentation
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
- [CATEGORY_MANAGEMENT_GUIDE.md](CATEGORY_MANAGEMENT_GUIDE.md) - Category management

34
check_confidence.py Normal file
View File

@@ -0,0 +1,34 @@
#!/usr/bin/env python3
import csv
from collections import Counter
import glob
files = sorted(glob.glob('output/category_proposals_*.csv'))
if files:
with open(files[-1], 'r') as f:
reader = csv.DictReader(f)
proposals = list(reader)
print("=== All Proposals ===")
print(f"Total: {len(proposals)}\n")
print("By Site:")
sites = Counter(p['current_site'] for p in proposals)
for site, count in sorted(sites.items()):
print(f" {site}: {count}")
print("\nBy Confidence (all sites):")
confs = Counter(p['category_confidence'] for p in proposals)
for conf, count in sorted(confs.items()):
print(f" {conf}: {count}")
print("\nBy Site and Confidence:")
for site in ['mistergeek.net', 'webscroll.fr', 'hellogeek.net']:
site_props = [p for p in proposals if p['current_site'] == site]
confs = Counter(p['category_confidence'] for p in site_props)
print(f"\n {site} ({len(site_props)} total):")
for conf, count in sorted(confs.items()):
print(f" {conf}: {count}")
medium_or_better = [p for p in site_props if p['category_confidence'] in ['High', 'Medium']]
print(f" → Would process with -c Medium (default): {len(medium_or_better)}")

View File

@@ -5,13 +5,16 @@ SEO Application Core - Integrated SEO automation functionality
import logging
from pathlib import Path
from datetime import datetime
from typing import Optional, List, Tuple
from typing import Optional, List, Tuple, Dict
from .exporter import PostExporter
from .analyzer import EnhancedPostAnalyzer
from .category_proposer import CategoryProposer
from .category_manager import WordPressCategoryManager, CategoryAssignmentProcessor
from .editorial_strategy import EditorialStrategyAnalyzer
from .post_migrator import WordPressPostMigrator
from .meta_description_generator import MetaDescriptionGenerator
from .meta_description_updater import MetaDescriptionUpdater
logger = logging.getLogger(__name__)
@@ -34,11 +37,23 @@ class SEOApp:
else:
logging.basicConfig(level=logging.INFO)
def export(self) -> str:
"""Export all posts from WordPress sites."""
def export(self, author_filter: Optional[List[str]] = None,
author_ids: Optional[List[int]] = None,
site_filter: Optional[str] = None) -> str:
"""
Export all posts from WordPress sites.
Args:
author_filter: List of author names to filter by
author_ids: List of author IDs to filter by
site_filter: Export from specific site only
Returns:
Path to exported CSV file
"""
logger.info("📦 Exporting all posts from WordPress sites...")
exporter = PostExporter()
return exporter.run()
exporter = PostExporter(author_filter=author_filter, author_ids=author_ids)
return exporter.run(site_filter=site_filter)
def analyze(self, csv_file: Optional[str] = None, fields: Optional[List[str]] = None,
update: bool = False, output: Optional[str] = None) -> str:
@@ -92,7 +107,8 @@ class SEOApp:
return proposer.run(output_file=output)
def category_apply(self, proposals_csv: str, site_name: str,
confidence: str = 'Medium', dry_run: bool = False) -> dict:
confidence: str = 'Medium', strict: bool = False,
dry_run: bool = False) -> dict:
"""
Apply AI category proposals to WordPress.
@@ -100,6 +116,7 @@ class SEOApp:
proposals_csv: Path to proposals CSV
site_name: Site to apply changes to (mistergeek.net, webscroll.fr, hellogeek.net)
confidence: Minimum confidence level (High, Medium, Low)
strict: If True, only match exact confidence (not "or better")
dry_run: If True, preview changes without applying
Returns:
@@ -112,6 +129,7 @@ class SEOApp:
proposals_csv=proposals_csv,
site_name=site_name,
confidence_threshold=confidence,
strict=strict,
dry_run=dry_run
)
@@ -161,6 +179,93 @@ class SEOApp:
analyzer = EditorialStrategyAnalyzer()
return analyzer.run(csv_file)
def migrate(self, csv_file: str, destination_site: str,
create_categories: bool = True, create_tags: bool = True,
delete_after: bool = False, status: str = 'draft',
output_file: Optional[str] = None,
ignore_original_date: bool = False) -> str:
"""
Migrate posts from CSV file to destination site.
Args:
csv_file: Path to CSV file with posts to migrate (must have 'site' and 'post_id' columns)
destination_site: Destination site name (mistergeek.net, webscroll.fr, hellogeek.net)
create_categories: If True, create categories if they don't exist
create_tags: If True, create tags if they don't exist
delete_after: If True, delete posts from source after migration
status: Status for new posts ('draft', 'publish', 'pending')
output_file: Custom output file path for migration report
ignore_original_date: If True, use current date instead of original post date
Returns:
Path to migration report CSV
"""
logger.info(f"🚀 Migrating posts to {destination_site}...")
migrator = WordPressPostMigrator()
return migrator.migrate_posts_from_csv(
csv_file=csv_file,
destination_site=destination_site,
create_categories=create_categories,
create_tags=create_tags,
delete_after=delete_after,
status=status,
output_file=output_file,
ignore_original_date=ignore_original_date
)
def migrate_by_filter(self, source_site: str, destination_site: str,
category_filter: Optional[List[str]] = None,
tag_filter: Optional[List[str]] = None,
date_after: Optional[str] = None,
date_before: Optional[str] = None,
status_filter: Optional[List[str]] = None,
create_categories: bool = True,
create_tags: bool = True,
delete_after: bool = False,
status: str = 'draft',
limit: Optional[int] = None,
ignore_original_date: bool = False) -> str:
"""
Migrate posts based on filters.
Args:
source_site: Source site name
destination_site: Destination site name
category_filter: List of category names to filter by
tag_filter: List of tag names to filter by
date_after: Only migrate posts after this date (YYYY-MM-DD)
date_before: Only migrate posts before this date (YYYY-MM-DD)
status_filter: List of statuses to filter by (e.g., ['publish', 'draft'])
create_categories: If True, create categories if they don't exist
create_tags: If True, create tags if they don't exist
delete_after: If True, delete posts from source after migration
status: Status for new posts
limit: Maximum number of posts to migrate
ignore_original_date: If True, use current date instead of original post date
Returns:
Path to migration report CSV
"""
logger.info(f"🚀 Migrating posts from {source_site} to {destination_site}...")
migrator = WordPressPostMigrator()
return migrator.migrate_posts_by_filter(
source_site=source_site,
destination_site=destination_site,
category_filter=category_filter,
tag_filter=tag_filter,
date_after=date_after,
date_before=date_before,
status_filter=status_filter,
create_categories=create_categories,
create_tags=create_tags,
delete_after=delete_after,
status=status,
limit=limit,
ignore_original_date=ignore_original_date
)
def status(self) -> dict:
"""Get status of output files."""
files = list(self.output_dir.glob('*.csv'))
@@ -179,6 +284,85 @@ class SEOApp:
return status_info
def generate_meta_descriptions(self, csv_file: Optional[str] = None,
output_file: Optional[str] = None,
only_missing: bool = False,
only_poor_quality: bool = False,
limit: Optional[int] = None) -> Tuple[str, Dict]:
"""
Generate AI-optimized meta descriptions for posts.
Args:
csv_file: Path to CSV file with posts (uses latest export if not provided)
output_file: Custom output file path for results
only_missing: Only generate for posts without meta descriptions
only_poor_quality: Only generate for posts with poor quality meta descriptions
limit: Maximum number of posts to process
Returns:
Tuple of (output_file_path, summary_dict)
"""
logger.info("✨ Generating AI-optimized meta descriptions...")
if not csv_file:
csv_file = self._find_latest_export()
if not csv_file:
raise FileNotFoundError("No exported posts found. Run export() first or provide a CSV file.")
logger.info(f"Using file: {csv_file}")
generator = MetaDescriptionGenerator(csv_file)
return generator.run(
output_file=output_file,
only_missing=only_missing,
only_poor_quality=only_poor_quality,
limit=limit
)
def update_meta_descriptions(self, site: str,
post_ids: Optional[List[int]] = None,
category_names: Optional[List[str]] = None,
category_ids: Optional[List[int]] = None,
author_names: Optional[List[str]] = None,
limit: Optional[int] = None,
dry_run: bool = False,
skip_existing: bool = True,
force_regenerate: bool = False) -> Dict:
"""
Fetch posts from WordPress, generate AI meta descriptions, and update them.
Args:
site: WordPress site name (REQUIRED) - mistergeek.net, webscroll.fr, hellogeek.net
post_ids: Specific post IDs to update
category_names: Filter by category names
category_ids: Filter by category IDs
author_names: Filter by author names
limit: Maximum number of posts to process
dry_run: If True, preview changes without updating
skip_existing: If True, skip posts with existing good quality meta descriptions
force_regenerate: If True, regenerate even for good quality metas
Returns:
Statistics dict
"""
logger.info(f"🔄 Updating meta descriptions on {site}...")
if not site:
raise ValueError("Site is required. Use --site mistergeek.net|webscroll.fr|hellogeek.net")
updater = MetaDescriptionUpdater(site)
return updater.run(
post_ids=post_ids,
category_ids=category_ids,
category_names=category_names,
author_names=author_names,
limit=limit,
dry_run=dry_run,
skip_existing=skip_existing,
force_regenerate=force_regenerate
)
def _find_latest_export(self) -> Optional[str]:
"""Find the latest exported CSV file."""
csv_files = list(self.output_dir.glob('all_posts_*.csv'))

View File

@@ -132,12 +132,33 @@ class WordPressCategoryManager:
}
return category_data['id']
elif response.status_code == 409:
# Category already exists
logger.info(f" Category '{category_name}' already exists")
existing = response.json()
if isinstance(existing, list) and len(existing) > 0:
return existing[0]['id']
elif response.status_code == 400:
# Category might already exist - search for it
error_data = response.json()
if error_data.get('code') == 'term_exists':
term_id = error_data.get('data', {}).get('term_id')
if term_id:
logger.info(f" Category '{category_name}' already exists (ID: {term_id})")
# Fetch the category details
cat_response = requests.get(
f"{base_url}/wp-json/wp/v2/categories/{term_id}",
auth=auth,
timeout=10
)
if cat_response.status_code == 200:
cat_data = cat_response.json()
# Update cache
if site_name in self.category_cache:
self.category_cache[site_name][cat_data['slug']] = {
'id': cat_data['id'],
'name': cat_data['name'],
'slug': cat_data['slug'],
'count': cat_data.get('count', 0)
}
return cat_data['id']
logger.warning(f" Category already exists or error: {error_data}")
return None
else:
logger.error(f"Error creating category: {response.status_code} - {response.text}")
@@ -164,21 +185,42 @@ class WordPressCategoryManager:
if site_name not in self.category_cache:
self.fetch_categories(site_name)
# Check if category exists
slug = category_name.lower().replace(' ', '-').replace('/', '-')
# Check if category exists (by exact name first)
categories = self.category_cache.get(site_name, {})
# Try exact name match (case-insensitive)
category_name_lower = category_name.lower()
for slug, cat_data in categories.items():
if cat_data['name'].lower() == category_name_lower:
logger.info(f"✓ Found existing category '{category_name}' (ID: {cat_data['id']})")
return cat_data['id']
# Try slug match
slug = category_name.lower().replace(' ', '-').replace('/', '-')
if slug in categories:
logger.info(f"✓ Found existing category '{category_name}' (ID: {categories[slug]['id']})")
return categories[slug]['id']
# Try alternative slug formats
alt_slug = category_name.lower().replace(' ', '-')
if alt_slug in categories:
logger.info(f"✓ Found existing category '{category_name}' (ID: {categories[alt_slug]['id']})")
return categories[alt_slug]['id']
# Try alternative slug formats (handle French characters)
import unicodedata
normalized_slug = unicodedata.normalize('NFKD', slug)\
.encode('ascii', 'ignore')\
.decode('ascii')\
.lower()\
.replace(' ', '-')
if normalized_slug in categories:
logger.info(f"✓ Found existing category '{category_name}' (ID: {categories[normalized_slug]['id']})")
return categories[normalized_slug]['id']
# Try partial match (if slug contains the category name)
for slug, cat_data in categories.items():
if category_name_lower in cat_data['name'].lower() or cat_data['name'].lower() in category_name_lower:
logger.info(f"✓ Found similar category '{cat_data['name']}' (ID: {cat_data['id']})")
return cat_data['id']
# Create new category
logger.info(f"Creating new category '{category_name}'...")
return self.create_category(site_name, category_name, description)
def assign_post_to_category(self, site_name: str, post_id: int,
@@ -292,14 +334,16 @@ class CategoryAssignmentProcessor:
def process_proposals(self, proposals: List[Dict], site_name: str,
confidence_threshold: str = 'Medium',
strict: bool = False,
dry_run: bool = False) -> Dict[str, int]:
"""
Process AI category proposals and apply to WordPress.
Args:
proposals: List of proposal dicts from CSV
site_name: Site to apply changes to
site_name: Site to apply changes to (filters proposals)
confidence_threshold: Minimum confidence to apply (High, Medium, Low)
strict: If True, only match exact confidence level
dry_run: If True, don't actually make changes
Returns:
@@ -312,7 +356,23 @@ class CategoryAssignmentProcessor:
if dry_run:
logger.info("DRY RUN - No changes will be made")
# Filter by site
original_count = len(proposals)
proposals = [p for p in proposals if p.get('current_site', '') == site_name]
filtered_by_site = original_count - len(proposals)
logger.info(f"Filtered to {len(proposals)} posts on {site_name} ({filtered_by_site} excluded from other sites)")
# Filter by confidence
if strict:
# Exact match only
filtered_proposals = [
p for p in proposals
if p.get('category_confidence', 'Medium') == confidence_threshold
]
logger.info(f"Filtered to {len(filtered_proposals)} proposals (confidence = {confidence_threshold}, strict mode)")
else:
# Medium or better (default behavior)
confidence_order = {'High': 3, 'Medium': 2, 'Low': 1}
min_confidence = confidence_order.get(confidence_threshold, 2)
@@ -320,28 +380,36 @@ class CategoryAssignmentProcessor:
p for p in proposals
if confidence_order.get(p.get('category_confidence', 'Medium'), 2) >= min_confidence
]
logger.info(f"Filtered to {len(filtered_proposals)} proposals (confidence >= {confidence_threshold})")
# Show breakdown
high_count = sum(1 for p in filtered_proposals if p.get('category_confidence') == 'High')
medium_count = sum(1 for p in filtered_proposals if p.get('category_confidence') == 'Medium')
low_count = sum(1 for p in filtered_proposals if p.get('category_confidence') == 'Low')
logger.info(f" Breakdown: High={high_count}, Medium={medium_count}, Low={low_count}")
# Fetch existing categories
self.category_manager.fetch_categories(site_name)
# Process each proposal
for i, proposal in enumerate(filtered_proposals, 1):
logger.info(f"\n[{i}/{len(filtered_proposals)}] Processing post {proposal.get('post_id')}...")
post_id = int(proposal.get('post_id', 0))
post_title = proposal.get('title', 'Unknown')[:60]
post_id = proposal.get('post_id', '')
proposed_category = proposal.get('proposed_category', '')
current_categories = proposal.get('current_categories', '')
confidence = proposal.get('category_confidence', 'Medium')
logger.info(f"\n[{i}/{len(filtered_proposals)}] Post {post_id}: {post_title}...")
logger.info(f" Current categories: {current_categories}")
logger.info(f" Proposed: {proposed_category} (confidence: {confidence})")
if not post_id or not proposed_category:
logger.warning(" Skipping: Missing post_id or proposed_category")
self.processing_stats['errors'] += 1
continue
if dry_run:
logger.info(f" Would assign to: {proposed_category}")
logger.info(f" [DRY RUN] Would assign to: {proposed_category}")
continue
# Get or create the category
@@ -362,9 +430,10 @@ class CategoryAssignmentProcessor:
logger.info(f" ✓ Assigned to '{proposed_category}'")
else:
self.processing_stats['errors'] += 1
logger.error(f" ✗ Failed to assign category")
else:
self.processing_stats['errors'] += 1
logger.error(f" Failed to get/create category '{proposed_category}'")
logger.error(f" Failed to get/create category '{proposed_category}'")
self.processing_stats['total_posts'] = len(filtered_proposals)
@@ -381,6 +450,7 @@ class CategoryAssignmentProcessor:
def run(self, proposals_csv: str, site_name: str,
confidence_threshold: str = 'Medium',
strict: bool = False,
dry_run: bool = False) -> Dict[str, int]:
"""
Run complete category assignment process.
@@ -389,6 +459,7 @@ class CategoryAssignmentProcessor:
proposals_csv: Path to proposals CSV
site_name: Site to apply changes to
confidence_threshold: Minimum confidence to apply
strict: If True, only match exact confidence level
dry_run: If True, preview changes without applying
Returns:
@@ -404,5 +475,6 @@ class CategoryAssignmentProcessor:
proposals,
site_name,
confidence_threshold,
dry_run
strict=strict,
dry_run=dry_run
)

View File

@@ -164,7 +164,7 @@ class CategoryProposer:
logger.info("\n📊 Analyzing editorial strategy to inform category proposals...")
analyzer = EditorialStrategyAnalyzer()
analyzer.load_csv(str(self.csv_file))
analyzer.load_posts(str(self.csv_file))
self.site_analysis = analyzer.analyze_site_content()
logger.info("✓ Editorial strategy analysis complete")

View File

@@ -47,6 +47,38 @@ Examples:
parser.add_argument('--site', '-s', choices=['mistergeek.net', 'webscroll.fr', 'hellogeek.net'],
help='WordPress site for category operations')
parser.add_argument('--description', '-d', help='Category description')
parser.add_argument('--strict', action='store_true', help='Strict confidence matching (exact match only)')
# Export arguments
parser.add_argument('--author', nargs='+', help='Filter by author name(s) for export')
parser.add_argument('--author-id', type=int, nargs='+', help='Filter by author ID(s) for export')
# Migration arguments
parser.add_argument('--destination', '--to', choices=['mistergeek.net', 'webscroll.fr', 'hellogeek.net'],
help='Destination site for migration')
parser.add_argument('--source', '--from', choices=['mistergeek.net', 'webscroll.fr', 'hellogeek.net'],
help='Source site for filtered migration')
parser.add_argument('--keep-source', action='store_true', help='Keep posts on source site (default: delete after migration)')
parser.add_argument('--post-status', choices=['draft', 'publish', 'pending'], default='draft',
help='Status for migrated posts (default: draft)')
parser.add_argument('--no-categories', action='store_true', help='Do not create categories automatically')
parser.add_argument('--no-tags', action='store_true', help='Do not create tags automatically')
parser.add_argument('--category-filter', nargs='+', help='Filter by category names (for filtered migration)')
parser.add_argument('--tag-filter', nargs='+', help='Filter by tag names (for filtered migration)')
parser.add_argument('--date-after', help='Migrate posts after this date (YYYY-MM-DD)')
parser.add_argument('--date-before', help='Migrate posts before this date (YYYY-MM-DD)')
parser.add_argument('--limit', type=int, help='Limit number of posts to migrate')
parser.add_argument('--ignore-original-date', action='store_true', help='Use current date instead of original post date')
# Meta description arguments
parser.add_argument('--only-missing', action='store_true', help='Only generate for posts without meta descriptions')
parser.add_argument('--only-poor', action='store_true', help='Only generate for posts with poor quality meta descriptions')
# Update meta arguments
parser.add_argument('--post-ids', type=int, nargs='+', help='Specific post IDs to update')
parser.add_argument('--category', nargs='+', help='Filter by category name(s)')
parser.add_argument('--category-id', type=int, nargs='+', help='Filter by category ID(s)')
parser.add_argument('--force', action='store_true', help='Force regenerate even for good quality meta descriptions')
args = parser.parse_args()
@@ -72,6 +104,9 @@ Examples:
'category_apply': cmd_category_apply,
'category_create': cmd_category_create,
'editorial_strategy': cmd_editorial_strategy,
'migrate': cmd_migrate,
'meta_description': cmd_meta_description,
'update_meta': cmd_update_meta,
'status': cmd_status,
'help': cmd_help,
}
@@ -103,8 +138,19 @@ def cmd_export(app, args):
"""Export all posts."""
if args.dry_run:
print("Would export all posts from WordPress sites")
if args.author:
print(f" Author filter: {args.author}")
if args.author_id:
print(f" Author ID filter: {args.author_id}")
return 0
app.export()
result = app.export(
author_filter=args.author,
author_ids=args.author_id,
site_filter=args.site
)
if result:
print(f"✅ Export completed! Output: {result}")
return 0
@@ -160,6 +206,8 @@ def cmd_category_apply(app, args):
print("Would apply category proposals to WordPress")
print(f" Site: {args.site}")
print(f" Confidence: {args.confidence}")
if args.strict:
print(f" Strict mode: Yes (exact match only)")
return 0
if not args.site:
@@ -180,11 +228,14 @@ def cmd_category_apply(app, args):
print(f"Applying categories from: {proposals_csv}")
print(f"Site: {args.site}")
print(f"Confidence threshold: {args.confidence}")
if args.strict:
print(f"Strict mode: Yes (exact match only)")
stats = app.category_apply(
proposals_csv=proposals_csv,
site_name=args.site,
confidence=args.confidence,
strict=args.strict,
dry_run=False
)
@@ -253,6 +304,196 @@ def cmd_editorial_strategy(app, args):
return 0
def cmd_migrate(app, args):
"""Migrate posts between websites."""
if args.dry_run:
print("Would migrate posts between websites")
if args.destination:
print(f" Destination: {args.destination}")
if args.source:
print(f" Source: {args.source}")
return 0
# Validate required arguments
if not args.destination:
print("❌ Destination site required. Use --destination mistergeek.net|webscroll.fr|hellogeek.net")
return 1
delete_after = not args.keep_source
create_categories = not args.no_categories
create_tags = not args.no_tags
# Check if using filtered migration or CSV-based migration
if args.source:
# Filtered migration
print(f"Migrating posts from {args.source} to {args.destination}")
print(f"Post status: {args.post_status}")
print(f"Delete after migration: {delete_after}")
if args.category_filter:
print(f"Category filter: {args.category_filter}")
if args.tag_filter:
print(f"Tag filter: {args.tag_filter}")
if args.date_after:
print(f"Date after: {args.date_after}")
if args.date_before:
print(f"Date before: {args.date_before}")
if args.limit:
print(f"Limit: {args.limit}")
result = app.migrate_by_filter(
source_site=args.source,
destination_site=args.destination,
category_filter=args.category_filter,
tag_filter=args.tag_filter,
date_after=args.date_after,
date_before=args.date_before,
status_filter=None,
create_categories=create_categories,
create_tags=create_tags,
delete_after=delete_after,
status=args.post_status,
limit=args.limit,
ignore_original_date=args.ignore_original_date
)
if result:
print(f"\n✅ Migration completed!")
print(f" Report: {result}")
else:
# CSV-based migration
csv_file = args.args[0] if args.args else None
if not csv_file:
print("❌ CSV file required. Provide path to CSV with 'site' and 'post_id' columns")
print(" Usage: seo migrate <csv_file> --destination <site>")
print(" Or use filtered migration: seo migrate --source <site> --destination <site>")
return 1
print(f"Migrating posts from CSV: {csv_file}")
print(f"Destination: {args.destination}")
print(f"Post status: {args.post_status}")
print(f"Delete after migration: {delete_after}")
result = app.migrate(
csv_file=csv_file,
destination_site=args.destination,
create_categories=create_categories,
create_tags=create_tags,
delete_after=delete_after,
status=args.post_status,
output_file=args.output,
ignore_original_date=args.ignore_original_date
)
if result:
print(f"\n✅ Migration completed!")
print(f" Report: {result}")
return 0
def cmd_meta_description(app, args):
"""Generate AI-optimized meta descriptions."""
if args.dry_run:
print("Would generate AI-optimized meta descriptions")
if args.only_missing:
print(" Filter: Only posts without meta descriptions")
if args.only_poor:
print(" Filter: Only posts with poor quality meta descriptions")
if args.limit:
print(f" Limit: {args.limit} posts")
return 0
csv_file = args.args[0] if args.args else None
print("Generating AI-optimized meta descriptions...")
if args.only_missing:
print(" Filter: Only posts without meta descriptions")
elif args.only_poor:
print(" Filter: Only posts with poor quality meta descriptions")
if args.limit:
print(f" Limit: {args.limit} posts")
output_file, summary = app.generate_meta_descriptions(
csv_file=csv_file,
output_file=args.output,
only_missing=args.only_missing,
only_poor_quality=args.only_poor,
limit=args.limit
)
if output_file and summary:
print(f"\n✅ Meta description generation completed!")
print(f" Results: {output_file}")
print(f"\n📊 Summary:")
print(f" Total processed: {summary.get('total_posts', 0)}")
print(f" Improved: {summary.get('improved', 0)} ({summary.get('improvement_rate', 0):.1f}%)")
print(f" Optimal length: {summary.get('optimal_length_count', 0)} ({summary.get('optimal_length_rate', 0):.1f}%)")
print(f" Average score: {summary.get('average_score', 0):.1f}")
print(f" API calls: {summary.get('api_calls', 0)}")
return 0
def cmd_update_meta(app, args):
"""Fetch, generate, and update meta descriptions directly on WordPress."""
if args.dry_run:
print("Would update meta descriptions on WordPress")
if not args.site:
print(" ❌ Site is required. Use --site mistergeek.net|webscroll.fr|hellogeek.net")
return 1
print(f" Site: {args.site}")
if args.post_ids:
print(f" Post IDs: {args.post_ids}")
if args.category:
print(f" Categories: {args.category}")
if args.author:
print(f" Authors: {args.author}")
if args.limit:
print(f" Limit: {args.limit} posts")
return 0
# Site is required
if not args.site:
print("❌ Site is required. Use --site mistergeek.net|webscroll.fr|hellogeek.net")
return 1
print(f"Updating meta descriptions on {args.site}...")
if args.post_ids:
print(f" Post IDs: {args.post_ids}")
if args.category:
print(f" Categories: {args.category}")
if args.author:
print(f" Authors: {args.author}")
if args.category_id:
print(f" Category IDs: {args.category_id}")
if args.limit:
print(f" Limit: {args.limit} posts")
print(f" Skip existing: {not args.force}")
print(f" Dry run: {args.dry_run}")
stats = app.update_meta_descriptions(
site=args.site,
post_ids=args.post_ids,
category_names=args.category,
category_ids=args.category_id,
author_names=args.author,
limit=args.limit,
dry_run=args.dry_run,
skip_existing=not args.force,
force_regenerate=args.force
)
if stats:
print(f"\n✅ Meta description update completed!")
print(f"\n📊 Summary:")
print(f" Total posts: {stats.get('total_posts', 0)}")
print(f" Updated: {stats.get('updated', 0)}")
print(f" Failed: {stats.get('failed', 0)}")
print(f" Skipped: {stats.get('skipped', 0)}")
print(f" API calls: {stats.get('api_calls', 0)}")
return 0
def cmd_status(app, args):
"""Show status."""
if args.dry_run:
@@ -279,10 +520,18 @@ SEO Automation CLI - Available Commands
Export & Analysis:
export Export all posts from WordPress sites
export --author "John Doe" Export posts by specific author
export --author-id 1 2 Export posts by author IDs
export -s mistergeek.net Export from specific site only
analyze [csv_file] Analyze posts with AI
analyze -f title Analyze specific fields (title, meta_description, categories, site)
analyze -u Update input CSV with new columns (creates backup)
category_propose [csv] Propose categories based on content
meta_description [csv] Generate AI-optimized meta descriptions
meta_description --only-missing Generate only for posts without meta descriptions
update_meta --site <site> Fetch, generate, and update meta on WordPress
update_meta --site A --post-ids 1 2 3 Update specific posts
update_meta --site A --category "VPN" Update posts in category
Category Management:
category_apply [csv] Apply AI category proposals to WordPress
@@ -293,11 +542,49 @@ Category Management:
Strategy & Migration:
editorial_strategy [csv] Analyze editorial lines and recommend migrations
editorial_strategy Get migration recommendations between sites
migrate <csv> --destination <site> Migrate posts from CSV to destination site
migrate --source <site> --destination <site> Migrate posts with filters
migrate --source A --to B --category-filter "VPN" Migrate specific categories
migrate --source A --to B --date-after 2024-01-01 --limit 10
Utility:
status Show output files status
help Show this help message
Export Options:
--author Filter by author name(s) (case-insensitive, partial match)
--author-id Filter by author ID(s)
--site, -s Export from specific site only
Meta Description Options:
--only-missing Only generate for posts without meta descriptions
--only-poor Only generate for posts with poor quality meta descriptions
--limit Limit number of posts to process
--output, -o Custom output file path
Update Meta Options:
--site, -s WordPress site (REQUIRED): mistergeek.net, webscroll.fr, hellogeek.net
--post-ids Specific post IDs to update
--category Filter by category name(s)
--category-id Filter by category ID(s)
--author Filter by author name(s)
--force Force regenerate even for good quality meta descriptions
Migration Options:
--destination, --to Destination site: mistergeek.net, webscroll.fr, hellogeek.net
--source, --from Source site for filtered migration
--keep-source Keep posts on source site (default: delete after migration)
--post-status Status for migrated posts: draft, publish, pending (default: draft)
--no-categories Do not create categories automatically
--no-tags Do not create tags automatically
--category-filter Filter by category names (for filtered migration)
--tag-filter Filter by tag names (for filtered migration)
--date-after Migrate posts after this date (YYYY-MM-DD)
--date-before Migrate posts before this date (YYYY-MM-DD)
--limit Limit number of posts to migrate
--ignore-original-date Use current date instead of original post date
--output, -o Custom output file path for migration report
Options:
--verbose, -v Enable verbose logging
--dry-run Show what would be done without doing it
@@ -307,14 +594,29 @@ Options:
--confidence, -c Confidence threshold: High, Medium, Low
--site, -s WordPress site: mistergeek.net, webscroll.fr, hellogeek.net
--description, -d Category description
--strict Strict confidence matching (exact match only, not "or better")
Examples:
seo export
seo export --author "John Doe"
seo export --author-id 1 2
seo export -s mistergeek.net --author "admin"
seo analyze -f title categories
seo category_propose
seo category_apply -s mistergeek.net -c Medium
seo category_create -s webscroll.fr "Torrent Clients"
seo editorial_strategy
seo migrate posts_to_migrate.csv --destination mistergeek.net
seo migrate --source webscroll.fr --destination mistergeek.net --category-filter VPN
seo migrate --source A --to B --date-after 2024-01-01 --limit 10 --keep-source
seo meta_description # Generate for all posts
seo meta_description --only-missing # Generate only for posts without meta
seo meta_description --only-poor --limit 10 # Fix 10 poor quality metas
seo update_meta --site mistergeek.net # Update all posts on site
seo update_meta --site A --post-ids 1 2 3 # Update specific posts
seo update_meta --site A --category "VPN" --limit 10 # Update 10 posts in category
seo update_meta --site A --author "john" --limit 10 # Update 10 posts by author
seo update_meta --site A --dry-run # Preview changes
seo status
""")
return 0

View File

@@ -20,11 +20,21 @@ logger = logging.getLogger(__name__)
class PostExporter:
"""Export posts from WordPress sites to CSV."""
def __init__(self):
"""Initialize the exporter."""
def __init__(self, author_filter: Optional[List[str]] = None,
author_ids: Optional[List[int]] = None):
"""
Initialize the exporter.
Args:
author_filter: List of author names to filter by (case-insensitive)
author_ids: List of author IDs to filter by
"""
self.sites = Config.WORDPRESS_SITES
self.all_posts = []
self.category_cache = {}
self.author_filter = author_filter
self.author_ids = author_ids
self.author_cache = {} # Cache author info by site
def fetch_category_names(self, site_name: str, site_config: Dict) -> Dict[int, Dict]:
"""Fetch category names from a WordPress site."""
@@ -50,8 +60,55 @@ class PostExporter:
self.category_cache[site_name] = categories
return categories
def fetch_posts_from_site(self, site_name: str, site_config: Dict) -> List[Dict]:
"""Fetch all posts from a WordPress site."""
def fetch_authors(self, site_name: str, site_config: Dict) -> Dict[int, Dict]:
"""
Fetch all authors/users from a WordPress site.
Returns:
Dict mapping author ID to author data (name, slug)
"""
if site_name in self.author_cache:
return self.author_cache[site_name]
logger.info(f" Fetching authors from {site_name}...")
authors = {}
base_url = site_config['url'].rstrip('/')
api_url = f"{base_url}/wp-json/wp/v2/users"
auth = HTTPBasicAuth(site_config['username'], site_config['password'])
try:
response = requests.get(api_url, params={'per_page': 100}, auth=auth, timeout=10)
response.raise_for_status()
for user in response.json():
authors[user['id']] = {
'id': user['id'],
'name': user.get('name', ''),
'slug': user.get('slug', ''),
'description': user.get('description', '')
}
logger.info(f" ✓ Fetched {len(authors)} authors")
except Exception as e:
logger.warning(f" Could not fetch authors from {site_name}: {e}")
# Fallback: create empty dict if authors can't be fetched
# Author IDs will still be exported, just without names
self.author_cache[site_name] = authors
return authors
def fetch_posts_from_site(self, site_name: str, site_config: Dict,
authors_map: Optional[Dict[int, Dict]] = None) -> List[Dict]:
"""
Fetch all posts from a WordPress site.
Args:
site_name: Site name
site_config: Site configuration
authors_map: Optional authors mapping for filtering
Returns:
List of post data
"""
logger.info(f"\nFetching posts from {site_name}...")
posts = []
@@ -59,14 +116,23 @@ class PostExporter:
api_url = f"{base_url}/wp-json/wp/v2/posts"
auth = HTTPBasicAuth(site_config['username'], site_config['password'])
# Build base params
base_params = {'page': 1, 'per_page': 100, '_embed': True}
# Add author filter if specified
if self.author_ids:
base_params['author'] = ','.join(map(str, self.author_ids))
logger.info(f" Filtering by author IDs: {self.author_ids}")
for status in ['publish', 'draft']:
page = 1
while True:
try:
params = {**base_params, 'page': page, 'status': status}
logger.info(f" Fetching page {page} ({status} posts)...")
response = requests.get(
api_url,
params={'page': page, 'per_page': 100, 'status': status},
params=params,
auth=auth,
timeout=10
)
@@ -76,7 +142,28 @@ class PostExporter:
if not page_posts:
break
# Filter by author name if specified
if self.author_filter and authors_map:
filtered_posts = []
for post in page_posts:
author_id = post.get('author')
if author_id and author_id in authors_map:
author_name = authors_map[author_id]['name'].lower()
author_slug = authors_map[author_id]['slug'].lower()
# Check if author matches filter
for filter_name in self.author_filter:
filter_lower = filter_name.lower()
if (filter_lower in author_name or
filter_lower == author_slug):
filtered_posts.append(post)
break
page_posts = filtered_posts
logger.info(f" ✓ Got {len(page_posts)} posts after author filter")
posts.extend(page_posts)
if page_posts:
logger.info(f" ✓ Got {len(page_posts)} posts")
page += 1
@@ -94,7 +181,8 @@ class PostExporter:
logger.info(f"✓ Total posts from {site_name}: {len(posts)}\n")
return posts
def extract_post_details(self, post: Dict, site_name: str, category_map: Dict) -> Dict:
def extract_post_details(self, post: Dict, site_name: str, category_map: Dict,
author_map: Optional[Dict[int, Dict]] = None) -> Dict:
"""Extract post details for CSV export."""
title = post.get('title', {})
if isinstance(title, dict):
@@ -122,6 +210,13 @@ class PostExporter:
for cat_id in category_ids
]) if category_ids else ''
# Get author name from author map
author_id = post.get('author', '')
author_name = ''
if author_map and author_id:
author_data = author_map.get(author_id, {})
author_name = author_data.get('name', '')
return {
'site': site_name,
'post_id': post['id'],
@@ -129,7 +224,8 @@ class PostExporter:
'title': title.strip(),
'slug': post.get('slug', ''),
'url': post.get('link', ''),
'author_id': post.get('author', ''),
'author_id': author_id,
'author_name': author_name,
'date_published': post.get('date', ''),
'date_modified': post.get('modified', ''),
'categories': category_names,
@@ -158,7 +254,7 @@ class PostExporter:
return ""
fieldnames = [
'site', 'post_id', 'status', 'title', 'slug', 'url', 'author_id',
'site', 'post_id', 'status', 'title', 'slug', 'url', 'author_id', 'author_name',
'date_published', 'date_modified', 'categories', 'tags', 'excerpt',
'content_preview', 'seo_title', 'meta_description', 'focus_keyword', 'word_count',
]
@@ -173,24 +269,46 @@ class PostExporter:
logger.info(f"✓ CSV exported to: {output_file}")
return str(output_file)
def run(self) -> str:
"""Run the complete export process."""
def run(self, site_filter: Optional[str] = None) -> str:
"""
Run the complete export process.
Args:
site_filter: Optional site name to export from (default: all sites)
Returns:
Path to exported CSV file
"""
logger.info("="*70)
logger.info("EXPORTING ALL POSTS")
logger.info("="*70)
if self.author_filter:
logger.info(f"Author filter: {self.author_filter}")
if self.author_ids:
logger.info(f"Author IDs: {self.author_ids}")
if site_filter:
logger.info(f"Site filter: {site_filter}")
logger.info("Sites configured: " + ", ".join(self.sites.keys()))
for site_name, config in self.sites.items():
# Skip sites if filter is specified
if site_filter and site_name != site_filter:
logger.info(f"Skipping {site_name} (not in filter)")
continue
categories = self.fetch_category_names(site_name, config)
posts = self.fetch_posts_from_site(site_name, config)
authors = self.fetch_authors(site_name, config)
posts = self.fetch_posts_from_site(site_name, config, authors)
if posts:
for post in posts:
post_details = self.extract_post_details(post, site_name, categories)
post_details = self.extract_post_details(post, site_name, categories, authors)
self.all_posts.append(post_details)
if not self.all_posts:
logger.error("No posts found on any site")
logger.warning("No posts found matching criteria")
return ""
self.all_posts.sort(key=lambda x: (x['site'], x['post_id']))

View File

@@ -0,0 +1,482 @@
"""
Meta Description Generator - AI-powered meta description generation and optimization
"""
import csv
import json
import logging
import time
from pathlib import Path
from datetime import datetime
from typing import Dict, List, Optional, Tuple
import requests
from .config import Config
logger = logging.getLogger(__name__)
class MetaDescriptionGenerator:
"""AI-powered meta description generator and optimizer."""
def __init__(self, csv_file: str):
"""
Initialize the generator.
Args:
csv_file: Path to CSV file with posts
"""
self.csv_file = Path(csv_file)
self.openrouter_api_key = Config.OPENROUTER_API_KEY
self.ai_model = Config.AI_MODEL
self.posts = []
self.generated_results = []
self.api_calls = 0
self.ai_cost = 0.0
# Meta description best practices
self.max_length = 160 # Optimal length for SEO
self.min_length = 120
self.include_keywords = True
def load_csv(self) -> bool:
"""Load posts from CSV file."""
logger.info(f"Loading CSV: {self.csv_file}")
if not self.csv_file.exists():
logger.error(f"CSV file not found: {self.csv_file}")
return False
try:
with open(self.csv_file, 'r', encoding='utf-8') as f:
reader = csv.DictReader(f)
self.posts = list(reader)
logger.info(f"✓ Loaded {len(self.posts)} posts from CSV")
return True
except Exception as e:
logger.error(f"Error loading CSV: {e}")
return False
def _build_prompt(self, post: Dict) -> str:
"""
Build AI prompt for meta description generation.
Args:
post: Post data dict
Returns:
AI prompt string
"""
title = post.get('title', '')
content_preview = post.get('content_preview', '')
excerpt = post.get('excerpt', '')
focus_keyword = post.get('focus_keyword', '')
current_meta = post.get('meta_description', '')
# Build context from available content
content_context = ""
if excerpt:
content_context += f"Excerpt: {excerpt}\n"
if content_preview:
content_context += f"Content preview: {content_preview[:300]}..."
prompt = f"""You are an SEO expert. Generate an optimized meta description for the following blog post.
**Post Title:** {title}
**Content Context:**
{content_context}
**Focus Keyword:** {focus_keyword if focus_keyword else 'Not specified'}
**Current Meta Description:** {current_meta if current_meta else 'None (needs to be created)'}
**Requirements:**
1. Length: 120-160 characters (optimal for SEO)
2. Include the focus keyword naturally if available
3. Make it compelling and action-oriented
4. Clearly describe what the post is about
5. Use active voice
6. Include a call-to-action when appropriate
7. Avoid clickbait - be accurate and valuable
8. Write in the same language as the content
**Output Format:**
Return ONLY the meta description text, nothing else. No quotes, no explanations."""
return prompt
def _call_ai_api(self, prompt: str) -> Optional[str]:
"""
Call AI API to generate meta description.
Args:
prompt: AI prompt
Returns:
Generated meta description or None
"""
url = "https://openrouter.ai/api/v1/chat/completions"
headers = {
"Authorization": f"Bearer {self.openrouter_api_key}",
"Content-Type": "application/json"
}
payload = {
"model": self.ai_model,
"messages": [
{
"role": "system",
"content": "You are an SEO expert specializing in meta description optimization. You write compelling, concise, and search-engine optimized meta descriptions."
},
{
"role": "user",
"content": prompt
}
],
"temperature": 0.7,
"max_tokens": 100
}
try:
response = requests.post(url, json=payload, headers=headers, timeout=30)
response.raise_for_status()
result = response.json()
self.api_calls += 1
# Extract generated text
if 'choices' in result and len(result['choices']) > 0:
meta_description = result['choices'][0]['message']['content'].strip()
# Remove quotes if AI included them
if meta_description.startswith('"') and meta_description.endswith('"'):
meta_description = meta_description[1:-1]
return meta_description
else:
logger.warning("No AI response received")
return None
except requests.exceptions.RequestException as e:
logger.error(f"API call failed: {e}")
return None
except Exception as e:
logger.error(f"Error processing AI response: {e}")
return None
def _validate_meta_description(self, meta: str) -> Dict[str, any]:
"""
Validate meta description quality.
Args:
meta: Meta description text
Returns:
Validation results dict
"""
length = len(meta)
validation = {
'length': length,
'is_valid': False,
'too_short': False,
'too_long': False,
'optimal': False,
'score': 0
}
# Check length
if length < self.min_length:
validation['too_short'] = True
validation['score'] = max(0, 50 - (self.min_length - length))
elif length > self.max_length:
validation['too_long'] = True
validation['score'] = max(0, 50 - (length - self.max_length))
else:
validation['optimal'] = True
validation['score'] = 100
# Check if it ends with a period (good practice)
if meta.endswith('.'):
validation['score'] = min(100, validation['score'] + 5)
# Check for call-to-action words
cta_words = ['learn', 'discover', 'find', 'explore', 'read', 'get', 'see', 'try', 'start']
if any(word in meta.lower() for word in cta_words):
validation['score'] = min(100, validation['score'] + 5)
validation['is_valid'] = validation['score'] >= 70
return validation
def generate_for_post(self, post: Dict) -> Optional[Dict]:
"""
Generate meta description for a single post.
Args:
post: Post data dict
Returns:
Result dict with generated meta and validation
"""
title = post.get('title', '')
post_id = post.get('post_id', '')
current_meta = post.get('meta_description', '')
logger.info(f"Generating meta description for post {post_id}: {title[:50]}...")
# Skip if post has no title
if not title:
logger.warning(f"Skipping post {post_id}: No title")
return None
# Build prompt and call AI
prompt = self._build_prompt(post)
generated_meta = self._call_ai_api(prompt)
if not generated_meta:
logger.error(f"Failed to generate meta description for post {post_id}")
return None
# Validate the result
validation = self._validate_meta_description(generated_meta)
# Calculate improvement
improvement = False
if current_meta:
current_validation = self._validate_meta_description(current_meta)
improvement = validation['score'] > current_validation['score']
else:
improvement = True # Any meta is an improvement over none
result = {
'post_id': post_id,
'site': post.get('site', ''),
'title': title,
'current_meta_description': current_meta,
'generated_meta_description': generated_meta,
'generated_length': validation['length'],
'validation_score': validation['score'],
'is_optimal_length': validation['optimal'],
'improvement': improvement,
'status': 'generated'
}
logger.info(f"✓ Generated meta description (score: {validation['score']}, length: {validation['length']})")
# Rate limiting
time.sleep(0.5)
return result
def generate_batch(self, batch: List[Dict]) -> List[Dict]:
"""
Generate meta descriptions for a batch of posts.
Args:
batch: List of post dicts
Returns:
List of result dicts
"""
results = []
for i, post in enumerate(batch, 1):
logger.info(f"Processing post {i}/{len(batch)}")
result = self.generate_for_post(post)
if result:
results.append(result)
return results
def filter_posts_for_generation(self, posts: List[Dict],
only_missing: bool = False,
only_poor_quality: bool = False) -> List[Dict]:
"""
Filter posts based on meta description status.
Args:
posts: List of post dicts
only_missing: Only include posts without meta descriptions
only_poor_quality: Only include posts with poor meta descriptions
Returns:
Filtered list of posts
"""
filtered = []
for post in posts:
current_meta = post.get('meta_description', '')
if only_missing:
# Skip posts that already have meta descriptions
if current_meta:
continue
filtered.append(post)
elif only_poor_quality:
# Skip posts without meta descriptions (handle separately)
if not current_meta:
continue
# Check if current meta is poor quality
validation = self._validate_meta_description(current_meta)
if validation['score'] < 70:
filtered.append(post)
else:
# Include all posts
filtered.append(post)
return filtered
def save_results(self, results: List[Dict], output_file: Optional[str] = None) -> str:
"""
Save generation results to CSV.
Args:
results: List of result dicts
output_file: Custom output file path
Returns:
Path to saved file
"""
if not output_file:
output_dir = Path(__file__).parent.parent.parent / 'output'
output_dir.mkdir(parents=True, exist_ok=True)
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
output_file = output_dir / f'meta_descriptions_{timestamp}.csv'
output_file = Path(output_file)
output_file.parent.mkdir(parents=True, exist_ok=True)
fieldnames = [
'post_id', 'site', 'title', 'current_meta_description',
'generated_meta_description', 'generated_length',
'validation_score', 'is_optimal_length', 'improvement', 'status'
]
logger.info(f"Saving {len(results)} results to {output_file}...")
with open(output_file, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(results)
logger.info(f"✓ Results saved to: {output_file}")
return str(output_file)
def generate_summary(self, results: List[Dict]) -> Dict:
"""
Generate summary statistics.
Args:
results: List of result dicts
Returns:
Summary dict
"""
if not results:
return {}
total = len(results)
improved = sum(1 for r in results if r.get('improvement', False))
optimal_length = sum(1 for r in results if r.get('is_optimal_length', False))
avg_score = sum(r.get('validation_score', 0) for r in results) / total
# Count by site
by_site = {}
for r in results:
site = r.get('site', 'unknown')
if site not in by_site:
by_site[site] = {'total': 0, 'improved': 0}
by_site[site]['total'] += 1
if r.get('improvement', False):
by_site[site]['improved'] += 1
summary = {
'total_posts': total,
'improved': improved,
'improvement_rate': (improved / total * 100) if total > 0 else 0,
'optimal_length_count': optimal_length,
'optimal_length_rate': (optimal_length / total * 100) if total > 0 else 0,
'average_score': avg_score,
'api_calls': self.api_calls,
'by_site': by_site
}
return summary
def run(self, output_file: Optional[str] = None,
only_missing: bool = False,
only_poor_quality: bool = False,
limit: Optional[int] = None) -> Tuple[str, Dict]:
"""
Run complete meta description generation process.
Args:
output_file: Custom output file path
only_missing: Only generate for posts without meta descriptions
only_poor_quality: Only generate for posts with poor quality meta descriptions
limit: Maximum number of posts to process
Returns:
Tuple of (output_file_path, summary_dict)
"""
logger.info("\n" + "="*70)
logger.info("AI META DESCRIPTION GENERATION")
logger.info("="*70)
# Load posts
if not self.load_csv():
return "", {}
# Filter posts
posts_to_process = self.filter_posts_for_generation(
self.posts,
only_missing=only_missing,
only_poor_quality=only_poor_quality
)
logger.info(f"Posts to process: {len(posts_to_process)}")
if only_missing:
logger.info("Filter: Only posts without meta descriptions")
elif only_poor_quality:
logger.info("Filter: Only posts with poor quality meta descriptions")
# Apply limit
if limit:
posts_to_process = posts_to_process[:limit]
logger.info(f"Limited to: {len(posts_to_process)} posts")
if not posts_to_process:
logger.warning("No posts to process")
return "", {}
# Generate meta descriptions
results = self.generate_batch(posts_to_process)
# Save results
if results:
output_path = self.save_results(results, output_file)
# Generate and log summary
summary = self.generate_summary(results)
logger.info("\n" + "="*70)
logger.info("GENERATION SUMMARY")
logger.info("="*70)
logger.info(f"Total posts processed: {summary['total_posts']}")
logger.info(f"Improved: {summary['improved']} ({summary['improvement_rate']:.1f}%)")
logger.info(f"Optimal length: {summary['optimal_length_count']} ({summary['optimal_length_rate']:.1f}%)")
logger.info(f"Average validation score: {summary['average_score']:.1f}")
logger.info(f"API calls made: {summary['api_calls']}")
logger.info("="*70)
return output_path, summary
else:
logger.warning("No results generated")
return "", {}

View File

@@ -0,0 +1,631 @@
"""
Meta Description Updater - Fetch, generate, and update meta descriptions directly on WordPress
"""
import csv
import json
import logging
import time
from pathlib import Path
from datetime import datetime
from typing import Dict, List, Optional, Tuple
import requests
from requests.auth import HTTPBasicAuth
from .config import Config
from .meta_description_generator import MetaDescriptionGenerator
logger = logging.getLogger(__name__)
class MetaDescriptionUpdater:
"""Fetch posts from WordPress, generate AI meta descriptions, and update them."""
def __init__(self, site_name: str):
"""
Initialize the updater.
Args:
site_name: WordPress site name (e.g., 'mistergeek.net')
"""
self.site_name = site_name
self.sites = Config.WORDPRESS_SITES
if site_name not in self.sites:
raise ValueError(f"Site '{site_name}' not found in configuration")
self.site_config = self.sites[site_name]
self.base_url = self.site_config['url'].rstrip('/')
self.auth = HTTPBasicAuth(
self.site_config['username'],
self.site_config['password']
)
self.openrouter_api_key = Config.OPENROUTER_API_KEY
self.ai_model = Config.AI_MODEL
self.posts = []
self.update_results = []
self.api_calls = 0
self.stats = {
'total_posts': 0,
'updated': 0,
'failed': 0,
'skipped': 0
}
def fetch_posts(self, post_ids: Optional[List[int]] = None,
category_ids: Optional[List[int]] = None,
category_names: Optional[List[str]] = None,
author_names: Optional[List[str]] = None,
limit: Optional[int] = None,
status: Optional[List[str]] = None) -> List[Dict]:
"""
Fetch posts from WordPress site.
Args:
post_ids: Specific post IDs to fetch
category_ids: Filter by category IDs
category_names: Filter by category names (will be resolved to IDs)
author_names: Filter by author names
limit: Maximum number of posts to fetch
status: Post statuses to fetch (default: ['publish'])
Returns:
List of post dicts
"""
logger.info(f"Fetching posts from {self.site_name}...")
if post_ids:
logger.info(f" Post IDs: {post_ids}")
if category_ids:
logger.info(f" Category IDs: {category_ids}")
if category_names:
logger.info(f" Category names: {category_names}")
if author_names:
logger.info(f" Authors: {author_names}")
if limit:
logger.info(f" Limit: {limit}")
# Resolve category names to IDs if needed
if category_names and not category_ids:
category_ids = self._get_category_ids_by_names(category_names)
# Resolve author names to IDs if needed
author_ids = None
if author_names:
author_ids = self._get_author_ids_by_names(author_names)
# Build API parameters
params = {
'per_page': 100,
'page': 1,
'status': ','.join(status) if status else 'publish',
'_embed': True
}
if post_ids:
# Fetch specific posts
posts = []
for post_id in post_ids:
try:
response = requests.get(
f"{self.base_url}/wp-json/wp/v2/posts/{post_id}",
auth=self.auth,
timeout=10
)
if response.status_code == 200:
posts.append(response.json())
else:
logger.warning(f" Post {post_id} not found or inaccessible")
except Exception as e:
logger.error(f" Error fetching post {post_id}: {e}")
self.posts = posts
else:
# Fetch posts with filters
if category_ids:
params['categories'] = ','.join(map(str, category_ids))
if author_ids:
params['author'] = ','.join(map(str, author_ids))
posts = []
while True:
try:
response = requests.get(
f"{self.base_url}/wp-json/wp/v2/posts",
params=params,
auth=self.auth,
timeout=30
)
response.raise_for_status()
page_posts = response.json()
if not page_posts:
break
posts.extend(page_posts)
if len(page_posts) < 100:
break
if limit and len(posts) >= limit:
break
params['page'] += 1
time.sleep(0.3)
except Exception as e:
logger.error(f"Error fetching posts: {e}")
break
# Apply limit if specified
if limit:
posts = posts[:limit]
self.posts = posts
logger.info(f"✓ Fetched {len(self.posts)} posts from {self.site_name}")
return self.posts
def _get_category_ids_by_names(self, category_names: List[str]) -> List[int]:
"""
Get category IDs by category names.
Args:
category_names: List of category names
Returns:
List of category IDs
"""
logger.info(f"Resolving category names to IDs...")
try:
response = requests.get(
f"{self.base_url}/wp-json/wp/v2/categories",
params={'per_page': 100},
auth=self.auth,
timeout=10
)
response.raise_for_status()
categories = response.json()
category_map = {cat['name'].lower(): cat['id'] for cat in categories}
category_ids = []
for name in category_names:
name_lower = name.lower()
if name_lower in category_map:
category_ids.append(category_map[name_lower])
logger.info(f"'{name}' -> ID {category_map[name_lower]}")
else:
# Try partial match
for cat_name, cat_id in category_map.items():
if name_lower in cat_name or cat_name in name_lower:
category_ids.append(cat_id)
logger.info(f"'{name}' -> ID {cat_id} (partial match)")
break
else:
logger.warning(f" ✗ Category '{name}' not found")
return category_ids
except Exception as e:
logger.error(f"Error fetching categories: {e}")
return []
def _get_author_ids_by_names(self, author_names: List[str]) -> List[int]:
"""
Get author/user IDs by author names.
Args:
author_names: List of author names
Returns:
List of author IDs
"""
logger.info(f"Resolving author names to IDs...")
try:
response = requests.get(
f"{self.base_url}/wp-json/wp/v2/users",
params={'per_page': 100},
auth=self.auth,
timeout=10
)
response.raise_for_status()
users = response.json()
author_map = {}
# Build map of name/slug to ID
for user in users:
name = user.get('name', '').lower()
slug = user.get('slug', '').lower()
author_map[name] = user['id']
author_map[slug] = user['id']
author_ids = []
for name in author_names:
name_lower = name.lower()
# Try exact match
if name_lower in author_map:
author_ids.append(author_map[name_lower])
logger.info(f"'{name}' -> ID {author_map[name_lower]}")
else:
# Try partial match
found = False
for author_name, author_id in author_map.items():
if name_lower in author_name or author_name in name_lower:
author_ids.append(author_id)
logger.info(f"'{name}' -> ID {author_id} (partial match: '{author_name}')")
found = True
break
if not found:
logger.warning(f" ✗ Author '{name}' not found")
return author_ids
except Exception as e:
logger.error(f"Error fetching authors: {e}")
return []
def _generate_meta_description(self, post: Dict) -> Optional[str]:
"""
Generate meta description for a post using AI.
Args:
post: Post data dict
Returns:
Generated meta description or None
"""
title = post.get('title', {}).get('rendered', '')
content = post.get('content', {}).get('rendered', '')
excerpt = post.get('excerpt', {}).get('rendered', '')
# Strip HTML from content
import re
content_text = re.sub('<[^<]+?>', '', content)[:500]
excerpt_text = re.sub('<[^<]+?>', '', excerpt)
# Build prompt
prompt = f"""You are an SEO expert. Generate an optimized meta description for the following blog post.
**Post Title:** {title}
**Content Context:**
Excerpt: {excerpt_text}
Content preview: {content_text}...
**Requirements:**
1. Length: 120-160 characters (optimal for SEO)
2. Make it compelling and action-oriented
3. Clearly describe what the post is about
4. Use active voice
5. Include a call-to-action when appropriate
6. Avoid clickbait - be accurate and valuable
**Output Format:**
Return ONLY the meta description text, nothing else. No quotes, no explanations."""
# Call AI API
url = "https://openrouter.ai/api/v1/chat/completions"
headers = {
"Authorization": f"Bearer {self.openrouter_api_key}",
"Content-Type": "application/json"
}
payload = {
"model": self.ai_model,
"messages": [
{
"role": "system",
"content": "You are an SEO expert specializing in meta description optimization."
},
{
"role": "user",
"content": prompt
}
],
"temperature": 0.7,
"max_tokens": 100
}
try:
response = requests.post(url, json=payload, headers=headers, timeout=30)
response.raise_for_status()
result = response.json()
self.api_calls += 1
if 'choices' in result and len(result['choices']) > 0:
meta_description = result['choices'][0]['message']['content'].strip()
# Remove quotes if AI included them
if meta_description.startswith('"') and meta_description.endswith('"'):
meta_description = meta_description[1:-1]
return meta_description
else:
logger.warning("No AI response received")
return None
except Exception as e:
logger.error(f"API call failed: {e}")
return None
def _update_post_meta(self, post_id: int, meta_description: str) -> bool:
"""
Update post meta description in WordPress.
Args:
post_id: Post ID to update
meta_description: New meta description
Returns:
True if successful, False otherwise
"""
logger.info(f"Updating post {post_id}...")
# Determine which SEO plugin meta key to use
# Try RankMath first, then Yoast
meta_fields = {
'rank_math_description': meta_description
}
try:
# First, get current post meta to preserve other fields
response = requests.get(
f"{self.base_url}/wp-json/wp/v2/posts/{post_id}",
auth=self.auth,
timeout=10
)
if response.status_code != 200:
logger.error(f" Could not fetch post {post_id}")
return False
current_post = response.json()
current_meta = current_post.get('meta', {})
# Update with new meta description
updated_meta = {**current_meta, **meta_fields}
# Update post
update_response = requests.post(
f"{self.base_url}/wp-json/wp/v2/posts/{post_id}",
json={'meta': updated_meta},
auth=self.auth,
timeout=10
)
if update_response.status_code == 200:
logger.info(f" ✓ Updated post {post_id}")
return True
else:
logger.error(f" ✗ Failed to update post {post_id}: {update_response.status_code}")
logger.error(f" Response: {update_response.text}")
return False
except Exception as e:
logger.error(f" ✗ Error updating post {post_id}: {e}")
return False
def _validate_meta_description(self, meta: str) -> Dict:
"""Validate meta description quality."""
length = len(meta)
validation = {
'length': length,
'is_optimal': 120 <= length <= 160,
'too_short': length < 120,
'too_long': length > 160,
'score': 0
}
if validation['is_optimal']:
validation['score'] = 100
elif validation['too_short']:
validation['score'] = max(0, 50 - (120 - length))
else:
validation['score'] = max(0, 50 - (length - 160))
# Bonus for ending with period
if meta.endswith('.'):
validation['score'] = min(100, validation['score'] + 5)
# Bonus for CTA words
cta_words = ['learn', 'discover', 'find', 'explore', 'read', 'get', 'see', 'try', 'start']
if any(word in meta.lower() for word in cta_words):
validation['score'] = min(100, validation['score'] + 5)
return validation
def update_posts(self, dry_run: bool = False,
skip_existing: bool = False,
force_regenerate: bool = False) -> Dict:
"""
Generate and update meta descriptions for fetched posts.
Args:
dry_run: If True, preview changes without updating
skip_existing: If True, skip posts that already have meta descriptions
force_regenerate: If True, regenerate even for posts with good meta descriptions
Returns:
Statistics dict
"""
logger.info("\n" + "="*70)
logger.info("META DESCRIPTION UPDATE")
logger.info("="*70)
logger.info(f"Site: {self.site_name}")
logger.info(f"Posts to process: {len(self.posts)}")
logger.info(f"Dry run: {dry_run}")
logger.info(f"Skip existing: {skip_existing}")
logger.info(f"Force regenerate: {force_regenerate}")
logger.info("="*70)
self.stats['total_posts'] = len(self.posts)
for i, post in enumerate(self.posts, 1):
post_id = post.get('id')
title = post.get('title', {}).get('rendered', '')[:50]
logger.info(f"\n[{i}/{len(self.posts)}] Processing post {post_id}: {title}...")
# Check current meta description
meta_dict = post.get('meta', {})
current_meta = (
meta_dict.get('rank_math_description', '') or
meta_dict.get('_yoast_wpseo_metadesc', '') or
''
)
# Skip if has existing meta and skip_existing is True
if current_meta and skip_existing and not force_regenerate:
logger.info(f" Skipping: Already has meta description")
self.stats['skipped'] += 1
continue
# Validate existing meta (if any)
if current_meta and not force_regenerate:
validation = self._validate_meta_description(current_meta)
if validation['score'] >= 80:
logger.info(f" Skipping: Existing meta is good quality (score: {validation['score']})")
self.stats['skipped'] += 1
continue
# Generate new meta description
logger.info(f" Generating meta description...")
generated_meta = self._generate_meta_description(post)
if not generated_meta:
logger.error(f" ✗ Failed to generate meta description")
self.stats['failed'] += 1
continue
# Validate generated meta
validation = self._validate_meta_description(generated_meta)
logger.info(f" Generated: {generated_meta[:80]}...")
logger.info(f" Length: {validation['length']} chars, Score: {validation['score']}")
# Update post
if dry_run:
logger.info(f" [DRY RUN] Would update post {post_id}")
self.update_results.append({
'post_id': post_id,
'title': title,
'current_meta': current_meta,
'generated_meta': generated_meta,
'status': 'dry_run',
'validation_score': validation['score']
})
else:
success = self._update_post_meta(post_id, generated_meta)
if success:
logger.info(f" ✓ Successfully updated post {post_id}")
self.stats['updated'] += 1
self.update_results.append({
'post_id': post_id,
'title': title,
'current_meta': current_meta,
'generated_meta': generated_meta,
'status': 'updated',
'validation_score': validation['score']
})
else:
self.stats['failed'] += 1
self.update_results.append({
'post_id': post_id,
'title': title,
'status': 'failed',
'validation_score': validation['score']
})
# Rate limiting
time.sleep(0.5)
# Save results
self._save_results()
# Print summary
logger.info("\n" + "="*70)
logger.info("UPDATE SUMMARY")
logger.info("="*70)
logger.info(f"Total posts: {self.stats['total_posts']}")
logger.info(f"Updated: {self.stats['updated']}")
logger.info(f"Failed: {self.stats['failed']}")
logger.info(f"Skipped: {self.stats['skipped']}")
logger.info(f"API calls: {self.api_calls}")
logger.info("="*70)
return self.stats
def _save_results(self):
"""Save update results to CSV."""
if not self.update_results:
return
output_dir = Path(__file__).parent.parent.parent / 'output'
output_dir.mkdir(parents=True, exist_ok=True)
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
output_file = output_dir / f'meta_update_{self.site_name}_{timestamp}.csv'
fieldnames = [
'post_id', 'title', 'current_meta', 'generated_meta',
'status', 'validation_score'
]
with open(output_file, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(self.update_results)
logger.info(f"\n✓ Results saved to: {output_file}")
def run(self, post_ids: Optional[List[int]] = None,
category_ids: Optional[List[int]] = None,
category_names: Optional[List[str]] = None,
author_names: Optional[List[str]] = None,
limit: Optional[int] = None,
dry_run: bool = False,
skip_existing: bool = False,
force_regenerate: bool = False) -> Dict:
"""
Run complete meta description update process.
Args:
post_ids: Specific post IDs to update
category_ids: Filter by category IDs
category_names: Filter by category names
author_names: Filter by author names
limit: Maximum number of posts to process
dry_run: If True, preview changes without updating
skip_existing: If True, skip posts with existing meta descriptions
force_regenerate: If True, regenerate even for good quality metas
Returns:
Statistics dict
"""
# Fetch posts
self.fetch_posts(
post_ids=post_ids,
category_ids=category_ids,
category_names=category_names,
author_names=author_names,
limit=limit
)
if not self.posts:
logger.warning("No posts found matching criteria")
return self.stats
# Update posts
return self.update_posts(
dry_run=dry_run,
skip_existing=skip_existing,
force_regenerate=force_regenerate
)

1007
src/seo/post_migrator.py Normal file

File diff suppressed because it is too large Load Diff