Add media importer for migrated posts

- Add import_media command to import featured images - Fetch media from source site (mistergeek.net) - Upload to destination site (hellogeek.net) - Map source media IDs to destination media IDs - Set featured images on migrated posts - Use migration report CSV as input - Support dry-run mode - Cache media mappings to avoid duplicate uploads Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Add SEO performance tracking features
2026-02-17 01:50:40 +01:00 · 2026-02-17 00:51:49 +01:00 · 2026-02-17 00:11:18 +01:00 · 2026-02-17 00:10:20 +01:00 · 2026-02-17 00:04:54 +01:00 · 2026-02-16 23:54:35 +01:00
16 changed files with 5712 additions and 66 deletions
--- a/AUTHOR_FILTER_GUIDE.md
+++ b/AUTHOR_FILTER_GUIDE.md
@@ -0,0 +1,226 @@
 # Author Filter Guide
 Export posts from specific authors using the enhanced export functionality.
 ## Overview
 The export command now supports filtering posts by author name or author ID, making it easy to:
 - Export posts from a specific author across all sites
 - Combine author filtering with site filtering
 - Export posts from multiple authors at once
 ## Usage
 ### Filter by Author Name
 Export posts from a specific author (case-insensitive, partial match):
 ```bash
 # Export posts by "John Doe"
 ./seo export --author "John Doe"
 # Export posts by "admin" (partial match)
 ./seo export --author admin
 # Export posts from multiple authors
 ./seo export --author "John Doe" "Jane Smith"
 ```
 ### Filter by Author ID
 Export posts from specific author IDs:
 ```bash
 # Export posts by author ID 1
 ./seo export --author-id 1
 # Export posts from multiple author IDs
 ./seo export --author-id 1 2 3
 ```
 ### Combine with Site Filter
 Export posts from a specific author on a specific site:
 ```bash
 # Export John's posts from mistergeek.net only
 ./seo export --author "John Doe" --site mistergeek.net
 # Export posts by author ID 1 from webscroll.fr
 ./seo export --author-id 1 -s webscroll.fr
 ```
 ### Dry Run Mode
 Preview what would be exported:
 ```bash
 ./seo export --author "John Doe" --dry-run
 ```
 ## How It Works
 1. **Author Name Matching**
   - Case-insensitive matching
   - Partial matches work (e.g., "john" matches "John Doe")
   - Matches against author's display name and slug
 2. **Author ID Matching**
   - Exact match on WordPress user ID
   - More reliable than name matching
   - Useful when authors have similar names
 3. **Author Information**
   - The exporter fetches all authors from each site
   - Author names are included in the exported CSV
   - Posts are filtered before export
 ## Export Output
 The exported CSV includes author information:
 ```csv
 site,post_id,status,title,slug,url,author_id,author_name,date_published,...
 mistergeek.net,123,publish,"VPN Guide",vpn-guide,https://...,1,John Doe,2024-01-15,...
 ```
 ### New Column: `author_name`
 The export now includes the author's display name in addition to the author ID.
 ## Examples
 ### Example 1: Export All Posts by Admin
 ```bash
 ./seo export --author admin
 ```
 Output: `output/all_posts_YYYY-MM-DD.csv`
 ### Example 2: Export Specific Author from Specific Site
 ```bash
 ./seo export --author "Marie" --site webscroll.fr
 ```
 ### Example 3: Export Multiple Authors
 ```bash
 ./seo export --author "John" "Marie" "Admin"
 ```
 ### Example 4: Export by Author ID
 ```bash
 ./seo export --author-id 5
 ```
 ### Example 5: Combine Author and Site Filters
 ```bash
 ./seo export --author "John" --site mistergeek.net --verbose
 ```
 ## Finding Author IDs
 If you don't know the author ID, you can:
 1. **Export all posts and check the CSV:**
   ```bash
   ./seo export
   # Then open the CSV and check the author_id column
   ```
 2. **Use WordPress Admin:**
   - Go to Users → All Users
   - Hover over a user name
   - The URL shows the user ID (e.g., `user_id=5`)
 3. **Use WordPress REST API directly:**
   ```bash
   curl -u username:password https://yoursite.com/wp-json/wp/v2/users
   ```
 ## Tips
 1. **Use quotes for names with spaces:**
   ```bash
   ./seo export --author "John Doe"  # ✓ Correct
   ./seo export --author John Doe    # ✗ Wrong (treated as 2 authors)
   ```
 2. **Partial matching is your friend:**
   ```bash
   ./seo export --author "john"  # Matches "John Doe", "Johnny", etc.
   ```
 3. **Combine with migration:**
   ```bash
   # Export author's posts, then migrate to another site
   ./seo export --author "John Doe" --site webscroll.fr
   ./seo migrate output/all_posts_*.csv --destination mistergeek.net
   ```
 4. **Verbose mode for debugging:**
   ```bash
   ./seo export --author "John" --verbose
   ```
 ## Troubleshooting
 ### No posts exported
 **Possible causes:**
 - Author name doesn't match (try different spelling)
 - Author has no posts
 - Author doesn't exist on that site
 **Solutions:**
 - Use `--verbose` to see what's happening
 - Try author ID instead of name
 - Check if author exists on the site
 ### Author names not showing in CSV
 **Possible causes:**
 - WordPress REST API doesn't allow user enumeration
 - Authentication issue
 **Solutions:**
 - Check WordPress user permissions
 - Verify credentials in config
 - Author ID will still be present even if name lookup fails
 ## API Usage
 Use author filtering programmatically:
 ```python
 from seo.app import SEOApp
 app = SEOApp()
 # Export by author name
 csv_file = app.export(author_filter=["John Doe"])
 # Export by author ID
 csv_file = app.export(author_ids=[1, 2])
 # Export by author and site
 csv_file = app.export(
    author_filter=["John"],
    site_filter="mistergeek.net"
 )
 ```
 ## Related Commands
 - `seo migrate` - Migrate exported posts to another site
 - `seo analyze` - Analyze exported posts with AI
 - `seo export --help` - Show all export options
 ## See Also
 - [MIGRATION_GUIDE.md](MIGRATION_GUIDE.md) - Post migration guide
 - [README.md](README.md) - Main documentation
--- a/META_DESCRIPTION_GUIDE.md
+++ b/META_DESCRIPTION_GUIDE.md
@@ -0,0 +1,327 @@
 # Meta Description Generation Guide
 AI-powered meta description generation and optimization for WordPress posts.
 ## Overview
 The meta description generator uses AI to create SEO-optimized meta descriptions for your blog posts. It can:
 - **Generate new meta descriptions** for posts without them
 - **Improve existing meta descriptions** that are poor quality
 - **Optimize length** (120-160 characters - ideal for SEO)
 - **Include focus keywords** naturally
 - **Add call-to-action** elements when appropriate
 ## Usage
 ### Generate for All Posts
 ```bash
 # Generate meta descriptions for all posts
 ./seo meta_description
 # Use a specific CSV file
 ./seo meta_description output/all_posts_2026-02-16.csv
 ```
 ### Generate Only for Missing Meta Descriptions
 ```bash
 # Only generate for posts without meta descriptions
 ./seo meta_description --only-missing
 ```
 ### Improve Poor Quality Meta Descriptions
 ```bash
 # Only regenerate meta descriptions with poor quality scores
 ./seo meta_description --only-poor
 # Limit to first 10 poor quality meta descriptions
 ./seo meta_description --only-poor --limit 10
 ```
 ### Dry Run Mode
 Preview what would be processed:
 ```bash
 ./seo meta_description --dry-run
 ./seo meta_description --dry-run --only-missing
 ```
 ## Command Options
 | Option | Description |
 |--------|-------------|
 | `--only-missing` | Only generate for posts without meta descriptions |
 | `--only-poor` | Only generate for posts with poor quality meta descriptions |
 | `--limit <N>` | Limit number of posts to process |
 | `--output`, `-o` | Custom output file path |
 | `--dry-run` | Preview without generating |
 | `--verbose`, `-v` | Enable verbose logging |
 ## How It Works
 ### 1. Content Analysis
 The AI analyzes:
 - Post title
 - Content preview (first 500 characters)
 - Excerpt (if available)
 - Focus keyword (if specified)
 - Current meta description (if exists)
 ### 2. AI Generation
 The AI generates meta descriptions following SEO best practices:
 - **Length**: 120-160 characters (optimal for search engines)
 - **Keywords**: Naturally includes focus keyword
 - **Compelling**: Action-oriented and engaging
 - **Accurate**: Clearly describes post content
 - **Active voice**: Uses active rather than passive voice
 - **Call-to-action**: Includes CTA when appropriate
 ### 3. Quality Validation
 Each generated meta description is scored on:
 - **Length optimization** (120-160 chars = 100 points)
 - **Proper ending** (period = +5 points)
 - **Call-to-action words** (+5 points)
 - **Overall quality** (minimum 70 points to pass)
 ### 4. Output
 Results are saved to CSV with:
 - Original meta description
 - Generated meta description
 - Length of generated meta
 - Validation score (0-100)
 - Whether length is optimal
 - Whether it's an improvement
 ## Output Format
 The tool generates a CSV file in `output/`:
 ```
 output/meta_descriptions_20260216_143022.csv
 ```
 ### CSV Columns
 | Column | Description |
 |--------|-------------|
 | `post_id` | WordPress post ID |
 | `site` | Site name |
 | `title` | Post title |
 | `current_meta_description` | Existing meta (if any) |
 | `generated_meta_description` | AI-generated meta |
 | `generated_length` | Character count |
 | `validation_score` | Quality score (0-100) |
 | `is_optimal_length` | True if 120-160 chars |
 | `improvement` | True if better than current |
 | `status` | Generation status |
 ## Examples
 ### Example 1: Generate All Missing Meta Descriptions
 ```bash
 # Export posts first
 ./seo export
 # Generate meta descriptions for posts without them
 ./seo meta_description --only-missing
 ```
 **Output:**
 ```
 Generating AI-optimized meta descriptions...
  Filter: Only posts without meta descriptions
 Processing post 1/45
 ✓ Generated meta description (score: 95, length: 155)
 ...
 ✅ Meta description generation completed!
  Results: output/meta_descriptions_20260216_143022.csv
 📊 Summary:
  Total processed: 45
  Improved: 42 (93.3%)
  Optimal length: 40 (88.9%)
  Average score: 92.5
  API calls: 45
 ```
 ### Example 2: Fix Poor Quality Meta Descriptions
 ```bash
 # Only improve meta descriptions scoring below 70
 ./seo meta_description --only-poor --limit 20
 ```
 ### Example 3: Test with Small Batch
 ```bash
 # Test with first 5 posts
 ./seo meta_description --limit 5
 ```
 ### Example 4: Custom Output File
 ```bash
 ./seo meta_description --output output/custom_meta_gen.csv
 ```
 ## Meta Description Quality Scoring
 ### Scoring Criteria
 | Criteria | Points |
 |----------|--------|
 | Optimal length (120-160 chars) | 100 |
 | Too short (< 120 chars) | 50 - (deficit) |
 | Too long (> 160 chars) | 50 - (excess) |
 | Ends with period | +5 |
 | Contains CTA words | +5 |
 ### Quality Thresholds
 - **Excellent (90-100)**: Ready to use
 - **Good (70-89)**: Minor improvements possible
 - **Poor (< 70)**: Needs regeneration
 ### CTA Words Detected
 The system looks for action words like:
 - learn, discover, find, explore
 - read, get, see, try, start
 - and more...
 ## Best Practices
 ### Before Generation
 1. **Export fresh data** - Ensure you have latest posts
   ```bash
   ./seo export
   ```
 2. **Review focus keywords** - Posts with focus keywords get better results
 3. **Test with small batch** - Try with `--limit 5` first
 ### During Generation
 1. **Monitor scores** - Watch validation scores in real-time
 2. **Check API usage** - Track number of API calls
 3. **Use filters** - Target only what needs improvement
 ### After Generation
 1. **Review results** - Open the CSV and check generated metas
 2. **Manual approval** - Don't auto-publish; review first
 3. **A/B test** - Compare performance of new vs old metas
 ## Integration with WordPress
 ### Manual Update
 1. Open the generated CSV: `output/meta_descriptions_*.csv`
 2. Copy generated meta descriptions
 3. Update in WordPress SEO plugin (RankMath, Yoast, etc.)
 ### Automated Update (Future)
 Future versions may support direct WordPress updates:
 ```bash
 # Not yet implemented
 ./seo meta_description --apply-to-wordpress
 ```
 ## API Usage & Cost
 ### API Calls
 - Each post requires 1 API call
 - Rate limited to 2 calls/second (0.5s delay)
 - Uses Claude AI via OpenRouter
 ### Estimated Cost
 Approximate cost per 1000 meta descriptions:
 - **~$0.50 - $2.00** depending on content length
 - Check OpenRouter pricing for current rates
 ### Monitoring
 The summary shows:
 - Total API calls made
 - Cost tracking (if enabled)
 ## Troubleshooting
 ### No Posts to Process
 **Problem:** "No posts to process"
 **Solutions:**
 1. Export posts first: `./seo export`
 2. Check CSV has required columns
 3. Verify filter isn't too restrictive
 ### Low Quality Scores
 **Problem:** Generated metas scoring below 70
 **Solutions:**
 1. Add focus keywords to posts
 2. Provide better content previews
 3. Try regenerating with different parameters
 ### API Errors
 **Problem:** "API call failed"
 **Solutions:**
 1. Check internet connection
 2. Verify API key in `.env`
 3. Check OpenRouter account status
 4. Reduce batch size with `--limit`
 ### Rate Limiting
 **Problem:** Too many API calls
 **Solutions:**
 1. Use `--limit` to batch process
 2. Wait between batches
 3. Upgrade API plan if needed
 ## Comparison with Other Tools
 | Feature | This Tool | Other SEO Tools |
 |---------|-----------|-----------------|
 | AI-powered | ✅ Yes | ⚠️ Sometimes |
 | Batch processing | ✅ Yes | ✅ Yes |
 | Quality scoring | ✅ Yes | ❌ No |
 | Custom prompts | ✅ Yes | ❌ No |
 | WordPress integration | ⚠️ Manual | ✅ Some |
 | Cost | Pay-per-use | Monthly subscription |
 ## Related Commands
 - `seo export` - Export posts for analysis
 - `seo analyze` - AI analysis with recommendations
 - `seo seo_check` - SEO quality checking
 ## See Also
 - [README.md](README.md) - Main documentation
 - [ENHANCED_ANALYSIS_GUIDE.md](ENHANCED_ANALYSIS_GUIDE.md) - AI analysis guide
 - [EDITORIAL_STRATEGY_GUIDE.md](EDITORIAL_STRATEGY_GUIDE.md) - Content strategy
 ---
 **Made with ❤️ for better SEO automation**
--- a/MIGRATION_GUIDE.md
+++ b/MIGRATION_GUIDE.md
@@ -0,0 +1,269 @@
 # Post Migration Guide
 This guide explains how to migrate posts between WordPress sites using the SEO automation tool.
 ## Overview
 The migration feature allows you to move posts from one WordPress site to another while preserving:
 - Post content (title, body, excerpt)
 - Categories (automatically created if they don't exist)
 - Tags (automatically created if they don't exist)
 - SEO metadata (RankMath, Yoast SEO)
 - Post slug
 ## Migration Modes
 There are two ways to migrate posts:
 ### 1. CSV-Based Migration
 Migrate specific posts listed in a CSV file.
 **Requirements:**
 - CSV file with at least two columns: `site` and `post_id`
 **Usage:**
 ```bash
 # Basic migration (posts deleted from source after migration)
 ./seo migrate posts_to_migrate.csv --destination mistergeek.net
 # Keep posts on source site
 ./seo migrate posts_to_migrate.csv --destination mistergeek.net --keep-source
 # Publish immediately instead of draft
 ./seo migrate posts_to_migrate.csv --destination mistergeek.net --post-status publish
 # Custom output file for migration report
 ./seo migrate posts_to_migrate.csv --destination mistergeek.net --output custom_report.csv
 ```
 ### 2. Filtered Migration
 Migrate posts based on filters (category, date, etc.).
 **Usage:**
 ```bash
 # Migrate all posts from source to destination
 ./seo migrate --source webscroll.fr --destination mistergeek.net
 # Migrate posts from specific categories
 ./seo migrate --source webscroll.fr --destination mistergeek.net --category-filter VPN "Torrent Clients"
 # Migrate posts with specific tags
 ./seo migrate --source webscroll.fr --destination mistergeek.net --tag-filter "guide" "tutorial"
 # Migrate posts by date range
 ./seo migrate --source webscroll.fr --destination mistergeek.net --date-after 2024-01-01 --date-before 2024-12-31
 # Limit number of posts
 ./seo migrate --source webscroll.fr --destination mistergeek.net --limit 10
 # Combine filters
 ./seo migrate --source webscroll.fr --destination mistergeek.net \
  --category-filter VPN \
  --date-after 2024-01-01 \
  --limit 5 \
  --keep-source
 ```
 ## Command Options
 ### Required Options
 - `--destination`, `--to`: Destination site (mistergeek.net, webscroll.fr, hellogeek.net)
 - `--source`, `--from`: Source site (for filtered migration only)
 - CSV file: Path to CSV with posts (for CSV-based migration)
 ### Optional Options
 | Option | Description | Default |
 |--------|-------------|---------|
 | `--keep-source` | Keep posts on source site after migration | Delete after migration |
 | `--post-status` | Status for migrated posts (draft, publish, pending) | draft |
 | `--no-categories` | Don't create categories automatically | Create categories |
 | `--no-tags` | Don't create tags automatically | Create tags |
 | `--category-filter` | Filter by category names (filtered migration) | All categories |
 | `--tag-filter` | Filter by tag names (filtered migration) | All tags |
 | `--date-after` | Migrate posts after this date (YYYY-MM-DD) | No limit |
 | `--date-before` | Migrate posts before this date (YYYY-MM-DD) | No limit |
 | `--limit` | Maximum number of posts to migrate | No limit |
 | `--output`, `-o` | Custom output file for migration report | Auto-generated |
 | `--dry-run` | Preview what would be done without doing it | Execute |
 | `--verbose`, `-v` | Enable verbose logging | Normal logging |
 ## Migration Process
 ### What Gets Migrated
 1. **Post Content**
   - Title
   - Body content (HTML preserved)
   - Excerpt
   - Slug
 2. **Categories**
   - Mapped from source to destination
   - Created automatically if they don't exist on destination
   - Hierarchical structure preserved (parent-child relationships)
 3. **Tags**
   - Mapped from source to destination
   - Created automatically if they don't exist on destination
 4. **SEO Metadata**
   - RankMath title and description
   - Yoast SEO title and description
   - Focus keywords
 ### What Doesn't Get Migrated
 - Featured images (must be re-uploaded manually)
 - Post author (uses destination site's default)
 - Comments (not transferred)
 - Custom fields (except SEO metadata)
 - Post revisions
 ## Migration Report
 After migration, a CSV report is generated in `output/` with the following information:
 ```csv
 source_site,source_post_id,destination_site,destination_post_id,title,status,categories_migrated,tags_migrated,deleted_from_source
 webscroll.fr,123,mistergeek.net,456,"VPN Guide",draft,3,5,True
 ```
 ## Examples
 ### Example 1: Migrate Specific Posts from CSV
 1. Create a CSV file with posts to migrate:
 ```csv
 site,post_id,title
 webscroll.fr,123,VPN Guide
 webscroll.fr,456,Torrent Tutorial
 ```
 2. Run migration:
 ```bash
 ./seo migrate my_posts.csv --destination mistergeek.net
 ```
 ### Example 2: Migrate All VPN Content
 ```bash
 ./seo migrate --source webscroll.fr --destination mistergeek.net \
  --category-filter VPN "VPN Reviews" \
  --post-status draft \
  --keep-source
 ```
 ### Example 3: Migrate Recent Content
 ```bash
 ./seo migrate --source webscroll.fr --destination mistergeek.net \
  --date-after 2024-06-01 \
  --limit 20
 ```
 ### Example 4: Preview Migration
 ```bash
 ./seo migrate --source webscroll.fr --destination mistergeek.net \
  --category-filter VPN \
  --dry-run
 ```
 ## Best Practices
 ### Before Migration
 1. **Backup both sites** - Always backup before bulk operations
 2. **Test with a few posts** - Migrate 1-2 posts first to verify
 3. **Check category structure** - Review destination site's categories
 4. **Plan URL redirects** - If deleting from source, set up redirects
 ### During Migration
 1. **Use dry-run first** - Preview what will be migrated
 2. **Start with drafts** - Review before publishing
 3. **Monitor logs** - Watch for errors or warnings
 4. **Limit batch size** - Migrate in batches of 10-20 posts
 ### After Migration
 1. **Review migrated posts** - Check formatting and categories
 2. **Add featured images** - Manually upload if needed
 3. **Set up redirects** - From old URLs to new URLs
 4. **Update internal links** - Fix cross-site links
 5. **Monitor SEO** - Track rankings after migration
 ## Troubleshooting
 ### Common Issues
 **1. "Site not found" error**
 - Check site name is correct (mistergeek.net, webscroll.fr, hellogeek.net)
 - Verify credentials in config.yaml or .env
 **2. "Category already exists" warning**
 - This is normal - the migrator found a matching category
 - The existing category will be used
 **3. "Failed to create post" error**
 - Check WordPress REST API is enabled
 - Verify user has post creation permissions
 - Check authentication credentials
 **4. Posts missing featured images**
 - Featured images are not migrated automatically
 - Upload images manually to destination site
 - Update featured image on migrated posts
 **5. Categories not matching**
 - Categories are matched by name (case-insensitive)
 - "VPN" and "vpn" will match
 - "VPN Guide" and "VPN" will NOT match - new category created
 ## API Usage
 You can also use the migration feature programmatically:
 ```python
 from seo.app import SEOApp
 app = SEOApp()
 # CSV-based migration
 app.migrate(
    csv_file='output/posts_to_migrate.csv',
    destination_site='mistergeek.net',
    create_categories=True,
    create_tags=True,
    delete_after=False,
    status='draft'
 )
 # Filtered migration
 app.migrate_by_filter(
    source_site='webscroll.fr',
    destination_site='mistergeek.net',
    category_filter=['VPN', 'Software'],
    date_after='2024-01-01',
    limit=10,
    create_categories=True,
    delete_after=False,
    status='draft'
 )
 ```
 ## Related Commands
 - `seo export` - Export posts from all sites
 - `seo editorial_strategy` - Analyze and get migration recommendations
 - `seo category_propose` - Get AI category recommendations
 ## See Also
 - [README.md](README.md) - Main documentation
 - [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
 - [CATEGORY_MANAGEMENT_GUIDE.md](CATEGORY_MANAGEMENT_GUIDE.md) - Category management
--- a/PERFORMANCE_TRACKING_GUIDE.md
+++ b/PERFORMANCE_TRACKING_GUIDE.md
@@ -0,0 +1,355 @@
 # SEO Performance Tracking Guide
 Track and analyze your website's SEO performance using Google Analytics 4 and Google Search Console data.
 ## Overview
 The SEO performance tracking features allow you to:
 - **Analyze page performance** - Track pageviews, clicks, impressions, CTR, and rankings
 - **Find keyword opportunities** - Discover keywords you can rank higher for
 - **Generate SEO reports** - Create comprehensive performance reports
 - **Import data** - Support for both CSV imports and API integration
 ## Commands
 ### 1. `seo performance` - Analyze Page Performance
 Analyze traffic and search performance data.
 **Usage:**
 ```bash
 # Analyze with CSV exports
 ./seo performance --ga4 analytics.csv --gsc search.csv
 # Analyze GA4 data only
 ./seo performance --ga4 analytics.csv
 # Analyze GSC data only
 ./seo performance --gsc search.csv
 # With custom output
 ./seo performance --ga4 analytics.csv --gsc search.csv --output custom_analysis.csv
 # Preview
 ./seo performance --ga4 analytics.csv --dry-run
 ```
 **Data Sources:**
 - **Google Analytics 4**: Export from GA4 → Reports → Engagement → Pages and screens
 - **Google Search Console**: Export from GSC → Performance → Search results → Export
 **Metrics Analyzed:**
 | Metric | Source | Description |
 |--------|--------|-------------|
 | Pageviews | GA4 | Number of page views |
 | Sessions | GA4 | Number of sessions |
 | Bounce Rate | GA4 | Percentage of single-page sessions |
 | Engagement Rate | GA4 | Percentage of engaged sessions |
 | Clicks | GSC | Number of search clicks |
 | Impressions | GSC | Number of search impressions |
 | CTR | GSC | Click-through rate |
 | Position | GSC | Average search ranking |
 ### 2. `seo keywords` - Keyword Opportunities
 Find keywords you can optimize for better rankings.
 **Usage:**
 ```bash
 # Analyze keyword opportunities
 ./seo keywords gsc_export.csv
 # Limit results
 ./seo keywords gsc_export.csv --limit 20
 # Custom output
 ./seo keywords gsc_export.csv --output keywords.csv
 ```
 **What It Finds:**
 - Keywords ranking positions 5-20 (easy to improve)
 - High impression keywords with low CTR
 - Keywords with good traffic potential
 **Example Output:**
 ```
 ✅ Found 47 keyword opportunities!
 Top opportunities:
  1. best vpn 2024 - Position: 8.5, Impressions: 1250
  2. torrent client - Position: 12.3, Impressions: 890
  3. vpn for gaming - Position: 9.1, Impressions: 650
 ```
 ### 3. `seo report` - Generate SEO Report
 Create comprehensive SEO performance reports.
 **Usage:**
 ```bash
 # Generate report
 ./seo report
 # Custom output
 ./seo report --output monthly_seo_report.md
 ```
 **Report Includes:**
 - Performance summary
 - Traffic analysis
 - Keyword opportunities
 - SEO recommendations
 - Action items
 ## Data Export Guides
 ### Export from Google Analytics 4
 1. Go to **Google Analytics** → Your Property
 2. Navigate to **Reports** → **Engagement** → **Pages and screens**
 3. Set date range (e.g., last 30 days)
 4. Click **Share** → **Download file** → **CSV**
 5. Save as `ga4_export.csv`
 **Required Columns:**
 - Page path
 - Page title
 - Views (pageviews)
 - Sessions
 - Bounce rate
 - Engagement rate
 ### Export from Google Search Console
 1. Go to **Google Search Console** → Your Property
 2. Click **Performance** → **Search results**
 3. Set date range (e.g., last 30 days)
 4. Check all metrics: Clicks, Impressions, CTR, Position
 5. Click **Export** → **CSV**
 6. Save as `gsc_export.csv`
 **Required Columns:**
 - Page (URL)
 - Clicks
 - Impressions
 - CTR
 - Position
 ## API Integration (Advanced)
 For automated data fetching, configure API credentials:
 ### 1. Google Analytics 4 API
 **Setup:**
 1. Go to [Google Cloud Console](https://console.cloud.google.com/)
 2. Create a new project or select existing
 3. Enable **Google Analytics Data API**
 4. Create service account credentials
 5. Download JSON key file
 6. Share GA4 property with service account email
 **Configuration:**
 Add to `.env`:
 ```
 GA4_CREDENTIALS=/path/to/ga4-credentials.json
 GA4_PROPERTY_ID=properties/123456789
 ```
 ### 2. Google Search Console API
 **Setup:**
 1. Go to [Google Cloud Console](https://console.cloud.google.com/)
 2. Enable **Search Console API**
 3. Create service account credentials
 4. Download JSON key file
 5. Share GSC property with service account email
 **Configuration:**
 Add to `.env`:
 ```
 GSC_CREDENTIALS=/path/to/gsc-credentials.json
 GSC_SITE_URL=https://www.mistergeek.net
 ```
 ### Using API Mode
 Once configured, you can run without CSV files:
 ```bash
 # Fetch data directly from APIs
 ./seo performance --start-date 2024-01-01 --end-date 2024-01-31
 ```
 ## Performance Insights
 ### Low CTR Pages
 Pages with high impressions but low CTR need better titles/descriptions:
 ```bash
 # Find pages with <2% CTR and 100+ impressions
 ./seo performance --gsc search.csv
 # Check "low_ctr" section in output
 ```
 **Action:** Optimize meta titles and descriptions
 ### Low Position Pages
 Pages ranking beyond position 20 need content optimization:
 ```bash
 # Find pages ranking >20 with 50+ impressions
 ./seo performance --gsc search.csv
 # Check "low_position" section in output
 ```
 **Action:** Improve content quality, add internal links
 ### Keyword Opportunities
 Keywords ranking 5-20 are easy to improve:
 ```bash
 ./seo keywords gsc_export.csv --limit 50
 ```
 **Action:** Optimize content for these specific keywords
 ## Workflow Examples
 ### Weekly Performance Check
 ```bash
 # 1. Export fresh data from GA4 and GSC
 # 2. Analyze performance
 ./seo performance --ga4 weekly_ga4.csv --gsc weekly_gsc.csv
 # 3. Review keyword opportunities
 ./seo keywords weekly_gsc.csv --limit 20
 # 4. Generate report
 ./seo report --output weekly_report.md
 ```
 ### Monthly SEO Audit
 ```bash
 # 1. Export full month data
 # 2. Comprehensive analysis
 ./seo performance --ga4 month_ga4.csv --gsc month_gsc.csv
 # 3. Identify top issues
 # Review output for:
 # - Low CTR pages
 # - Low position pages
 # - High impression, low click pages
 # 4. Generate action plan
 ./seo report --output monthly_audit.md
 ```
 ### Content Optimization Sprint
 ```bash
 # 1. Find keyword opportunities
 ./seo keywords gsc.csv --limit 50 > opportunities.txt
 # 2. For each opportunity:
 # - Review current content
 # - Optimize for target keyword
 # - Update meta description
 # 3. Track improvements
 # Re-run analysis after 2 weeks
 ./seo performance --gsc new_gsc.csv
 ```
 ## Output Files
 All analysis results are saved to `output/`:
 | File | Description |
 |------|-------------|
 | `performance_data_*.csv` | Raw performance metrics |
 | `performance_analysis_*.csv` | Analysis with insights |
 | `seo_report_*.md` | Markdown report |
 ## Troubleshooting
 ### No Data Loaded
 **Problem:** "No data loaded. Provide GA4 and/or GSC export files."
 **Solution:**
 - Ensure CSV files are properly exported
 - Check file paths are correct
 - Verify CSV has required columns
 ### Column Name Errors
 **Problem:** "KeyError: 'pageviews'"
 **Solution:**
 - Ensure GA4 export includes pageviews column
 - Column names are normalized automatically
 - Check CSV encoding (UTF-8)
 ### API Authentication Errors
 **Problem:** "Failed to initialize GA4 client"
 **Solution:**
 - Verify service account JSON is valid
 - Check API is enabled in Google Cloud
 - Ensure service account has access to property
 ## Best Practices
 ### Data Collection
 1. **Export regularly** - Weekly or monthly exports
 2. **Consistent date ranges** - Use same range for comparisons
 3. **Keep historical data** - Archive old exports for trend analysis
 ### Analysis
 1. **Focus on trends** - Look at changes over time
 2. **Prioritize impact** - Fix high-traffic pages first
 3. **Track improvements** - Re-analyze after optimizations
 ### Reporting
 1. **Regular reports** - Weekly/monthly cadence
 2. **Share insights** - Distribute to team/stakeholders
 3. **Action-oriented** - Include specific recommendations
 ## Related Commands
 - `seo export` - Export posts from WordPress
 - `seo meta_description` - Generate meta descriptions
 - `seo update_meta` - Update meta on WordPress
 ## See Also
 - [README.md](README.md) - Main documentation
 - [META_DESCRIPTION_GUIDE.md](META_DESCRIPTION_GUIDE.md) - Meta description guide
 - [ANALYTICS_SETUP.md](ANALYTICS_SETUP.md) - API setup guide (if exists)
 ---
 **Made with ❤️ for better SEO automation**
--- a/check_confidence.py
+++ b/check_confidence.py
@@ -0,0 +1,34 @@
 #!/usr/bin/env python3
 import csv
 from collections import Counter
 import glob
 files = sorted(glob.glob('output/category_proposals_*.csv'))
 if files:
    with open(files[-1], 'r') as f:
        reader = csv.DictReader(f)
        proposals = list(reader)
    print("=== All Proposals ===")
    print(f"Total: {len(proposals)}\n")
    print("By Site:")
    sites = Counter(p['current_site'] for p in proposals)
    for site, count in sorted(sites.items()):
        print(f"  {site}: {count}")
    print("\nBy Confidence (all sites):")
    confs = Counter(p['category_confidence'] for p in proposals)
    for conf, count in sorted(confs.items()):
        print(f"  {conf}: {count}")
    print("\nBy Site and Confidence:")
    for site in ['mistergeek.net', 'webscroll.fr', 'hellogeek.net']:
        site_props = [p for p in proposals if p['current_site'] == site]
        confs = Counter(p['category_confidence'] for p in site_props)
        print(f"\n  {site} ({len(site_props)} total):")
        for conf, count in sorted(confs.items()):
            print(f"    {conf}: {count}")
        medium_or_better = [p for p in site_props if p['category_confidence'] in ['High', 'Medium']]
        print(f"    → Would process with -c Medium (default): {len(medium_or_better)}")
--- a/src/seo/app.py
+++ b/src/seo/app.py
@@ -5,13 +5,19 @@ SEO Application Core - Integrated SEO automation functionality
 import logging
 from pathlib import Path
 from datetime import datetime
-from typing import Optional, List, Tuple
+from typing import Optional, List, Tuple, Dict
 from .exporter import PostExporter
 from .analyzer import EnhancedPostAnalyzer
 from .category_proposer import CategoryProposer
 from .category_manager import WordPressCategoryManager, CategoryAssignmentProcessor
 from .editorial_strategy import EditorialStrategyAnalyzer
 from .post_migrator import WordPressPostMigrator
 from .meta_description_generator import MetaDescriptionGenerator
 from .meta_description_updater import MetaDescriptionUpdater
 from .performance_tracker import SEOPerformanceTracker
 from .performance_analyzer import PerformanceAnalyzer
 from .media_importer import WordPressMediaImporter
 logger = logging.getLogger(__name__)
@@ -34,11 +40,23 @@ class SEOApp:
        else:
            logging.basicConfig(level=logging.INFO)
-    def export(self) -> str:
+    def export(self, author_filter: Optional[List[str]] = None, 
-        """Export all posts from WordPress sites."""
+               author_ids: Optional[List[int]] = None,
               site_filter: Optional[str] = None) -> str:
        """
        Export all posts from WordPress sites.
        Args:
            author_filter: List of author names to filter by
            author_ids: List of author IDs to filter by
            site_filter: Export from specific site only
        Returns:
            Path to exported CSV file
        """
        logger.info("📦 Exporting all posts from WordPress sites...")
-        exporter = PostExporter()
+        exporter = PostExporter(author_filter=author_filter, author_ids=author_ids)
-        return exporter.run()
+        return exporter.run(site_filter=site_filter)
    def analyze(self, csv_file: Optional[str] = None, fields: Optional[List[str]] = None, 
                update: bool = False, output: Optional[str] = None) -> str:
@@ -92,7 +110,8 @@ class SEOApp:
        return proposer.run(output_file=output)
    def category_apply(self, proposals_csv: str, site_name: str,
-                      confidence: str = 'Medium', dry_run: bool = False) -> dict:
+                      confidence: str = 'Medium', strict: bool = False,
                      dry_run: bool = False) -> dict:
        """
        Apply AI category proposals to WordPress.
@@ -100,6 +119,7 @@ class SEOApp:
            proposals_csv: Path to proposals CSV
            site_name: Site to apply changes to (mistergeek.net, webscroll.fr, hellogeek.net)
            confidence: Minimum confidence level (High, Medium, Low)
            strict: If True, only match exact confidence (not "or better")
            dry_run: If True, preview changes without applying
        Returns:
@@ -112,6 +132,7 @@ class SEOApp:
            proposals_csv=proposals_csv,
            site_name=site_name,
            confidence_threshold=confidence,
            strict=strict,
            dry_run=dry_run
        )
@@ -161,6 +182,93 @@ class SEOApp:
        analyzer = EditorialStrategyAnalyzer()
        return analyzer.run(csv_file)
    def migrate(self, csv_file: str, destination_site: str,
                create_categories: bool = True, create_tags: bool = True,
                delete_after: bool = False, status: str = 'draft',
                output_file: Optional[str] = None,
                ignore_original_date: bool = False) -> str:
        """
        Migrate posts from CSV file to destination site.
        Args:
            csv_file: Path to CSV file with posts to migrate (must have 'site' and 'post_id' columns)
            destination_site: Destination site name (mistergeek.net, webscroll.fr, hellogeek.net)
            create_categories: If True, create categories if they don't exist
            create_tags: If True, create tags if they don't exist
            delete_after: If True, delete posts from source after migration
            status: Status for new posts ('draft', 'publish', 'pending')
            output_file: Custom output file path for migration report
            ignore_original_date: If True, use current date instead of original post date
        Returns:
            Path to migration report CSV
        """
        logger.info(f"🚀 Migrating posts to {destination_site}...")
        migrator = WordPressPostMigrator()
        return migrator.migrate_posts_from_csv(
            csv_file=csv_file,
            destination_site=destination_site,
            create_categories=create_categories,
            create_tags=create_tags,
            delete_after=delete_after,
            status=status,
            output_file=output_file,
            ignore_original_date=ignore_original_date
        )
    def migrate_by_filter(self, source_site: str, destination_site: str,
                          category_filter: Optional[List[str]] = None,
                          tag_filter: Optional[List[str]] = None,
                          date_after: Optional[str] = None,
                          date_before: Optional[str] = None,
                          status_filter: Optional[List[str]] = None,
                          create_categories: bool = True,
                          create_tags: bool = True,
                          delete_after: bool = False,
                          status: str = 'draft',
                          limit: Optional[int] = None,
                          ignore_original_date: bool = False) -> str:
        """
        Migrate posts based on filters.
        Args:
            source_site: Source site name
            destination_site: Destination site name
            category_filter: List of category names to filter by
            tag_filter: List of tag names to filter by
            date_after: Only migrate posts after this date (YYYY-MM-DD)
            date_before: Only migrate posts before this date (YYYY-MM-DD)
            status_filter: List of statuses to filter by (e.g., ['publish', 'draft'])
            create_categories: If True, create categories if they don't exist
            create_tags: If True, create tags if they don't exist
            delete_after: If True, delete posts from source after migration
            status: Status for new posts
            limit: Maximum number of posts to migrate
            ignore_original_date: If True, use current date instead of original post date
        Returns:
            Path to migration report CSV
        """
        logger.info(f"🚀 Migrating posts from {source_site} to {destination_site}...")
        migrator = WordPressPostMigrator()
        return migrator.migrate_posts_by_filter(
            source_site=source_site,
            destination_site=destination_site,
            category_filter=category_filter,
            tag_filter=tag_filter,
            date_after=date_after,
            date_before=date_before,
            status_filter=status_filter,
            create_categories=create_categories,
            create_tags=create_tags,
            delete_after=delete_after,
            status=status,
            limit=limit,
            ignore_original_date=ignore_original_date
        )
    def status(self) -> dict:
        """Get status of output files."""
        files = list(self.output_dir.glob('*.csv'))
@@ -179,6 +287,85 @@ class SEOApp:
        return status_info
    def generate_meta_descriptions(self, csv_file: Optional[str] = None,
                                   output_file: Optional[str] = None,
                                   only_missing: bool = False,
                                   only_poor_quality: bool = False,
                                   limit: Optional[int] = None) -> Tuple[str, Dict]:
        """
        Generate AI-optimized meta descriptions for posts.
        Args:
            csv_file: Path to CSV file with posts (uses latest export if not provided)
            output_file: Custom output file path for results
            only_missing: Only generate for posts without meta descriptions
            only_poor_quality: Only generate for posts with poor quality meta descriptions
            limit: Maximum number of posts to process
        Returns:
            Tuple of (output_file_path, summary_dict)
        """
        logger.info("✨ Generating AI-optimized meta descriptions...")
        if not csv_file:
            csv_file = self._find_latest_export()
        if not csv_file:
            raise FileNotFoundError("No exported posts found. Run export() first or provide a CSV file.")
        logger.info(f"Using file: {csv_file}")
        generator = MetaDescriptionGenerator(csv_file)
        return generator.run(
            output_file=output_file,
            only_missing=only_missing,
            only_poor_quality=only_poor_quality,
            limit=limit
        )
    def update_meta_descriptions(self, site: str,
                                 post_ids: Optional[List[int]] = None,
                                 category_names: Optional[List[str]] = None,
                                 category_ids: Optional[List[int]] = None,
                                 author_names: Optional[List[str]] = None,
                                 limit: Optional[int] = None,
                                 dry_run: bool = False,
                                 skip_existing: bool = True,
                                 force_regenerate: bool = False) -> Dict:
        """
        Fetch posts from WordPress, generate AI meta descriptions, and update them.
        Args:
            site: WordPress site name (REQUIRED) - mistergeek.net, webscroll.fr, hellogeek.net
            post_ids: Specific post IDs to update
            category_names: Filter by category names
            category_ids: Filter by category IDs
            author_names: Filter by author names
            limit: Maximum number of posts to process
            dry_run: If True, preview changes without updating
            skip_existing: If True, skip posts with existing good quality meta descriptions
            force_regenerate: If True, regenerate even for good quality metas
        Returns:
            Statistics dict
        """
        logger.info(f"🔄 Updating meta descriptions on {site}...")
        if not site:
            raise ValueError("Site is required. Use --site mistergeek.net|webscroll.fr|hellogeek.net")
        updater = MetaDescriptionUpdater(site)
        return updater.run(
            post_ids=post_ids,
            category_ids=category_ids,
            category_names=category_names,
            author_names=author_names,
            limit=limit,
            dry_run=dry_run,
            skip_existing=skip_existing,
            force_regenerate=force_regenerate
        )
    def _find_latest_export(self) -> Optional[str]:
        """Find the latest exported CSV file."""
        csv_files = list(self.output_dir.glob('all_posts_*.csv'))
@@ -188,3 +375,136 @@ class SEOApp:
        latest = max(csv_files, key=lambda f: f.stat().st_ctime)
        return str(latest)
    def performance(self, ga4_file: Optional[str] = None,
                    gsc_file: Optional[str] = None,
                    start_date: Optional[str] = None,
                    end_date: Optional[str] = None,
                    output_file: Optional[str] = None) -> Tuple[str, Dict]:
        """
        Analyze page performance from GA4 and GSC data.
        Args:
            ga4_file: Path to GA4 export CSV (or use API if credentials configured)
            gsc_file: Path to GSC export CSV (or use API if credentials configured)
            start_date: Start date YYYY-MM-DD (for API mode)
            end_date: End date YYYY-MM-DD (for API mode)
            output_file: Custom output file path
        Returns:
            Tuple of (output_file_path, analysis_dict)
        """
        logger.info("📊 Analyzing page performance...")
        # If CSV files provided, use analyzer
        if ga4_file or gsc_file:
            analyzer = PerformanceAnalyzer()
            return analyzer.run(ga4_file=ga4_file, gsc_file=gsc_file, output_file=output_file)
        # Otherwise try API mode
        tracker = SEOPerformanceTracker()
        if tracker.ga4_client or tracker.gsc_service:
            return tracker.run(start_date=start_date, end_date=end_date, output_file=output_file)
        else:
            logger.error("No data source available. Provide CSV exports or configure API credentials.")
            return "", {}
    def keywords(self, gsc_file: str, limit: int = 50) -> List[Dict]:
        """
        Analyze keyword opportunities from GSC data.
        Args:
            gsc_file: Path to GSC export CSV
            limit: Maximum keywords to return
        Returns:
            List of keyword opportunity dicts
        """
        logger.info("🔍 Analyzing keyword opportunities...")
        analyzer = PerformanceAnalyzer()
        analyzer.load_gsc_export(gsc_file)
        analysis = analyzer.analyze()
        opportunities = analysis.get('keyword_opportunities', [])[:limit]
        logger.info(f"Found {len(opportunities)} keyword opportunities")
        return opportunities
    def seo_report(self, output_file: Optional[str] = None) -> str:
        """
        Generate comprehensive SEO performance report.
        Args:
            output_file: Custom output file path
        Returns:
            Path to report file
        """
        logger.info("📄 Generating SEO report...")
        if not output_file:
            output_dir = Path(__file__).parent.parent.parent / 'output'
            output_dir.mkdir(parents=True, exist_ok=True)
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            output_file = output_dir / f'seo_report_{timestamp}.md'
        output_file = Path(output_file)
        # Generate report content
        report = self._generate_report_content()
        # Write report
        with open(output_file, 'w', encoding='utf-8') as f:
            f.write(report)
        logger.info(f"✓ Report saved to: {output_file}")
        return str(output_file)
    def _generate_report_content(self) -> str:
        """Generate markdown report content."""
        report = []
        report.append("# SEO Performance Report\n")
        report.append(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
        report.append("---\n")
        # Summary section
        report.append("## 📊 Summary\n")
        report.append("This report provides insights into your website's SEO performance.\n")
        # Add analysis sections
        report.append("## 📈 Traffic Analysis\n")
        report.append("*Import GA4/GSC data for detailed traffic analysis*\n")
        report.append("## 🔍 Keyword Opportunities\n")
        report.append("*Import GSC data for keyword analysis*\n")
        report.append("## 📝 SEO Recommendations\n")
        report.append("1. Review and optimize meta descriptions\n")
        report.append("2. Improve content for low-ranking pages\n")
        report.append("3. Build internal links to important pages\n")
        report.append("4. Monitor keyword rankings regularly\n")
        return "\n".join(report)
    def import_media(self, migration_report: str,
                    source_site: str = 'mistergeek.net',
                    destination_site: str = 'hellogeek.net',
                    dry_run: bool = True) -> Dict:
        """
        Import media from source to destination site for migrated posts.
        Args:
            migration_report: Path to migration report CSV
            source_site: Source site name
            destination_site: Destination site name
            dry_run: If True, preview without importing
        Returns:
            Statistics dict
        """
        logger.info(f"📸 Importing media from {source_site} to {destination_site}...")
        importer = WordPressMediaImporter(source_site, destination_site)
        return importer.run_from_migration_report(migration_report, dry_run=dry_run)
--- a/src/seo/category_manager.py
+++ b/src/seo/category_manager.py
@@ -132,12 +132,33 @@ class WordPressCategoryManager:
                    }
                return category_data['id']
-            elif response.status_code == 409:
+            elif response.status_code == 400:
-                # Category already exists
+                # Category might already exist - search for it
-                logger.info(f"  Category '{category_name}' already exists")
+                error_data = response.json()
-                existing = response.json()
+                if error_data.get('code') == 'term_exists':
-                if isinstance(existing, list) and len(existing) > 0:
+                    term_id = error_data.get('data', {}).get('term_id')
-                    return existing[0]['id']
+                    if term_id:
                        logger.info(f"  Category '{category_name}' already exists (ID: {term_id})")
                        # Fetch the category details
                        cat_response = requests.get(
                            f"{base_url}/wp-json/wp/v2/categories/{term_id}",
                            auth=auth,
                            timeout=10
                        )
                        if cat_response.status_code == 200:
                            cat_data = cat_response.json()
                            # Update cache
                            if site_name in self.category_cache:
                                self.category_cache[site_name][cat_data['slug']] = {
                                    'id': cat_data['id'],
                                    'name': cat_data['name'],
                                    'slug': cat_data['slug'],
                                    'count': cat_data.get('count', 0)
                                }
                            return cat_data['id']
                logger.warning(f"  Category already exists or error: {error_data}")
                return None
            else:
                logger.error(f"Error creating category: {response.status_code} - {response.text}")
@@ -164,21 +185,42 @@ class WordPressCategoryManager:
        if site_name not in self.category_cache:
            self.fetch_categories(site_name)
-        # Check if category exists
+        # Check if category exists (by exact name first)
        slug = category_name.lower().replace(' ', '-').replace('/', '-')
        categories = self.category_cache.get(site_name, {})
        # Try exact name match (case-insensitive)
        category_name_lower = category_name.lower()
        for slug, cat_data in categories.items():
            if cat_data['name'].lower() == category_name_lower:
                logger.info(f"✓ Found existing category '{category_name}' (ID: {cat_data['id']})")
                return cat_data['id']
        # Try slug match
        slug = category_name.lower().replace(' ', '-').replace('/', '-')
        if slug in categories:
            logger.info(f"✓ Found existing category '{category_name}' (ID: {categories[slug]['id']})")
            return categories[slug]['id']
-        # Try alternative slug formats
+        # Try alternative slug formats (handle French characters)
-        alt_slug = category_name.lower().replace(' ', '-')
+        import unicodedata
-        if alt_slug in categories:
+        normalized_slug = unicodedata.normalize('NFKD', slug)\
-            logger.info(f"✓ Found existing category '{category_name}' (ID: {categories[alt_slug]['id']})")
+            .encode('ascii', 'ignore')\
-            return categories[alt_slug]['id']
+            .decode('ascii')\
            .lower()\
            .replace(' ', '-')
        if normalized_slug in categories:
            logger.info(f"✓ Found existing category '{category_name}' (ID: {categories[normalized_slug]['id']})")
            return categories[normalized_slug]['id']
        # Try partial match (if slug contains the category name)
        for slug, cat_data in categories.items():
            if category_name_lower in cat_data['name'].lower() or cat_data['name'].lower() in category_name_lower:
                logger.info(f"✓ Found similar category '{cat_data['name']}' (ID: {cat_data['id']})")
                return cat_data['id']
        # Create new category
        logger.info(f"Creating new category '{category_name}'...")
        return self.create_category(site_name, category_name, description)
    def assign_post_to_category(self, site_name: str, post_id: int, 
@@ -292,14 +334,16 @@ class CategoryAssignmentProcessor:
    def process_proposals(self, proposals: List[Dict], site_name: str,
                         confidence_threshold: str = 'Medium',
                         strict: bool = False,
                         dry_run: bool = False) -> Dict[str, int]:
        """
        Process AI category proposals and apply to WordPress.
        Args:
            proposals: List of proposal dicts from CSV
-            site_name: Site to apply changes to
+            site_name: Site to apply changes to (filters proposals)
            confidence_threshold: Minimum confidence to apply (High, Medium, Low)
            strict: If True, only match exact confidence level
            dry_run: If True, don't actually make changes
        Returns:
@@ -312,7 +356,23 @@ class CategoryAssignmentProcessor:
        if dry_run:
            logger.info("DRY RUN - No changes will be made")
        # Filter by site
        original_count = len(proposals)
        proposals = [p for p in proposals if p.get('current_site', '') == site_name]
        filtered_by_site = original_count - len(proposals)
        logger.info(f"Filtered to {len(proposals)} posts on {site_name} ({filtered_by_site} excluded from other sites)")
        # Filter by confidence
        if strict:
            # Exact match only
            filtered_proposals = [
                p for p in proposals
                if p.get('category_confidence', 'Medium') == confidence_threshold
            ]
            logger.info(f"Filtered to {len(filtered_proposals)} proposals (confidence = {confidence_threshold}, strict mode)")
        else:
            # Medium or better (default behavior)
            confidence_order = {'High': 3, 'Medium': 2, 'Low': 1}
            min_confidence = confidence_order.get(confidence_threshold, 2)
@@ -320,28 +380,36 @@ class CategoryAssignmentProcessor:
                p for p in proposals
                if confidence_order.get(p.get('category_confidence', 'Medium'), 2) >= min_confidence
            ]
            logger.info(f"Filtered to {len(filtered_proposals)} proposals (confidence >= {confidence_threshold})")
            # Show breakdown
            high_count = sum(1 for p in filtered_proposals if p.get('category_confidence') == 'High')
            medium_count = sum(1 for p in filtered_proposals if p.get('category_confidence') == 'Medium')
            low_count = sum(1 for p in filtered_proposals if p.get('category_confidence') == 'Low')
            logger.info(f"  Breakdown: High={high_count}, Medium={medium_count}, Low={low_count}")
        # Fetch existing categories
        self.category_manager.fetch_categories(site_name)
        # Process each proposal
        for i, proposal in enumerate(filtered_proposals, 1):
-            logger.info(f"\n[{i}/{len(filtered_proposals)}] Processing post {proposal.get('post_id')}...")
+            post_title = proposal.get('title', 'Unknown')[:60]
-            
+            post_id = proposal.get('post_id', '')
            post_id = int(proposal.get('post_id', 0))
            proposed_category = proposal.get('proposed_category', '')
            current_categories = proposal.get('current_categories', '')
            confidence = proposal.get('category_confidence', 'Medium')
            logger.info(f"\n[{i}/{len(filtered_proposals)}] Post {post_id}: {post_title}...")
            logger.info(f"  Current categories: {current_categories}")
            logger.info(f"  Proposed: {proposed_category} (confidence: {confidence})")
            if not post_id or not proposed_category:
                logger.warning("  Skipping: Missing post_id or proposed_category")
                self.processing_stats['errors'] += 1
                continue
            if dry_run:
-                logger.info(f"  Would assign to: {proposed_category}")
+                logger.info(f"  [DRY RUN] Would assign to: {proposed_category}")
                continue
            # Get or create the category
@@ -362,9 +430,10 @@ class CategoryAssignmentProcessor:
                    logger.info(f"  ✓ Assigned to '{proposed_category}'")
                else:
                    self.processing_stats['errors'] += 1
                    logger.error(f"  ✗ Failed to assign category")
            else:
                self.processing_stats['errors'] += 1
-                logger.error(f"  Failed to get/create category '{proposed_category}'")
+                logger.error(f"  ✗ Failed to get/create category '{proposed_category}'")
        self.processing_stats['total_posts'] = len(filtered_proposals)
@@ -381,6 +450,7 @@ class CategoryAssignmentProcessor:
    def run(self, proposals_csv: str, site_name: str,
            confidence_threshold: str = 'Medium',
            strict: bool = False,
            dry_run: bool = False) -> Dict[str, int]:
        """
        Run complete category assignment process.
@@ -389,6 +459,7 @@ class CategoryAssignmentProcessor:
            proposals_csv: Path to proposals CSV
            site_name: Site to apply changes to
            confidence_threshold: Minimum confidence to apply
            strict: If True, only match exact confidence level
            dry_run: If True, preview changes without applying
        Returns:
@@ -404,5 +475,6 @@ class CategoryAssignmentProcessor:
            proposals, 
            site_name, 
            confidence_threshold,
-            dry_run
+            strict=strict,
            dry_run=dry_run
        )
--- a/src/seo/category_proposer.py
+++ b/src/seo/category_proposer.py
@@ -164,7 +164,7 @@ class CategoryProposer:
        logger.info("\n📊 Analyzing editorial strategy to inform category proposals...")
        analyzer = EditorialStrategyAnalyzer()
-        analyzer.load_csv(str(self.csv_file))
+        analyzer.load_posts(str(self.csv_file))
        self.site_analysis = analyzer.analyze_site_content()
        logger.info("✓ Editorial strategy analysis complete")
--- a/src/seo/cli.py
+++ b/src/seo/cli.py
@@ -47,6 +47,48 @@ Examples:
    parser.add_argument('--site', '-s', choices=['mistergeek.net', 'webscroll.fr', 'hellogeek.net'],
                       help='WordPress site for category operations')
    parser.add_argument('--description', '-d', help='Category description')
    parser.add_argument('--strict', action='store_true', help='Strict confidence matching (exact match only)')
    # Export arguments
    parser.add_argument('--author', nargs='+', help='Filter by author name(s) for export')
    parser.add_argument('--author-id', type=int, nargs='+', help='Filter by author ID(s) for export')
    # Migration arguments
    parser.add_argument('--destination', '--to', choices=['mistergeek.net', 'webscroll.fr', 'hellogeek.net'],
                       help='Destination site for migration')
    parser.add_argument('--source', '--from', choices=['mistergeek.net', 'webscroll.fr', 'hellogeek.net'],
                       help='Source site for filtered migration')
    parser.add_argument('--keep-source', action='store_true', help='Keep posts on source site (default: delete after migration)')
    parser.add_argument('--post-status', choices=['draft', 'publish', 'pending'], default='draft',
                       help='Status for migrated posts (default: draft)')
    parser.add_argument('--no-categories', action='store_true', help='Do not create categories automatically')
    parser.add_argument('--no-tags', action='store_true', help='Do not create tags automatically')
    parser.add_argument('--category-filter', nargs='+', help='Filter by category names (for filtered migration)')
    parser.add_argument('--tag-filter', nargs='+', help='Filter by tag names (for filtered migration)')
    parser.add_argument('--date-after', help='Migrate posts after this date (YYYY-MM-DD)')
    parser.add_argument('--date-before', help='Migrate posts before this date (YYYY-MM-DD)')
    parser.add_argument('--limit', type=int, help='Limit number of posts to migrate')
    parser.add_argument('--ignore-original-date', action='store_true', help='Use current date instead of original post date')
    # Meta description arguments
    parser.add_argument('--only-missing', action='store_true', help='Only generate for posts without meta descriptions')
    parser.add_argument('--only-poor', action='store_true', help='Only generate for posts with poor quality meta descriptions')
    # Update meta arguments
    parser.add_argument('--post-ids', type=int, nargs='+', help='Specific post IDs to update')
    parser.add_argument('--category', nargs='+', help='Filter by category name(s)')
    parser.add_argument('--category-id', type=int, nargs='+', help='Filter by category ID(s)')
    parser.add_argument('--force', action='store_true', help='Force regenerate even for good quality meta descriptions')
    # Performance arguments
    parser.add_argument('--ga4', help='Path to Google Analytics 4 export CSV')
    parser.add_argument('--gsc', help='Path to Google Search Console export CSV')
    parser.add_argument('--start-date', help='Start date YYYY-MM-DD (for API mode)')
    parser.add_argument('--end-date', help='End date YYYY-MM-DD (for API mode)')
    # Media import arguments
    parser.add_argument('--from-site', help='Source site for media import (default: mistergeek.net)')
    parser.add_argument('--to-site', help='Destination site for media import (default: hellogeek.net)')
    args = parser.parse_args()
@@ -72,6 +114,13 @@ Examples:
        'category_apply': cmd_category_apply,
        'category_create': cmd_category_create,
        'editorial_strategy': cmd_editorial_strategy,
        'migrate': cmd_migrate,
        'meta_description': cmd_meta_description,
        'update_meta': cmd_update_meta,
        'performance': cmd_performance,
        'keywords': cmd_keywords,
        'report': cmd_report,
        'import_media': cmd_import_media,
        'status': cmd_status,
        'help': cmd_help,
    }
@@ -103,8 +152,19 @@ def cmd_export(app, args):
    """Export all posts."""
    if args.dry_run:
        print("Would export all posts from WordPress sites")
        if args.author:
            print(f"  Author filter: {args.author}")
        if args.author_id:
            print(f"  Author ID filter: {args.author_id}")
        return 0
-    app.export()
+    
    result = app.export(
        author_filter=args.author,
        author_ids=args.author_id,
        site_filter=args.site
    )
    if result:
        print(f"✅ Export completed! Output: {result}")
    return 0
@@ -160,6 +220,8 @@ def cmd_category_apply(app, args):
        print("Would apply category proposals to WordPress")
        print(f"  Site: {args.site}")
        print(f"  Confidence: {args.confidence}")
        if args.strict:
            print(f"  Strict mode: Yes (exact match only)")
        return 0
    if not args.site:
@@ -180,11 +242,14 @@ def cmd_category_apply(app, args):
    print(f"Applying categories from: {proposals_csv}")
    print(f"Site: {args.site}")
    print(f"Confidence threshold: {args.confidence}")
    if args.strict:
        print(f"Strict mode: Yes (exact match only)")
    stats = app.category_apply(
        proposals_csv=proposals_csv,
        site_name=args.site,
        confidence=args.confidence,
        strict=args.strict,
        dry_run=False
    )
@@ -253,6 +318,196 @@ def cmd_editorial_strategy(app, args):
    return 0
 def cmd_migrate(app, args):
    """Migrate posts between websites."""
    if args.dry_run:
        print("Would migrate posts between websites")
        if args.destination:
            print(f"  Destination: {args.destination}")
        if args.source:
            print(f"  Source: {args.source}")
        return 0
    # Validate required arguments
    if not args.destination:
        print("❌ Destination site required. Use --destination mistergeek.net|webscroll.fr|hellogeek.net")
        return 1
    delete_after = not args.keep_source
    create_categories = not args.no_categories
    create_tags = not args.no_tags
    # Check if using filtered migration or CSV-based migration
    if args.source:
        # Filtered migration
        print(f"Migrating posts from {args.source} to {args.destination}")
        print(f"Post status: {args.post_status}")
        print(f"Delete after migration: {delete_after}")
        if args.category_filter:
            print(f"Category filter: {args.category_filter}")
        if args.tag_filter:
            print(f"Tag filter: {args.tag_filter}")
        if args.date_after:
            print(f"Date after: {args.date_after}")
        if args.date_before:
            print(f"Date before: {args.date_before}")
        if args.limit:
            print(f"Limit: {args.limit}")
        result = app.migrate_by_filter(
            source_site=args.source,
            destination_site=args.destination,
            category_filter=args.category_filter,
            tag_filter=args.tag_filter,
            date_after=args.date_after,
            date_before=args.date_before,
            status_filter=None,
            create_categories=create_categories,
            create_tags=create_tags,
            delete_after=delete_after,
            status=args.post_status,
            limit=args.limit,
            ignore_original_date=args.ignore_original_date
        )
        if result:
            print(f"\n✅ Migration completed!")
            print(f"  Report: {result}")
    else:
        # CSV-based migration
        csv_file = args.args[0] if args.args else None
        if not csv_file:
            print("❌ CSV file required. Provide path to CSV with 'site' and 'post_id' columns")
            print("   Usage: seo migrate <csv_file> --destination <site>")
            print("   Or use filtered migration: seo migrate --source <site> --destination <site>")
            return 1
        print(f"Migrating posts from CSV: {csv_file}")
        print(f"Destination: {args.destination}")
        print(f"Post status: {args.post_status}")
        print(f"Delete after migration: {delete_after}")
        result = app.migrate(
            csv_file=csv_file,
            destination_site=args.destination,
            create_categories=create_categories,
            create_tags=create_tags,
            delete_after=delete_after,
            status=args.post_status,
            output_file=args.output,
            ignore_original_date=args.ignore_original_date
        )
        if result:
            print(f"\n✅ Migration completed!")
            print(f"  Report: {result}")
    return 0
 def cmd_meta_description(app, args):
    """Generate AI-optimized meta descriptions."""
    if args.dry_run:
        print("Would generate AI-optimized meta descriptions")
        if args.only_missing:
            print("  Filter: Only posts without meta descriptions")
        if args.only_poor:
            print("  Filter: Only posts with poor quality meta descriptions")
        if args.limit:
            print(f"  Limit: {args.limit} posts")
        return 0
    csv_file = args.args[0] if args.args else None
    print("Generating AI-optimized meta descriptions...")
    if args.only_missing:
        print("  Filter: Only posts without meta descriptions")
    elif args.only_poor:
        print("  Filter: Only posts with poor quality meta descriptions")
    if args.limit:
        print(f"  Limit: {args.limit} posts")
    output_file, summary = app.generate_meta_descriptions(
        csv_file=csv_file,
        output_file=args.output,
        only_missing=args.only_missing,
        only_poor_quality=args.only_poor,
        limit=args.limit
    )
    if output_file and summary:
        print(f"\n✅ Meta description generation completed!")
        print(f"  Results: {output_file}")
        print(f"\n📊 Summary:")
        print(f"  Total processed: {summary.get('total_posts', 0)}")
        print(f"  Improved: {summary.get('improved', 0)} ({summary.get('improvement_rate', 0):.1f}%)")
        print(f"  Optimal length: {summary.get('optimal_length_count', 0)} ({summary.get('optimal_length_rate', 0):.1f}%)")
        print(f"  Average score: {summary.get('average_score', 0):.1f}")
        print(f"  API calls: {summary.get('api_calls', 0)}")
    return 0
 def cmd_update_meta(app, args):
    """Fetch, generate, and update meta descriptions directly on WordPress."""
    if args.dry_run:
        print("Would update meta descriptions on WordPress")
        if not args.site:
            print("  ❌ Site is required. Use --site mistergeek.net|webscroll.fr|hellogeek.net")
            return 1
        print(f"  Site: {args.site}")
        if args.post_ids:
            print(f"  Post IDs: {args.post_ids}")
        if args.category:
            print(f"  Categories: {args.category}")
        if args.author:
            print(f"  Authors: {args.author}")
        if args.limit:
            print(f"  Limit: {args.limit} posts")
        return 0
    # Site is required
    if not args.site:
        print("❌ Site is required. Use --site mistergeek.net|webscroll.fr|hellogeek.net")
        return 1
    print(f"Updating meta descriptions on {args.site}...")
    if args.post_ids:
        print(f"  Post IDs: {args.post_ids}")
    if args.category:
        print(f"  Categories: {args.category}")
    if args.author:
        print(f"  Authors: {args.author}")
    if args.category_id:
        print(f"  Category IDs: {args.category_id}")
    if args.limit:
        print(f"  Limit: {args.limit} posts")
    print(f"  Skip existing: {not args.force}")
    print(f"  Dry run: {args.dry_run}")
    stats = app.update_meta_descriptions(
        site=args.site,
        post_ids=args.post_ids,
        category_names=args.category,
        category_ids=args.category_id,
        author_names=args.author,
        limit=args.limit,
        dry_run=args.dry_run,
        skip_existing=not args.force,
        force_regenerate=args.force
    )
    if stats:
        print(f"\n✅ Meta description update completed!")
        print(f"\n📊 Summary:")
        print(f"  Total posts: {stats.get('total_posts', 0)}")
        print(f"  Updated: {stats.get('updated', 0)}")
        print(f"  Failed: {stats.get('failed', 0)}")
        print(f"  Skipped: {stats.get('skipped', 0)}")
        print(f"  API calls: {stats.get('api_calls', 0)}")
    return 0
 def cmd_status(app, args):
    """Show status."""
    if args.dry_run:
@@ -272,6 +527,123 @@ def cmd_status(app, args):
    return 0
 def cmd_performance(app, args):
    """Analyze page performance from GA4 and GSC data."""
    if args.dry_run:
        print("Would analyze page performance")
        if args.ga4:
            print(f"  GA4 file: {args.ga4}")
        if args.gsc:
            print(f"  GSC file: {args.gsc}")
        return 0
    print("Analyzing page performance...")
    output_file, analysis = app.performance(
        ga4_file=args.ga4,
        gsc_file=args.gsc,
        start_date=args.start_date,
        end_date=args.end_date,
        output_file=args.output
    )
    if output_file and analysis:
        print(f"\n✅ Performance analysis completed!")
        print(f"  Results: {output_file}")
        print(f"\n📊 Summary:")
        summary = analysis.get('summary', {})
        print(f"  Total pages: {summary.get('total_pages', 0)}")
        print(f"  Total pageviews: {summary.get('total_pageviews', 0)}")
        print(f"  Total clicks: {summary.get('total_clicks', 0)}")
        print(f"  Average CTR: {summary.get('average_ctr', 0):.2%}")
        print(f"  Average position: {summary.get('average_position', 0):.1f}")
    return 0
 def cmd_keywords(app, args):
    """Analyze keyword opportunities from GSC data."""
    if args.dry_run:
        print("Would analyze keyword opportunities")
        if args.args:
            print(f"  GSC file: {args.args[0]}")
        return 0
    gsc_file = args.args[0] if args.args else None
    if not gsc_file:
        print("❌ GSC export file required")
        print("   Usage: seo keywords <gsc_export.csv>")
        return 1
    print(f"Analyzing keyword opportunities from {gsc_file}...")
    opportunities = app.keywords(gsc_file=gsc_file, limit=args.limit or 50)
    if opportunities:
        print(f"\n✅ Found {len(opportunities)} keyword opportunities!")
        print(f"\nTop opportunities:")
        for i, kw in enumerate(opportunities[:10], 1):
            print(f"  {i}. {kw['query']} - Position: {kw['position']:.1f}, Impressions: {kw['impressions']}")
    return 0
 def cmd_report(app, args):
    """Generate comprehensive SEO performance report."""
    if args.dry_run:
        print("Would generate SEO performance report")
        return 0
    print("Generating SEO performance report...")
    report_file = app.seo_report(output_file=args.output)
    if report_file:
        print(f"\n✅ Report generated!")
        print(f"  Report: {report_file}")
    return 0
 def cmd_import_media(app, args):
    """Import media from source to destination site for migrated posts."""
    if args.dry_run:
        print("Would import media")
        print(f"  Source: {args.from_site or 'mistergeek.net'}")
        print(f"  Destination: {args.to_site or 'hellogeek.net'}")
        if args.args:
            print(f"  Migration report: {args.args[0]}")
        return 0
    migration_report = args.args[0] if args.args else None
    if not migration_report:
        print("❌ Migration report CSV required")
        print("   Usage: seo import_media <migration_report.csv>")
        return 1
    source_site = args.from_site or 'mistergeek.net'
    dest_site = args.to_site or 'hellogeek.net'
    print(f"Importing media from {source_site} to {dest_site}...")
    print(f"Migration report: {migration_report}")
    stats = app.import_media(
        migration_report=migration_report,
        source_site=source_site,
        destination_site=dest_site,
        dry_run=False
    )
    if stats:
        print(f"\n✅ Media import completed!")
        print(f"\n📊 Summary:")
        print(f"  Total posts: {stats.get('total_posts', 0)}")
        print(f"  Posts with media: {stats.get('posts_with_media', 0)}")
        print(f"  Images uploaded: {stats.get('images_uploaded', 0)}")
        print(f"  Featured images set: {stats.get('featured_images_set', 0)}")
        print(f"  Errors: {stats.get('errors', 0)}")
    return 0
 def cmd_help(app, args):
    """Show help."""
    print("""
@@ -279,10 +651,18 @@ SEO Automation CLI - Available Commands
 Export & Analysis:
  export                    Export all posts from WordPress sites
  export --author "John Doe"  Export posts by specific author
  export --author-id 1 2    Export posts by author IDs
  export -s mistergeek.net  Export from specific site only
  analyze [csv_file]        Analyze posts with AI
  analyze -f title          Analyze specific fields (title, meta_description, categories, site)
  analyze -u                Update input CSV with new columns (creates backup)
  category_propose [csv]    Propose categories based on content
  meta_description [csv]    Generate AI-optimized meta descriptions
  meta_description --only-missing  Generate only for posts without meta descriptions
  update_meta --site <site> Fetch, generate, and update meta on WordPress
  update_meta --site A --post-ids 1 2 3  Update specific posts
  update_meta --site A --category "VPN"  Update posts in category
 Category Management:
  category_apply [csv]      Apply AI category proposals to WordPress
@@ -293,11 +673,61 @@ Category Management:
 Strategy & Migration:
  editorial_strategy [csv]  Analyze editorial lines and recommend migrations
  editorial_strategy        Get migration recommendations between sites
  migrate <csv> --destination <site>  Migrate posts from CSV to destination site
  migrate --source <site> --destination <site>  Migrate posts with filters
  migrate --source A --to B --category-filter "VPN"  Migrate specific categories
  migrate --source A --to B --date-after 2024-01-01 --limit 10
 Utility:
  status                    Show output files status
  performance [ga4.csv] [gsc.csv]  Analyze page performance
  performance --ga4 analytics.csv --gsc search.csv  Analyze with both sources
  keywords <gsc.csv>        Show keyword opportunities
  report                    Generate SEO performance report
  import_media <report.csv> Import media for migrated posts
  help                      Show this help message
 Export Options:
  --author                  Filter by author name(s) (case-insensitive, partial match)
  --author-id               Filter by author ID(s)
  --site, -s                Export from specific site only
 Meta Description Options:
  --only-missing            Only generate for posts without meta descriptions
  --only-poor               Only generate for posts with poor quality meta descriptions
  --limit                   Limit number of posts to process
  --output, -o              Custom output file path
 Update Meta Options:
  --site, -s                WordPress site (REQUIRED): mistergeek.net, webscroll.fr, hellogeek.net
  --post-ids                Specific post IDs to update
  --category                Filter by category name(s)
  --category-id             Filter by category ID(s)
  --author                  Filter by author name(s)
  --force                   Force regenerate even for good quality meta descriptions
 Performance Options:
  --ga4                     Path to Google Analytics 4 export CSV
  --gsc                     Path to Google Search Console export CSV
  --start-date              Start date YYYY-MM-DD (for API mode)
  --end-date                End date YYYY-MM-DD (for API mode)
  --limit                   Limit number of results
 Migration Options:
  --destination, --to       Destination site: mistergeek.net, webscroll.fr, hellogeek.net
  --source, --from          Source site for filtered migration
  --keep-source             Keep posts on source site (default: delete after migration)
  --post-status             Status for migrated posts: draft, publish, pending (default: draft)
  --no-categories           Do not create categories automatically
  --no-tags                 Do not create tags automatically
  --category-filter         Filter by category names (for filtered migration)
  --tag-filter              Filter by tag names (for filtered migration)
  --date-after              Migrate posts after this date (YYYY-MM-DD)
  --date-before             Migrate posts before this date (YYYY-MM-DD)
  --limit                   Limit number of posts to migrate
  --ignore-original-date    Use current date instead of original post date
  --output, -o              Custom output file path for migration report
 Options:
  --verbose, -v             Enable verbose logging
  --dry-run                 Show what would be done without doing it
@@ -307,14 +737,32 @@ Options:
  --confidence, -c          Confidence threshold: High, Medium, Low
  --site, -s                WordPress site: mistergeek.net, webscroll.fr, hellogeek.net
  --description, -d         Category description
  --strict                  Strict confidence matching (exact match only, not "or better")
 Examples:
  seo export
  seo export --author "John Doe"
  seo export --author-id 1 2
  seo export -s mistergeek.net --author "admin"
  seo analyze -f title categories
  seo category_propose
  seo category_apply -s mistergeek.net -c Medium
  seo category_create -s webscroll.fr "Torrent Clients"
  seo editorial_strategy
  seo migrate posts_to_migrate.csv --destination mistergeek.net
  seo migrate --source webscroll.fr --destination mistergeek.net --category-filter VPN
  seo migrate --source A --to B --date-after 2024-01-01 --limit 10 --keep-source
  seo meta_description                    # Generate for all posts
  seo meta_description --only-missing     # Generate only for posts without meta
  seo meta_description --only-poor --limit 10  # Fix 10 poor quality metas
  seo update_meta --site mistergeek.net   # Update all posts on site
  seo update_meta --site A --post-ids 1 2 3  # Update specific posts
  seo update_meta --site A --category "VPN" --limit 10  # Update 10 posts in category
  seo update_meta --site A --author "john" --limit 10  # Update 10 posts by author
  seo update_meta --site A --dry-run      # Preview changes
  seo performance --ga4 analytics.csv --gsc search.csv  # Analyze performance
  seo keywords gsc_export.csv             # Show keyword opportunities
  seo report                              # Generate SEO report
  seo status
    """)
    return 0
--- a/src/seo/exporter.py
+++ b/src/seo/exporter.py
@@ -20,11 +20,21 @@ logger = logging.getLogger(__name__)
 class PostExporter:
    """Export posts from WordPress sites to CSV."""
-    def __init__(self):
+    def __init__(self, author_filter: Optional[List[str]] = None, 
-        """Initialize the exporter."""
+                 author_ids: Optional[List[int]] = None):
        """
        Initialize the exporter.
        Args:
            author_filter: List of author names to filter by (case-insensitive)
            author_ids: List of author IDs to filter by
        """
        self.sites = Config.WORDPRESS_SITES
        self.all_posts = []
        self.category_cache = {}
        self.author_filter = author_filter
        self.author_ids = author_ids
        self.author_cache = {}  # Cache author info by site
    def fetch_category_names(self, site_name: str, site_config: Dict) -> Dict[int, Dict]:
        """Fetch category names from a WordPress site."""
@@ -50,8 +60,55 @@ class PostExporter:
        self.category_cache[site_name] = categories
        return categories
-    def fetch_posts_from_site(self, site_name: str, site_config: Dict) -> List[Dict]:
+    def fetch_authors(self, site_name: str, site_config: Dict) -> Dict[int, Dict]:
-        """Fetch all posts from a WordPress site."""
+        """
        Fetch all authors/users from a WordPress site.
        Returns:
            Dict mapping author ID to author data (name, slug)
        """
        if site_name in self.author_cache:
            return self.author_cache[site_name]
        logger.info(f"  Fetching authors from {site_name}...")
        authors = {}
        base_url = site_config['url'].rstrip('/')
        api_url = f"{base_url}/wp-json/wp/v2/users"
        auth = HTTPBasicAuth(site_config['username'], site_config['password'])
        try:
            response = requests.get(api_url, params={'per_page': 100}, auth=auth, timeout=10)
            response.raise_for_status()
            for user in response.json():
                authors[user['id']] = {
                    'id': user['id'],
                    'name': user.get('name', ''),
                    'slug': user.get('slug', ''),
                    'description': user.get('description', '')
                }
            logger.info(f"    ✓ Fetched {len(authors)} authors")
        except Exception as e:
            logger.warning(f"  Could not fetch authors from {site_name}: {e}")
            # Fallback: create empty dict if authors can't be fetched
            # Author IDs will still be exported, just without names
        self.author_cache[site_name] = authors
        return authors
    def fetch_posts_from_site(self, site_name: str, site_config: Dict,
                              authors_map: Optional[Dict[int, Dict]] = None) -> List[Dict]:
        """
        Fetch all posts from a WordPress site.
        Args:
            site_name: Site name
            site_config: Site configuration
            authors_map: Optional authors mapping for filtering
        Returns:
            List of post data
        """
        logger.info(f"\nFetching posts from {site_name}...")
        posts = []
@@ -59,14 +116,23 @@ class PostExporter:
        api_url = f"{base_url}/wp-json/wp/v2/posts"
        auth = HTTPBasicAuth(site_config['username'], site_config['password'])
        # Build base params
        base_params = {'page': 1, 'per_page': 100, '_embed': True}
        # Add author filter if specified
        if self.author_ids:
            base_params['author'] = ','.join(map(str, self.author_ids))
            logger.info(f"  Filtering by author IDs: {self.author_ids}")
        for status in ['publish', 'draft']:
            page = 1
            while True:
                try:
                    params = {**base_params, 'page': page, 'status': status}
                    logger.info(f"  Fetching page {page} ({status} posts)...")
                    response = requests.get(
                        api_url,
-                        params={'page': page, 'per_page': 100, 'status': status},
+                        params=params,
                        auth=auth,
                        timeout=10
                    )
@@ -76,7 +142,28 @@ class PostExporter:
                    if not page_posts:
                        break
                    # Filter by author name if specified
                    if self.author_filter and authors_map:
                        filtered_posts = []
                        for post in page_posts:
                            author_id = post.get('author')
                            if author_id and author_id in authors_map:
                                author_name = authors_map[author_id]['name'].lower()
                                author_slug = authors_map[author_id]['slug'].lower()
                                # Check if author matches filter
                                for filter_name in self.author_filter:
                                    filter_lower = filter_name.lower()
                                    if (filter_lower in author_name or 
                                        filter_lower == author_slug):
                                        filtered_posts.append(post)
                                        break
                        page_posts = filtered_posts
                        logger.info(f"    ✓ Got {len(page_posts)} posts after author filter")
                    posts.extend(page_posts)
                    if page_posts:
                        logger.info(f"    ✓ Got {len(page_posts)} posts")
                    page += 1
@@ -94,7 +181,8 @@ class PostExporter:
        logger.info(f"✓ Total posts from {site_name}: {len(posts)}\n")
        return posts
-    def extract_post_details(self, post: Dict, site_name: str, category_map: Dict) -> Dict:
+    def extract_post_details(self, post: Dict, site_name: str, category_map: Dict,
                            author_map: Optional[Dict[int, Dict]] = None) -> Dict:
        """Extract post details for CSV export."""
        title = post.get('title', {})
        if isinstance(title, dict):
@@ -122,6 +210,13 @@ class PostExporter:
            for cat_id in category_ids
        ]) if category_ids else ''
        # Get author name from author map
        author_id = post.get('author', '')
        author_name = ''
        if author_map and author_id:
            author_data = author_map.get(author_id, {})
            author_name = author_data.get('name', '')
        return {
            'site': site_name,
            'post_id': post['id'],
@@ -129,7 +224,8 @@ class PostExporter:
            'title': title.strip(),
            'slug': post.get('slug', ''),
            'url': post.get('link', ''),
-            'author_id': post.get('author', ''),
+            'author_id': author_id,
            'author_name': author_name,
            'date_published': post.get('date', ''),
            'date_modified': post.get('modified', ''),
            'categories': category_names,
@@ -158,7 +254,7 @@ class PostExporter:
            return ""
        fieldnames = [
-            'site', 'post_id', 'status', 'title', 'slug', 'url', 'author_id',
+            'site', 'post_id', 'status', 'title', 'slug', 'url', 'author_id', 'author_name',
            'date_published', 'date_modified', 'categories', 'tags', 'excerpt',
            'content_preview', 'seo_title', 'meta_description', 'focus_keyword', 'word_count',
        ]
@@ -173,24 +269,46 @@ class PostExporter:
        logger.info(f"✓ CSV exported to: {output_file}")
        return str(output_file)
-    def run(self) -> str:
+    def run(self, site_filter: Optional[str] = None) -> str:
-        """Run the complete export process."""
+        """
        Run the complete export process.
        Args:
            site_filter: Optional site name to export from (default: all sites)
        Returns:
            Path to exported CSV file
        """
        logger.info("="*70)
        logger.info("EXPORTING ALL POSTS")
        logger.info("="*70)
        if self.author_filter:
            logger.info(f"Author filter: {self.author_filter}")
        if self.author_ids:
            logger.info(f"Author IDs: {self.author_ids}")
        if site_filter:
            logger.info(f"Site filter: {site_filter}")
        logger.info("Sites configured: " + ", ".join(self.sites.keys()))
        for site_name, config in self.sites.items():
            # Skip sites if filter is specified
            if site_filter and site_name != site_filter:
                logger.info(f"Skipping {site_name} (not in filter)")
                continue
            categories = self.fetch_category_names(site_name, config)
-            posts = self.fetch_posts_from_site(site_name, config)
+            authors = self.fetch_authors(site_name, config)
            posts = self.fetch_posts_from_site(site_name, config, authors)
            if posts:
                for post in posts:
-                    post_details = self.extract_post_details(post, site_name, categories)
+                    post_details = self.extract_post_details(post, site_name, categories, authors)
                    self.all_posts.append(post_details)
        if not self.all_posts:
-            logger.error("No posts found on any site")
+            logger.warning("No posts found matching criteria")
            return ""
        self.all_posts.sort(key=lambda x: (x['site'], x['post_id']))
--- a/src/seo/media_importer.py
+++ b/src/seo/media_importer.py
@@ -0,0 +1,467 @@
 """
 Media Importer - Import media from one WordPress site to another
 Specifically designed for migrated posts
 """
 import logging
 import os
 import tempfile
 import requests
 from requests.auth import HTTPBasicAuth
 from pathlib import Path
 from datetime import datetime
 from typing import Dict, List, Optional, Tuple
 import csv
 from .config import Config
 logger = logging.getLogger(__name__)
 class WordPressMediaImporter:
    """Import media from source WordPress site to destination site."""
    def __init__(self, source_site: str = 'mistergeek.net',
                 destination_site: str = 'hellogeek.net'):
        """
        Initialize media importer.
        Args:
            source_site: Source site name
            destination_site: Destination site name
        """
        self.source_site = source_site
        self.destination_site = destination_site
        self.sites = Config.WORDPRESS_SITES
        # Validate sites
        if source_site not in self.sites:
            raise ValueError(f"Source site '{source_site}' not found")
        if destination_site not in self.sites:
            raise ValueError(f"Destination site '{destination_site}' not found")
        # Setup source
        self.source_config = self.sites[source_site]
        self.source_url = self.source_config['url'].rstrip('/')
        self.source_auth = HTTPBasicAuth(
            self.source_config['username'],
            self.source_config['password']
        )
        # Setup destination
        self.dest_config = self.sites[destination_site]
        self.dest_url = self.dest_config['url'].rstrip('/')
        self.dest_auth = HTTPBasicAuth(
            self.dest_config['username'],
            self.dest_config['password']
        )
        self.media_cache = {}  # Cache source media ID -> dest media ID
        self.stats = {
            'total_posts': 0,
            'posts_with_media': 0,
            'images_downloaded': 0,
            'images_uploaded': 0,
            'featured_images_set': 0,
            'errors': 0
        }
    def fetch_migrated_posts(self, post_ids: Optional[List[int]] = None) -> List[Dict]:
        """
        Fetch posts that need media imported.
        Args:
            post_ids: Specific post IDs to process
        Returns:
            List of post dicts
        """
        logger.info(f"Fetching posts from {self.destination_site}...")
        if post_ids:
            # Fetch specific posts
            posts = []
            for post_id in post_ids:
                try:
                    response = requests.get(
                        f"{self.dest_url}/wp-json/wp/v2/posts/{post_id}",
                        auth=self.dest_auth,
                        timeout=10
                    )
                    if response.status_code == 200:
                        posts.append(response.json())
                except Exception as e:
                    logger.error(f"Error fetching post {post_id}: {e}")
            return posts
        else:
            # Fetch recent posts (assuming migrated posts are recent)
            try:
                response = requests.get(
                    f"{self.dest_url}/wp-json/wp/v2/posts",
                    params={
                        'per_page': 100,
                        'status': 'publish,draft',
                        '_embed': True
                    },
                    auth=self.dest_auth,
                    timeout=30
                )
                response.raise_for_status()
                return response.json()
            except Exception as e:
                logger.error(f"Error fetching posts: {e}")
                return []
    def get_source_post(self, post_id: int) -> Optional[Dict]:
        """
        Fetch corresponding post from source site.
        Args:
            post_id: Post ID on source site
        Returns:
            Post dict or None
        """
        try:
            response = requests.get(
                f"{self.source_url}/wp-json/wp/v2/posts/{post_id}",
                auth=self.source_auth,
                timeout=10,
                params={'_embed': True}
            )
            if response.status_code == 200:
                return response.json()
            else:
                logger.warning(f"Source post {post_id} not found")
                return None
        except Exception as e:
            logger.error(f"Error fetching source post {post_id}: {e}")
            return None
    def download_media(self, media_url: str) -> Optional[bytes]:
        """
        Download media file from source site.
        Args:
            media_url: URL of media file
        Returns:
            File content bytes or None
        """
        try:
            response = requests.get(media_url, timeout=30)
            response.raise_for_status()
            return response.content
        except Exception as e:
            logger.error(f"Error downloading {media_url}: {e}")
            return None
    def upload_media(self, file_content: bytes, filename: str,
                     mime_type: str = 'image/jpeg',
                     alt_text: str = '',
                     caption: str = '') -> Optional[int]:
        """
        Upload media to destination site.
        Args:
            file_content: File content bytes
            filename: Filename for the media
            mime_type: MIME type of the file
            alt_text: Alt text for the image
            caption: Caption for the image
        Returns:
            Media ID on destination site or None
        """
        try:
            # Upload file
            files = {'file': (filename, file_content, mime_type)}
            response = requests.post(
                f"{self.dest_url}/wp-json/wp/v2/media",
                files=files,
                auth=self.dest_auth,
                headers={
                    'Content-Disposition': f'attachment; filename={filename}',
                    'Content-Type': mime_type
                },
                timeout=30
            )
            if response.status_code == 201:
                media_data = response.json()
                media_id = media_data['id']
                # Update alt text and caption
                if alt_text or caption:
                    meta_update = {}
                    if alt_text:
                        meta_update['_wp_attachment_image_alt'] = alt_text
                    if caption:
                        meta_update['excerpt'] = caption
                    requests.post(
                        f"{self.dest_url}/wp-json/wp/v2/media/{media_id}",
                        json=meta_update,
                        auth=self.dest_auth,
                        timeout=10
                    )
                logger.info(f"✓ Uploaded {filename} (ID: {media_id})")
                return media_id
            else:
                logger.error(f"Error uploading {filename}: {response.status_code}")
                return None
        except Exception as e:
            logger.error(f"Error uploading {filename}: {e}")
            return None
    def import_featured_image(self, source_post: Dict, dest_post_id: int) -> bool:
        """
        Import featured image from source post to destination post.
        Args:
            source_post: Source post dict
            dest_post_id: Destination post ID
        Returns:
            True if successful
        """
        # Check if source has featured image
        featured_media_id = source_post.get('featured_media')
        if not featured_media_id:
            logger.info(f"  No featured image on source post")
            return False
        # Check if already imported
        if featured_media_id in self.media_cache:
            dest_media_id = self.media_cache[featured_media_id]
            logger.info(f"  Using cached media ID: {dest_media_id}")
        else:
            # Fetch media details from source
            try:
                media_response = requests.get(
                    f"{self.source_url}/wp-json/wp/v2/media/{featured_media_id}",
                    auth=self.source_auth,
                    timeout=10
                )
                if media_response.status_code != 200:
                    logger.error(f"Could not fetch media {featured_media_id}")
                    return False
                media_data = media_response.json()
                # Download media file
                media_url = media_data.get('source_url', '')
                if not media_url:
                    # Try alternative URL structure
                    media_url = media_data.get('guid', {}).get('rendered', '')
                file_content = self.download_media(media_url)
                if not file_content:
                    return False
                # Extract filename and mime type
                filename = media_data.get('slug', 'image.jpg') + '.jpg'
                mime_type = media_data.get('mime_type', 'image/jpeg')
                alt_text = media_data.get('alt_text', '')
                caption = media_data.get('caption', {}).get('rendered', '')
                # Upload to destination
                dest_media_id = self.upload_media(
                    file_content, filename, mime_type, alt_text, caption
                )
                if not dest_media_id:
                    return False
                # Cache the mapping
                self.media_cache[featured_media_id] = dest_media_id
                self.stats['images_uploaded'] += 1
            except Exception as e:
                logger.error(f"Error importing featured image: {e}")
                return False
        # Set featured image on destination post
        try:
            response = requests.post(
                f"{self.dest_url}/wp-json/wp/v2/posts/{dest_post_id}",
                json={'featured_media': dest_media_id},
                auth=self.dest_auth,
                timeout=10
            )
            if response.status_code == 200:
                logger.info(f"✓ Set featured image on post {dest_post_id}")
                self.stats['featured_images_set'] += 1
                return True
            else:
                logger.error(f"Error setting featured image: {response.status_code}")
                return False
        except Exception as e:
            logger.error(f"Error setting featured image: {e}")
            return False
    def import_post_media(self, source_post: Dict, dest_post_id: int) -> int:
        """
        Import all media from a post (featured image + inline images).
        Args:
            source_post: Source post dict
            dest_post_id: Destination post ID
        Returns:
            Number of images imported
        """
        images_imported = 0
        # Import featured image
        if self.import_featured_image(source_post, dest_post_id):
            images_imported += 1
        # TODO: Import inline images from content
        # This would require parsing the content for <img> tags
        # and replacing source URLs with destination URLs
        return images_imported
    def process_posts(self, post_mappings: List[Tuple[int, int]],
                     dry_run: bool = False) -> Dict:
        """
        Process media import for mapped posts.
        Args:
            post_mappings: List of (source_post_id, dest_post_id) tuples
            dry_run: If True, preview without importing
        Returns:
            Statistics dict
        """
        logger.info("\n" + "="*70)
        logger.info("MEDIA IMPORTER")
        logger.info("="*70)
        logger.info(f"Source: {self.source_site}")
        logger.info(f"Destination: {self.destination_site}")
        logger.info(f"Posts to process: {len(post_mappings)}")
        logger.info(f"Dry run: {dry_run}")
        logger.info("="*70)
        self.stats['total_posts'] = len(post_mappings)
        for i, (source_id, dest_id) in enumerate(post_mappings, 1):
            logger.info(f"\n[{i}/{len(post_mappings)}] Processing post mapping:")
            logger.info(f"  Source: {source_id} → Destination: {dest_id}")
            # Fetch source post
            source_post = self.get_source_post(source_id)
            if not source_post:
                logger.warning(f"  Skipping: Source post not found")
                self.stats['errors'] += 1
                continue
            # Check if source has media
            if not source_post.get('featured_media'):
                logger.info(f"  No featured image to import")
                continue
            self.stats['posts_with_media'] += 1
            if dry_run:
                logger.info(f"  [DRY RUN] Would import featured image")
                self.stats['images_downloaded'] += 1
                self.stats['images_uploaded'] += 1
                self.stats['featured_images_set'] += 1
            else:
                # Import media
                imported = self.import_post_media(source_post, dest_id)
                if imported > 0:
                    self.stats['images_downloaded'] += imported
        # Print summary
        logger.info("\n" + "="*70)
        logger.info("IMPORT SUMMARY")
        logger.info("="*70)
        logger.info(f"Total posts: {self.stats['total_posts']}")
        logger.info(f"Posts with media: {self.stats['posts_with_media']}")
        logger.info(f"Images downloaded: {self.stats['images_downloaded']}")
        logger.info(f"Images uploaded: {self.stats['images_uploaded']}")
        logger.info(f"Featured images set: {self.stats['featured_images_set']}")
        logger.info(f"Errors: {self.stats['errors']}")
        logger.info("="*70)
        return self.stats
    def run_from_csv(self, csv_file: str, dry_run: bool = False) -> Dict:
        """
        Import media for posts listed in CSV file.
        CSV should have columns: source_post_id, destination_post_id
        Args:
            csv_file: Path to CSV file with post mappings
            dry_run: If True, preview without importing
        Returns:
            Statistics dict
        """
        logger.info(f"Loading post mappings from: {csv_file}")
        try:
            with open(csv_file, 'r', encoding='utf-8') as f:
                reader = csv.DictReader(f)
                mappings = []
                for row in reader:
                    source_id = int(row.get('source_post_id', 0))
                    dest_id = int(row.get('destination_post_id', 0))
                    if source_id and dest_id:
                        mappings.append((source_id, dest_id))
            logger.info(f"✓ Loaded {len(mappings)} post mappings")
        except Exception as e:
            logger.error(f"Error loading CSV: {e}")
            return self.stats
        return self.process_posts(mappings, dry_run=dry_run)
    def run_from_migration_report(self, report_file: str,
                                  dry_run: bool = False) -> Dict:
        """
        Import media using migration report CSV.
        Args:
            report_file: Path to migration report CSV
            dry_run: If True, preview without importing
        Returns:
            Statistics dict
        """
        logger.info(f"Loading migration report: {report_file}")
        try:
            with open(report_file, 'r', encoding='utf-8') as f:
                reader = csv.DictReader(f)
                mappings = []
                for row in reader:
                    source_id = int(row.get('source_post_id', 0))
                    dest_id = int(row.get('destination_post_id', 0))
                    if source_id and dest_id:
                        mappings.append((source_id, dest_id))
            logger.info(f"✓ Loaded {len(mappings)} post mappings from migration report")
        except Exception as e:
            logger.error(f"Error loading migration report: {e}")
            return self.stats
        return self.process_posts(mappings, dry_run=dry_run)
--- a/src/seo/meta_description_generator.py
+++ b/src/seo/meta_description_generator.py
@@ -0,0 +1,482 @@
 """
 Meta Description Generator - AI-powered meta description generation and optimization
 """
 import csv
 import json
 import logging
 import time
 from pathlib import Path
 from datetime import datetime
 from typing import Dict, List, Optional, Tuple
 import requests
 from .config import Config
 logger = logging.getLogger(__name__)
 class MetaDescriptionGenerator:
    """AI-powered meta description generator and optimizer."""
    def __init__(self, csv_file: str):
        """
        Initialize the generator.
        Args:
            csv_file: Path to CSV file with posts
        """
        self.csv_file = Path(csv_file)
        self.openrouter_api_key = Config.OPENROUTER_API_KEY
        self.ai_model = Config.AI_MODEL
        self.posts = []
        self.generated_results = []
        self.api_calls = 0
        self.ai_cost = 0.0
        # Meta description best practices
        self.max_length = 160  # Optimal length for SEO
        self.min_length = 120
        self.include_keywords = True
    def load_csv(self) -> bool:
        """Load posts from CSV file."""
        logger.info(f"Loading CSV: {self.csv_file}")
        if not self.csv_file.exists():
            logger.error(f"CSV file not found: {self.csv_file}")
            return False
        try:
            with open(self.csv_file, 'r', encoding='utf-8') as f:
                reader = csv.DictReader(f)
                self.posts = list(reader)
            logger.info(f"✓ Loaded {len(self.posts)} posts from CSV")
            return True
        except Exception as e:
            logger.error(f"Error loading CSV: {e}")
            return False
    def _build_prompt(self, post: Dict) -> str:
        """
        Build AI prompt for meta description generation.
        Args:
            post: Post data dict
        Returns:
            AI prompt string
        """
        title = post.get('title', '')
        content_preview = post.get('content_preview', '')
        excerpt = post.get('excerpt', '')
        focus_keyword = post.get('focus_keyword', '')
        current_meta = post.get('meta_description', '')
        # Build context from available content
        content_context = ""
        if excerpt:
            content_context += f"Excerpt: {excerpt}\n"
        if content_preview:
            content_context += f"Content preview: {content_preview[:300]}..."
        prompt = f"""You are an SEO expert. Generate an optimized meta description for the following blog post.
 **Post Title:** {title}
 **Content Context:**
 {content_context}
 **Focus Keyword:** {focus_keyword if focus_keyword else 'Not specified'}
 **Current Meta Description:** {current_meta if current_meta else 'None (needs to be created)'}
 **Requirements:**
 1. Length: 120-160 characters (optimal for SEO)
 2. Include the focus keyword naturally if available
 3. Make it compelling and action-oriented
 4. Clearly describe what the post is about
 5. Use active voice
 6. Include a call-to-action when appropriate
 7. Avoid clickbait - be accurate and valuable
 8. Write in the same language as the content
 **Output Format:**
 Return ONLY the meta description text, nothing else. No quotes, no explanations."""
        return prompt
    def _call_ai_api(self, prompt: str) -> Optional[str]:
        """
        Call AI API to generate meta description.
        Args:
            prompt: AI prompt
        Returns:
            Generated meta description or None
        """
        url = "https://openrouter.ai/api/v1/chat/completions"
        headers = {
            "Authorization": f"Bearer {self.openrouter_api_key}",
            "Content-Type": "application/json"
        }
        payload = {
            "model": self.ai_model,
            "messages": [
                {
                    "role": "system",
                    "content": "You are an SEO expert specializing in meta description optimization. You write compelling, concise, and search-engine optimized meta descriptions."
                },
                {
                    "role": "user",
                    "content": prompt
                }
            ],
            "temperature": 0.7,
            "max_tokens": 100
        }
        try:
            response = requests.post(url, json=payload, headers=headers, timeout=30)
            response.raise_for_status()
            result = response.json()
            self.api_calls += 1
            # Extract generated text
            if 'choices' in result and len(result['choices']) > 0:
                meta_description = result['choices'][0]['message']['content'].strip()
                # Remove quotes if AI included them
                if meta_description.startswith('"') and meta_description.endswith('"'):
                    meta_description = meta_description[1:-1]
                return meta_description
            else:
                logger.warning("No AI response received")
                return None
        except requests.exceptions.RequestException as e:
            logger.error(f"API call failed: {e}")
            return None
        except Exception as e:
            logger.error(f"Error processing AI response: {e}")
            return None
    def _validate_meta_description(self, meta: str) -> Dict[str, any]:
        """
        Validate meta description quality.
        Args:
            meta: Meta description text
        Returns:
            Validation results dict
        """
        length = len(meta)
        validation = {
            'length': length,
            'is_valid': False,
            'too_short': False,
            'too_long': False,
            'optimal': False,
            'score': 0
        }
        # Check length
        if length < self.min_length:
            validation['too_short'] = True
            validation['score'] = max(0, 50 - (self.min_length - length))
        elif length > self.max_length:
            validation['too_long'] = True
            validation['score'] = max(0, 50 - (length - self.max_length))
        else:
            validation['optimal'] = True
            validation['score'] = 100
        # Check if it ends with a period (good practice)
        if meta.endswith('.'):
            validation['score'] = min(100, validation['score'] + 5)
        # Check for call-to-action words
        cta_words = ['learn', 'discover', 'find', 'explore', 'read', 'get', 'see', 'try', 'start']
        if any(word in meta.lower() for word in cta_words):
            validation['score'] = min(100, validation['score'] + 5)
        validation['is_valid'] = validation['score'] >= 70
        return validation
    def generate_for_post(self, post: Dict) -> Optional[Dict]:
        """
        Generate meta description for a single post.
        Args:
            post: Post data dict
        Returns:
            Result dict with generated meta and validation
        """
        title = post.get('title', '')
        post_id = post.get('post_id', '')
        current_meta = post.get('meta_description', '')
        logger.info(f"Generating meta description for post {post_id}: {title[:50]}...")
        # Skip if post has no title
        if not title:
            logger.warning(f"Skipping post {post_id}: No title")
            return None
        # Build prompt and call AI
        prompt = self._build_prompt(post)
        generated_meta = self._call_ai_api(prompt)
        if not generated_meta:
            logger.error(f"Failed to generate meta description for post {post_id}")
            return None
        # Validate the result
        validation = self._validate_meta_description(generated_meta)
        # Calculate improvement
        improvement = False
        if current_meta:
            current_validation = self._validate_meta_description(current_meta)
            improvement = validation['score'] > current_validation['score']
        else:
            improvement = True  # Any meta is an improvement over none
        result = {
            'post_id': post_id,
            'site': post.get('site', ''),
            'title': title,
            'current_meta_description': current_meta,
            'generated_meta_description': generated_meta,
            'generated_length': validation['length'],
            'validation_score': validation['score'],
            'is_optimal_length': validation['optimal'],
            'improvement': improvement,
            'status': 'generated'
        }
        logger.info(f"✓ Generated meta description (score: {validation['score']}, length: {validation['length']})")
        # Rate limiting
        time.sleep(0.5)
        return result
    def generate_batch(self, batch: List[Dict]) -> List[Dict]:
        """
        Generate meta descriptions for a batch of posts.
        Args:
            batch: List of post dicts
        Returns:
            List of result dicts
        """
        results = []
        for i, post in enumerate(batch, 1):
            logger.info(f"Processing post {i}/{len(batch)}")
            result = self.generate_for_post(post)
            if result:
                results.append(result)
        return results
    def filter_posts_for_generation(self, posts: List[Dict], 
                                    only_missing: bool = False,
                                    only_poor_quality: bool = False) -> List[Dict]:
        """
        Filter posts based on meta description status.
        Args:
            posts: List of post dicts
            only_missing: Only include posts without meta descriptions
            only_poor_quality: Only include posts with poor meta descriptions
        Returns:
            Filtered list of posts
        """
        filtered = []
        for post in posts:
            current_meta = post.get('meta_description', '')
            if only_missing:
                # Skip posts that already have meta descriptions
                if current_meta:
                    continue
                filtered.append(post)
            elif only_poor_quality:
                # Skip posts without meta descriptions (handle separately)
                if not current_meta:
                    continue
                # Check if current meta is poor quality
                validation = self._validate_meta_description(current_meta)
                if validation['score'] < 70:
                    filtered.append(post)
            else:
                # Include all posts
                filtered.append(post)
        return filtered
    def save_results(self, results: List[Dict], output_file: Optional[str] = None) -> str:
        """
        Save generation results to CSV.
        Args:
            results: List of result dicts
            output_file: Custom output file path
        Returns:
            Path to saved file
        """
        if not output_file:
            output_dir = Path(__file__).parent.parent.parent / 'output'
            output_dir.mkdir(parents=True, exist_ok=True)
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            output_file = output_dir / f'meta_descriptions_{timestamp}.csv'
        output_file = Path(output_file)
        output_file.parent.mkdir(parents=True, exist_ok=True)
        fieldnames = [
            'post_id', 'site', 'title', 'current_meta_description',
            'generated_meta_description', 'generated_length',
            'validation_score', 'is_optimal_length', 'improvement', 'status'
        ]
        logger.info(f"Saving {len(results)} results to {output_file}...")
        with open(output_file, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerows(results)
        logger.info(f"✓ Results saved to: {output_file}")
        return str(output_file)
    def generate_summary(self, results: List[Dict]) -> Dict:
        """
        Generate summary statistics.
        Args:
            results: List of result dicts
        Returns:
            Summary dict
        """
        if not results:
            return {}
        total = len(results)
        improved = sum(1 for r in results if r.get('improvement', False))
        optimal_length = sum(1 for r in results if r.get('is_optimal_length', False))
        avg_score = sum(r.get('validation_score', 0) for r in results) / total
        # Count by site
        by_site = {}
        for r in results:
            site = r.get('site', 'unknown')
            if site not in by_site:
                by_site[site] = {'total': 0, 'improved': 0}
            by_site[site]['total'] += 1
            if r.get('improvement', False):
                by_site[site]['improved'] += 1
        summary = {
            'total_posts': total,
            'improved': improved,
            'improvement_rate': (improved / total * 100) if total > 0 else 0,
            'optimal_length_count': optimal_length,
            'optimal_length_rate': (optimal_length / total * 100) if total > 0 else 0,
            'average_score': avg_score,
            'api_calls': self.api_calls,
            'by_site': by_site
        }
        return summary
    def run(self, output_file: Optional[str] = None,
            only_missing: bool = False,
            only_poor_quality: bool = False,
            limit: Optional[int] = None) -> Tuple[str, Dict]:
        """
        Run complete meta description generation process.
        Args:
            output_file: Custom output file path
            only_missing: Only generate for posts without meta descriptions
            only_poor_quality: Only generate for posts with poor quality meta descriptions
            limit: Maximum number of posts to process
        Returns:
            Tuple of (output_file_path, summary_dict)
        """
        logger.info("\n" + "="*70)
        logger.info("AI META DESCRIPTION GENERATION")
        logger.info("="*70)
        # Load posts
        if not self.load_csv():
            return "", {}
        # Filter posts
        posts_to_process = self.filter_posts_for_generation(
            self.posts,
            only_missing=only_missing,
            only_poor_quality=only_poor_quality
        )
        logger.info(f"Posts to process: {len(posts_to_process)}")
        if only_missing:
            logger.info("Filter: Only posts without meta descriptions")
        elif only_poor_quality:
            logger.info("Filter: Only posts with poor quality meta descriptions")
        # Apply limit
        if limit:
            posts_to_process = posts_to_process[:limit]
            logger.info(f"Limited to: {len(posts_to_process)} posts")
        if not posts_to_process:
            logger.warning("No posts to process")
            return "", {}
        # Generate meta descriptions
        results = self.generate_batch(posts_to_process)
        # Save results
        if results:
            output_path = self.save_results(results, output_file)
            # Generate and log summary
            summary = self.generate_summary(results)
            logger.info("\n" + "="*70)
            logger.info("GENERATION SUMMARY")
            logger.info("="*70)
            logger.info(f"Total posts processed: {summary['total_posts']}")
            logger.info(f"Improved: {summary['improved']} ({summary['improvement_rate']:.1f}%)")
            logger.info(f"Optimal length: {summary['optimal_length_count']} ({summary['optimal_length_rate']:.1f}%)")
            logger.info(f"Average validation score: {summary['average_score']:.1f}")
            logger.info(f"API calls made: {summary['api_calls']}")
            logger.info("="*70)
            return output_path, summary
        else:
            logger.warning("No results generated")
            return "", {}
--- a/src/seo/meta_description_updater.py
+++ b/src/seo/meta_description_updater.py
@@ -0,0 +1,631 @@
 """
 Meta Description Updater - Fetch, generate, and update meta descriptions directly on WordPress
 """
 import csv
 import json
 import logging
 import time
 from pathlib import Path
 from datetime import datetime
 from typing import Dict, List, Optional, Tuple
 import requests
 from requests.auth import HTTPBasicAuth
 from .config import Config
 from .meta_description_generator import MetaDescriptionGenerator
 logger = logging.getLogger(__name__)
 class MetaDescriptionUpdater:
    """Fetch posts from WordPress, generate AI meta descriptions, and update them."""
    def __init__(self, site_name: str):
        """
        Initialize the updater.
        Args:
            site_name: WordPress site name (e.g., 'mistergeek.net')
        """
        self.site_name = site_name
        self.sites = Config.WORDPRESS_SITES
        if site_name not in self.sites:
            raise ValueError(f"Site '{site_name}' not found in configuration")
        self.site_config = self.sites[site_name]
        self.base_url = self.site_config['url'].rstrip('/')
        self.auth = HTTPBasicAuth(
            self.site_config['username'],
            self.site_config['password']
        )
        self.openrouter_api_key = Config.OPENROUTER_API_KEY
        self.ai_model = Config.AI_MODEL
        self.posts = []
        self.update_results = []
        self.api_calls = 0
        self.stats = {
            'total_posts': 0,
            'updated': 0,
            'failed': 0,
            'skipped': 0
        }
    def fetch_posts(self, post_ids: Optional[List[int]] = None,
                    category_ids: Optional[List[int]] = None,
                    category_names: Optional[List[str]] = None,
                    author_names: Optional[List[str]] = None,
                    limit: Optional[int] = None,
                    status: Optional[List[str]] = None) -> List[Dict]:
        """
        Fetch posts from WordPress site.
        Args:
            post_ids: Specific post IDs to fetch
            category_ids: Filter by category IDs
            category_names: Filter by category names (will be resolved to IDs)
            author_names: Filter by author names
            limit: Maximum number of posts to fetch
            status: Post statuses to fetch (default: ['publish'])
        Returns:
            List of post dicts
        """
        logger.info(f"Fetching posts from {self.site_name}...")
        if post_ids:
            logger.info(f"  Post IDs: {post_ids}")
        if category_ids:
            logger.info(f"  Category IDs: {category_ids}")
        if category_names:
            logger.info(f"  Category names: {category_names}")
        if author_names:
            logger.info(f"  Authors: {author_names}")
        if limit:
            logger.info(f"  Limit: {limit}")
        # Resolve category names to IDs if needed
        if category_names and not category_ids:
            category_ids = self._get_category_ids_by_names(category_names)
        # Resolve author names to IDs if needed
        author_ids = None
        if author_names:
            author_ids = self._get_author_ids_by_names(author_names)
        # Build API parameters
        params = {
            'per_page': 100,
            'page': 1,
            'status': ','.join(status) if status else 'publish',
            '_embed': True
        }
        if post_ids:
            # Fetch specific posts
            posts = []
            for post_id in post_ids:
                try:
                    response = requests.get(
                        f"{self.base_url}/wp-json/wp/v2/posts/{post_id}",
                        auth=self.auth,
                        timeout=10
                    )
                    if response.status_code == 200:
                        posts.append(response.json())
                    else:
                        logger.warning(f"  Post {post_id} not found or inaccessible")
                except Exception as e:
                    logger.error(f"  Error fetching post {post_id}: {e}")
            self.posts = posts
        else:
            # Fetch posts with filters
            if category_ids:
                params['categories'] = ','.join(map(str, category_ids))
            if author_ids:
                params['author'] = ','.join(map(str, author_ids))
            posts = []
            while True:
                try:
                    response = requests.get(
                        f"{self.base_url}/wp-json/wp/v2/posts",
                        params=params,
                        auth=self.auth,
                        timeout=30
                    )
                    response.raise_for_status()
                    page_posts = response.json()
                    if not page_posts:
                        break
                    posts.extend(page_posts)
                    if len(page_posts) < 100:
                        break
                    if limit and len(posts) >= limit:
                        break
                    params['page'] += 1
                    time.sleep(0.3)
                except Exception as e:
                    logger.error(f"Error fetching posts: {e}")
                    break
            # Apply limit if specified
            if limit:
                posts = posts[:limit]
            self.posts = posts
        logger.info(f"✓ Fetched {len(self.posts)} posts from {self.site_name}")
        return self.posts
    def _get_category_ids_by_names(self, category_names: List[str]) -> List[int]:
        """
        Get category IDs by category names.
        Args:
            category_names: List of category names
        Returns:
            List of category IDs
        """
        logger.info(f"Resolving category names to IDs...")
        try:
            response = requests.get(
                f"{self.base_url}/wp-json/wp/v2/categories",
                params={'per_page': 100},
                auth=self.auth,
                timeout=10
            )
            response.raise_for_status()
            categories = response.json()
            category_map = {cat['name'].lower(): cat['id'] for cat in categories}
            category_ids = []
            for name in category_names:
                name_lower = name.lower()
                if name_lower in category_map:
                    category_ids.append(category_map[name_lower])
                    logger.info(f"  ✓ '{name}' -> ID {category_map[name_lower]}")
                else:
                    # Try partial match
                    for cat_name, cat_id in category_map.items():
                        if name_lower in cat_name or cat_name in name_lower:
                            category_ids.append(cat_id)
                            logger.info(f"  ✓ '{name}' -> ID {cat_id} (partial match)")
                            break
                    else:
                        logger.warning(f"  ✗ Category '{name}' not found")
            return category_ids
        except Exception as e:
            logger.error(f"Error fetching categories: {e}")
            return []
    def _get_author_ids_by_names(self, author_names: List[str]) -> List[int]:
        """
        Get author/user IDs by author names.
        Args:
            author_names: List of author names
        Returns:
            List of author IDs
        """
        logger.info(f"Resolving author names to IDs...")
        try:
            response = requests.get(
                f"{self.base_url}/wp-json/wp/v2/users",
                params={'per_page': 100},
                auth=self.auth,
                timeout=10
            )
            response.raise_for_status()
            users = response.json()
            author_map = {}
            # Build map of name/slug to ID
            for user in users:
                name = user.get('name', '').lower()
                slug = user.get('slug', '').lower()
                author_map[name] = user['id']
                author_map[slug] = user['id']
            author_ids = []
            for name in author_names:
                name_lower = name.lower()
                # Try exact match
                if name_lower in author_map:
                    author_ids.append(author_map[name_lower])
                    logger.info(f"  ✓ '{name}' -> ID {author_map[name_lower]}")
                else:
                    # Try partial match
                    found = False
                    for author_name, author_id in author_map.items():
                        if name_lower in author_name or author_name in name_lower:
                            author_ids.append(author_id)
                            logger.info(f"  ✓ '{name}' -> ID {author_id} (partial match: '{author_name}')")
                            found = True
                            break
                    if not found:
                        logger.warning(f"  ✗ Author '{name}' not found")
            return author_ids
        except Exception as e:
            logger.error(f"Error fetching authors: {e}")
            return []
    def _generate_meta_description(self, post: Dict) -> Optional[str]:
        """
        Generate meta description for a post using AI.
        Args:
            post: Post data dict
        Returns:
            Generated meta description or None
        """
        title = post.get('title', {}).get('rendered', '')
        content = post.get('content', {}).get('rendered', '')
        excerpt = post.get('excerpt', {}).get('rendered', '')
        # Strip HTML from content
        import re
        content_text = re.sub('<[^<]+?>', '', content)[:500]
        excerpt_text = re.sub('<[^<]+?>', '', excerpt)
        # Build prompt
        prompt = f"""You are an SEO expert. Generate an optimized meta description for the following blog post.
 **Post Title:** {title}
 **Content Context:**
 Excerpt: {excerpt_text}
 Content preview: {content_text}...
 **Requirements:**
 1. Length: 120-160 characters (optimal for SEO)
 2. Make it compelling and action-oriented
 3. Clearly describe what the post is about
 4. Use active voice
 5. Include a call-to-action when appropriate
 6. Avoid clickbait - be accurate and valuable
 **Output Format:**
 Return ONLY the meta description text, nothing else. No quotes, no explanations."""
        # Call AI API
        url = "https://openrouter.ai/api/v1/chat/completions"
        headers = {
            "Authorization": f"Bearer {self.openrouter_api_key}",
            "Content-Type": "application/json"
        }
        payload = {
            "model": self.ai_model,
            "messages": [
                {
                    "role": "system",
                    "content": "You are an SEO expert specializing in meta description optimization."
                },
                {
                    "role": "user",
                    "content": prompt
                }
            ],
            "temperature": 0.7,
            "max_tokens": 100
        }
        try:
            response = requests.post(url, json=payload, headers=headers, timeout=30)
            response.raise_for_status()
            result = response.json()
            self.api_calls += 1
            if 'choices' in result and len(result['choices']) > 0:
                meta_description = result['choices'][0]['message']['content'].strip()
                # Remove quotes if AI included them
                if meta_description.startswith('"') and meta_description.endswith('"'):
                    meta_description = meta_description[1:-1]
                return meta_description
            else:
                logger.warning("No AI response received")
                return None
        except Exception as e:
            logger.error(f"API call failed: {e}")
            return None
    def _update_post_meta(self, post_id: int, meta_description: str) -> bool:
        """
        Update post meta description in WordPress.
        Args:
            post_id: Post ID to update
            meta_description: New meta description
        Returns:
            True if successful, False otherwise
        """
        logger.info(f"Updating post {post_id}...")
        # Determine which SEO plugin meta key to use
        # Try RankMath first, then Yoast
        meta_fields = {
            'rank_math_description': meta_description
        }
        try:
            # First, get current post meta to preserve other fields
            response = requests.get(
                f"{self.base_url}/wp-json/wp/v2/posts/{post_id}",
                auth=self.auth,
                timeout=10
            )
            if response.status_code != 200:
                logger.error(f"  Could not fetch post {post_id}")
                return False
            current_post = response.json()
            current_meta = current_post.get('meta', {})
            # Update with new meta description
            updated_meta = {**current_meta, **meta_fields}
            # Update post
            update_response = requests.post(
                f"{self.base_url}/wp-json/wp/v2/posts/{post_id}",
                json={'meta': updated_meta},
                auth=self.auth,
                timeout=10
            )
            if update_response.status_code == 200:
                logger.info(f"  ✓ Updated post {post_id}")
                return True
            else:
                logger.error(f"  ✗ Failed to update post {post_id}: {update_response.status_code}")
                logger.error(f"    Response: {update_response.text}")
                return False
        except Exception as e:
            logger.error(f"  ✗ Error updating post {post_id}: {e}")
            return False
    def _validate_meta_description(self, meta: str) -> Dict:
        """Validate meta description quality."""
        length = len(meta)
        validation = {
            'length': length,
            'is_optimal': 120 <= length <= 160,
            'too_short': length < 120,
            'too_long': length > 160,
            'score': 0
        }
        if validation['is_optimal']:
            validation['score'] = 100
        elif validation['too_short']:
            validation['score'] = max(0, 50 - (120 - length))
        else:
            validation['score'] = max(0, 50 - (length - 160))
        # Bonus for ending with period
        if meta.endswith('.'):
            validation['score'] = min(100, validation['score'] + 5)
        # Bonus for CTA words
        cta_words = ['learn', 'discover', 'find', 'explore', 'read', 'get', 'see', 'try', 'start']
        if any(word in meta.lower() for word in cta_words):
            validation['score'] = min(100, validation['score'] + 5)
        return validation
    def update_posts(self, dry_run: bool = False,
                     skip_existing: bool = False,
                     force_regenerate: bool = False) -> Dict:
        """
        Generate and update meta descriptions for fetched posts.
        Args:
            dry_run: If True, preview changes without updating
            skip_existing: If True, skip posts that already have meta descriptions
            force_regenerate: If True, regenerate even for posts with good meta descriptions
        Returns:
            Statistics dict
        """
        logger.info("\n" + "="*70)
        logger.info("META DESCRIPTION UPDATE")
        logger.info("="*70)
        logger.info(f"Site: {self.site_name}")
        logger.info(f"Posts to process: {len(self.posts)}")
        logger.info(f"Dry run: {dry_run}")
        logger.info(f"Skip existing: {skip_existing}")
        logger.info(f"Force regenerate: {force_regenerate}")
        logger.info("="*70)
        self.stats['total_posts'] = len(self.posts)
        for i, post in enumerate(self.posts, 1):
            post_id = post.get('id')
            title = post.get('title', {}).get('rendered', '')[:50]
            logger.info(f"\n[{i}/{len(self.posts)}] Processing post {post_id}: {title}...")
            # Check current meta description
            meta_dict = post.get('meta', {})
            current_meta = (
                meta_dict.get('rank_math_description', '') or
                meta_dict.get('_yoast_wpseo_metadesc', '') or
                ''
            )
            # Skip if has existing meta and skip_existing is True
            if current_meta and skip_existing and not force_regenerate:
                logger.info(f"  Skipping: Already has meta description")
                self.stats['skipped'] += 1
                continue
            # Validate existing meta (if any)
            if current_meta and not force_regenerate:
                validation = self._validate_meta_description(current_meta)
                if validation['score'] >= 80:
                    logger.info(f"  Skipping: Existing meta is good quality (score: {validation['score']})")
                    self.stats['skipped'] += 1
                    continue
            # Generate new meta description
            logger.info(f"  Generating meta description...")
            generated_meta = self._generate_meta_description(post)
            if not generated_meta:
                logger.error(f"  ✗ Failed to generate meta description")
                self.stats['failed'] += 1
                continue
            # Validate generated meta
            validation = self._validate_meta_description(generated_meta)
            logger.info(f"  Generated: {generated_meta[:80]}...")
            logger.info(f"  Length: {validation['length']} chars, Score: {validation['score']}")
            # Update post
            if dry_run:
                logger.info(f"  [DRY RUN] Would update post {post_id}")
                self.update_results.append({
                    'post_id': post_id,
                    'title': title,
                    'current_meta': current_meta,
                    'generated_meta': generated_meta,
                    'status': 'dry_run',
                    'validation_score': validation['score']
                })
            else:
                success = self._update_post_meta(post_id, generated_meta)
                if success:
                    logger.info(f"  ✓ Successfully updated post {post_id}")
                    self.stats['updated'] += 1
                    self.update_results.append({
                        'post_id': post_id,
                        'title': title,
                        'current_meta': current_meta,
                        'generated_meta': generated_meta,
                        'status': 'updated',
                        'validation_score': validation['score']
                    })
                else:
                    self.stats['failed'] += 1
                    self.update_results.append({
                        'post_id': post_id,
                        'title': title,
                        'status': 'failed',
                        'validation_score': validation['score']
                    })
            # Rate limiting
            time.sleep(0.5)
        # Save results
        self._save_results()
        # Print summary
        logger.info("\n" + "="*70)
        logger.info("UPDATE SUMMARY")
        logger.info("="*70)
        logger.info(f"Total posts: {self.stats['total_posts']}")
        logger.info(f"Updated: {self.stats['updated']}")
        logger.info(f"Failed: {self.stats['failed']}")
        logger.info(f"Skipped: {self.stats['skipped']}")
        logger.info(f"API calls: {self.api_calls}")
        logger.info("="*70)
        return self.stats
    def _save_results(self):
        """Save update results to CSV."""
        if not self.update_results:
            return
        output_dir = Path(__file__).parent.parent.parent / 'output'
        output_dir.mkdir(parents=True, exist_ok=True)
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        output_file = output_dir / f'meta_update_{self.site_name}_{timestamp}.csv'
        fieldnames = [
            'post_id', 'title', 'current_meta', 'generated_meta',
            'status', 'validation_score'
        ]
        with open(output_file, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerows(self.update_results)
        logger.info(f"\n✓ Results saved to: {output_file}")
    def run(self, post_ids: Optional[List[int]] = None,
            category_ids: Optional[List[int]] = None,
            category_names: Optional[List[str]] = None,
            author_names: Optional[List[str]] = None,
            limit: Optional[int] = None,
            dry_run: bool = False,
            skip_existing: bool = False,
            force_regenerate: bool = False) -> Dict:
        """
        Run complete meta description update process.
        Args:
            post_ids: Specific post IDs to update
            category_ids: Filter by category IDs
            category_names: Filter by category names
            author_names: Filter by author names
            limit: Maximum number of posts to process
            dry_run: If True, preview changes without updating
            skip_existing: If True, skip posts with existing meta descriptions
            force_regenerate: If True, regenerate even for good quality metas
        Returns:
            Statistics dict
        """
        # Fetch posts
        self.fetch_posts(
            post_ids=post_ids,
            category_ids=category_ids,
            category_names=category_names,
            author_names=author_names,
            limit=limit
        )
        if not self.posts:
            logger.warning("No posts found matching criteria")
            return self.stats
        # Update posts
        return self.update_posts(
            dry_run=dry_run,
            skip_existing=skip_existing,
            force_regenerate=force_regenerate
        )
--- a/src/seo/performance_analyzer.py
+++ b/src/seo/performance_analyzer.py
@@ -0,0 +1,396 @@
 """
 SEO Performance Analyzer - Analyze page performance from imported data
 Supports Google Analytics and Search Console CSV imports
 """
 import csv
 import logging
 from pathlib import Path
 from datetime import datetime
 from typing import Dict, List, Optional, Tuple
 logger = logging.getLogger(__name__)
 class PerformanceAnalyzer:
    """Analyze SEO performance from imported CSV data."""
    def __init__(self):
        """Initialize performance analyzer."""
        self.performance_data = []
        self.analysis_results = {}
    def load_ga4_export(self, csv_file: str) -> List[Dict]:
        """
        Load Google Analytics 4 export CSV.
        Expected columns: page_path, page_title, pageviews, sessions, bounce_rate, etc.
        Args:
            csv_file: Path to GA4 export CSV
        Returns:
            List of data dicts
        """
        logger.info(f"Loading GA4 export: {csv_file}")
        try:
            with open(csv_file, 'r', encoding='utf-8') as f:
                reader = csv.DictReader(f)
                data = list(reader)
            # Normalize column names
            normalized = []
            for row in data:
                normalized_row = {}
                for key, value in row.items():
                    # Normalize key names
                    new_key = key.lower().replace(' ', '_').replace('-', '_')
                    if 'page' in new_key and 'path' in new_key:
                        normalized_row['page'] = value
                    elif 'page' in new_key and 'title' in new_key:
                        normalized_row['page_title'] = value
                    elif 'pageviews' in new_key or 'views' in new_key:
                        normalized_row['pageviews'] = int(value) if value else 0
                    elif 'sessions' in new_key:
                        normalized_row['sessions'] = int(value) if value else 0
                    elif 'bounce' in new_key and 'rate' in new_key:
                        normalized_row['bounce_rate'] = float(value) if value else 0.0
                    elif 'engagement' in new_key and 'rate' in new_key:
                        normalized_row['engagement_rate'] = float(value) if value else 0.0
                    elif 'duration' in new_key or 'time' in new_key:
                        normalized_row['avg_session_duration'] = float(value) if value else 0.0
                    else:
                        normalized_row[new_key] = value
                normalized.append(normalized_row)
            self.performance_data.extend(normalized)
            logger.info(f"✓ Loaded {len(normalized)} rows from GA4")
            return normalized
        except Exception as e:
            logger.error(f"Error loading GA4 export: {e}")
            return []
    def load_gsc_export(self, csv_file: str) -> List[Dict]:
        """
        Load Google Search Console export CSV.
        Expected columns: Page, Clicks, Impressions, CTR, Position
        Args:
            csv_file: Path to GSC export CSV
        Returns:
            List of data dicts
        """
        logger.info(f"Loading GSC export: {csv_file}")
        try:
            with open(csv_file, 'r', encoding='utf-8') as f:
                reader = csv.DictReader(f)
                data = list(reader)
            # Normalize column names
            normalized = []
            for row in data:
                normalized_row = {'page': ''}
                for key, value in row.items():
                    new_key = key.lower().replace(' ', '_')
                    if 'page' in new_key or 'url' in new_key:
                        normalized_row['page'] = value
                    elif 'clicks' in new_key:
                        normalized_row['clicks'] = int(value) if value else 0
                    elif 'impressions' in new_key:
                        normalized_row['impressions'] = int(value) if value else 0
                    elif 'ctr' in new_key:
                        normalized_row['ctr'] = float(value) if value else 0.0
                    elif 'position' in new_key or 'rank' in new_key:
                        normalized_row['position'] = float(value) if value else 0.0
                    elif 'query' in new_key or 'keyword' in new_key:
                        normalized_row['query'] = value
                normalized.append(normalized_row)
            # Merge with existing data
            self._merge_gsc_data(normalized)
            logger.info(f"✓ Loaded {len(normalized)} rows from GSC")
            return normalized
        except Exception as e:
            logger.error(f"Error loading GSC export: {e}")
            return []
    def _merge_gsc_data(self, gsc_data: List[Dict]):
        """Merge GSC data with existing performance data."""
        # Create lookup by page
        existing_pages = {p.get('page', ''): p for p in self.performance_data}
        for gsc_row in gsc_data:
            page = gsc_row.get('page', '')
            if page in existing_pages:
                # Update existing record
                existing_pages[page].update(gsc_row)
            else:
                # Add new record
                new_record = {
                    'page': page,
                    'page_title': '',
                    'pageviews': 0,
                    'sessions': 0,
                    'bounce_rate': 0.0,
                    'engagement_rate': 0.0,
                    'avg_session_duration': 0.0
                }
                new_record.update(gsc_row)
                self.performance_data.append(new_record)
    def analyze(self) -> Dict:
        """
        Analyze performance data.
        Returns:
            Analysis results dict
        """
        if not self.performance_data:
            logger.warning("No data to analyze")
            return {}
        logger.info("\n" + "="*70)
        logger.info("PERFORMANCE ANALYSIS")
        logger.info("="*70)
        # Calculate summary metrics
        total_pages = len(self.performance_data)
        total_pageviews = sum(p.get('pageviews', 0) for p in self.performance_data)
        total_clicks = sum(p.get('clicks', 0) for p in self.performance_data)
        total_impressions = sum(p.get('impressions', 0) for p in self.performance_data)
        avg_ctr = total_clicks / total_impressions if total_impressions > 0 else 0.0
        avg_position = sum(p.get('position', 0) for p in self.performance_data) / total_pages if total_pages > 0 else 0.0
        # Top pages
        top_by_views = sorted(
            self.performance_data,
            key=lambda x: x.get('pageviews', 0),
            reverse=True
        )[:20]
        top_by_clicks = sorted(
            self.performance_data,
            key=lambda x: x.get('clicks', 0),
            reverse=True
        )[:20]
        # Pages with issues
        low_ctr = [
            p for p in self.performance_data
            if p.get('impressions', 0) > 100 and p.get('ctr', 0) < 0.02
        ]
        low_position = [
            p for p in self.performance_data
            if p.get('impressions', 0) > 50 and p.get('position', 0) > 20
        ]
        high_impressions_low_clicks = [
            p for p in self.performance_data
            if p.get('impressions', 0) > 500 and p.get('ctr', 0) < 0.01
        ]
        # Keyword opportunities (from GSC data)
        keyword_opportunities = self._analyze_keywords()
        analysis = {
            'summary': {
                'total_pages': total_pages,
                'total_pageviews': total_pageviews,
                'total_clicks': total_clicks,
                'total_impressions': total_impressions,
                'average_ctr': avg_ctr,
                'average_position': avg_position
            },
            'top_pages': {
                'by_views': top_by_views,
                'by_clicks': top_by_clicks
            },
            'issues': {
                'low_ctr': low_ctr,
                'low_position': low_position,
                'high_impressions_low_clicks': high_impressions_low_clicks
            },
            'keyword_opportunities': keyword_opportunities,
            'recommendations': self._generate_recommendations(analysis)
        }
        # Log summary
        logger.info(f"Total pages analyzed: {total_pages}")
        logger.info(f"Total pageviews: {total_pageviews}")
        logger.info(f"Total clicks: {total_clicks}")
        logger.info(f"Total impressions: {total_impressions}")
        logger.info(f"Average CTR: {avg_ctr:.2%}")
        logger.info(f"Average position: {avg_position:.1f}")
        logger.info(f"\nPages with low CTR: {len(low_ctr)}")
        logger.info(f"Pages with low position: {len(low_position)}")
        logger.info(f"High impression, low click pages: {len(high_impressions_low_clicks)}")
        logger.info("="*70)
        self.analysis_results = analysis
        return analysis
    def _analyze_keywords(self) -> List[Dict]:
        """Analyze keyword opportunities from GSC data."""
        keywords = {}
        for page in self.performance_data:
            query = page.get('query', '')
            if not query:
                continue
            if query not in keywords:
                keywords[query] = {
                    'query': query,
                    'clicks': 0,
                    'impressions': 0,
                    'position': 0.0,
                    'pages': []
                }
            keywords[query]['clicks'] += page.get('clicks', 0)
            keywords[query]['impressions'] += page.get('impressions', 0)
            keywords[query]['pages'].append(page.get('page', ''))
        # Calculate average position per keyword
        for query in keywords:
            positions = [
                p.get('position', 0) for p in self.performance_data
                if p.get('query') == query
            ]
            if positions:
                keywords[query]['position'] = sum(positions) / len(positions)
        # Sort by impressions
        keyword_list = list(keywords.values())
        keyword_list.sort(key=lambda x: x['impressions'], reverse=True)
        # Filter opportunities (position 5-20, high impressions)
        opportunities = [
            k for k in keyword_list
            if 5 <= k['position'] <= 20 and k['impressions'] > 100
        ]
        return opportunities[:50]  # Top 50 opportunities
    def _generate_recommendations(self, analysis: Dict) -> List[str]:
        """Generate SEO recommendations."""
        recommendations = []
        issues = analysis.get('issues', {})
        # Low CTR
        low_ctr_count = len(issues.get('low_ctr', []))
        if low_ctr_count > 0:
            recommendations.append(
                f"📝 {low_ctr_count} pages have low CTR (<2% with 100+ impressions). "
                "Improve meta titles and descriptions to increase click-through rate."
            )
        # Low position
        low_pos_count = len(issues.get('low_position', []))
        if low_pos_count > 0:
            recommendations.append(
                f"📊 {low_pos_count} pages rank beyond position 20. "
                "Consider content optimization and internal linking."
            )
        # High impressions, low clicks
        high_imp_count = len(issues.get('high_impressions_low_clicks', []))
        if high_imp_count > 0:
            recommendations.append(
                f"⚠️ {high_imp_count} pages have 500+ impressions but <1% CTR. "
                "These are prime candidates for title/description optimization."
            )
        # Keyword opportunities
        keyword_count = len(analysis.get('keyword_opportunities', []))
        if keyword_count > 0:
            recommendations.append(
                f"🎯 {keyword_count} keyword opportunities identified (ranking 5-20). "
                "Focus content optimization on these keywords."
            )
        return recommendations
    def save_analysis(self, output_file: Optional[str] = None) -> str:
        """
        Save analysis results to CSV.
        Args:
            output_file: Custom output file path
        Returns:
            Path to saved file
        """
        if not output_file:
            output_dir = Path(__file__).parent.parent.parent / 'output'
            output_dir.mkdir(parents=True, exist_ok=True)
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            output_file = output_dir / f'performance_analysis_{timestamp}.csv'
        output_file = Path(output_file)
        output_file.parent.mkdir(parents=True, exist_ok=True)
        fieldnames = [
            'page', 'page_title', 'pageviews', 'sessions', 'bounce_rate',
            'engagement_rate', 'avg_session_duration', 'clicks', 'impressions',
            'ctr', 'position', 'query'
        ]
        logger.info(f"Saving analysis to {output_file}...")
        with open(output_file, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerows(self.performance_data)
        logger.info(f"✓ Saved to: {output_file}")
        return str(output_file)
    def run(self, ga4_file: Optional[str] = None,
            gsc_file: Optional[str] = None,
            output_file: Optional[str] = None) -> Tuple[str, Dict]:
        """
        Run complete performance analysis.
        Args:
            ga4_file: Path to GA4 export CSV
            gsc_file: Path to GSC export CSV
            output_file: Custom output file path
        Returns:
            Tuple of (output_file_path, analysis_dict)
        """
        logger.info("\n" + "="*70)
        logger.info("SEO PERFORMANCE ANALYZER")
        logger.info("="*70)
        # Load data
        if ga4_file:
            self.load_ga4_export(ga4_file)
        if gsc_file:
            self.load_gsc_export(gsc_file)
        if not self.performance_data:
            logger.error("No data loaded. Provide GA4 and/or GSC export files.")
            return "", {}
        # Analyze
        analysis = self.analyze()
        # Save
        output_path = self.save_analysis(output_file)
        return output_path, analysis
--- a/src/seo/performance_tracker.py
+++ b/src/seo/performance_tracker.py
@@ -0,0 +1,494 @@
 """
 SEO Performance Tracker - Google Analytics 4 & Search Console Integration
 Fetch and analyze page performance data for SEO optimization
 """
 import csv
 import json
 import logging
 from pathlib import Path
 from datetime import datetime, timedelta
 from typing import Dict, List, Optional, Tuple
 # Optional Google imports
 try:
    from google.analytics.admin import AnalyticsAdminServiceClient
    from google.analytics.data import BetaAnalyticsDataClient
    from google.analytics.data_v1beta.types import (
        DateRange,
        Dimension,
        Metric,
        RunReportRequest,
    )
    from google.oauth2 import service_account
    from googleapiclient.discovery import build
    GOOGLE_AVAILABLE = True
 except ImportError:
    GOOGLE_AVAILABLE = False
    logger = logging.getLogger(__name__)
    logger.warning("Google libraries not installed. API mode disabled. Use CSV imports instead.")
 from .config import Config
 logger = logging.getLogger(__name__)
 class SEOPerformanceTracker:
    """Track and analyze SEO performance from Google Analytics and Search Console."""
    def __init__(self, ga4_credentials: Optional[str] = None,
                 gsc_credentials: Optional[str] = None,
                 ga4_property_id: Optional[str] = None,
                 gsc_site_url: Optional[str] = None):
        """
        Initialize performance tracker.
        Args:
            ga4_credentials: Path to GA4 service account JSON
            gsc_credentials: Path to GSC service account JSON
            ga4_property_id: GA4 property ID (e.g., "properties/123456789")
            gsc_site_url: GSC site URL (e.g., "https://www.mistergeek.net")
        """
        self.ga4_credentials = ga4_credentials or Config.GA4_CREDENTIALS
        self.gsc_credentials = gsc_credentials or Config.GSC_CREDENTIALS
        self.ga4_property_id = ga4_property_id or Config.GA4_PROPERTY_ID
        self.gsc_site_url = gsc_site_url or Config.GSC_SITE_URL
        self.ga4_client = None
        self.gsc_service = None
        # Initialize clients
        self._init_ga4_client()
        self._init_gsc_service()
        self.performance_data = []
    def _init_ga4_client(self):
        """Initialize Google Analytics 4 client."""
        if not GOOGLE_AVAILABLE:
            logger.warning("Google libraries not installed. API mode disabled.")
            return
        if not self.ga4_credentials or not self.ga4_property_id:
            logger.warning("GA4 credentials not configured")
            return
        try:
            credentials = service_account.Credentials.from_service_account_file(
                self.ga4_credentials,
                scopes=["https://www.googleapis.com/auth/analytics.readonly"]
            )
            self.ga4_client = BetaAnalyticsDataClient(credentials=credentials)
            logger.info("✓ GA4 client initialized")
        except Exception as e:
            logger.error(f"Failed to initialize GA4 client: {e}")
            self.ga4_client = None
    def _init_gsc_service(self):
        """Initialize Google Search Console service."""
        if not GOOGLE_AVAILABLE:
            logger.warning("Google libraries not installed. API mode disabled.")
            return
        if not self.gsc_credentials:
            logger.warning("GSC credentials not configured")
            return
        try:
            credentials = service_account.Credentials.from_service_account_file(
                self.gsc_credentials,
                scopes=["https://www.googleapis.com/auth/webmasters.readonly"]
            )
            self.gsc_service = build('webmasters', 'v3', credentials=credentials)
            logger.info("✓ GSC service initialized")
        except Exception as e:
            logger.error(f"Failed to initialize GSC service: {e}")
            self.gsc_service = None
    def fetch_ga4_data(self, start_date: str, end_date: str,
                       dimensions: Optional[List[str]] = None) -> List[Dict]:
        """
        Fetch data from Google Analytics 4.
        Args:
            start_date: Start date (YYYY-MM-DD)
            end_date: End date (YYYY-MM-DD)
            dimensions: List of dimensions to fetch
        Returns:
            List of performance data dicts
        """
        if not self.ga4_client:
            logger.warning("GA4 client not available")
            return []
        logger.info(f"Fetching GA4 data from {start_date} to {end_date}...")
        # Default dimensions
        if dimensions is None:
            dimensions = ['pagePath', 'pageTitle']
        # Default metrics
        metrics = [
            'screenPageViews',
            'sessions',
            'bounceRate',
            'averageSessionDuration',
            'engagementRate'
        ]
        try:
            request = RunReportRequest(
                property=self.ga4_property_id,
                dimensions=[Dimension(name=dim) for dim in dimensions],
                metrics=[Metric(name=metric) for metric in metrics],
                date_ranges=[DateRange(start_date=start_date, end_date=end_date)]
            )
            response = self.ga4_client.run_report(request)
            data = []
            for row in response.rows:
                row_data = {}
                # Extract dimensions
                for i, dim_header in enumerate(response.dimension_headers):
                    row_data[dim_header.name] = row.dimension_values[i].value
                # Extract metrics
                for i, metric_header in enumerate(response.metric_headers):
                    value = row.metric_values[i].value
                    # Convert to appropriate type
                    if metric_header.name in ['bounceRate', 'engagementRate']:
                        value = float(value) if value else 0.0
                    elif metric_header.name in ['screenPageViews', 'sessions']:
                        value = int(value) if value else 0
                    elif metric_header.name == 'averageSessionDuration':
                        value = float(value) if value else 0.0
                    row_data[metric_header.name] = value
                data.append(row_data)
            logger.info(f"✓ Fetched {len(data)} rows from GA4")
            return data
        except Exception as e:
            logger.error(f"Error fetching GA4 data: {e}")
            return []
    def fetch_gsc_data(self, start_date: str, end_date: str,
                       dimensions: Optional[List[str]] = None) -> List[Dict]:
        """
        Fetch data from Google Search Console.
        Args:
            start_date: Start date (YYYY-MM-DD)
            end_date: End date (YYYY-MM-DD)
            dimensions: List of dimensions to fetch
        Returns:
            List of performance data dicts
        """
        if not self.gsc_service:
            logger.warning("GSC service not available")
            return []
        logger.info(f"Fetching GSC data from {start_date} to {end_date}...")
        # Default dimensions
        if dimensions is None:
            dimensions = ['page']
        try:
            # Build request
            request = {
                'startDate': start_date,
                'endDate': end_date,
                'dimensions': dimensions,
                'rowLimit': 5000,
                'startRow': 0
            }
            response = self.gsc_service.searchanalytics().query(
                siteUrl=self.gsc_site_url,
                body=request
            ).execute()
            data = []
            if 'rows' in response:
                for row in response['rows']:
                    row_data = {
                        'page': row['keys'][0] if len(row['keys']) > 0 else '',
                        'clicks': row.get('clicks', 0),
                        'impressions': row.get('impressions', 0),
                        'ctr': row.get('ctr', 0.0),
                        'position': row.get('position', 0.0)
                    }
                    # Add query if available
                    if len(row['keys']) > 1:
                        row_data['query'] = row['keys'][1]
                    data.append(row_data)
            logger.info(f"✓ Fetched {len(data)} rows from GSC")
            return data
        except Exception as e:
            logger.error(f"Error fetching GSC data: {e}")
            return []
    def fetch_combined_data(self, start_date: str, end_date: str) -> List[Dict]:
        """
        Fetch and combine data from GA4 and GSC.
        Args:
            start_date: Start date (YYYY-MM-DD)
            end_date: End date (YYYY-MM-DD)
        Returns:
            List of combined performance data dicts
        """
        logger.info("\n" + "="*70)
        logger.info("FETCHING PERFORMANCE DATA")
        logger.info("="*70)
        # Fetch from both sources
        ga4_data = self.fetch_ga4_data(start_date, end_date)
        gsc_data = self.fetch_gsc_data(start_date, end_date)
        # Combine data by page path
        combined = {}
        # Add GA4 data
        for row in ga4_data:
            page_path = row.get('pagePath', '')
            combined[page_path] = {
                'page': page_path,
                'page_title': row.get('pageTitle', ''),
                'pageviews': row.get('screenPageViews', 0),
                'sessions': row.get('sessions', 0),
                'bounce_rate': row.get('bounceRate', 0.0),
                'avg_session_duration': row.get('averageSessionDuration', 0.0),
                'engagement_rate': row.get('engagementRate', 0.0),
                'clicks': 0,
                'impressions': 0,
                'ctr': 0.0,
                'position': 0.0
            }
        # Merge GSC data
        for row in gsc_data:
            page_path = row.get('page', '')
            if page_path in combined:
                # Update existing record
                combined[page_path]['clicks'] = row.get('clicks', 0)
                combined[page_path]['impressions'] = row.get('impressions', 0)
                combined[page_path]['ctr'] = row.get('ctr', 0.0)
                combined[page_path]['position'] = row.get('position', 0.0)
            else:
                # Create new record
                combined[page_path] = {
                    'page': page_path,
                    'page_title': '',
                    'pageviews': 0,
                    'sessions': 0,
                    'bounce_rate': 0.0,
                    'avg_session_duration': 0.0,
                    'engagement_rate': 0.0,
                    'clicks': row.get('clicks', 0),
                    'impressions': row.get('impressions', 0),
                    'ctr': row.get('ctr', 0.0),
                    'position': row.get('position', 0.0)
                }
        self.performance_data = list(combined.values())
        logger.info(f"✓ Combined {len(self.performance_data)} pages")
        logger.info("="*70)
        return self.performance_data
    def analyze_performance(self) -> Dict:
        """
        Analyze performance data and generate insights.
        Returns:
            Analysis results dict
        """
        if not self.performance_data:
            return {}
        logger.info("\n" + "="*70)
        logger.info("PERFORMANCE ANALYSIS")
        logger.info("="*70)
        # Calculate metrics
        total_pageviews = sum(p.get('pageviews', 0) for p in self.performance_data)
        total_clicks = sum(p.get('clicks', 0) for p in self.performance_data)
        total_impressions = sum(p.get('impressions', 0) for p in self.performance_data)
        avg_ctr = total_clicks / total_impressions if total_impressions > 0 else 0
        avg_position = sum(p.get('position', 0) for p in self.performance_data) / len(self.performance_data)
        # Top pages by pageviews
        top_pages = sorted(
            self.performance_data,
            key=lambda x: x.get('pageviews', 0),
            reverse=True
        )[:10]
        # Top pages by CTR
        top_ctr = sorted(
            [p for p in self.performance_data if p.get('impressions', 0) > 100],
            key=lambda x: x.get('ctr', 0),
            reverse=True
        )[:10]
        # Pages needing improvement (low CTR)
        low_ctr = [
            p for p in self.performance_data
            if p.get('impressions', 0) > 100 and p.get('ctr', 0) < 0.02
        ]
        # Pages with good traffic but low position
        opportunity_pages = [
            p for p in self.performance_data
            if p.get('pageviews', 0) > 50 and p.get('position', 0) > 10
        ]
        analysis = {
            'summary': {
                'total_pages': len(self.performance_data),
                'total_pageviews': total_pageviews,
                'total_clicks': total_clicks,
                'total_impressions': total_impressions,
                'average_ctr': avg_ctr,
                'average_position': avg_position
            },
            'top_pages': top_pages,
            'top_ctr': top_ctr,
            'low_ctr': low_ctr,
            'opportunities': opportunity_pages,
            'recommendations': self._generate_recommendations(analysis)
        }
        # Log summary
        logger.info(f"Total pages: {analysis['summary']['total_pages']}")
        logger.info(f"Total pageviews: {analysis['summary']['total_pageviews']}")
        logger.info(f"Total clicks: {analysis['summary']['total_clicks']}")
        logger.info(f"Average CTR: {analysis['summary']['average_ctr']:.2%}")
        logger.info(f"Average position: {analysis['summary']['average_position']:.1f}")
        logger.info("="*70)
        return analysis
    def _generate_recommendations(self, analysis: Dict) -> List[str]:
        """Generate SEO recommendations based on analysis."""
        recommendations = []
        # Low CTR recommendations
        low_ctr_count = len(analysis.get('low_ctr', []))
        if low_ctr_count > 0:
            recommendations.append(
                f"📝 {low_ctr_count} pages have low CTR (<2%). "
                "Consider improving meta titles and descriptions."
            )
        # Position opportunities
        opportunity_count = len(analysis.get('opportunities', []))
        if opportunity_count > 0:
            recommendations.append(
                f"🎯 {opportunity_count} pages have good traffic but rank >10. "
                "Optimize content to improve rankings."
            )
        # High impressions, low clicks
        high_impressions = [
            p for p in self.performance_data
            if p.get('impressions', 0) > 1000 and p.get('ctr', 0) < 0.01
        ]
        if high_impressions:
            recommendations.append(
                f"⚠️ {len(high_impressions)} pages have high impressions but very low CTR. "
                "Review title tags for better click appeal."
            )
        return recommendations
    def save_to_csv(self, output_file: Optional[str] = None) -> str:
        """
        Save performance data to CSV.
        Args:
            output_file: Custom output file path
        Returns:
            Path to saved file
        """
        if not output_file:
            output_dir = Path(__file__).parent.parent.parent / 'output'
            output_dir.mkdir(parents=True, exist_ok=True)
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            output_file = output_dir / f'performance_data_{timestamp}.csv'
        output_file = Path(output_file)
        output_file.parent.mkdir(parents=True, exist_ok=True)
        fieldnames = [
            'page', 'page_title', 'pageviews', 'sessions', 'bounce_rate',
            'avg_session_duration', 'engagement_rate', 'clicks', 'impressions',
            'ctr', 'position'
        ]
        logger.info(f"Saving {len(self.performance_data)} rows to {output_file}...")
        with open(output_file, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerows(self.performance_data)
        logger.info(f"✓ Saved to: {output_file}")
        return str(output_file)
    def run(self, start_date: Optional[str] = None,
            end_date: Optional[str] = None,
            output_file: Optional[str] = None) -> Tuple[str, Dict]:
        """
        Run complete performance analysis.
        Args:
            start_date: Start date (YYYY-MM-DD), default 30 days ago
            end_date: End date (YYYY-MM-DD), default yesterday
            output_file: Custom output file path
        Returns:
            Tuple of (output_file_path, analysis_dict)
        """
        # Default date range (last 30 days)
        if not end_date:
            end_date = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
        if not start_date:
            start_date = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%d')
        logger.info("\n" + "="*70)
        logger.info("SEO PERFORMANCE ANALYSIS")
        logger.info("="*70)
        logger.info(f"Date range: {start_date} to {end_date}")
        logger.info("="*70)
        # Fetch data
        self.fetch_combined_data(start_date, end_date)
        if not self.performance_data:
            logger.warning("No performance data available")
            return "", {}
        # Analyze
        analysis = self.analyze_performance()
        # Save
        output_path = self.save_to_csv(output_file)
        return output_path, analysis
--- a/src/seo/post_migrator.py
+++ b/src/seo/post_migrator.py
Author	SHA1	Message	Date
Kevin Bataille	69e4287366	Add media importer for migrated posts - Add import_media command to import featured images - Fetch media from source site (mistergeek.net) - Upload to destination site (hellogeek.net) - Map source media IDs to destination media IDs - Set featured images on migrated posts - Use migration report CSV as input - Support dry-run mode - Cache media mappings to avoid duplicate uploads Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-02-17 01:50:40 +01:00
Kevin Bataille	6ef268ba80	Add SEO performance tracking features - Add performance command to analyze page metrics from GA4/GSC - Add keywords command to find keyword opportunities - Add report command to generate SEO performance reports - Support CSV imports (no API setup required) - Optional Google API integration for automated data fetching - Analyze pageviews, clicks, impressions, CTR, rankings - Identify low CTR pages, low position pages, opportunities - Generate comprehensive SEO reports with recommendations - Add PERFORMANCE_TRACKING_GUIDE.md with complete documentation Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-02-17 00:51:49 +01:00
Kevin Bataille	ba43d70a56	Reuse --author flag for update_meta command - Use existing --author flag instead of --author-filter - Consistent with export command - Cleaner CLI interface Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-02-17 00:11:18 +01:00
Kevin Bataille	66ea25002a	Add author filter to update_meta command - Add --author-filter option to filter posts by author name - Resolve author names to IDs via WordPress API - Support partial matching for author names - Works with other filters (category, limit, post-ids) - Fix argparse conflict with existing --author flag Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-02-17 00:10:20 +01:00
Kevin Bataille	93ea5794f0	Add direct WordPress meta description updater - Add update_meta command to fetch, generate, and update meta on WordPress - Require --site parameter to specify target website - Support filtering by post IDs (--post-ids) - Support filtering by category names (--category) or IDs (--category-id) - Support limit parameter to batch process posts - Skip existing good quality meta descriptions by default - Add --force flag to regenerate all meta descriptions - Include dry-run mode to preview changes - Save update results to CSV for review - Rate limited API calls (0.5s delay between requests) Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-02-17 00:04:54 +01:00
Kevin Bataille	ba8e39b5d8	Add AI-powered meta description generation - Add meta_description command to generate SEO-optimized meta descriptions - Use AI to generate compelling, length-optimized descriptions (120-160 chars) - Support --only-missing flag for posts without meta descriptions - Support --only-poor flag to improve low-quality meta descriptions - Include quality validation scoring (0-100) - Add call-to-action detection and optimization - Generate detailed CSV reports with validation metrics - Add comprehensive documentation (META_DESCRIPTION_GUIDE.md) Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-02-16 23:54:35 +01:00
Kevin Bataille	84f8fc6db5	Add post migration and author filter features - Add migrate command to transfer posts between websites - Support CSV-based and filtered migration modes - Preserve original post dates (with --ignore-original-date option) - Auto-create categories and tags on destination site - Add author filtering to export (--author and --author-id flags) - Include author_name column in exported CSV - Add comprehensive documentation (MIGRATION_GUIDE.md, AUTHOR_FILTER_GUIDE.md) Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-02-16 23:50:24 +01:00
Kevin Bataille	06d660f9c8	Add confidence breakdown display - Shows High/Medium/Low count breakdown - Helps verify all matching posts will be processed - Example output: Filtered to 328 proposals (confidence >= Medium) Breakdown: High=293, Medium=35, Low=0 Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-02-16 18:21:16 +01:00
Kevin Bataille	54168a1c00	Add strict confidence filtering option ### New Feature: - --strict flag for exact confidence matching - Default: Medium = Medium + High (or better) - Strict: Medium = Medium only (exact match) ### Usage: ./seo category_apply -s mistergeek.net -c Medium # Medium or better ./seo category_apply -s mistergeek.net -c Medium --strict # Medium only ### Example Output: # Default (or better): Filtered to 328 proposals (confidence >= Medium) # Strict mode: Filtered to 156 proposals (confidence = Medium, strict mode) ### Benefits: - More precise control over which posts to update - Can process confidence levels separately - Better for batch processing in stages Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-02-16 17:13:06 +01:00
Kevin Bataille	b265125656	Fix category_apply - Filter by site and show article titles - Filter proposals by current_site (only applies to selected site) - Show article title for each post - Show current and proposed categories - Better error logging Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-02-16 16:36:51 +01:00
Kevin Bataille	fa700cba98	Fix category lookup - handle French characters and existing categories better ### Fixes: - Improved get_or_create_category() with multiple lookup strategies - Handle French characters in category names (Jeu vidéo, Téléchargement) - Better handling of 'term_exists' 400 error from WordPress - Fetch existing category details when creation fails ### Lookup Order: 1. Exact name match (case-insensitive) 2. Slug match 3. Normalized slug (handles French characters) 4. Partial name match ### Benefits: - No more errors for existing categories - Handles accented characters properly - Better caching of existing categories - More robust category creation Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-02-16 16:33:01 +01:00
Kevin Bataille	00f0cce03e	Fix method name - load_posts instead of load_csv Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>	2026-02-16 16:10:02 +01:00