### Fixes:
- Improved get_or_create_category() with multiple lookup strategies
- Handle French characters in category names (Jeu vidéo, Téléchargement)
- Better handling of 'term_exists' 400 error from WordPress
- Fetch existing category details when creation fails
### Lookup Order:
1. Exact name match (case-insensitive)
2. Slug match
3. Normalized slug (handles French characters)
4. Partial name match
### Benefits:
- No more errors for existing categories
- Handles accented characters properly
- Better caching of existing categories
- More robust category creation
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
New Feature:
- Fetch existing categories from WordPress sites before AI proposals
- AI now prefers existing categories to avoid duplicates
- Shows existing categories in AI prompt for better suggestions
- Tracks whether proposed categories are existing or new
### Changes:
- fetch_existing_categories() method - Gets categories from all sites
- Updated AI prompt includes existing categories list
- New CSV column: is_existing_category (Yes/No)
- Statistics showing existing vs new categories
### Benefits:
- Reduces category duplication
- Maintains consistency across posts
- AI makes smarter category suggestions
- Users can see which are existing vs new categories
### AI Prompt Enhancement:
EXISTING CATEGORIES (PREFER THESE TO AVOID DUPLICATES):
mistergeek.net:
- VPN
- Software
- Gaming
...
webscroll.fr:
- Torrenting
- File-Sharing
...
IMPORTANT: Use existing categories when possible...
### Output:
Category Statistics:
Using existing categories: 145 posts
Proposing new categories: 12 posts
Usage:
./seo category_propose # Now fetches existing categories automatically
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Major refactoring of category proposal system:
### Changes:
- Integrated editorial strategy into category proposals
- Added site migration recommendations to category proposals
- AI now considers editorial lines when suggesting categories
- Automatic detection of posts that should migrate between sites
### New Features:
- Editorial line definitions for each site
- Topic-based site matching algorithm
- Migration recommendations alongside category proposals
- Dual output: category_proposals_*.csv + migration_recommendations_*.csv
### Editorial Lines:
mistergeek.net: VPN, Software, Gaming, SEO, Tech (high-value)
webscroll.fr: Torrenting, File-Sharing, Tracker Guides (niche)
hellogeek.net: Experimental, Low-Traffic, Off-Brand (catch-all)
### Output Files:
1. category_proposals_*.csv - Categories + site recommendations
2. migration_recommendations_*.csv - Posts to migrate between sites
### CSV Columns Added:
- recommended_site - Best site for the post
- should_migrate - Yes/No flag
- migration_reason - Why migration is recommended
- current_site - Original site for comparison
### Benefits:
- Categories aligned with site strategy
- Automatic migration detection
- Smarter AI prompts with editorial context
- Unified category + migration workflow
Usage:
./seo category_propose
# Generates both category and migration files
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Major refactoring to create a unified, self-contained Python package:
### Architecture Changes:
- Removed scripts/ directory completely
- All functionality now in src/seo/ package
- Single entry point: ./seo (imports from src/seo/cli)
- No external dependencies on scripts folder
### New Package Structure:
src/seo/
├── __init__.py - Package exports (SEOApp, PostExporter, etc.)
├── cli.py - Command-line interface
├── app.py - Main application class
├── config.py - Configuration management
├── exporter.py - Post export functionality (self-contained)
├── analyzer.py - Enhanced analyzer with selective fields
├── category_proposer.py - AI category proposals (self-contained)
├── seo_checker.py - Placeholder for future implementation
├── categories.py - Placeholder for future implementation
├── approval.py - Placeholder for future implementation
└── recategorizer.py - Placeholder for future implementation
### Features:
- All modules are self-contained (no scripts dependencies)
- EnhancedPostAnalyzer with selective field analysis
- CategoryProposer for AI-powered category suggestions
- Support for in-place CSV updates with backups
- Clean, integrated codebase
### CLI Commands:
- seo export - Export posts from WordPress
- seo analyze - Analyze with AI (supports -f fields, -u update)
- seo category_propose - Propose categories
- seo status - Show output files
- seo help - Show help
### Usage Examples:
./seo export
./seo analyze -f title categories
./seo analyze -u -f meta_description
./seo category_propose
./seo status
### Benefits:
- Single source of truth
- Easier to maintain and extend
- Proper Python package structure
- Can be installed with pip install -e .
- Clean imports throughout
- No path resolution issues
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Architecture Changes:
- Created src/seo/ package with modular architecture
- Main application class (SEOApp) with Rails-inspired API
- Separated concerns into distinct modules:
- app.py: Main application orchestrator
- cli.py: Command-line interface
- config.py: Configuration management
- exporter.py: Post export functionality
- analyzer.py: AI analysis
- recategorizer.py: Recategorization
- seo_checker.py: SEO quality checking
- categories.py: Category management
- approval.py: User approval system
New Features:
- Proper Python package structure (src layout)
- setup.py and setup.cfg for installation
- Can be installed with: pip install -e .
- Entry point: seo = seo.cli:main
- Cleaner imports and dependencies
Benefits:
- Better code organization
- Easier to maintain and extend
- Follows Python best practices
- Proper package isolation
- Can be imported as library
- Testable components
- Clear separation of concerns
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Major refactoring to create a clean, integrated CLI application:
### New Features:
- Unified CLI executable (./seo) with simple command structure
- All commands accept optional CSV file arguments
- Auto-detection of latest files when no arguments provided
- Simplified output directory structure (output/ instead of output/reports/)
- Cleaner export filename format (all_posts_YYYY-MM-DD.csv)
### Commands:
- export: Export all posts from WordPress sites
- analyze [csv]: Analyze posts with AI (optional CSV input)
- recategorize [csv]: Recategorize posts with AI
- seo_check: Check SEO quality
- categories: Manage categories across sites
- approve [files]: Review and approve recommendations
- full_pipeline: Run complete workflow
- analytics, gaps, opportunities, report, status
### Changes:
- Moved all scripts to scripts/ directory
- Created config.yaml for configuration
- Updated all scripts to use output/ directory
- Deprecated old seo-cli.py in favor of new ./seo
- Added AGENTS.md and CHANGELOG.md documentation
- Consolidated README.md with updated usage
### Technical:
- Added PyYAML dependency
- Removed hardcoded configuration values
- All scripts now properly integrated
- Better error handling and user feedback
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>