Refactor SEO automation into unified CLI application

Major refactoring to create a clean, integrated CLI application:

### New Features:
- Unified CLI executable (./seo) with simple command structure
- All commands accept optional CSV file arguments
- Auto-detection of latest files when no arguments provided
- Simplified output directory structure (output/ instead of output/reports/)
- Cleaner export filename format (all_posts_YYYY-MM-DD.csv)

### Commands:
- export: Export all posts from WordPress sites
- analyze [csv]: Analyze posts with AI (optional CSV input)
- recategorize [csv]: Recategorize posts with AI
- seo_check: Check SEO quality
- categories: Manage categories across sites
- approve [files]: Review and approve recommendations
- full_pipeline: Run complete workflow
- analytics, gaps, opportunities, report, status

### Changes:
- Moved all scripts to scripts/ directory
- Created config.yaml for configuration
- Updated all scripts to use output/ directory
- Deprecated old seo-cli.py in favor of new ./seo
- Added AGENTS.md and CHANGELOG.md documentation
- Consolidated README.md with updated usage

### Technical:
- Added PyYAML dependency
- Removed hardcoded configuration values
- All scripts now properly integrated
- Better error handling and user feedback

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
This commit is contained in:
Kevin Bataille
2026-02-16 14:24:44 +01:00
parent 3b51952336
commit 8c7cd24685
57 changed files with 16095 additions and 560 deletions

310
guides/PROJECT_GUIDE.md Normal file
View File

@@ -0,0 +1,310 @@
# SEO Analysis & Improvement System - Project Guide
## 📋 Overview
A complete 4-phase SEO analysis pipeline that:
1. **Integrates** Google Analytics, Search Console, and WordPress data
2. **Identifies** high-potential keywords for optimization (positions 11-30)
3. **Discovers** new content opportunities using AI
4. **Generates** a comprehensive report with 90-day action plan
## 📂 Project Structure
```
seo/
├── input/ # SOURCE DATA (your exports)
│ ├── new-propositions.csv # WordPress posts
│ ├── README.md # How to export data
│ └── analytics/
│ ├── ga4_export.csv # Google Analytics
│ └── gsc/
│ ├── Pages.csv # GSC pages (required)
│ ├── Requêtes.csv # GSC queries (optional)
│ └── ...
├── output/ # RESULTS (auto-generated)
│ ├── results/
│ │ ├── seo_optimization_report.md # 📍 PRIMARY OUTPUT
│ │ ├── posts_with_analytics.csv
│ │ ├── posts_prioritized.csv
│ │ ├── keyword_opportunities.csv
│ │ └── content_gaps.csv
│ │
│ ├── logs/
│ │ ├── import_log.txt
│ │ ├── opportunity_analysis_log.txt
│ │ └── content_gap_analysis_log.txt
│ │
│ └── README.md # Output guide
├── 🚀 run_analysis.sh # Run entire pipeline
├── analytics_importer.py # Phase 1: Merge data
├── opportunity_analyzer.py # Phase 2: Find wins
├── content_gap_analyzer.py # Phase 3: Find gaps
├── report_generator.py # Phase 4: Generate report
├── config.py
├── requirements.txt
├── .env.example
└── .gitignore
```
## 🚀 Getting Started
### Step 1: Prepare Input Data
**Place WordPress posts CSV:**
```
input/new-propositions.csv
```
**Export Google Analytics 4:**
1. Go to: Analytics > Reports > Engagement > Pages and Screens
2. Set date range: Last 90 days
3. Download CSV → Save as: `input/analytics/ga4_export.csv`
**Export Google Search Console (Pages):**
1. Go to: Performance
2. Set date range: Last 90 days
3. Export CSV → Save as: `input/analytics/gsc/Pages.csv`
### Step 2: Run Analysis
```bash
# Run entire pipeline
./run_analysis.sh
# OR run steps individually
./venv/bin/python analytics_importer.py
./venv/bin/python opportunity_analyzer.py
./venv/bin/python content_gap_analyzer.py
./venv/bin/python report_generator.py
```
### Step 3: Review Report
Open: **`output/results/seo_optimization_report.md`**
Contains:
- Executive summary with current metrics
- Top 20 posts ranked by opportunity (with AI recommendations)
- Keyword opportunities breakdown
- Content gap analysis
- 90-day phased action plan
## 📊 What Each Script Does
### `analytics_importer.py` (Phase 1)
**Purpose:** Merge analytics data with WordPress posts
**Input:**
- `input/new-propositions.csv` (WordPress posts)
- `input/analytics/ga4_export.csv` (Google Analytics)
- `input/analytics/gsc/Pages.csv` (Search Console)
**Output:**
- `output/results/posts_with_analytics.csv` (enriched dataset)
- `output/logs/import_log.txt` (matching report)
**Handles:** French and English column names, URL normalization, multi-source merging
### `opportunity_analyzer.py` (Phase 2)
**Purpose:** Identify high-potential optimization opportunities
**Input:**
- `output/results/posts_with_analytics.csv`
**Output:**
- `output/results/keyword_opportunities.csv` (26 opportunities)
- `output/logs/opportunity_analysis_log.txt`
**Features:**
- Filters posts at positions 11-30 (page 2-3)
- Calculates opportunity scores (0-100)
- Generates AI recommendations for top 20 posts
### `content_gap_analyzer.py` (Phase 3)
**Purpose:** Discover new content opportunities
**Input:**
- `output/results/posts_with_analytics.csv`
- `input/analytics/gsc/Requêtes.csv` (optional)
**Output:**
- `output/results/content_gaps.csv`
- `output/logs/content_gap_analysis_log.txt`
**Features:**
- Topic cluster extraction
- Gap identification
- AI-powered content suggestions
### `report_generator.py` (Phase 4)
**Purpose:** Create comprehensive report with action plan
**Input:**
- All analysis results from phases 1-3
**Output:**
- `output/results/seo_optimization_report.md`**PRIMARY DELIVERABLE**
- `output/results/posts_prioritized.csv`
**Features:**
- Comprehensive markdown report
- All 262 posts ranked
- 90-day action plan with estimated gains
## 📈 Understanding Your Report
### Key Metrics (Executive Summary)
- **Total Posts:** All posts analyzed
- **Monthly Traffic:** Current organic traffic
- **Total Impressions:** Search visibility (90 days)
- **Average Position:** Current ranking position
- **Opportunities:** Posts ready to optimize
### Top 20 Posts to Optimize
Each post shows:
- **Title** (the post name)
- **Current Position** (search ranking)
- **Impressions** (search visibility)
- **Traffic** (organic visits)
- **Priority Score** (0-100 opportunity rating)
- **Status** (page 1 vs page 2-3)
- **Recommendations** (how to improve)
### Priority Scoring (0-100)
Higher scores = more opportunity for gain with less effort
Calculated from:
- **Position (35%)** - How close to page 1
- **Traffic Potential (30%)** - Search impressions
- **CTR Gap (20%)** - Improvement opportunity
- **Content Quality (15%)** - Existing engagement
## 🎯 Action Plan
### Week 1-2: Quick Wins (+100 visits/month)
- Focus on posts at positions 11-15
- Update SEO titles and meta descriptions
- 30-60 minutes per post
### Week 3-4: Core Optimization (+150 visits/month)
- Posts 6-15 in priority list
- Add content sections
- Improve structure with headers
- 2-3 hours per post
### Week 5-8: New Content (+300 visits/month)
- Create 3-5 new posts from gap analysis
- Target high-search-demand topics
- 4-6 hours per post
### Week 9-12: Refinement (+100 visits/month)
- Monitor ranking improvements
- Refine underperforming optimizations
- Prepare next round of analysis
**Total: +650 visits/month potential gain**
## 🔧 Configuration
Edit `.env` to customize analysis:
```bash
# Position range for opportunities
ANALYSIS_MIN_POSITION=11
ANALYSIS_MAX_POSITION=30
# Minimum impressions to consider
ANALYSIS_MIN_IMPRESSIONS=50
# Posts for AI recommendations
ANALYSIS_TOP_N_POSTS=20
```
## 🐛 Troubleshooting
### Missing Input Files
```
❌ Error: File not found: input/...
```
→ Check that all files are in the correct locations
### Empty Report Titles
✓ FIXED - Now correctly loads post titles from multiple column names
### No Opportunities Found
```
⚠️ No opportunities found in specified range
```
→ Try lowering `ANALYSIS_MIN_IMPRESSIONS` in `.env`
### API Errors
```
❌ AI generation failed: ...
```
→ Check `OPENROUTER_API_KEY` in `.env` and account balance
## 📚 Additional Resources
- **`input/README.md`** - How to export analytics data
- **`output/README.md`** - Output files guide
- **`QUICKSTART_ANALYSIS.md`** - Step-by-step tutorial
- **`ANALYSIS_SYSTEM.md`** - Technical documentation
## ✅ Success Checklist
- [ ] All input files placed in `input/` directory
- [ ] `.env` file configured with API key
- [ ] Ran `./run_analysis.sh` successfully
- [ ] Reviewed `output/results/seo_optimization_report.md`
- [ ] Identified 5-10 quick wins to start with
- [ ] Created action plan for first week
## 🎓 Key Learnings
### Why Positions 11-30 Matter
- **Page 1** posts are hard to move
- **Page 2-3** posts are easy wins (small improvements move them up)
- **Quick gains:** 1-2 position improvements = CTR increases 20-30%
### CTR Expectations by Position
- Position 1: ~30% CTR
- Position 5-10: 4-7% CTR
- Position 11-15: 1-2% CTR (quick wins)
- Position 16-20: 0.8-1% CTR
- Position 21-30: ~0.5% CTR
### Content Quality Signals
- Higher bounce rate = less relevant content
- Low traffic = poor CTR or position
- Low impressions = insufficient optimization
## 📞 Support
### Check Logs First
```
output/logs/import_log.txt
output/logs/opportunity_analysis_log.txt
output/logs/content_gap_analysis_log.txt
```
### Common Issues
1. **Empty titles** → Fixed with flexible column name mapping
2. **File not found** → Check file locations match structure
3. **API errors** → Verify API key and account balance
4. **No opportunities** → Lower minimum impressions threshold
## 🚀 Ready to Optimize?
1. Prepare your input data
2. Run `./run_analysis.sh`
3. Open the report
4. Start with quick wins
5. Track improvements in 4 weeks
Good luck boosting your SEO! 📈
---
**Last Updated:** February 2026
**System Status:** Production Ready ✅