Refactor SEO automation into unified CLI application

Major refactoring to create a clean, integrated CLI application:

### New Features:
- Unified CLI executable (./seo) with simple command structure
- All commands accept optional CSV file arguments
- Auto-detection of latest files when no arguments provided
- Simplified output directory structure (output/ instead of output/reports/)
- Cleaner export filename format (all_posts_YYYY-MM-DD.csv)

### Commands:
- export: Export all posts from WordPress sites
- analyze [csv]: Analyze posts with AI (optional CSV input)
- recategorize [csv]: Recategorize posts with AI
- seo_check: Check SEO quality
- categories: Manage categories across sites
- approve [files]: Review and approve recommendations
- full_pipeline: Run complete workflow
- analytics, gaps, opportunities, report, status

### Changes:
- Moved all scripts to scripts/ directory
- Created config.yaml for configuration
- Updated all scripts to use output/ directory
- Deprecated old seo-cli.py in favor of new ./seo
- Added AGENTS.md and CHANGELOG.md documentation
- Consolidated README.md with updated usage

### Technical:
- Added PyYAML dependency
- Removed hardcoded configuration values
- All scripts now properly integrated
- Better error handling and user feedback

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
This commit is contained in:
Kevin Bataille
2026-02-16 14:24:44 +01:00
parent 3b51952336
commit 8c7cd24685
57 changed files with 16095 additions and 560 deletions

View File

@@ -0,0 +1,430 @@
# Storage & Draft Posts - Complete Guide
## Storage Architecture
### How Data is Stored
The Multi-Site SEO Analyzer **does NOT use a local database**. Instead:
1. **Fetches on-demand** from WordPress REST API
2. **Analyzes in-memory** using Python
3. **Exports to CSV files** for long-term storage and review
```
┌─────────────────────────────┐
│ 3 WordPress Sites │
│ (via REST API) │
└──────────┬──────────────────┘
├─→ Fetch posts (published + optional drafts)
┌──────────▼──────────────────┐
│ Python Analysis │
│ (in-memory processing) │
└──────────┬──────────────────┘
├─→ Analyze titles
├─→ Analyze meta descriptions
├─→ Score (0-100)
├─→ AI recommendations (optional)
┌──────────▼──────────────────┐
│ CSV File Export │
│ (persistent storage) │
└─────────────────────────────┘
```
### Why CSV Instead of Database?
**Advantages:**
- ✓ No database setup or maintenance
- ✓ Easy to import to Excel/Google Sheets
- ✓ Human-readable format
- ✓ Shareable with non-technical team members
- ✓ Version control friendly (Git-trackable)
- ✓ No dependencies on database software
**Disadvantages:**
- ✗ Each run is independent (no running total)
- ✗ No real-time updates
- ✗ Manual comparison between runs
**When to use database instead:**
- If analyzing >10,000 posts regularly
- If you need real-time dashboards
- If you want automatic tracking over time
---
## CSV Output Structure
### File Location
```
output/reports/seo_analysis_TIMESTAMP.csv
```
### Columns
| Column | Description | Example |
|--------|-------------|---------|
| `site` | WordPress site | mistergeek.net |
| `post_id` | WordPress post ID | 2845 |
| `status` | Post status | publish / draft |
| `title` | Post title | "Best VPN Services 2025" |
| `slug` | URL slug | best-vpn-services-2025 |
| `url` | Full URL | https://mistergeek.net/best-vpn-2025/ |
| `meta_description` | Meta description text | "Compare 50+ VPN..." |
| `title_score` | Title SEO score (0-100) | 92 |
| `title_issues` | Problems with title | "None" |
| `title_recommendations` | How to improve | "None" |
| `meta_score` | Meta description score (0-100) | 88 |
| `meta_issues` | Problems with meta | "None" |
| `meta_recommendations` | How to improve | "None" |
| `overall_score` | Combined score | 90 |
| `ai_recommendations` | Claude-generated tips | "Consider adding..." |
### Importing to Google Sheets
1. Download CSV from `output/reports/`
2. Open Google Sheets
3. File → Import → Upload CSV
4. Add columns for tracking:
- [ ] Status (Not Started / In Progress / Done)
- [ ] Notes
- [ ] Date Completed
5. Share with team
6. Filter and sort as needed
---
## Draft Posts Feature
### What Are Drafts?
Draft posts are unpublished WordPress posts. They're:
- Written but not published
- Not visible on the website
- Still indexed by WordPress
- Perfect for analyzing before publishing
### Using Draft Posts
**By default**, the analyzer fetches **only published posts**:
```bash
python scripts/multi_site_seo_analyzer.py
```
**To include draft posts**, use the `--include-drafts` flag:
```bash
python scripts/multi_site_seo_analyzer.py --include-drafts
```
### Output with Drafts
The CSV will include a `status` column showing which posts are published vs. draft:
```csv
site,post_id,status,title,meta_score,overall_score
mistergeek.net,2845,publish,"Best VPN",88,90
mistergeek.net,2901,draft,"New VPN Draft",45,55
webscroll.fr,1234,publish,"Torrent Guide",72,75
webscroll.fr,1235,draft,"Draft Tracker Review",20,30
```
### Use Cases for Drafts
**1. Optimize Before Publishing**
If you have draft posts ready to publish:
```bash
python scripts/multi_site_seo_analyzer.py --include-drafts
```
Review their SEO scores and improve titles/meta before publishing.
**2. Recover Previous Content**
If you have removed posts saved as drafts:
```bash
python scripts/multi_site_seo_analyzer.py --include-drafts
```
Analyze them to decide: republish, improve, or delete.
**3. Audit Unpublished Work**
See what's sitting in drafts that could be published:
```bash
python scripts/multi_site_seo_analyzer.py --include-drafts | grep "draft"
```
---
## Complete Examples
### Example 1: Analyze Published Only
```bash
python scripts/multi_site_seo_analyzer.py
```
**Output:**
- Analyzes: ~262 published posts
- Time: 2-3 minutes
- Drafts: Not included
### Example 2: Analyze Published + Drafts
```bash
python scripts/multi_site_seo_analyzer.py --include-drafts
```
**Output:**
- Analyzes: ~262 published + X drafts
- Time: 2-5 minutes (depending on draft count)
- Shows status column: "publish" or "draft"
### Example 3: Analyze Published + Drafts + AI
```bash
python scripts/multi_site_seo_analyzer.py --include-drafts --top-n 20
```
**Output:**
- Analyzes: All posts (published + drafts)
- AI recommendations: Top 20 worst-scoring posts
- Cost: ~$0.20
- Time: 10-15 minutes
### Example 4: Focus on Drafts Only
While the script always includes both, you can filter in Excel/Sheets:
1. Run: `python scripts/multi_site_seo_analyzer.py --include-drafts`
2. Open CSV in Google Sheets
3. Filter `status` column = "draft"
4. Sort by `overall_score` (lowest first)
5. Optimize top 10 drafts before publishing
---
## Comparing Results Over Time
### Manual Comparison
Since results are exported to CSV, you can track progress manually:
```bash
# Week 1
python scripts/multi_site_seo_analyzer.py --no-ai
# Save: seo_analysis_week1.csv
# (Optimize posts for 4 weeks)
# Week 5
python scripts/multi_site_seo_analyzer.py --no-ai
# Save: seo_analysis_week5.csv
# Compare in Excel/Sheets:
# Sort both by post_id
# Compare scores: Week 1 vs Week 5
```
### Calculating Improvement
Example:
| Post | Week 1 Score | Week 5 Score | Change |
|------|--------------|--------------|--------|
| Best VPN | 45 | 92 | +47 |
| Top 10 Software | 38 | 78 | +40 |
| Streaming Guide | 52 | 65 | +13 |
| **Average** | **45** | **78** | **+33** |
---
## Organizing Your CSV Files
### Naming Convention
Create a folder for historical analysis:
```
output/
├── reports/
│ ├── 2025-02-16_initial_analysis.csv
│ ├── 2025-03-16_after_optimization.csv
│ ├── 2025-04-16_follow_up.csv
│ └── seo_analysis_20250216_120000.csv (latest)
```
### Archive Strategy
1. Run analyzer monthly
2. Save result with date: `seo_analysis_2025-02-16.csv`
3. Keep 12 months of history
4. Compare trends over time
---
## Advanced: Storing Recommendations
### Using a Master Spreadsheet
Instead of relying on CSV alone, create a master Google Sheet:
**Columns:**
- Post ID
- Title
- Current Score
- Issues
- Improvements Needed
- Status (Not Started / In Progress / Done)
- Completed Date
- New Score
**Process:**
1. Run analyzer: `python scripts/multi_site_seo_analyzer.py`
2. Copy relevant rows to master spreadsheet
3. As you optimize: update "Status" and "New Score"
4. Track progress visually
---
## Performance Considerations
### Fetch Time
- **Published only:** ~10-30 seconds (262 posts)
- **Published + drafts:** ~10-30 seconds (+X seconds per 100 drafts)
Drafts don't significantly impact speed since both are fetched in same API call.
### Analysis Time
- **Without AI:** ~1-2 minutes
- **With AI (10 posts):** ~5-10 minutes
- **With AI (50 posts):** ~20-30 minutes
AI recommendations add most of the time (not the fetching).
### Memory Usage
- **262 posts:** ~20-30 MB
- **262 posts + 100 drafts:** ~35-50 MB
No memory issues for typical WordPress sites.
---
## Troubleshooting
### "No drafts found"
**Problem:** You're using `--include-drafts` but get same result as without it.
**Solutions:**
1. Verify you have draft posts on the site
2. Check user has permission to view drafts (needs edit_posts capability)
3. Try logging in and checking WordPress directly
### CSV Encoding Issues
**Problem:** CSV opens with weird characters in Excel.
**Solution:** Open with UTF-8 encoding:
- Excel: File → Open → Select CSV → Click "Edit"
- Sheets: Upload CSV, let Google handle encoding
### Want to Use a Database Later?
If you outgrow CSV files, consider:
**SQLite** (built-in, no installation):
```python
import sqlite3
conn = sqlite3.connect('seo_analysis.db')
# Insert results into database
```
**PostgreSQL** (professional option):
```python
import psycopg2
conn = psycopg2.connect("dbname=seo_db user=postgres")
# Insert results
```
But for now, CSV is perfect for your needs.
---
## Summary
### Storage
| Aspect | Implementation |
|--------|-----------------|
| Database? | No - CSV files |
| Location | `output/reports/` |
| Format | CSV (Excel/Sheets compatible) |
| Persistence | Permanent (until deleted) |
### Draft Posts
| Aspect | Usage |
|--------|-------|
| Default | Published only |
| Include drafts | `--include-drafts` flag |
| Output column | `status` (publish/draft) |
| Use case | Optimize before publishing, recover removed content |
### Commands
```bash
# Published only
python scripts/multi_site_seo_analyzer.py
# Published + Drafts
python scripts/multi_site_seo_analyzer.py --include-drafts
# Published + Drafts + AI
python scripts/multi_site_seo_analyzer.py --include-drafts --top-n 20
# Skip AI (faster)
python scripts/multi_site_seo_analyzer.py --no-ai
```
---
## Next Steps
1. **First run (published only):**
```bash
python scripts/multi_site_seo_analyzer.py --no-ai
```
2. **Analyze results:**
```bash
open output/reports/seo_analysis_*.csv
```
3. **Optimize published posts** with score < 50
4. **Second run (include drafts):**
```bash
python scripts/multi_site_seo_analyzer.py --include-drafts
```
5. **Decide on drafts:** Publish, improve, or delete
6. **Track progress:** Re-run monthly and compare scores
Ready? Start with: `python scripts/multi_site_seo_analyzer.py --include-drafts`