Refactor SEO automation into unified CLI application
Major refactoring to create a clean, integrated CLI application: ### New Features: - Unified CLI executable (./seo) with simple command structure - All commands accept optional CSV file arguments - Auto-detection of latest files when no arguments provided - Simplified output directory structure (output/ instead of output/reports/) - Cleaner export filename format (all_posts_YYYY-MM-DD.csv) ### Commands: - export: Export all posts from WordPress sites - analyze [csv]: Analyze posts with AI (optional CSV input) - recategorize [csv]: Recategorize posts with AI - seo_check: Check SEO quality - categories: Manage categories across sites - approve [files]: Review and approve recommendations - full_pipeline: Run complete workflow - analytics, gaps, opportunities, report, status ### Changes: - Moved all scripts to scripts/ directory - Created config.yaml for configuration - Updated all scripts to use output/ directory - Deprecated old seo-cli.py in favor of new ./seo - Added AGENTS.md and CHANGELOG.md documentation - Consolidated README.md with updated usage ### Technical: - Added PyYAML dependency - Removed hardcoded configuration values - All scripts now properly integrated - Better error handling and user feedback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
This commit is contained in:
430
guides/STORAGE_AND_DRAFTS.md
Normal file
430
guides/STORAGE_AND_DRAFTS.md
Normal file
@@ -0,0 +1,430 @@
|
||||
# Storage & Draft Posts - Complete Guide
|
||||
|
||||
## Storage Architecture
|
||||
|
||||
### How Data is Stored
|
||||
|
||||
The Multi-Site SEO Analyzer **does NOT use a local database**. Instead:
|
||||
|
||||
1. **Fetches on-demand** from WordPress REST API
|
||||
2. **Analyzes in-memory** using Python
|
||||
3. **Exports to CSV files** for long-term storage and review
|
||||
|
||||
```
|
||||
┌─────────────────────────────┐
|
||||
│ 3 WordPress Sites │
|
||||
│ (via REST API) │
|
||||
└──────────┬──────────────────┘
|
||||
│
|
||||
├─→ Fetch posts (published + optional drafts)
|
||||
│
|
||||
┌──────────▼──────────────────┐
|
||||
│ Python Analysis │
|
||||
│ (in-memory processing) │
|
||||
└──────────┬──────────────────┘
|
||||
│
|
||||
├─→ Analyze titles
|
||||
│
|
||||
├─→ Analyze meta descriptions
|
||||
│
|
||||
├─→ Score (0-100)
|
||||
│
|
||||
├─→ AI recommendations (optional)
|
||||
│
|
||||
┌──────────▼──────────────────┐
|
||||
│ CSV File Export │
|
||||
│ (persistent storage) │
|
||||
└─────────────────────────────┘
|
||||
```
|
||||
|
||||
### Why CSV Instead of Database?
|
||||
|
||||
**Advantages:**
|
||||
- ✓ No database setup or maintenance
|
||||
- ✓ Easy to import to Excel/Google Sheets
|
||||
- ✓ Human-readable format
|
||||
- ✓ Shareable with non-technical team members
|
||||
- ✓ Version control friendly (Git-trackable)
|
||||
- ✓ No dependencies on database software
|
||||
|
||||
**Disadvantages:**
|
||||
- ✗ Each run is independent (no running total)
|
||||
- ✗ No real-time updates
|
||||
- ✗ Manual comparison between runs
|
||||
|
||||
**When to use database instead:**
|
||||
- If analyzing >10,000 posts regularly
|
||||
- If you need real-time dashboards
|
||||
- If you want automatic tracking over time
|
||||
|
||||
---
|
||||
|
||||
## CSV Output Structure
|
||||
|
||||
### File Location
|
||||
```
|
||||
output/reports/seo_analysis_TIMESTAMP.csv
|
||||
```
|
||||
|
||||
### Columns
|
||||
|
||||
| Column | Description | Example |
|
||||
|--------|-------------|---------|
|
||||
| `site` | WordPress site | mistergeek.net |
|
||||
| `post_id` | WordPress post ID | 2845 |
|
||||
| `status` | Post status | publish / draft |
|
||||
| `title` | Post title | "Best VPN Services 2025" |
|
||||
| `slug` | URL slug | best-vpn-services-2025 |
|
||||
| `url` | Full URL | https://mistergeek.net/best-vpn-2025/ |
|
||||
| `meta_description` | Meta description text | "Compare 50+ VPN..." |
|
||||
| `title_score` | Title SEO score (0-100) | 92 |
|
||||
| `title_issues` | Problems with title | "None" |
|
||||
| `title_recommendations` | How to improve | "None" |
|
||||
| `meta_score` | Meta description score (0-100) | 88 |
|
||||
| `meta_issues` | Problems with meta | "None" |
|
||||
| `meta_recommendations` | How to improve | "None" |
|
||||
| `overall_score` | Combined score | 90 |
|
||||
| `ai_recommendations` | Claude-generated tips | "Consider adding..." |
|
||||
|
||||
### Importing to Google Sheets
|
||||
|
||||
1. Download CSV from `output/reports/`
|
||||
2. Open Google Sheets
|
||||
3. File → Import → Upload CSV
|
||||
4. Add columns for tracking:
|
||||
- [ ] Status (Not Started / In Progress / Done)
|
||||
- [ ] Notes
|
||||
- [ ] Date Completed
|
||||
5. Share with team
|
||||
6. Filter and sort as needed
|
||||
|
||||
---
|
||||
|
||||
## Draft Posts Feature
|
||||
|
||||
### What Are Drafts?
|
||||
|
||||
Draft posts are unpublished WordPress posts. They're:
|
||||
- Written but not published
|
||||
- Not visible on the website
|
||||
- Still indexed by WordPress
|
||||
- Perfect for analyzing before publishing
|
||||
|
||||
### Using Draft Posts
|
||||
|
||||
**By default**, the analyzer fetches **only published posts**:
|
||||
|
||||
```bash
|
||||
python scripts/multi_site_seo_analyzer.py
|
||||
```
|
||||
|
||||
**To include draft posts**, use the `--include-drafts` flag:
|
||||
|
||||
```bash
|
||||
python scripts/multi_site_seo_analyzer.py --include-drafts
|
||||
```
|
||||
|
||||
### Output with Drafts
|
||||
|
||||
The CSV will include a `status` column showing which posts are published vs. draft:
|
||||
|
||||
```csv
|
||||
site,post_id,status,title,meta_score,overall_score
|
||||
mistergeek.net,2845,publish,"Best VPN",88,90
|
||||
mistergeek.net,2901,draft,"New VPN Draft",45,55
|
||||
webscroll.fr,1234,publish,"Torrent Guide",72,75
|
||||
webscroll.fr,1235,draft,"Draft Tracker Review",20,30
|
||||
```
|
||||
|
||||
### Use Cases for Drafts
|
||||
|
||||
**1. Optimize Before Publishing**
|
||||
|
||||
If you have draft posts ready to publish:
|
||||
|
||||
```bash
|
||||
python scripts/multi_site_seo_analyzer.py --include-drafts
|
||||
```
|
||||
|
||||
Review their SEO scores and improve titles/meta before publishing.
|
||||
|
||||
**2. Recover Previous Content**
|
||||
|
||||
If you have removed posts saved as drafts:
|
||||
|
||||
```bash
|
||||
python scripts/multi_site_seo_analyzer.py --include-drafts
|
||||
```
|
||||
|
||||
Analyze them to decide: republish, improve, or delete.
|
||||
|
||||
**3. Audit Unpublished Work**
|
||||
|
||||
See what's sitting in drafts that could be published:
|
||||
|
||||
```bash
|
||||
python scripts/multi_site_seo_analyzer.py --include-drafts | grep "draft"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Complete Examples
|
||||
|
||||
### Example 1: Analyze Published Only
|
||||
|
||||
```bash
|
||||
python scripts/multi_site_seo_analyzer.py
|
||||
```
|
||||
|
||||
**Output:**
|
||||
- Analyzes: ~262 published posts
|
||||
- Time: 2-3 minutes
|
||||
- Drafts: Not included
|
||||
|
||||
### Example 2: Analyze Published + Drafts
|
||||
|
||||
```bash
|
||||
python scripts/multi_site_seo_analyzer.py --include-drafts
|
||||
```
|
||||
|
||||
**Output:**
|
||||
- Analyzes: ~262 published + X drafts
|
||||
- Time: 2-5 minutes (depending on draft count)
|
||||
- Shows status column: "publish" or "draft"
|
||||
|
||||
### Example 3: Analyze Published + Drafts + AI
|
||||
|
||||
```bash
|
||||
python scripts/multi_site_seo_analyzer.py --include-drafts --top-n 20
|
||||
```
|
||||
|
||||
**Output:**
|
||||
- Analyzes: All posts (published + drafts)
|
||||
- AI recommendations: Top 20 worst-scoring posts
|
||||
- Cost: ~$0.20
|
||||
- Time: 10-15 minutes
|
||||
|
||||
### Example 4: Focus on Drafts Only
|
||||
|
||||
While the script always includes both, you can filter in Excel/Sheets:
|
||||
|
||||
1. Run: `python scripts/multi_site_seo_analyzer.py --include-drafts`
|
||||
2. Open CSV in Google Sheets
|
||||
3. Filter `status` column = "draft"
|
||||
4. Sort by `overall_score` (lowest first)
|
||||
5. Optimize top 10 drafts before publishing
|
||||
|
||||
---
|
||||
|
||||
## Comparing Results Over Time
|
||||
|
||||
### Manual Comparison
|
||||
|
||||
Since results are exported to CSV, you can track progress manually:
|
||||
|
||||
```bash
|
||||
# Week 1
|
||||
python scripts/multi_site_seo_analyzer.py --no-ai
|
||||
# Save: seo_analysis_week1.csv
|
||||
|
||||
# (Optimize posts for 4 weeks)
|
||||
|
||||
# Week 5
|
||||
python scripts/multi_site_seo_analyzer.py --no-ai
|
||||
# Save: seo_analysis_week5.csv
|
||||
|
||||
# Compare in Excel/Sheets:
|
||||
# Sort both by post_id
|
||||
# Compare scores: Week 1 vs Week 5
|
||||
```
|
||||
|
||||
### Calculating Improvement
|
||||
|
||||
Example:
|
||||
|
||||
| Post | Week 1 Score | Week 5 Score | Change |
|
||||
|------|--------------|--------------|--------|
|
||||
| Best VPN | 45 | 92 | +47 |
|
||||
| Top 10 Software | 38 | 78 | +40 |
|
||||
| Streaming Guide | 52 | 65 | +13 |
|
||||
| **Average** | **45** | **78** | **+33** |
|
||||
|
||||
---
|
||||
|
||||
## Organizing Your CSV Files
|
||||
|
||||
### Naming Convention
|
||||
|
||||
Create a folder for historical analysis:
|
||||
|
||||
```
|
||||
output/
|
||||
├── reports/
|
||||
│ ├── 2025-02-16_initial_analysis.csv
|
||||
│ ├── 2025-03-16_after_optimization.csv
|
||||
│ ├── 2025-04-16_follow_up.csv
|
||||
│ └── seo_analysis_20250216_120000.csv (latest)
|
||||
```
|
||||
|
||||
### Archive Strategy
|
||||
|
||||
1. Run analyzer monthly
|
||||
2. Save result with date: `seo_analysis_2025-02-16.csv`
|
||||
3. Keep 12 months of history
|
||||
4. Compare trends over time
|
||||
|
||||
---
|
||||
|
||||
## Advanced: Storing Recommendations
|
||||
|
||||
### Using a Master Spreadsheet
|
||||
|
||||
Instead of relying on CSV alone, create a master Google Sheet:
|
||||
|
||||
**Columns:**
|
||||
- Post ID
|
||||
- Title
|
||||
- Current Score
|
||||
- Issues
|
||||
- Improvements Needed
|
||||
- Status (Not Started / In Progress / Done)
|
||||
- Completed Date
|
||||
- New Score
|
||||
|
||||
**Process:**
|
||||
1. Run analyzer: `python scripts/multi_site_seo_analyzer.py`
|
||||
2. Copy relevant rows to master spreadsheet
|
||||
3. As you optimize: update "Status" and "New Score"
|
||||
4. Track progress visually
|
||||
|
||||
---
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Fetch Time
|
||||
|
||||
- **Published only:** ~10-30 seconds (262 posts)
|
||||
- **Published + drafts:** ~10-30 seconds (+X seconds per 100 drafts)
|
||||
|
||||
Drafts don't significantly impact speed since both are fetched in same API call.
|
||||
|
||||
### Analysis Time
|
||||
|
||||
- **Without AI:** ~1-2 minutes
|
||||
- **With AI (10 posts):** ~5-10 minutes
|
||||
- **With AI (50 posts):** ~20-30 minutes
|
||||
|
||||
AI recommendations add most of the time (not the fetching).
|
||||
|
||||
### Memory Usage
|
||||
|
||||
- **262 posts:** ~20-30 MB
|
||||
- **262 posts + 100 drafts:** ~35-50 MB
|
||||
|
||||
No memory issues for typical WordPress sites.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "No drafts found"
|
||||
|
||||
**Problem:** You're using `--include-drafts` but get same result as without it.
|
||||
|
||||
**Solutions:**
|
||||
1. Verify you have draft posts on the site
|
||||
2. Check user has permission to view drafts (needs edit_posts capability)
|
||||
3. Try logging in and checking WordPress directly
|
||||
|
||||
### CSV Encoding Issues
|
||||
|
||||
**Problem:** CSV opens with weird characters in Excel.
|
||||
|
||||
**Solution:** Open with UTF-8 encoding:
|
||||
- Excel: File → Open → Select CSV → Click "Edit"
|
||||
- Sheets: Upload CSV, let Google handle encoding
|
||||
|
||||
### Want to Use a Database Later?
|
||||
|
||||
If you outgrow CSV files, consider:
|
||||
|
||||
**SQLite** (built-in, no installation):
|
||||
```python
|
||||
import sqlite3
|
||||
conn = sqlite3.connect('seo_analysis.db')
|
||||
# Insert results into database
|
||||
```
|
||||
|
||||
**PostgreSQL** (professional option):
|
||||
```python
|
||||
import psycopg2
|
||||
conn = psycopg2.connect("dbname=seo_db user=postgres")
|
||||
# Insert results
|
||||
```
|
||||
|
||||
But for now, CSV is perfect for your needs.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
### Storage
|
||||
|
||||
| Aspect | Implementation |
|
||||
|--------|-----------------|
|
||||
| Database? | No - CSV files |
|
||||
| Location | `output/reports/` |
|
||||
| Format | CSV (Excel/Sheets compatible) |
|
||||
| Persistence | Permanent (until deleted) |
|
||||
|
||||
### Draft Posts
|
||||
|
||||
| Aspect | Usage |
|
||||
|--------|-------|
|
||||
| Default | Published only |
|
||||
| Include drafts | `--include-drafts` flag |
|
||||
| Output column | `status` (publish/draft) |
|
||||
| Use case | Optimize before publishing, recover removed content |
|
||||
|
||||
### Commands
|
||||
|
||||
```bash
|
||||
# Published only
|
||||
python scripts/multi_site_seo_analyzer.py
|
||||
|
||||
# Published + Drafts
|
||||
python scripts/multi_site_seo_analyzer.py --include-drafts
|
||||
|
||||
# Published + Drafts + AI
|
||||
python scripts/multi_site_seo_analyzer.py --include-drafts --top-n 20
|
||||
|
||||
# Skip AI (faster)
|
||||
python scripts/multi_site_seo_analyzer.py --no-ai
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **First run (published only):**
|
||||
```bash
|
||||
python scripts/multi_site_seo_analyzer.py --no-ai
|
||||
```
|
||||
|
||||
2. **Analyze results:**
|
||||
```bash
|
||||
open output/reports/seo_analysis_*.csv
|
||||
```
|
||||
|
||||
3. **Optimize published posts** with score < 50
|
||||
|
||||
4. **Second run (include drafts):**
|
||||
```bash
|
||||
python scripts/multi_site_seo_analyzer.py --include-drafts
|
||||
```
|
||||
|
||||
5. **Decide on drafts:** Publish, improve, or delete
|
||||
|
||||
6. **Track progress:** Re-run monthly and compare scores
|
||||
|
||||
Ready? Start with: `python scripts/multi_site_seo_analyzer.py --include-drafts`
|
||||
Reference in New Issue
Block a user