Major refactoring to create a clean, integrated CLI application: ### New Features: - Unified CLI executable (./seo) with simple command structure - All commands accept optional CSV file arguments - Auto-detection of latest files when no arguments provided - Simplified output directory structure (output/ instead of output/reports/) - Cleaner export filename format (all_posts_YYYY-MM-DD.csv) ### Commands: - export: Export all posts from WordPress sites - analyze [csv]: Analyze posts with AI (optional CSV input) - recategorize [csv]: Recategorize posts with AI - seo_check: Check SEO quality - categories: Manage categories across sites - approve [files]: Review and approve recommendations - full_pipeline: Run complete workflow - analytics, gaps, opportunities, report, status ### Changes: - Moved all scripts to scripts/ directory - Created config.yaml for configuration - Updated all scripts to use output/ directory - Deprecated old seo-cli.py in favor of new ./seo - Added AGENTS.md and CHANGELOG.md documentation - Consolidated README.md with updated usage ### Technical: - Added PyYAML dependency - Removed hardcoded configuration values - All scripts now properly integrated - Better error handling and user feedback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
431 lines
10 KiB
Markdown
431 lines
10 KiB
Markdown
# Storage & Draft Posts - Complete Guide
|
|
|
|
## Storage Architecture
|
|
|
|
### How Data is Stored
|
|
|
|
The Multi-Site SEO Analyzer **does NOT use a local database**. Instead:
|
|
|
|
1. **Fetches on-demand** from WordPress REST API
|
|
2. **Analyzes in-memory** using Python
|
|
3. **Exports to CSV files** for long-term storage and review
|
|
|
|
```
|
|
┌─────────────────────────────┐
|
|
│ 3 WordPress Sites │
|
|
│ (via REST API) │
|
|
└──────────┬──────────────────┘
|
|
│
|
|
├─→ Fetch posts (published + optional drafts)
|
|
│
|
|
┌──────────▼──────────────────┐
|
|
│ Python Analysis │
|
|
│ (in-memory processing) │
|
|
└──────────┬──────────────────┘
|
|
│
|
|
├─→ Analyze titles
|
|
│
|
|
├─→ Analyze meta descriptions
|
|
│
|
|
├─→ Score (0-100)
|
|
│
|
|
├─→ AI recommendations (optional)
|
|
│
|
|
┌──────────▼──────────────────┐
|
|
│ CSV File Export │
|
|
│ (persistent storage) │
|
|
└─────────────────────────────┘
|
|
```
|
|
|
|
### Why CSV Instead of Database?
|
|
|
|
**Advantages:**
|
|
- ✓ No database setup or maintenance
|
|
- ✓ Easy to import to Excel/Google Sheets
|
|
- ✓ Human-readable format
|
|
- ✓ Shareable with non-technical team members
|
|
- ✓ Version control friendly (Git-trackable)
|
|
- ✓ No dependencies on database software
|
|
|
|
**Disadvantages:**
|
|
- ✗ Each run is independent (no running total)
|
|
- ✗ No real-time updates
|
|
- ✗ Manual comparison between runs
|
|
|
|
**When to use database instead:**
|
|
- If analyzing >10,000 posts regularly
|
|
- If you need real-time dashboards
|
|
- If you want automatic tracking over time
|
|
|
|
---
|
|
|
|
## CSV Output Structure
|
|
|
|
### File Location
|
|
```
|
|
output/reports/seo_analysis_TIMESTAMP.csv
|
|
```
|
|
|
|
### Columns
|
|
|
|
| Column | Description | Example |
|
|
|--------|-------------|---------|
|
|
| `site` | WordPress site | mistergeek.net |
|
|
| `post_id` | WordPress post ID | 2845 |
|
|
| `status` | Post status | publish / draft |
|
|
| `title` | Post title | "Best VPN Services 2025" |
|
|
| `slug` | URL slug | best-vpn-services-2025 |
|
|
| `url` | Full URL | https://mistergeek.net/best-vpn-2025/ |
|
|
| `meta_description` | Meta description text | "Compare 50+ VPN..." |
|
|
| `title_score` | Title SEO score (0-100) | 92 |
|
|
| `title_issues` | Problems with title | "None" |
|
|
| `title_recommendations` | How to improve | "None" |
|
|
| `meta_score` | Meta description score (0-100) | 88 |
|
|
| `meta_issues` | Problems with meta | "None" |
|
|
| `meta_recommendations` | How to improve | "None" |
|
|
| `overall_score` | Combined score | 90 |
|
|
| `ai_recommendations` | Claude-generated tips | "Consider adding..." |
|
|
|
|
### Importing to Google Sheets
|
|
|
|
1. Download CSV from `output/reports/`
|
|
2. Open Google Sheets
|
|
3. File → Import → Upload CSV
|
|
4. Add columns for tracking:
|
|
- [ ] Status (Not Started / In Progress / Done)
|
|
- [ ] Notes
|
|
- [ ] Date Completed
|
|
5. Share with team
|
|
6. Filter and sort as needed
|
|
|
|
---
|
|
|
|
## Draft Posts Feature
|
|
|
|
### What Are Drafts?
|
|
|
|
Draft posts are unpublished WordPress posts. They're:
|
|
- Written but not published
|
|
- Not visible on the website
|
|
- Still indexed by WordPress
|
|
- Perfect for analyzing before publishing
|
|
|
|
### Using Draft Posts
|
|
|
|
**By default**, the analyzer fetches **only published posts**:
|
|
|
|
```bash
|
|
python scripts/multi_site_seo_analyzer.py
|
|
```
|
|
|
|
**To include draft posts**, use the `--include-drafts` flag:
|
|
|
|
```bash
|
|
python scripts/multi_site_seo_analyzer.py --include-drafts
|
|
```
|
|
|
|
### Output with Drafts
|
|
|
|
The CSV will include a `status` column showing which posts are published vs. draft:
|
|
|
|
```csv
|
|
site,post_id,status,title,meta_score,overall_score
|
|
mistergeek.net,2845,publish,"Best VPN",88,90
|
|
mistergeek.net,2901,draft,"New VPN Draft",45,55
|
|
webscroll.fr,1234,publish,"Torrent Guide",72,75
|
|
webscroll.fr,1235,draft,"Draft Tracker Review",20,30
|
|
```
|
|
|
|
### Use Cases for Drafts
|
|
|
|
**1. Optimize Before Publishing**
|
|
|
|
If you have draft posts ready to publish:
|
|
|
|
```bash
|
|
python scripts/multi_site_seo_analyzer.py --include-drafts
|
|
```
|
|
|
|
Review their SEO scores and improve titles/meta before publishing.
|
|
|
|
**2. Recover Previous Content**
|
|
|
|
If you have removed posts saved as drafts:
|
|
|
|
```bash
|
|
python scripts/multi_site_seo_analyzer.py --include-drafts
|
|
```
|
|
|
|
Analyze them to decide: republish, improve, or delete.
|
|
|
|
**3. Audit Unpublished Work**
|
|
|
|
See what's sitting in drafts that could be published:
|
|
|
|
```bash
|
|
python scripts/multi_site_seo_analyzer.py --include-drafts | grep "draft"
|
|
```
|
|
|
|
---
|
|
|
|
## Complete Examples
|
|
|
|
### Example 1: Analyze Published Only
|
|
|
|
```bash
|
|
python scripts/multi_site_seo_analyzer.py
|
|
```
|
|
|
|
**Output:**
|
|
- Analyzes: ~262 published posts
|
|
- Time: 2-3 minutes
|
|
- Drafts: Not included
|
|
|
|
### Example 2: Analyze Published + Drafts
|
|
|
|
```bash
|
|
python scripts/multi_site_seo_analyzer.py --include-drafts
|
|
```
|
|
|
|
**Output:**
|
|
- Analyzes: ~262 published + X drafts
|
|
- Time: 2-5 minutes (depending on draft count)
|
|
- Shows status column: "publish" or "draft"
|
|
|
|
### Example 3: Analyze Published + Drafts + AI
|
|
|
|
```bash
|
|
python scripts/multi_site_seo_analyzer.py --include-drafts --top-n 20
|
|
```
|
|
|
|
**Output:**
|
|
- Analyzes: All posts (published + drafts)
|
|
- AI recommendations: Top 20 worst-scoring posts
|
|
- Cost: ~$0.20
|
|
- Time: 10-15 minutes
|
|
|
|
### Example 4: Focus on Drafts Only
|
|
|
|
While the script always includes both, you can filter in Excel/Sheets:
|
|
|
|
1. Run: `python scripts/multi_site_seo_analyzer.py --include-drafts`
|
|
2. Open CSV in Google Sheets
|
|
3. Filter `status` column = "draft"
|
|
4. Sort by `overall_score` (lowest first)
|
|
5. Optimize top 10 drafts before publishing
|
|
|
|
---
|
|
|
|
## Comparing Results Over Time
|
|
|
|
### Manual Comparison
|
|
|
|
Since results are exported to CSV, you can track progress manually:
|
|
|
|
```bash
|
|
# Week 1
|
|
python scripts/multi_site_seo_analyzer.py --no-ai
|
|
# Save: seo_analysis_week1.csv
|
|
|
|
# (Optimize posts for 4 weeks)
|
|
|
|
# Week 5
|
|
python scripts/multi_site_seo_analyzer.py --no-ai
|
|
# Save: seo_analysis_week5.csv
|
|
|
|
# Compare in Excel/Sheets:
|
|
# Sort both by post_id
|
|
# Compare scores: Week 1 vs Week 5
|
|
```
|
|
|
|
### Calculating Improvement
|
|
|
|
Example:
|
|
|
|
| Post | Week 1 Score | Week 5 Score | Change |
|
|
|------|--------------|--------------|--------|
|
|
| Best VPN | 45 | 92 | +47 |
|
|
| Top 10 Software | 38 | 78 | +40 |
|
|
| Streaming Guide | 52 | 65 | +13 |
|
|
| **Average** | **45** | **78** | **+33** |
|
|
|
|
---
|
|
|
|
## Organizing Your CSV Files
|
|
|
|
### Naming Convention
|
|
|
|
Create a folder for historical analysis:
|
|
|
|
```
|
|
output/
|
|
├── reports/
|
|
│ ├── 2025-02-16_initial_analysis.csv
|
|
│ ├── 2025-03-16_after_optimization.csv
|
|
│ ├── 2025-04-16_follow_up.csv
|
|
│ └── seo_analysis_20250216_120000.csv (latest)
|
|
```
|
|
|
|
### Archive Strategy
|
|
|
|
1. Run analyzer monthly
|
|
2. Save result with date: `seo_analysis_2025-02-16.csv`
|
|
3. Keep 12 months of history
|
|
4. Compare trends over time
|
|
|
|
---
|
|
|
|
## Advanced: Storing Recommendations
|
|
|
|
### Using a Master Spreadsheet
|
|
|
|
Instead of relying on CSV alone, create a master Google Sheet:
|
|
|
|
**Columns:**
|
|
- Post ID
|
|
- Title
|
|
- Current Score
|
|
- Issues
|
|
- Improvements Needed
|
|
- Status (Not Started / In Progress / Done)
|
|
- Completed Date
|
|
- New Score
|
|
|
|
**Process:**
|
|
1. Run analyzer: `python scripts/multi_site_seo_analyzer.py`
|
|
2. Copy relevant rows to master spreadsheet
|
|
3. As you optimize: update "Status" and "New Score"
|
|
4. Track progress visually
|
|
|
|
---
|
|
|
|
## Performance Considerations
|
|
|
|
### Fetch Time
|
|
|
|
- **Published only:** ~10-30 seconds (262 posts)
|
|
- **Published + drafts:** ~10-30 seconds (+X seconds per 100 drafts)
|
|
|
|
Drafts don't significantly impact speed since both are fetched in same API call.
|
|
|
|
### Analysis Time
|
|
|
|
- **Without AI:** ~1-2 minutes
|
|
- **With AI (10 posts):** ~5-10 minutes
|
|
- **With AI (50 posts):** ~20-30 minutes
|
|
|
|
AI recommendations add most of the time (not the fetching).
|
|
|
|
### Memory Usage
|
|
|
|
- **262 posts:** ~20-30 MB
|
|
- **262 posts + 100 drafts:** ~35-50 MB
|
|
|
|
No memory issues for typical WordPress sites.
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### "No drafts found"
|
|
|
|
**Problem:** You're using `--include-drafts` but get same result as without it.
|
|
|
|
**Solutions:**
|
|
1. Verify you have draft posts on the site
|
|
2. Check user has permission to view drafts (needs edit_posts capability)
|
|
3. Try logging in and checking WordPress directly
|
|
|
|
### CSV Encoding Issues
|
|
|
|
**Problem:** CSV opens with weird characters in Excel.
|
|
|
|
**Solution:** Open with UTF-8 encoding:
|
|
- Excel: File → Open → Select CSV → Click "Edit"
|
|
- Sheets: Upload CSV, let Google handle encoding
|
|
|
|
### Want to Use a Database Later?
|
|
|
|
If you outgrow CSV files, consider:
|
|
|
|
**SQLite** (built-in, no installation):
|
|
```python
|
|
import sqlite3
|
|
conn = sqlite3.connect('seo_analysis.db')
|
|
# Insert results into database
|
|
```
|
|
|
|
**PostgreSQL** (professional option):
|
|
```python
|
|
import psycopg2
|
|
conn = psycopg2.connect("dbname=seo_db user=postgres")
|
|
# Insert results
|
|
```
|
|
|
|
But for now, CSV is perfect for your needs.
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
### Storage
|
|
|
|
| Aspect | Implementation |
|
|
|--------|-----------------|
|
|
| Database? | No - CSV files |
|
|
| Location | `output/reports/` |
|
|
| Format | CSV (Excel/Sheets compatible) |
|
|
| Persistence | Permanent (until deleted) |
|
|
|
|
### Draft Posts
|
|
|
|
| Aspect | Usage |
|
|
|--------|-------|
|
|
| Default | Published only |
|
|
| Include drafts | `--include-drafts` flag |
|
|
| Output column | `status` (publish/draft) |
|
|
| Use case | Optimize before publishing, recover removed content |
|
|
|
|
### Commands
|
|
|
|
```bash
|
|
# Published only
|
|
python scripts/multi_site_seo_analyzer.py
|
|
|
|
# Published + Drafts
|
|
python scripts/multi_site_seo_analyzer.py --include-drafts
|
|
|
|
# Published + Drafts + AI
|
|
python scripts/multi_site_seo_analyzer.py --include-drafts --top-n 20
|
|
|
|
# Skip AI (faster)
|
|
python scripts/multi_site_seo_analyzer.py --no-ai
|
|
```
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **First run (published only):**
|
|
```bash
|
|
python scripts/multi_site_seo_analyzer.py --no-ai
|
|
```
|
|
|
|
2. **Analyze results:**
|
|
```bash
|
|
open output/reports/seo_analysis_*.csv
|
|
```
|
|
|
|
3. **Optimize published posts** with score < 50
|
|
|
|
4. **Second run (include drafts):**
|
|
```bash
|
|
python scripts/multi_site_seo_analyzer.py --include-drafts
|
|
```
|
|
|
|
5. **Decide on drafts:** Publish, improve, or delete
|
|
|
|
6. **Track progress:** Re-run monthly and compare scores
|
|
|
|
Ready? Start with: `python scripts/multi_site_seo_analyzer.py --include-drafts`
|