Compare commits
12 Commits
65eb83f76c
...
working-co
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
69e4287366 | ||
|
|
6ef268ba80 | ||
|
|
ba43d70a56 | ||
|
|
66ea25002a | ||
|
|
93ea5794f0 | ||
|
|
ba8e39b5d8 | ||
|
|
84f8fc6db5 | ||
|
|
06d660f9c8 | ||
|
|
54168a1c00 | ||
|
|
b265125656 | ||
|
|
fa700cba98 | ||
|
|
00f0cce03e |
226
AUTHOR_FILTER_GUIDE.md
Normal file
226
AUTHOR_FILTER_GUIDE.md
Normal file
@@ -0,0 +1,226 @@
|
|||||||
|
# Author Filter Guide
|
||||||
|
|
||||||
|
Export posts from specific authors using the enhanced export functionality.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The export command now supports filtering posts by author name or author ID, making it easy to:
|
||||||
|
- Export posts from a specific author across all sites
|
||||||
|
- Combine author filtering with site filtering
|
||||||
|
- Export posts from multiple authors at once
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Filter by Author Name
|
||||||
|
|
||||||
|
Export posts from a specific author (case-insensitive, partial match):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Export posts by "John Doe"
|
||||||
|
./seo export --author "John Doe"
|
||||||
|
|
||||||
|
# Export posts by "admin" (partial match)
|
||||||
|
./seo export --author admin
|
||||||
|
|
||||||
|
# Export posts from multiple authors
|
||||||
|
./seo export --author "John Doe" "Jane Smith"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Filter by Author ID
|
||||||
|
|
||||||
|
Export posts from specific author IDs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Export posts by author ID 1
|
||||||
|
./seo export --author-id 1
|
||||||
|
|
||||||
|
# Export posts from multiple author IDs
|
||||||
|
./seo export --author-id 1 2 3
|
||||||
|
```
|
||||||
|
|
||||||
|
### Combine with Site Filter
|
||||||
|
|
||||||
|
Export posts from a specific author on a specific site:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Export John's posts from mistergeek.net only
|
||||||
|
./seo export --author "John Doe" --site mistergeek.net
|
||||||
|
|
||||||
|
# Export posts by author ID 1 from webscroll.fr
|
||||||
|
./seo export --author-id 1 -s webscroll.fr
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dry Run Mode
|
||||||
|
|
||||||
|
Preview what would be exported:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./seo export --author "John Doe" --dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
1. **Author Name Matching**
|
||||||
|
- Case-insensitive matching
|
||||||
|
- Partial matches work (e.g., "john" matches "John Doe")
|
||||||
|
- Matches against author's display name and slug
|
||||||
|
|
||||||
|
2. **Author ID Matching**
|
||||||
|
- Exact match on WordPress user ID
|
||||||
|
- More reliable than name matching
|
||||||
|
- Useful when authors have similar names
|
||||||
|
|
||||||
|
3. **Author Information**
|
||||||
|
- The exporter fetches all authors from each site
|
||||||
|
- Author names are included in the exported CSV
|
||||||
|
- Posts are filtered before export
|
||||||
|
|
||||||
|
## Export Output
|
||||||
|
|
||||||
|
The exported CSV includes author information:
|
||||||
|
|
||||||
|
```csv
|
||||||
|
site,post_id,status,title,slug,url,author_id,author_name,date_published,...
|
||||||
|
mistergeek.net,123,publish,"VPN Guide",vpn-guide,https://...,1,John Doe,2024-01-15,...
|
||||||
|
```
|
||||||
|
|
||||||
|
### New Column: `author_name`
|
||||||
|
|
||||||
|
The export now includes the author's display name in addition to the author ID.
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Example 1: Export All Posts by Admin
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./seo export --author admin
|
||||||
|
```
|
||||||
|
|
||||||
|
Output: `output/all_posts_YYYY-MM-DD.csv`
|
||||||
|
|
||||||
|
### Example 2: Export Specific Author from Specific Site
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./seo export --author "Marie" --site webscroll.fr
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 3: Export Multiple Authors
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./seo export --author "John" "Marie" "Admin"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 4: Export by Author ID
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./seo export --author-id 5
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 5: Combine Author and Site Filters
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./seo export --author "John" --site mistergeek.net --verbose
|
||||||
|
```
|
||||||
|
|
||||||
|
## Finding Author IDs
|
||||||
|
|
||||||
|
If you don't know the author ID, you can:
|
||||||
|
|
||||||
|
1. **Export all posts and check the CSV:**
|
||||||
|
```bash
|
||||||
|
./seo export
|
||||||
|
# Then open the CSV and check the author_id column
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Use WordPress Admin:**
|
||||||
|
- Go to Users → All Users
|
||||||
|
- Hover over a user name
|
||||||
|
- The URL shows the user ID (e.g., `user_id=5`)
|
||||||
|
|
||||||
|
3. **Use WordPress REST API directly:**
|
||||||
|
```bash
|
||||||
|
curl -u username:password https://yoursite.com/wp-json/wp/v2/users
|
||||||
|
```
|
||||||
|
|
||||||
|
## Tips
|
||||||
|
|
||||||
|
1. **Use quotes for names with spaces:**
|
||||||
|
```bash
|
||||||
|
./seo export --author "John Doe" # ✓ Correct
|
||||||
|
./seo export --author John Doe # ✗ Wrong (treated as 2 authors)
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Partial matching is your friend:**
|
||||||
|
```bash
|
||||||
|
./seo export --author "john" # Matches "John Doe", "Johnny", etc.
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Combine with migration:**
|
||||||
|
```bash
|
||||||
|
# Export author's posts, then migrate to another site
|
||||||
|
./seo export --author "John Doe" --site webscroll.fr
|
||||||
|
./seo migrate output/all_posts_*.csv --destination mistergeek.net
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Verbose mode for debugging:**
|
||||||
|
```bash
|
||||||
|
./seo export --author "John" --verbose
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### No posts exported
|
||||||
|
|
||||||
|
**Possible causes:**
|
||||||
|
- Author name doesn't match (try different spelling)
|
||||||
|
- Author has no posts
|
||||||
|
- Author doesn't exist on that site
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
- Use `--verbose` to see what's happening
|
||||||
|
- Try author ID instead of name
|
||||||
|
- Check if author exists on the site
|
||||||
|
|
||||||
|
### Author names not showing in CSV
|
||||||
|
|
||||||
|
**Possible causes:**
|
||||||
|
- WordPress REST API doesn't allow user enumeration
|
||||||
|
- Authentication issue
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
- Check WordPress user permissions
|
||||||
|
- Verify credentials in config
|
||||||
|
- Author ID will still be present even if name lookup fails
|
||||||
|
|
||||||
|
## API Usage
|
||||||
|
|
||||||
|
Use author filtering programmatically:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from seo.app import SEOApp
|
||||||
|
|
||||||
|
app = SEOApp()
|
||||||
|
|
||||||
|
# Export by author name
|
||||||
|
csv_file = app.export(author_filter=["John Doe"])
|
||||||
|
|
||||||
|
# Export by author ID
|
||||||
|
csv_file = app.export(author_ids=[1, 2])
|
||||||
|
|
||||||
|
# Export by author and site
|
||||||
|
csv_file = app.export(
|
||||||
|
author_filter=["John"],
|
||||||
|
site_filter="mistergeek.net"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Related Commands
|
||||||
|
|
||||||
|
- `seo migrate` - Migrate exported posts to another site
|
||||||
|
- `seo analyze` - Analyze exported posts with AI
|
||||||
|
- `seo export --help` - Show all export options
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [MIGRATION_GUIDE.md](MIGRATION_GUIDE.md) - Post migration guide
|
||||||
|
- [README.md](README.md) - Main documentation
|
||||||
327
META_DESCRIPTION_GUIDE.md
Normal file
327
META_DESCRIPTION_GUIDE.md
Normal file
@@ -0,0 +1,327 @@
|
|||||||
|
# Meta Description Generation Guide
|
||||||
|
|
||||||
|
AI-powered meta description generation and optimization for WordPress posts.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The meta description generator uses AI to create SEO-optimized meta descriptions for your blog posts. It can:
|
||||||
|
|
||||||
|
- **Generate new meta descriptions** for posts without them
|
||||||
|
- **Improve existing meta descriptions** that are poor quality
|
||||||
|
- **Optimize length** (120-160 characters - ideal for SEO)
|
||||||
|
- **Include focus keywords** naturally
|
||||||
|
- **Add call-to-action** elements when appropriate
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Generate for All Posts
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Generate meta descriptions for all posts
|
||||||
|
./seo meta_description
|
||||||
|
|
||||||
|
# Use a specific CSV file
|
||||||
|
./seo meta_description output/all_posts_2026-02-16.csv
|
||||||
|
```
|
||||||
|
|
||||||
|
### Generate Only for Missing Meta Descriptions
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Only generate for posts without meta descriptions
|
||||||
|
./seo meta_description --only-missing
|
||||||
|
```
|
||||||
|
|
||||||
|
### Improve Poor Quality Meta Descriptions
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Only regenerate meta descriptions with poor quality scores
|
||||||
|
./seo meta_description --only-poor
|
||||||
|
|
||||||
|
# Limit to first 10 poor quality meta descriptions
|
||||||
|
./seo meta_description --only-poor --limit 10
|
||||||
|
```
|
||||||
|
|
||||||
|
### Dry Run Mode
|
||||||
|
|
||||||
|
Preview what would be processed:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./seo meta_description --dry-run
|
||||||
|
./seo meta_description --dry-run --only-missing
|
||||||
|
```
|
||||||
|
|
||||||
|
## Command Options
|
||||||
|
|
||||||
|
| Option | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `--only-missing` | Only generate for posts without meta descriptions |
|
||||||
|
| `--only-poor` | Only generate for posts with poor quality meta descriptions |
|
||||||
|
| `--limit <N>` | Limit number of posts to process |
|
||||||
|
| `--output`, `-o` | Custom output file path |
|
||||||
|
| `--dry-run` | Preview without generating |
|
||||||
|
| `--verbose`, `-v` | Enable verbose logging |
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
### 1. Content Analysis
|
||||||
|
|
||||||
|
The AI analyzes:
|
||||||
|
- Post title
|
||||||
|
- Content preview (first 500 characters)
|
||||||
|
- Excerpt (if available)
|
||||||
|
- Focus keyword (if specified)
|
||||||
|
- Current meta description (if exists)
|
||||||
|
|
||||||
|
### 2. AI Generation
|
||||||
|
|
||||||
|
The AI generates meta descriptions following SEO best practices:
|
||||||
|
- **Length**: 120-160 characters (optimal for search engines)
|
||||||
|
- **Keywords**: Naturally includes focus keyword
|
||||||
|
- **Compelling**: Action-oriented and engaging
|
||||||
|
- **Accurate**: Clearly describes post content
|
||||||
|
- **Active voice**: Uses active rather than passive voice
|
||||||
|
- **Call-to-action**: Includes CTA when appropriate
|
||||||
|
|
||||||
|
### 3. Quality Validation
|
||||||
|
|
||||||
|
Each generated meta description is scored on:
|
||||||
|
- **Length optimization** (120-160 chars = 100 points)
|
||||||
|
- **Proper ending** (period = +5 points)
|
||||||
|
- **Call-to-action words** (+5 points)
|
||||||
|
- **Overall quality** (minimum 70 points to pass)
|
||||||
|
|
||||||
|
### 4. Output
|
||||||
|
|
||||||
|
Results are saved to CSV with:
|
||||||
|
- Original meta description
|
||||||
|
- Generated meta description
|
||||||
|
- Length of generated meta
|
||||||
|
- Validation score (0-100)
|
||||||
|
- Whether length is optimal
|
||||||
|
- Whether it's an improvement
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
The tool generates a CSV file in `output/`:
|
||||||
|
|
||||||
|
```
|
||||||
|
output/meta_descriptions_20260216_143022.csv
|
||||||
|
```
|
||||||
|
|
||||||
|
### CSV Columns
|
||||||
|
|
||||||
|
| Column | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `post_id` | WordPress post ID |
|
||||||
|
| `site` | Site name |
|
||||||
|
| `title` | Post title |
|
||||||
|
| `current_meta_description` | Existing meta (if any) |
|
||||||
|
| `generated_meta_description` | AI-generated meta |
|
||||||
|
| `generated_length` | Character count |
|
||||||
|
| `validation_score` | Quality score (0-100) |
|
||||||
|
| `is_optimal_length` | True if 120-160 chars |
|
||||||
|
| `improvement` | True if better than current |
|
||||||
|
| `status` | Generation status |
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Example 1: Generate All Missing Meta Descriptions
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Export posts first
|
||||||
|
./seo export
|
||||||
|
|
||||||
|
# Generate meta descriptions for posts without them
|
||||||
|
./seo meta_description --only-missing
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output:**
|
||||||
|
```
|
||||||
|
Generating AI-optimized meta descriptions...
|
||||||
|
Filter: Only posts without meta descriptions
|
||||||
|
Processing post 1/45
|
||||||
|
✓ Generated meta description (score: 95, length: 155)
|
||||||
|
...
|
||||||
|
|
||||||
|
✅ Meta description generation completed!
|
||||||
|
Results: output/meta_descriptions_20260216_143022.csv
|
||||||
|
|
||||||
|
📊 Summary:
|
||||||
|
Total processed: 45
|
||||||
|
Improved: 42 (93.3%)
|
||||||
|
Optimal length: 40 (88.9%)
|
||||||
|
Average score: 92.5
|
||||||
|
API calls: 45
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 2: Fix Poor Quality Meta Descriptions
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Only improve meta descriptions scoring below 70
|
||||||
|
./seo meta_description --only-poor --limit 20
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 3: Test with Small Batch
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test with first 5 posts
|
||||||
|
./seo meta_description --limit 5
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 4: Custom Output File
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./seo meta_description --output output/custom_meta_gen.csv
|
||||||
|
```
|
||||||
|
|
||||||
|
## Meta Description Quality Scoring
|
||||||
|
|
||||||
|
### Scoring Criteria
|
||||||
|
|
||||||
|
| Criteria | Points |
|
||||||
|
|----------|--------|
|
||||||
|
| Optimal length (120-160 chars) | 100 |
|
||||||
|
| Too short (< 120 chars) | 50 - (deficit) |
|
||||||
|
| Too long (> 160 chars) | 50 - (excess) |
|
||||||
|
| Ends with period | +5 |
|
||||||
|
| Contains CTA words | +5 |
|
||||||
|
|
||||||
|
### Quality Thresholds
|
||||||
|
|
||||||
|
- **Excellent (90-100)**: Ready to use
|
||||||
|
- **Good (70-89)**: Minor improvements possible
|
||||||
|
- **Poor (< 70)**: Needs regeneration
|
||||||
|
|
||||||
|
### CTA Words Detected
|
||||||
|
|
||||||
|
The system looks for action words like:
|
||||||
|
- learn, discover, find, explore
|
||||||
|
- read, get, see, try, start
|
||||||
|
- and more...
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Before Generation
|
||||||
|
|
||||||
|
1. **Export fresh data** - Ensure you have latest posts
|
||||||
|
```bash
|
||||||
|
./seo export
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Review focus keywords** - Posts with focus keywords get better results
|
||||||
|
|
||||||
|
3. **Test with small batch** - Try with `--limit 5` first
|
||||||
|
|
||||||
|
### During Generation
|
||||||
|
|
||||||
|
1. **Monitor scores** - Watch validation scores in real-time
|
||||||
|
2. **Check API usage** - Track number of API calls
|
||||||
|
3. **Use filters** - Target only what needs improvement
|
||||||
|
|
||||||
|
### After Generation
|
||||||
|
|
||||||
|
1. **Review results** - Open the CSV and check generated metas
|
||||||
|
2. **Manual approval** - Don't auto-publish; review first
|
||||||
|
3. **A/B test** - Compare performance of new vs old metas
|
||||||
|
|
||||||
|
## Integration with WordPress
|
||||||
|
|
||||||
|
### Manual Update
|
||||||
|
|
||||||
|
1. Open the generated CSV: `output/meta_descriptions_*.csv`
|
||||||
|
2. Copy generated meta descriptions
|
||||||
|
3. Update in WordPress SEO plugin (RankMath, Yoast, etc.)
|
||||||
|
|
||||||
|
### Automated Update (Future)
|
||||||
|
|
||||||
|
Future versions may support direct WordPress updates:
|
||||||
|
```bash
|
||||||
|
# Not yet implemented
|
||||||
|
./seo meta_description --apply-to-wordpress
|
||||||
|
```
|
||||||
|
|
||||||
|
## API Usage & Cost
|
||||||
|
|
||||||
|
### API Calls
|
||||||
|
|
||||||
|
- Each post requires 1 API call
|
||||||
|
- Rate limited to 2 calls/second (0.5s delay)
|
||||||
|
- Uses Claude AI via OpenRouter
|
||||||
|
|
||||||
|
### Estimated Cost
|
||||||
|
|
||||||
|
Approximate cost per 1000 meta descriptions:
|
||||||
|
- **~$0.50 - $2.00** depending on content length
|
||||||
|
- Check OpenRouter pricing for current rates
|
||||||
|
|
||||||
|
### Monitoring
|
||||||
|
|
||||||
|
The summary shows:
|
||||||
|
- Total API calls made
|
||||||
|
- Cost tracking (if enabled)
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### No Posts to Process
|
||||||
|
|
||||||
|
**Problem:** "No posts to process"
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. Export posts first: `./seo export`
|
||||||
|
2. Check CSV has required columns
|
||||||
|
3. Verify filter isn't too restrictive
|
||||||
|
|
||||||
|
### Low Quality Scores
|
||||||
|
|
||||||
|
**Problem:** Generated metas scoring below 70
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. Add focus keywords to posts
|
||||||
|
2. Provide better content previews
|
||||||
|
3. Try regenerating with different parameters
|
||||||
|
|
||||||
|
### API Errors
|
||||||
|
|
||||||
|
**Problem:** "API call failed"
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. Check internet connection
|
||||||
|
2. Verify API key in `.env`
|
||||||
|
3. Check OpenRouter account status
|
||||||
|
4. Reduce batch size with `--limit`
|
||||||
|
|
||||||
|
### Rate Limiting
|
||||||
|
|
||||||
|
**Problem:** Too many API calls
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. Use `--limit` to batch process
|
||||||
|
2. Wait between batches
|
||||||
|
3. Upgrade API plan if needed
|
||||||
|
|
||||||
|
## Comparison with Other Tools
|
||||||
|
|
||||||
|
| Feature | This Tool | Other SEO Tools |
|
||||||
|
|---------|-----------|-----------------|
|
||||||
|
| AI-powered | ✅ Yes | ⚠️ Sometimes |
|
||||||
|
| Batch processing | ✅ Yes | ✅ Yes |
|
||||||
|
| Quality scoring | ✅ Yes | ❌ No |
|
||||||
|
| Custom prompts | ✅ Yes | ❌ No |
|
||||||
|
| WordPress integration | ⚠️ Manual | ✅ Some |
|
||||||
|
| Cost | Pay-per-use | Monthly subscription |
|
||||||
|
|
||||||
|
## Related Commands
|
||||||
|
|
||||||
|
- `seo export` - Export posts for analysis
|
||||||
|
- `seo analyze` - AI analysis with recommendations
|
||||||
|
- `seo seo_check` - SEO quality checking
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [README.md](README.md) - Main documentation
|
||||||
|
- [ENHANCED_ANALYSIS_GUIDE.md](ENHANCED_ANALYSIS_GUIDE.md) - AI analysis guide
|
||||||
|
- [EDITORIAL_STRATEGY_GUIDE.md](EDITORIAL_STRATEGY_GUIDE.md) - Content strategy
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Made with ❤️ for better SEO automation**
|
||||||
269
MIGRATION_GUIDE.md
Normal file
269
MIGRATION_GUIDE.md
Normal file
@@ -0,0 +1,269 @@
|
|||||||
|
# Post Migration Guide
|
||||||
|
|
||||||
|
This guide explains how to migrate posts between WordPress sites using the SEO automation tool.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The migration feature allows you to move posts from one WordPress site to another while preserving:
|
||||||
|
- Post content (title, body, excerpt)
|
||||||
|
- Categories (automatically created if they don't exist)
|
||||||
|
- Tags (automatically created if they don't exist)
|
||||||
|
- SEO metadata (RankMath, Yoast SEO)
|
||||||
|
- Post slug
|
||||||
|
|
||||||
|
## Migration Modes
|
||||||
|
|
||||||
|
There are two ways to migrate posts:
|
||||||
|
|
||||||
|
### 1. CSV-Based Migration
|
||||||
|
|
||||||
|
Migrate specific posts listed in a CSV file.
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
- CSV file with at least two columns: `site` and `post_id`
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
```bash
|
||||||
|
# Basic migration (posts deleted from source after migration)
|
||||||
|
./seo migrate posts_to_migrate.csv --destination mistergeek.net
|
||||||
|
|
||||||
|
# Keep posts on source site
|
||||||
|
./seo migrate posts_to_migrate.csv --destination mistergeek.net --keep-source
|
||||||
|
|
||||||
|
# Publish immediately instead of draft
|
||||||
|
./seo migrate posts_to_migrate.csv --destination mistergeek.net --post-status publish
|
||||||
|
|
||||||
|
# Custom output file for migration report
|
||||||
|
./seo migrate posts_to_migrate.csv --destination mistergeek.net --output custom_report.csv
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Filtered Migration
|
||||||
|
|
||||||
|
Migrate posts based on filters (category, date, etc.).
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
```bash
|
||||||
|
# Migrate all posts from source to destination
|
||||||
|
./seo migrate --source webscroll.fr --destination mistergeek.net
|
||||||
|
|
||||||
|
# Migrate posts from specific categories
|
||||||
|
./seo migrate --source webscroll.fr --destination mistergeek.net --category-filter VPN "Torrent Clients"
|
||||||
|
|
||||||
|
# Migrate posts with specific tags
|
||||||
|
./seo migrate --source webscroll.fr --destination mistergeek.net --tag-filter "guide" "tutorial"
|
||||||
|
|
||||||
|
# Migrate posts by date range
|
||||||
|
./seo migrate --source webscroll.fr --destination mistergeek.net --date-after 2024-01-01 --date-before 2024-12-31
|
||||||
|
|
||||||
|
# Limit number of posts
|
||||||
|
./seo migrate --source webscroll.fr --destination mistergeek.net --limit 10
|
||||||
|
|
||||||
|
# Combine filters
|
||||||
|
./seo migrate --source webscroll.fr --destination mistergeek.net \
|
||||||
|
--category-filter VPN \
|
||||||
|
--date-after 2024-01-01 \
|
||||||
|
--limit 5 \
|
||||||
|
--keep-source
|
||||||
|
```
|
||||||
|
|
||||||
|
## Command Options
|
||||||
|
|
||||||
|
### Required Options
|
||||||
|
|
||||||
|
- `--destination`, `--to`: Destination site (mistergeek.net, webscroll.fr, hellogeek.net)
|
||||||
|
- `--source`, `--from`: Source site (for filtered migration only)
|
||||||
|
- CSV file: Path to CSV with posts (for CSV-based migration)
|
||||||
|
|
||||||
|
### Optional Options
|
||||||
|
|
||||||
|
| Option | Description | Default |
|
||||||
|
|--------|-------------|---------|
|
||||||
|
| `--keep-source` | Keep posts on source site after migration | Delete after migration |
|
||||||
|
| `--post-status` | Status for migrated posts (draft, publish, pending) | draft |
|
||||||
|
| `--no-categories` | Don't create categories automatically | Create categories |
|
||||||
|
| `--no-tags` | Don't create tags automatically | Create tags |
|
||||||
|
| `--category-filter` | Filter by category names (filtered migration) | All categories |
|
||||||
|
| `--tag-filter` | Filter by tag names (filtered migration) | All tags |
|
||||||
|
| `--date-after` | Migrate posts after this date (YYYY-MM-DD) | No limit |
|
||||||
|
| `--date-before` | Migrate posts before this date (YYYY-MM-DD) | No limit |
|
||||||
|
| `--limit` | Maximum number of posts to migrate | No limit |
|
||||||
|
| `--output`, `-o` | Custom output file for migration report | Auto-generated |
|
||||||
|
| `--dry-run` | Preview what would be done without doing it | Execute |
|
||||||
|
| `--verbose`, `-v` | Enable verbose logging | Normal logging |
|
||||||
|
|
||||||
|
## Migration Process
|
||||||
|
|
||||||
|
### What Gets Migrated
|
||||||
|
|
||||||
|
1. **Post Content**
|
||||||
|
- Title
|
||||||
|
- Body content (HTML preserved)
|
||||||
|
- Excerpt
|
||||||
|
- Slug
|
||||||
|
|
||||||
|
2. **Categories**
|
||||||
|
- Mapped from source to destination
|
||||||
|
- Created automatically if they don't exist on destination
|
||||||
|
- Hierarchical structure preserved (parent-child relationships)
|
||||||
|
|
||||||
|
3. **Tags**
|
||||||
|
- Mapped from source to destination
|
||||||
|
- Created automatically if they don't exist on destination
|
||||||
|
|
||||||
|
4. **SEO Metadata**
|
||||||
|
- RankMath title and description
|
||||||
|
- Yoast SEO title and description
|
||||||
|
- Focus keywords
|
||||||
|
|
||||||
|
### What Doesn't Get Migrated
|
||||||
|
|
||||||
|
- Featured images (must be re-uploaded manually)
|
||||||
|
- Post author (uses destination site's default)
|
||||||
|
- Comments (not transferred)
|
||||||
|
- Custom fields (except SEO metadata)
|
||||||
|
- Post revisions
|
||||||
|
|
||||||
|
## Migration Report
|
||||||
|
|
||||||
|
After migration, a CSV report is generated in `output/` with the following information:
|
||||||
|
|
||||||
|
```csv
|
||||||
|
source_site,source_post_id,destination_site,destination_post_id,title,status,categories_migrated,tags_migrated,deleted_from_source
|
||||||
|
webscroll.fr,123,mistergeek.net,456,"VPN Guide",draft,3,5,True
|
||||||
|
```
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Example 1: Migrate Specific Posts from CSV
|
||||||
|
|
||||||
|
1. Create a CSV file with posts to migrate:
|
||||||
|
```csv
|
||||||
|
site,post_id,title
|
||||||
|
webscroll.fr,123,VPN Guide
|
||||||
|
webscroll.fr,456,Torrent Tutorial
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Run migration:
|
||||||
|
```bash
|
||||||
|
./seo migrate my_posts.csv --destination mistergeek.net
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 2: Migrate All VPN Content
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./seo migrate --source webscroll.fr --destination mistergeek.net \
|
||||||
|
--category-filter VPN "VPN Reviews" \
|
||||||
|
--post-status draft \
|
||||||
|
--keep-source
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 3: Migrate Recent Content
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./seo migrate --source webscroll.fr --destination mistergeek.net \
|
||||||
|
--date-after 2024-06-01 \
|
||||||
|
--limit 20
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example 4: Preview Migration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./seo migrate --source webscroll.fr --destination mistergeek.net \
|
||||||
|
--category-filter VPN \
|
||||||
|
--dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Before Migration
|
||||||
|
|
||||||
|
1. **Backup both sites** - Always backup before bulk operations
|
||||||
|
2. **Test with a few posts** - Migrate 1-2 posts first to verify
|
||||||
|
3. **Check category structure** - Review destination site's categories
|
||||||
|
4. **Plan URL redirects** - If deleting from source, set up redirects
|
||||||
|
|
||||||
|
### During Migration
|
||||||
|
|
||||||
|
1. **Use dry-run first** - Preview what will be migrated
|
||||||
|
2. **Start with drafts** - Review before publishing
|
||||||
|
3. **Monitor logs** - Watch for errors or warnings
|
||||||
|
4. **Limit batch size** - Migrate in batches of 10-20 posts
|
||||||
|
|
||||||
|
### After Migration
|
||||||
|
|
||||||
|
1. **Review migrated posts** - Check formatting and categories
|
||||||
|
2. **Add featured images** - Manually upload if needed
|
||||||
|
3. **Set up redirects** - From old URLs to new URLs
|
||||||
|
4. **Update internal links** - Fix cross-site links
|
||||||
|
5. **Monitor SEO** - Track rankings after migration
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
**1. "Site not found" error**
|
||||||
|
- Check site name is correct (mistergeek.net, webscroll.fr, hellogeek.net)
|
||||||
|
- Verify credentials in config.yaml or .env
|
||||||
|
|
||||||
|
**2. "Category already exists" warning**
|
||||||
|
- This is normal - the migrator found a matching category
|
||||||
|
- The existing category will be used
|
||||||
|
|
||||||
|
**3. "Failed to create post" error**
|
||||||
|
- Check WordPress REST API is enabled
|
||||||
|
- Verify user has post creation permissions
|
||||||
|
- Check authentication credentials
|
||||||
|
|
||||||
|
**4. Posts missing featured images**
|
||||||
|
- Featured images are not migrated automatically
|
||||||
|
- Upload images manually to destination site
|
||||||
|
- Update featured image on migrated posts
|
||||||
|
|
||||||
|
**5. Categories not matching**
|
||||||
|
- Categories are matched by name (case-insensitive)
|
||||||
|
- "VPN" and "vpn" will match
|
||||||
|
- "VPN Guide" and "VPN" will NOT match - new category created
|
||||||
|
|
||||||
|
## API Usage
|
||||||
|
|
||||||
|
You can also use the migration feature programmatically:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from seo.app import SEOApp
|
||||||
|
|
||||||
|
app = SEOApp()
|
||||||
|
|
||||||
|
# CSV-based migration
|
||||||
|
app.migrate(
|
||||||
|
csv_file='output/posts_to_migrate.csv',
|
||||||
|
destination_site='mistergeek.net',
|
||||||
|
create_categories=True,
|
||||||
|
create_tags=True,
|
||||||
|
delete_after=False,
|
||||||
|
status='draft'
|
||||||
|
)
|
||||||
|
|
||||||
|
# Filtered migration
|
||||||
|
app.migrate_by_filter(
|
||||||
|
source_site='webscroll.fr',
|
||||||
|
destination_site='mistergeek.net',
|
||||||
|
category_filter=['VPN', 'Software'],
|
||||||
|
date_after='2024-01-01',
|
||||||
|
limit=10,
|
||||||
|
create_categories=True,
|
||||||
|
delete_after=False,
|
||||||
|
status='draft'
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Related Commands
|
||||||
|
|
||||||
|
- `seo export` - Export posts from all sites
|
||||||
|
- `seo editorial_strategy` - Analyze and get migration recommendations
|
||||||
|
- `seo category_propose` - Get AI category recommendations
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [README.md](README.md) - Main documentation
|
||||||
|
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
|
||||||
|
- [CATEGORY_MANAGEMENT_GUIDE.md](CATEGORY_MANAGEMENT_GUIDE.md) - Category management
|
||||||
355
PERFORMANCE_TRACKING_GUIDE.md
Normal file
355
PERFORMANCE_TRACKING_GUIDE.md
Normal file
@@ -0,0 +1,355 @@
|
|||||||
|
# SEO Performance Tracking Guide
|
||||||
|
|
||||||
|
Track and analyze your website's SEO performance using Google Analytics 4 and Google Search Console data.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The SEO performance tracking features allow you to:
|
||||||
|
|
||||||
|
- **Analyze page performance** - Track pageviews, clicks, impressions, CTR, and rankings
|
||||||
|
- **Find keyword opportunities** - Discover keywords you can rank higher for
|
||||||
|
- **Generate SEO reports** - Create comprehensive performance reports
|
||||||
|
- **Import data** - Support for both CSV imports and API integration
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
### 1. `seo performance` - Analyze Page Performance
|
||||||
|
|
||||||
|
Analyze traffic and search performance data.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Analyze with CSV exports
|
||||||
|
./seo performance --ga4 analytics.csv --gsc search.csv
|
||||||
|
|
||||||
|
# Analyze GA4 data only
|
||||||
|
./seo performance --ga4 analytics.csv
|
||||||
|
|
||||||
|
# Analyze GSC data only
|
||||||
|
./seo performance --gsc search.csv
|
||||||
|
|
||||||
|
# With custom output
|
||||||
|
./seo performance --ga4 analytics.csv --gsc search.csv --output custom_analysis.csv
|
||||||
|
|
||||||
|
# Preview
|
||||||
|
./seo performance --ga4 analytics.csv --dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
**Data Sources:**
|
||||||
|
|
||||||
|
- **Google Analytics 4**: Export from GA4 → Reports → Engagement → Pages and screens
|
||||||
|
- **Google Search Console**: Export from GSC → Performance → Search results → Export
|
||||||
|
|
||||||
|
**Metrics Analyzed:**
|
||||||
|
|
||||||
|
| Metric | Source | Description |
|
||||||
|
|--------|--------|-------------|
|
||||||
|
| Pageviews | GA4 | Number of page views |
|
||||||
|
| Sessions | GA4 | Number of sessions |
|
||||||
|
| Bounce Rate | GA4 | Percentage of single-page sessions |
|
||||||
|
| Engagement Rate | GA4 | Percentage of engaged sessions |
|
||||||
|
| Clicks | GSC | Number of search clicks |
|
||||||
|
| Impressions | GSC | Number of search impressions |
|
||||||
|
| CTR | GSC | Click-through rate |
|
||||||
|
| Position | GSC | Average search ranking |
|
||||||
|
|
||||||
|
### 2. `seo keywords` - Keyword Opportunities
|
||||||
|
|
||||||
|
Find keywords you can optimize for better rankings.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Analyze keyword opportunities
|
||||||
|
./seo keywords gsc_export.csv
|
||||||
|
|
||||||
|
# Limit results
|
||||||
|
./seo keywords gsc_export.csv --limit 20
|
||||||
|
|
||||||
|
# Custom output
|
||||||
|
./seo keywords gsc_export.csv --output keywords.csv
|
||||||
|
```
|
||||||
|
|
||||||
|
**What It Finds:**
|
||||||
|
|
||||||
|
- Keywords ranking positions 5-20 (easy to improve)
|
||||||
|
- High impression keywords with low CTR
|
||||||
|
- Keywords with good traffic potential
|
||||||
|
|
||||||
|
**Example Output:**
|
||||||
|
|
||||||
|
```
|
||||||
|
✅ Found 47 keyword opportunities!
|
||||||
|
|
||||||
|
Top opportunities:
|
||||||
|
1. best vpn 2024 - Position: 8.5, Impressions: 1250
|
||||||
|
2. torrent client - Position: 12.3, Impressions: 890
|
||||||
|
3. vpn for gaming - Position: 9.1, Impressions: 650
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. `seo report` - Generate SEO Report
|
||||||
|
|
||||||
|
Create comprehensive SEO performance reports.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Generate report
|
||||||
|
./seo report
|
||||||
|
|
||||||
|
# Custom output
|
||||||
|
./seo report --output monthly_seo_report.md
|
||||||
|
```
|
||||||
|
|
||||||
|
**Report Includes:**
|
||||||
|
|
||||||
|
- Performance summary
|
||||||
|
- Traffic analysis
|
||||||
|
- Keyword opportunities
|
||||||
|
- SEO recommendations
|
||||||
|
- Action items
|
||||||
|
|
||||||
|
## Data Export Guides
|
||||||
|
|
||||||
|
### Export from Google Analytics 4
|
||||||
|
|
||||||
|
1. Go to **Google Analytics** → Your Property
|
||||||
|
2. Navigate to **Reports** → **Engagement** → **Pages and screens**
|
||||||
|
3. Set date range (e.g., last 30 days)
|
||||||
|
4. Click **Share** → **Download file** → **CSV**
|
||||||
|
5. Save as `ga4_export.csv`
|
||||||
|
|
||||||
|
**Required Columns:**
|
||||||
|
- Page path
|
||||||
|
- Page title
|
||||||
|
- Views (pageviews)
|
||||||
|
- Sessions
|
||||||
|
- Bounce rate
|
||||||
|
- Engagement rate
|
||||||
|
|
||||||
|
### Export from Google Search Console
|
||||||
|
|
||||||
|
1. Go to **Google Search Console** → Your Property
|
||||||
|
2. Click **Performance** → **Search results**
|
||||||
|
3. Set date range (e.g., last 30 days)
|
||||||
|
4. Check all metrics: Clicks, Impressions, CTR, Position
|
||||||
|
5. Click **Export** → **CSV**
|
||||||
|
6. Save as `gsc_export.csv`
|
||||||
|
|
||||||
|
**Required Columns:**
|
||||||
|
- Page (URL)
|
||||||
|
- Clicks
|
||||||
|
- Impressions
|
||||||
|
- CTR
|
||||||
|
- Position
|
||||||
|
|
||||||
|
## API Integration (Advanced)
|
||||||
|
|
||||||
|
For automated data fetching, configure API credentials:
|
||||||
|
|
||||||
|
### 1. Google Analytics 4 API
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
|
||||||
|
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
|
||||||
|
2. Create a new project or select existing
|
||||||
|
3. Enable **Google Analytics Data API**
|
||||||
|
4. Create service account credentials
|
||||||
|
5. Download JSON key file
|
||||||
|
6. Share GA4 property with service account email
|
||||||
|
|
||||||
|
**Configuration:**
|
||||||
|
|
||||||
|
Add to `.env`:
|
||||||
|
```
|
||||||
|
GA4_CREDENTIALS=/path/to/ga4-credentials.json
|
||||||
|
GA4_PROPERTY_ID=properties/123456789
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Google Search Console API
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
|
||||||
|
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
|
||||||
|
2. Enable **Search Console API**
|
||||||
|
3. Create service account credentials
|
||||||
|
4. Download JSON key file
|
||||||
|
5. Share GSC property with service account email
|
||||||
|
|
||||||
|
**Configuration:**
|
||||||
|
|
||||||
|
Add to `.env`:
|
||||||
|
```
|
||||||
|
GSC_CREDENTIALS=/path/to/gsc-credentials.json
|
||||||
|
GSC_SITE_URL=https://www.mistergeek.net
|
||||||
|
```
|
||||||
|
|
||||||
|
### Using API Mode
|
||||||
|
|
||||||
|
Once configured, you can run without CSV files:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Fetch data directly from APIs
|
||||||
|
./seo performance --start-date 2024-01-01 --end-date 2024-01-31
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance Insights
|
||||||
|
|
||||||
|
### Low CTR Pages
|
||||||
|
|
||||||
|
Pages with high impressions but low CTR need better titles/descriptions:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Find pages with <2% CTR and 100+ impressions
|
||||||
|
./seo performance --gsc search.csv
|
||||||
|
# Check "low_ctr" section in output
|
||||||
|
```
|
||||||
|
|
||||||
|
**Action:** Optimize meta titles and descriptions
|
||||||
|
|
||||||
|
### Low Position Pages
|
||||||
|
|
||||||
|
Pages ranking beyond position 20 need content optimization:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Find pages ranking >20 with 50+ impressions
|
||||||
|
./seo performance --gsc search.csv
|
||||||
|
# Check "low_position" section in output
|
||||||
|
```
|
||||||
|
|
||||||
|
**Action:** Improve content quality, add internal links
|
||||||
|
|
||||||
|
### Keyword Opportunities
|
||||||
|
|
||||||
|
Keywords ranking 5-20 are easy to improve:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./seo keywords gsc_export.csv --limit 50
|
||||||
|
```
|
||||||
|
|
||||||
|
**Action:** Optimize content for these specific keywords
|
||||||
|
|
||||||
|
## Workflow Examples
|
||||||
|
|
||||||
|
### Weekly Performance Check
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Export fresh data from GA4 and GSC
|
||||||
|
# 2. Analyze performance
|
||||||
|
./seo performance --ga4 weekly_ga4.csv --gsc weekly_gsc.csv
|
||||||
|
|
||||||
|
# 3. Review keyword opportunities
|
||||||
|
./seo keywords weekly_gsc.csv --limit 20
|
||||||
|
|
||||||
|
# 4. Generate report
|
||||||
|
./seo report --output weekly_report.md
|
||||||
|
```
|
||||||
|
|
||||||
|
### Monthly SEO Audit
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Export full month data
|
||||||
|
# 2. Comprehensive analysis
|
||||||
|
./seo performance --ga4 month_ga4.csv --gsc month_gsc.csv
|
||||||
|
|
||||||
|
# 3. Identify top issues
|
||||||
|
# Review output for:
|
||||||
|
# - Low CTR pages
|
||||||
|
# - Low position pages
|
||||||
|
# - High impression, low click pages
|
||||||
|
|
||||||
|
# 4. Generate action plan
|
||||||
|
./seo report --output monthly_audit.md
|
||||||
|
```
|
||||||
|
|
||||||
|
### Content Optimization Sprint
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Find keyword opportunities
|
||||||
|
./seo keywords gsc.csv --limit 50 > opportunities.txt
|
||||||
|
|
||||||
|
# 2. For each opportunity:
|
||||||
|
# - Review current content
|
||||||
|
# - Optimize for target keyword
|
||||||
|
# - Update meta description
|
||||||
|
|
||||||
|
# 3. Track improvements
|
||||||
|
# Re-run analysis after 2 weeks
|
||||||
|
./seo performance --gsc new_gsc.csv
|
||||||
|
```
|
||||||
|
|
||||||
|
## Output Files
|
||||||
|
|
||||||
|
All analysis results are saved to `output/`:
|
||||||
|
|
||||||
|
| File | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `performance_data_*.csv` | Raw performance metrics |
|
||||||
|
| `performance_analysis_*.csv` | Analysis with insights |
|
||||||
|
| `seo_report_*.md` | Markdown report |
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### No Data Loaded
|
||||||
|
|
||||||
|
**Problem:** "No data loaded. Provide GA4 and/or GSC export files."
|
||||||
|
|
||||||
|
**Solution:**
|
||||||
|
- Ensure CSV files are properly exported
|
||||||
|
- Check file paths are correct
|
||||||
|
- Verify CSV has required columns
|
||||||
|
|
||||||
|
### Column Name Errors
|
||||||
|
|
||||||
|
**Problem:** "KeyError: 'pageviews'"
|
||||||
|
|
||||||
|
**Solution:**
|
||||||
|
- Ensure GA4 export includes pageviews column
|
||||||
|
- Column names are normalized automatically
|
||||||
|
- Check CSV encoding (UTF-8)
|
||||||
|
|
||||||
|
### API Authentication Errors
|
||||||
|
|
||||||
|
**Problem:** "Failed to initialize GA4 client"
|
||||||
|
|
||||||
|
**Solution:**
|
||||||
|
- Verify service account JSON is valid
|
||||||
|
- Check API is enabled in Google Cloud
|
||||||
|
- Ensure service account has access to property
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Data Collection
|
||||||
|
|
||||||
|
1. **Export regularly** - Weekly or monthly exports
|
||||||
|
2. **Consistent date ranges** - Use same range for comparisons
|
||||||
|
3. **Keep historical data** - Archive old exports for trend analysis
|
||||||
|
|
||||||
|
### Analysis
|
||||||
|
|
||||||
|
1. **Focus on trends** - Look at changes over time
|
||||||
|
2. **Prioritize impact** - Fix high-traffic pages first
|
||||||
|
3. **Track improvements** - Re-analyze after optimizations
|
||||||
|
|
||||||
|
### Reporting
|
||||||
|
|
||||||
|
1. **Regular reports** - Weekly/monthly cadence
|
||||||
|
2. **Share insights** - Distribute to team/stakeholders
|
||||||
|
3. **Action-oriented** - Include specific recommendations
|
||||||
|
|
||||||
|
## Related Commands
|
||||||
|
|
||||||
|
- `seo export` - Export posts from WordPress
|
||||||
|
- `seo meta_description` - Generate meta descriptions
|
||||||
|
- `seo update_meta` - Update meta on WordPress
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [README.md](README.md) - Main documentation
|
||||||
|
- [META_DESCRIPTION_GUIDE.md](META_DESCRIPTION_GUIDE.md) - Meta description guide
|
||||||
|
- [ANALYTICS_SETUP.md](ANALYTICS_SETUP.md) - API setup guide (if exists)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Made with ❤️ for better SEO automation**
|
||||||
34
check_confidence.py
Normal file
34
check_confidence.py
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
import csv
|
||||||
|
from collections import Counter
|
||||||
|
import glob
|
||||||
|
|
||||||
|
files = sorted(glob.glob('output/category_proposals_*.csv'))
|
||||||
|
if files:
|
||||||
|
with open(files[-1], 'r') as f:
|
||||||
|
reader = csv.DictReader(f)
|
||||||
|
proposals = list(reader)
|
||||||
|
|
||||||
|
print("=== All Proposals ===")
|
||||||
|
print(f"Total: {len(proposals)}\n")
|
||||||
|
|
||||||
|
print("By Site:")
|
||||||
|
sites = Counter(p['current_site'] for p in proposals)
|
||||||
|
for site, count in sorted(sites.items()):
|
||||||
|
print(f" {site}: {count}")
|
||||||
|
|
||||||
|
print("\nBy Confidence (all sites):")
|
||||||
|
confs = Counter(p['category_confidence'] for p in proposals)
|
||||||
|
for conf, count in sorted(confs.items()):
|
||||||
|
print(f" {conf}: {count}")
|
||||||
|
|
||||||
|
print("\nBy Site and Confidence:")
|
||||||
|
for site in ['mistergeek.net', 'webscroll.fr', 'hellogeek.net']:
|
||||||
|
site_props = [p for p in proposals if p['current_site'] == site]
|
||||||
|
confs = Counter(p['category_confidence'] for p in site_props)
|
||||||
|
print(f"\n {site} ({len(site_props)} total):")
|
||||||
|
for conf, count in sorted(confs.items()):
|
||||||
|
print(f" {conf}: {count}")
|
||||||
|
|
||||||
|
medium_or_better = [p for p in site_props if p['category_confidence'] in ['High', 'Medium']]
|
||||||
|
print(f" → Would process with -c Medium (default): {len(medium_or_better)}")
|
||||||
332
src/seo/app.py
332
src/seo/app.py
@@ -5,13 +5,19 @@ SEO Application Core - Integrated SEO automation functionality
|
|||||||
import logging
|
import logging
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
from typing import Optional, List, Tuple
|
from typing import Optional, List, Tuple, Dict
|
||||||
|
|
||||||
from .exporter import PostExporter
|
from .exporter import PostExporter
|
||||||
from .analyzer import EnhancedPostAnalyzer
|
from .analyzer import EnhancedPostAnalyzer
|
||||||
from .category_proposer import CategoryProposer
|
from .category_proposer import CategoryProposer
|
||||||
from .category_manager import WordPressCategoryManager, CategoryAssignmentProcessor
|
from .category_manager import WordPressCategoryManager, CategoryAssignmentProcessor
|
||||||
from .editorial_strategy import EditorialStrategyAnalyzer
|
from .editorial_strategy import EditorialStrategyAnalyzer
|
||||||
|
from .post_migrator import WordPressPostMigrator
|
||||||
|
from .meta_description_generator import MetaDescriptionGenerator
|
||||||
|
from .meta_description_updater import MetaDescriptionUpdater
|
||||||
|
from .performance_tracker import SEOPerformanceTracker
|
||||||
|
from .performance_analyzer import PerformanceAnalyzer
|
||||||
|
from .media_importer import WordPressMediaImporter
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@@ -34,11 +40,23 @@ class SEOApp:
|
|||||||
else:
|
else:
|
||||||
logging.basicConfig(level=logging.INFO)
|
logging.basicConfig(level=logging.INFO)
|
||||||
|
|
||||||
def export(self) -> str:
|
def export(self, author_filter: Optional[List[str]] = None,
|
||||||
"""Export all posts from WordPress sites."""
|
author_ids: Optional[List[int]] = None,
|
||||||
|
site_filter: Optional[str] = None) -> str:
|
||||||
|
"""
|
||||||
|
Export all posts from WordPress sites.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
author_filter: List of author names to filter by
|
||||||
|
author_ids: List of author IDs to filter by
|
||||||
|
site_filter: Export from specific site only
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Path to exported CSV file
|
||||||
|
"""
|
||||||
logger.info("📦 Exporting all posts from WordPress sites...")
|
logger.info("📦 Exporting all posts from WordPress sites...")
|
||||||
exporter = PostExporter()
|
exporter = PostExporter(author_filter=author_filter, author_ids=author_ids)
|
||||||
return exporter.run()
|
return exporter.run(site_filter=site_filter)
|
||||||
|
|
||||||
def analyze(self, csv_file: Optional[str] = None, fields: Optional[List[str]] = None,
|
def analyze(self, csv_file: Optional[str] = None, fields: Optional[List[str]] = None,
|
||||||
update: bool = False, output: Optional[str] = None) -> str:
|
update: bool = False, output: Optional[str] = None) -> str:
|
||||||
@@ -92,7 +110,8 @@ class SEOApp:
|
|||||||
return proposer.run(output_file=output)
|
return proposer.run(output_file=output)
|
||||||
|
|
||||||
def category_apply(self, proposals_csv: str, site_name: str,
|
def category_apply(self, proposals_csv: str, site_name: str,
|
||||||
confidence: str = 'Medium', dry_run: bool = False) -> dict:
|
confidence: str = 'Medium', strict: bool = False,
|
||||||
|
dry_run: bool = False) -> dict:
|
||||||
"""
|
"""
|
||||||
Apply AI category proposals to WordPress.
|
Apply AI category proposals to WordPress.
|
||||||
|
|
||||||
@@ -100,6 +119,7 @@ class SEOApp:
|
|||||||
proposals_csv: Path to proposals CSV
|
proposals_csv: Path to proposals CSV
|
||||||
site_name: Site to apply changes to (mistergeek.net, webscroll.fr, hellogeek.net)
|
site_name: Site to apply changes to (mistergeek.net, webscroll.fr, hellogeek.net)
|
||||||
confidence: Minimum confidence level (High, Medium, Low)
|
confidence: Minimum confidence level (High, Medium, Low)
|
||||||
|
strict: If True, only match exact confidence (not "or better")
|
||||||
dry_run: If True, preview changes without applying
|
dry_run: If True, preview changes without applying
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
@@ -112,6 +132,7 @@ class SEOApp:
|
|||||||
proposals_csv=proposals_csv,
|
proposals_csv=proposals_csv,
|
||||||
site_name=site_name,
|
site_name=site_name,
|
||||||
confidence_threshold=confidence,
|
confidence_threshold=confidence,
|
||||||
|
strict=strict,
|
||||||
dry_run=dry_run
|
dry_run=dry_run
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -161,6 +182,93 @@ class SEOApp:
|
|||||||
analyzer = EditorialStrategyAnalyzer()
|
analyzer = EditorialStrategyAnalyzer()
|
||||||
return analyzer.run(csv_file)
|
return analyzer.run(csv_file)
|
||||||
|
|
||||||
|
def migrate(self, csv_file: str, destination_site: str,
|
||||||
|
create_categories: bool = True, create_tags: bool = True,
|
||||||
|
delete_after: bool = False, status: str = 'draft',
|
||||||
|
output_file: Optional[str] = None,
|
||||||
|
ignore_original_date: bool = False) -> str:
|
||||||
|
"""
|
||||||
|
Migrate posts from CSV file to destination site.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
csv_file: Path to CSV file with posts to migrate (must have 'site' and 'post_id' columns)
|
||||||
|
destination_site: Destination site name (mistergeek.net, webscroll.fr, hellogeek.net)
|
||||||
|
create_categories: If True, create categories if they don't exist
|
||||||
|
create_tags: If True, create tags if they don't exist
|
||||||
|
delete_after: If True, delete posts from source after migration
|
||||||
|
status: Status for new posts ('draft', 'publish', 'pending')
|
||||||
|
output_file: Custom output file path for migration report
|
||||||
|
ignore_original_date: If True, use current date instead of original post date
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Path to migration report CSV
|
||||||
|
"""
|
||||||
|
logger.info(f"🚀 Migrating posts to {destination_site}...")
|
||||||
|
|
||||||
|
migrator = WordPressPostMigrator()
|
||||||
|
return migrator.migrate_posts_from_csv(
|
||||||
|
csv_file=csv_file,
|
||||||
|
destination_site=destination_site,
|
||||||
|
create_categories=create_categories,
|
||||||
|
create_tags=create_tags,
|
||||||
|
delete_after=delete_after,
|
||||||
|
status=status,
|
||||||
|
output_file=output_file,
|
||||||
|
ignore_original_date=ignore_original_date
|
||||||
|
)
|
||||||
|
|
||||||
|
def migrate_by_filter(self, source_site: str, destination_site: str,
|
||||||
|
category_filter: Optional[List[str]] = None,
|
||||||
|
tag_filter: Optional[List[str]] = None,
|
||||||
|
date_after: Optional[str] = None,
|
||||||
|
date_before: Optional[str] = None,
|
||||||
|
status_filter: Optional[List[str]] = None,
|
||||||
|
create_categories: bool = True,
|
||||||
|
create_tags: bool = True,
|
||||||
|
delete_after: bool = False,
|
||||||
|
status: str = 'draft',
|
||||||
|
limit: Optional[int] = None,
|
||||||
|
ignore_original_date: bool = False) -> str:
|
||||||
|
"""
|
||||||
|
Migrate posts based on filters.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
source_site: Source site name
|
||||||
|
destination_site: Destination site name
|
||||||
|
category_filter: List of category names to filter by
|
||||||
|
tag_filter: List of tag names to filter by
|
||||||
|
date_after: Only migrate posts after this date (YYYY-MM-DD)
|
||||||
|
date_before: Only migrate posts before this date (YYYY-MM-DD)
|
||||||
|
status_filter: List of statuses to filter by (e.g., ['publish', 'draft'])
|
||||||
|
create_categories: If True, create categories if they don't exist
|
||||||
|
create_tags: If True, create tags if they don't exist
|
||||||
|
delete_after: If True, delete posts from source after migration
|
||||||
|
status: Status for new posts
|
||||||
|
limit: Maximum number of posts to migrate
|
||||||
|
ignore_original_date: If True, use current date instead of original post date
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Path to migration report CSV
|
||||||
|
"""
|
||||||
|
logger.info(f"🚀 Migrating posts from {source_site} to {destination_site}...")
|
||||||
|
|
||||||
|
migrator = WordPressPostMigrator()
|
||||||
|
return migrator.migrate_posts_by_filter(
|
||||||
|
source_site=source_site,
|
||||||
|
destination_site=destination_site,
|
||||||
|
category_filter=category_filter,
|
||||||
|
tag_filter=tag_filter,
|
||||||
|
date_after=date_after,
|
||||||
|
date_before=date_before,
|
||||||
|
status_filter=status_filter,
|
||||||
|
create_categories=create_categories,
|
||||||
|
create_tags=create_tags,
|
||||||
|
delete_after=delete_after,
|
||||||
|
status=status,
|
||||||
|
limit=limit,
|
||||||
|
ignore_original_date=ignore_original_date
|
||||||
|
)
|
||||||
|
|
||||||
def status(self) -> dict:
|
def status(self) -> dict:
|
||||||
"""Get status of output files."""
|
"""Get status of output files."""
|
||||||
files = list(self.output_dir.glob('*.csv'))
|
files = list(self.output_dir.glob('*.csv'))
|
||||||
@@ -179,6 +287,85 @@ class SEOApp:
|
|||||||
|
|
||||||
return status_info
|
return status_info
|
||||||
|
|
||||||
|
def generate_meta_descriptions(self, csv_file: Optional[str] = None,
|
||||||
|
output_file: Optional[str] = None,
|
||||||
|
only_missing: bool = False,
|
||||||
|
only_poor_quality: bool = False,
|
||||||
|
limit: Optional[int] = None) -> Tuple[str, Dict]:
|
||||||
|
"""
|
||||||
|
Generate AI-optimized meta descriptions for posts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
csv_file: Path to CSV file with posts (uses latest export if not provided)
|
||||||
|
output_file: Custom output file path for results
|
||||||
|
only_missing: Only generate for posts without meta descriptions
|
||||||
|
only_poor_quality: Only generate for posts with poor quality meta descriptions
|
||||||
|
limit: Maximum number of posts to process
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (output_file_path, summary_dict)
|
||||||
|
"""
|
||||||
|
logger.info("✨ Generating AI-optimized meta descriptions...")
|
||||||
|
|
||||||
|
if not csv_file:
|
||||||
|
csv_file = self._find_latest_export()
|
||||||
|
|
||||||
|
if not csv_file:
|
||||||
|
raise FileNotFoundError("No exported posts found. Run export() first or provide a CSV file.")
|
||||||
|
|
||||||
|
logger.info(f"Using file: {csv_file}")
|
||||||
|
|
||||||
|
generator = MetaDescriptionGenerator(csv_file)
|
||||||
|
return generator.run(
|
||||||
|
output_file=output_file,
|
||||||
|
only_missing=only_missing,
|
||||||
|
only_poor_quality=only_poor_quality,
|
||||||
|
limit=limit
|
||||||
|
)
|
||||||
|
|
||||||
|
def update_meta_descriptions(self, site: str,
|
||||||
|
post_ids: Optional[List[int]] = None,
|
||||||
|
category_names: Optional[List[str]] = None,
|
||||||
|
category_ids: Optional[List[int]] = None,
|
||||||
|
author_names: Optional[List[str]] = None,
|
||||||
|
limit: Optional[int] = None,
|
||||||
|
dry_run: bool = False,
|
||||||
|
skip_existing: bool = True,
|
||||||
|
force_regenerate: bool = False) -> Dict:
|
||||||
|
"""
|
||||||
|
Fetch posts from WordPress, generate AI meta descriptions, and update them.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
site: WordPress site name (REQUIRED) - mistergeek.net, webscroll.fr, hellogeek.net
|
||||||
|
post_ids: Specific post IDs to update
|
||||||
|
category_names: Filter by category names
|
||||||
|
category_ids: Filter by category IDs
|
||||||
|
author_names: Filter by author names
|
||||||
|
limit: Maximum number of posts to process
|
||||||
|
dry_run: If True, preview changes without updating
|
||||||
|
skip_existing: If True, skip posts with existing good quality meta descriptions
|
||||||
|
force_regenerate: If True, regenerate even for good quality metas
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Statistics dict
|
||||||
|
"""
|
||||||
|
logger.info(f"🔄 Updating meta descriptions on {site}...")
|
||||||
|
|
||||||
|
if not site:
|
||||||
|
raise ValueError("Site is required. Use --site mistergeek.net|webscroll.fr|hellogeek.net")
|
||||||
|
|
||||||
|
updater = MetaDescriptionUpdater(site)
|
||||||
|
return updater.run(
|
||||||
|
post_ids=post_ids,
|
||||||
|
category_ids=category_ids,
|
||||||
|
category_names=category_names,
|
||||||
|
author_names=author_names,
|
||||||
|
limit=limit,
|
||||||
|
dry_run=dry_run,
|
||||||
|
skip_existing=skip_existing,
|
||||||
|
force_regenerate=force_regenerate
|
||||||
|
)
|
||||||
|
|
||||||
def _find_latest_export(self) -> Optional[str]:
|
def _find_latest_export(self) -> Optional[str]:
|
||||||
"""Find the latest exported CSV file."""
|
"""Find the latest exported CSV file."""
|
||||||
csv_files = list(self.output_dir.glob('all_posts_*.csv'))
|
csv_files = list(self.output_dir.glob('all_posts_*.csv'))
|
||||||
@@ -188,3 +375,136 @@ class SEOApp:
|
|||||||
|
|
||||||
latest = max(csv_files, key=lambda f: f.stat().st_ctime)
|
latest = max(csv_files, key=lambda f: f.stat().st_ctime)
|
||||||
return str(latest)
|
return str(latest)
|
||||||
|
|
||||||
|
def performance(self, ga4_file: Optional[str] = None,
|
||||||
|
gsc_file: Optional[str] = None,
|
||||||
|
start_date: Optional[str] = None,
|
||||||
|
end_date: Optional[str] = None,
|
||||||
|
output_file: Optional[str] = None) -> Tuple[str, Dict]:
|
||||||
|
"""
|
||||||
|
Analyze page performance from GA4 and GSC data.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
ga4_file: Path to GA4 export CSV (or use API if credentials configured)
|
||||||
|
gsc_file: Path to GSC export CSV (or use API if credentials configured)
|
||||||
|
start_date: Start date YYYY-MM-DD (for API mode)
|
||||||
|
end_date: End date YYYY-MM-DD (for API mode)
|
||||||
|
output_file: Custom output file path
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (output_file_path, analysis_dict)
|
||||||
|
"""
|
||||||
|
logger.info("📊 Analyzing page performance...")
|
||||||
|
|
||||||
|
# If CSV files provided, use analyzer
|
||||||
|
if ga4_file or gsc_file:
|
||||||
|
analyzer = PerformanceAnalyzer()
|
||||||
|
return analyzer.run(ga4_file=ga4_file, gsc_file=gsc_file, output_file=output_file)
|
||||||
|
|
||||||
|
# Otherwise try API mode
|
||||||
|
tracker = SEOPerformanceTracker()
|
||||||
|
if tracker.ga4_client or tracker.gsc_service:
|
||||||
|
return tracker.run(start_date=start_date, end_date=end_date, output_file=output_file)
|
||||||
|
else:
|
||||||
|
logger.error("No data source available. Provide CSV exports or configure API credentials.")
|
||||||
|
return "", {}
|
||||||
|
|
||||||
|
def keywords(self, gsc_file: str, limit: int = 50) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Analyze keyword opportunities from GSC data.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
gsc_file: Path to GSC export CSV
|
||||||
|
limit: Maximum keywords to return
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of keyword opportunity dicts
|
||||||
|
"""
|
||||||
|
logger.info("🔍 Analyzing keyword opportunities...")
|
||||||
|
|
||||||
|
analyzer = PerformanceAnalyzer()
|
||||||
|
analyzer.load_gsc_export(gsc_file)
|
||||||
|
analysis = analyzer.analyze()
|
||||||
|
|
||||||
|
opportunities = analysis.get('keyword_opportunities', [])[:limit]
|
||||||
|
|
||||||
|
logger.info(f"Found {len(opportunities)} keyword opportunities")
|
||||||
|
|
||||||
|
return opportunities
|
||||||
|
|
||||||
|
def seo_report(self, output_file: Optional[str] = None) -> str:
|
||||||
|
"""
|
||||||
|
Generate comprehensive SEO performance report.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
output_file: Custom output file path
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Path to report file
|
||||||
|
"""
|
||||||
|
logger.info("📄 Generating SEO report...")
|
||||||
|
|
||||||
|
if not output_file:
|
||||||
|
output_dir = Path(__file__).parent.parent.parent / 'output'
|
||||||
|
output_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||||
|
output_file = output_dir / f'seo_report_{timestamp}.md'
|
||||||
|
|
||||||
|
output_file = Path(output_file)
|
||||||
|
|
||||||
|
# Generate report content
|
||||||
|
report = self._generate_report_content()
|
||||||
|
|
||||||
|
# Write report
|
||||||
|
with open(output_file, 'w', encoding='utf-8') as f:
|
||||||
|
f.write(report)
|
||||||
|
|
||||||
|
logger.info(f"✓ Report saved to: {output_file}")
|
||||||
|
return str(output_file)
|
||||||
|
|
||||||
|
def _generate_report_content(self) -> str:
|
||||||
|
"""Generate markdown report content."""
|
||||||
|
report = []
|
||||||
|
report.append("# SEO Performance Report\n")
|
||||||
|
report.append(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
|
||||||
|
report.append("---\n")
|
||||||
|
|
||||||
|
# Summary section
|
||||||
|
report.append("## 📊 Summary\n")
|
||||||
|
report.append("This report provides insights into your website's SEO performance.\n")
|
||||||
|
|
||||||
|
# Add analysis sections
|
||||||
|
report.append("## 📈 Traffic Analysis\n")
|
||||||
|
report.append("*Import GA4/GSC data for detailed traffic analysis*\n")
|
||||||
|
|
||||||
|
report.append("## 🔍 Keyword Opportunities\n")
|
||||||
|
report.append("*Import GSC data for keyword analysis*\n")
|
||||||
|
|
||||||
|
report.append("## 📝 SEO Recommendations\n")
|
||||||
|
report.append("1. Review and optimize meta descriptions\n")
|
||||||
|
report.append("2. Improve content for low-ranking pages\n")
|
||||||
|
report.append("3. Build internal links to important pages\n")
|
||||||
|
report.append("4. Monitor keyword rankings regularly\n")
|
||||||
|
|
||||||
|
return "\n".join(report)
|
||||||
|
|
||||||
|
def import_media(self, migration_report: str,
|
||||||
|
source_site: str = 'mistergeek.net',
|
||||||
|
destination_site: str = 'hellogeek.net',
|
||||||
|
dry_run: bool = True) -> Dict:
|
||||||
|
"""
|
||||||
|
Import media from source to destination site for migrated posts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
migration_report: Path to migration report CSV
|
||||||
|
source_site: Source site name
|
||||||
|
destination_site: Destination site name
|
||||||
|
dry_run: If True, preview without importing
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Statistics dict
|
||||||
|
"""
|
||||||
|
logger.info(f"📸 Importing media from {source_site} to {destination_site}...")
|
||||||
|
|
||||||
|
importer = WordPressMediaImporter(source_site, destination_site)
|
||||||
|
return importer.run_from_migration_report(migration_report, dry_run=dry_run)
|
||||||
|
|||||||
@@ -132,12 +132,33 @@ class WordPressCategoryManager:
|
|||||||
}
|
}
|
||||||
|
|
||||||
return category_data['id']
|
return category_data['id']
|
||||||
elif response.status_code == 409:
|
elif response.status_code == 400:
|
||||||
# Category already exists
|
# Category might already exist - search for it
|
||||||
logger.info(f" Category '{category_name}' already exists")
|
error_data = response.json()
|
||||||
existing = response.json()
|
if error_data.get('code') == 'term_exists':
|
||||||
if isinstance(existing, list) and len(existing) > 0:
|
term_id = error_data.get('data', {}).get('term_id')
|
||||||
return existing[0]['id']
|
if term_id:
|
||||||
|
logger.info(f" Category '{category_name}' already exists (ID: {term_id})")
|
||||||
|
|
||||||
|
# Fetch the category details
|
||||||
|
cat_response = requests.get(
|
||||||
|
f"{base_url}/wp-json/wp/v2/categories/{term_id}",
|
||||||
|
auth=auth,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
if cat_response.status_code == 200:
|
||||||
|
cat_data = cat_response.json()
|
||||||
|
# Update cache
|
||||||
|
if site_name in self.category_cache:
|
||||||
|
self.category_cache[site_name][cat_data['slug']] = {
|
||||||
|
'id': cat_data['id'],
|
||||||
|
'name': cat_data['name'],
|
||||||
|
'slug': cat_data['slug'],
|
||||||
|
'count': cat_data.get('count', 0)
|
||||||
|
}
|
||||||
|
return cat_data['id']
|
||||||
|
|
||||||
|
logger.warning(f" Category already exists or error: {error_data}")
|
||||||
return None
|
return None
|
||||||
else:
|
else:
|
||||||
logger.error(f"Error creating category: {response.status_code} - {response.text}")
|
logger.error(f"Error creating category: {response.status_code} - {response.text}")
|
||||||
@@ -164,21 +185,42 @@ class WordPressCategoryManager:
|
|||||||
if site_name not in self.category_cache:
|
if site_name not in self.category_cache:
|
||||||
self.fetch_categories(site_name)
|
self.fetch_categories(site_name)
|
||||||
|
|
||||||
# Check if category exists
|
# Check if category exists (by exact name first)
|
||||||
slug = category_name.lower().replace(' ', '-').replace('/', '-')
|
|
||||||
categories = self.category_cache.get(site_name, {})
|
categories = self.category_cache.get(site_name, {})
|
||||||
|
|
||||||
|
# Try exact name match (case-insensitive)
|
||||||
|
category_name_lower = category_name.lower()
|
||||||
|
for slug, cat_data in categories.items():
|
||||||
|
if cat_data['name'].lower() == category_name_lower:
|
||||||
|
logger.info(f"✓ Found existing category '{category_name}' (ID: {cat_data['id']})")
|
||||||
|
return cat_data['id']
|
||||||
|
|
||||||
|
# Try slug match
|
||||||
|
slug = category_name.lower().replace(' ', '-').replace('/', '-')
|
||||||
if slug in categories:
|
if slug in categories:
|
||||||
logger.info(f"✓ Found existing category '{category_name}' (ID: {categories[slug]['id']})")
|
logger.info(f"✓ Found existing category '{category_name}' (ID: {categories[slug]['id']})")
|
||||||
return categories[slug]['id']
|
return categories[slug]['id']
|
||||||
|
|
||||||
# Try alternative slug formats
|
# Try alternative slug formats (handle French characters)
|
||||||
alt_slug = category_name.lower().replace(' ', '-')
|
import unicodedata
|
||||||
if alt_slug in categories:
|
normalized_slug = unicodedata.normalize('NFKD', slug)\
|
||||||
logger.info(f"✓ Found existing category '{category_name}' (ID: {categories[alt_slug]['id']})")
|
.encode('ascii', 'ignore')\
|
||||||
return categories[alt_slug]['id']
|
.decode('ascii')\
|
||||||
|
.lower()\
|
||||||
|
.replace(' ', '-')
|
||||||
|
|
||||||
|
if normalized_slug in categories:
|
||||||
|
logger.info(f"✓ Found existing category '{category_name}' (ID: {categories[normalized_slug]['id']})")
|
||||||
|
return categories[normalized_slug]['id']
|
||||||
|
|
||||||
|
# Try partial match (if slug contains the category name)
|
||||||
|
for slug, cat_data in categories.items():
|
||||||
|
if category_name_lower in cat_data['name'].lower() or cat_data['name'].lower() in category_name_lower:
|
||||||
|
logger.info(f"✓ Found similar category '{cat_data['name']}' (ID: {cat_data['id']})")
|
||||||
|
return cat_data['id']
|
||||||
|
|
||||||
# Create new category
|
# Create new category
|
||||||
|
logger.info(f"Creating new category '{category_name}'...")
|
||||||
return self.create_category(site_name, category_name, description)
|
return self.create_category(site_name, category_name, description)
|
||||||
|
|
||||||
def assign_post_to_category(self, site_name: str, post_id: int,
|
def assign_post_to_category(self, site_name: str, post_id: int,
|
||||||
@@ -292,14 +334,16 @@ class CategoryAssignmentProcessor:
|
|||||||
|
|
||||||
def process_proposals(self, proposals: List[Dict], site_name: str,
|
def process_proposals(self, proposals: List[Dict], site_name: str,
|
||||||
confidence_threshold: str = 'Medium',
|
confidence_threshold: str = 'Medium',
|
||||||
|
strict: bool = False,
|
||||||
dry_run: bool = False) -> Dict[str, int]:
|
dry_run: bool = False) -> Dict[str, int]:
|
||||||
"""
|
"""
|
||||||
Process AI category proposals and apply to WordPress.
|
Process AI category proposals and apply to WordPress.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
proposals: List of proposal dicts from CSV
|
proposals: List of proposal dicts from CSV
|
||||||
site_name: Site to apply changes to
|
site_name: Site to apply changes to (filters proposals)
|
||||||
confidence_threshold: Minimum confidence to apply (High, Medium, Low)
|
confidence_threshold: Minimum confidence to apply (High, Medium, Low)
|
||||||
|
strict: If True, only match exact confidence level
|
||||||
dry_run: If True, don't actually make changes
|
dry_run: If True, don't actually make changes
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
@@ -312,7 +356,23 @@ class CategoryAssignmentProcessor:
|
|||||||
if dry_run:
|
if dry_run:
|
||||||
logger.info("DRY RUN - No changes will be made")
|
logger.info("DRY RUN - No changes will be made")
|
||||||
|
|
||||||
|
# Filter by site
|
||||||
|
original_count = len(proposals)
|
||||||
|
proposals = [p for p in proposals if p.get('current_site', '') == site_name]
|
||||||
|
filtered_by_site = original_count - len(proposals)
|
||||||
|
|
||||||
|
logger.info(f"Filtered to {len(proposals)} posts on {site_name} ({filtered_by_site} excluded from other sites)")
|
||||||
|
|
||||||
# Filter by confidence
|
# Filter by confidence
|
||||||
|
if strict:
|
||||||
|
# Exact match only
|
||||||
|
filtered_proposals = [
|
||||||
|
p for p in proposals
|
||||||
|
if p.get('category_confidence', 'Medium') == confidence_threshold
|
||||||
|
]
|
||||||
|
logger.info(f"Filtered to {len(filtered_proposals)} proposals (confidence = {confidence_threshold}, strict mode)")
|
||||||
|
else:
|
||||||
|
# Medium or better (default behavior)
|
||||||
confidence_order = {'High': 3, 'Medium': 2, 'Low': 1}
|
confidence_order = {'High': 3, 'Medium': 2, 'Low': 1}
|
||||||
min_confidence = confidence_order.get(confidence_threshold, 2)
|
min_confidence = confidence_order.get(confidence_threshold, 2)
|
||||||
|
|
||||||
@@ -320,28 +380,36 @@ class CategoryAssignmentProcessor:
|
|||||||
p for p in proposals
|
p for p in proposals
|
||||||
if confidence_order.get(p.get('category_confidence', 'Medium'), 2) >= min_confidence
|
if confidence_order.get(p.get('category_confidence', 'Medium'), 2) >= min_confidence
|
||||||
]
|
]
|
||||||
|
|
||||||
logger.info(f"Filtered to {len(filtered_proposals)} proposals (confidence >= {confidence_threshold})")
|
logger.info(f"Filtered to {len(filtered_proposals)} proposals (confidence >= {confidence_threshold})")
|
||||||
|
|
||||||
|
# Show breakdown
|
||||||
|
high_count = sum(1 for p in filtered_proposals if p.get('category_confidence') == 'High')
|
||||||
|
medium_count = sum(1 for p in filtered_proposals if p.get('category_confidence') == 'Medium')
|
||||||
|
low_count = sum(1 for p in filtered_proposals if p.get('category_confidence') == 'Low')
|
||||||
|
logger.info(f" Breakdown: High={high_count}, Medium={medium_count}, Low={low_count}")
|
||||||
|
|
||||||
# Fetch existing categories
|
# Fetch existing categories
|
||||||
self.category_manager.fetch_categories(site_name)
|
self.category_manager.fetch_categories(site_name)
|
||||||
|
|
||||||
# Process each proposal
|
# Process each proposal
|
||||||
for i, proposal in enumerate(filtered_proposals, 1):
|
for i, proposal in enumerate(filtered_proposals, 1):
|
||||||
logger.info(f"\n[{i}/{len(filtered_proposals)}] Processing post {proposal.get('post_id')}...")
|
post_title = proposal.get('title', 'Unknown')[:60]
|
||||||
|
post_id = proposal.get('post_id', '')
|
||||||
post_id = int(proposal.get('post_id', 0))
|
|
||||||
proposed_category = proposal.get('proposed_category', '')
|
proposed_category = proposal.get('proposed_category', '')
|
||||||
current_categories = proposal.get('current_categories', '')
|
current_categories = proposal.get('current_categories', '')
|
||||||
confidence = proposal.get('category_confidence', 'Medium')
|
confidence = proposal.get('category_confidence', 'Medium')
|
||||||
|
|
||||||
|
logger.info(f"\n[{i}/{len(filtered_proposals)}] Post {post_id}: {post_title}...")
|
||||||
|
logger.info(f" Current categories: {current_categories}")
|
||||||
|
logger.info(f" Proposed: {proposed_category} (confidence: {confidence})")
|
||||||
|
|
||||||
if not post_id or not proposed_category:
|
if not post_id or not proposed_category:
|
||||||
logger.warning(" Skipping: Missing post_id or proposed_category")
|
logger.warning(" Skipping: Missing post_id or proposed_category")
|
||||||
self.processing_stats['errors'] += 1
|
self.processing_stats['errors'] += 1
|
||||||
continue
|
continue
|
||||||
|
|
||||||
if dry_run:
|
if dry_run:
|
||||||
logger.info(f" Would assign to: {proposed_category}")
|
logger.info(f" [DRY RUN] Would assign to: {proposed_category}")
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# Get or create the category
|
# Get or create the category
|
||||||
@@ -362,9 +430,10 @@ class CategoryAssignmentProcessor:
|
|||||||
logger.info(f" ✓ Assigned to '{proposed_category}'")
|
logger.info(f" ✓ Assigned to '{proposed_category}'")
|
||||||
else:
|
else:
|
||||||
self.processing_stats['errors'] += 1
|
self.processing_stats['errors'] += 1
|
||||||
|
logger.error(f" ✗ Failed to assign category")
|
||||||
else:
|
else:
|
||||||
self.processing_stats['errors'] += 1
|
self.processing_stats['errors'] += 1
|
||||||
logger.error(f" Failed to get/create category '{proposed_category}'")
|
logger.error(f" ✗ Failed to get/create category '{proposed_category}'")
|
||||||
|
|
||||||
self.processing_stats['total_posts'] = len(filtered_proposals)
|
self.processing_stats['total_posts'] = len(filtered_proposals)
|
||||||
|
|
||||||
@@ -381,6 +450,7 @@ class CategoryAssignmentProcessor:
|
|||||||
|
|
||||||
def run(self, proposals_csv: str, site_name: str,
|
def run(self, proposals_csv: str, site_name: str,
|
||||||
confidence_threshold: str = 'Medium',
|
confidence_threshold: str = 'Medium',
|
||||||
|
strict: bool = False,
|
||||||
dry_run: bool = False) -> Dict[str, int]:
|
dry_run: bool = False) -> Dict[str, int]:
|
||||||
"""
|
"""
|
||||||
Run complete category assignment process.
|
Run complete category assignment process.
|
||||||
@@ -389,6 +459,7 @@ class CategoryAssignmentProcessor:
|
|||||||
proposals_csv: Path to proposals CSV
|
proposals_csv: Path to proposals CSV
|
||||||
site_name: Site to apply changes to
|
site_name: Site to apply changes to
|
||||||
confidence_threshold: Minimum confidence to apply
|
confidence_threshold: Minimum confidence to apply
|
||||||
|
strict: If True, only match exact confidence level
|
||||||
dry_run: If True, preview changes without applying
|
dry_run: If True, preview changes without applying
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
@@ -404,5 +475,6 @@ class CategoryAssignmentProcessor:
|
|||||||
proposals,
|
proposals,
|
||||||
site_name,
|
site_name,
|
||||||
confidence_threshold,
|
confidence_threshold,
|
||||||
dry_run
|
strict=strict,
|
||||||
|
dry_run=dry_run
|
||||||
)
|
)
|
||||||
|
|||||||
@@ -164,7 +164,7 @@ class CategoryProposer:
|
|||||||
logger.info("\n📊 Analyzing editorial strategy to inform category proposals...")
|
logger.info("\n📊 Analyzing editorial strategy to inform category proposals...")
|
||||||
|
|
||||||
analyzer = EditorialStrategyAnalyzer()
|
analyzer = EditorialStrategyAnalyzer()
|
||||||
analyzer.load_csv(str(self.csv_file))
|
analyzer.load_posts(str(self.csv_file))
|
||||||
self.site_analysis = analyzer.analyze_site_content()
|
self.site_analysis = analyzer.analyze_site_content()
|
||||||
|
|
||||||
logger.info("✓ Editorial strategy analysis complete")
|
logger.info("✓ Editorial strategy analysis complete")
|
||||||
|
|||||||
450
src/seo/cli.py
450
src/seo/cli.py
@@ -47,6 +47,48 @@ Examples:
|
|||||||
parser.add_argument('--site', '-s', choices=['mistergeek.net', 'webscroll.fr', 'hellogeek.net'],
|
parser.add_argument('--site', '-s', choices=['mistergeek.net', 'webscroll.fr', 'hellogeek.net'],
|
||||||
help='WordPress site for category operations')
|
help='WordPress site for category operations')
|
||||||
parser.add_argument('--description', '-d', help='Category description')
|
parser.add_argument('--description', '-d', help='Category description')
|
||||||
|
parser.add_argument('--strict', action='store_true', help='Strict confidence matching (exact match only)')
|
||||||
|
|
||||||
|
# Export arguments
|
||||||
|
parser.add_argument('--author', nargs='+', help='Filter by author name(s) for export')
|
||||||
|
parser.add_argument('--author-id', type=int, nargs='+', help='Filter by author ID(s) for export')
|
||||||
|
|
||||||
|
# Migration arguments
|
||||||
|
parser.add_argument('--destination', '--to', choices=['mistergeek.net', 'webscroll.fr', 'hellogeek.net'],
|
||||||
|
help='Destination site for migration')
|
||||||
|
parser.add_argument('--source', '--from', choices=['mistergeek.net', 'webscroll.fr', 'hellogeek.net'],
|
||||||
|
help='Source site for filtered migration')
|
||||||
|
parser.add_argument('--keep-source', action='store_true', help='Keep posts on source site (default: delete after migration)')
|
||||||
|
parser.add_argument('--post-status', choices=['draft', 'publish', 'pending'], default='draft',
|
||||||
|
help='Status for migrated posts (default: draft)')
|
||||||
|
parser.add_argument('--no-categories', action='store_true', help='Do not create categories automatically')
|
||||||
|
parser.add_argument('--no-tags', action='store_true', help='Do not create tags automatically')
|
||||||
|
parser.add_argument('--category-filter', nargs='+', help='Filter by category names (for filtered migration)')
|
||||||
|
parser.add_argument('--tag-filter', nargs='+', help='Filter by tag names (for filtered migration)')
|
||||||
|
parser.add_argument('--date-after', help='Migrate posts after this date (YYYY-MM-DD)')
|
||||||
|
parser.add_argument('--date-before', help='Migrate posts before this date (YYYY-MM-DD)')
|
||||||
|
parser.add_argument('--limit', type=int, help='Limit number of posts to migrate')
|
||||||
|
parser.add_argument('--ignore-original-date', action='store_true', help='Use current date instead of original post date')
|
||||||
|
|
||||||
|
# Meta description arguments
|
||||||
|
parser.add_argument('--only-missing', action='store_true', help='Only generate for posts without meta descriptions')
|
||||||
|
parser.add_argument('--only-poor', action='store_true', help='Only generate for posts with poor quality meta descriptions')
|
||||||
|
|
||||||
|
# Update meta arguments
|
||||||
|
parser.add_argument('--post-ids', type=int, nargs='+', help='Specific post IDs to update')
|
||||||
|
parser.add_argument('--category', nargs='+', help='Filter by category name(s)')
|
||||||
|
parser.add_argument('--category-id', type=int, nargs='+', help='Filter by category ID(s)')
|
||||||
|
parser.add_argument('--force', action='store_true', help='Force regenerate even for good quality meta descriptions')
|
||||||
|
|
||||||
|
# Performance arguments
|
||||||
|
parser.add_argument('--ga4', help='Path to Google Analytics 4 export CSV')
|
||||||
|
parser.add_argument('--gsc', help='Path to Google Search Console export CSV')
|
||||||
|
parser.add_argument('--start-date', help='Start date YYYY-MM-DD (for API mode)')
|
||||||
|
parser.add_argument('--end-date', help='End date YYYY-MM-DD (for API mode)')
|
||||||
|
|
||||||
|
# Media import arguments
|
||||||
|
parser.add_argument('--from-site', help='Source site for media import (default: mistergeek.net)')
|
||||||
|
parser.add_argument('--to-site', help='Destination site for media import (default: hellogeek.net)')
|
||||||
|
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
@@ -72,6 +114,13 @@ Examples:
|
|||||||
'category_apply': cmd_category_apply,
|
'category_apply': cmd_category_apply,
|
||||||
'category_create': cmd_category_create,
|
'category_create': cmd_category_create,
|
||||||
'editorial_strategy': cmd_editorial_strategy,
|
'editorial_strategy': cmd_editorial_strategy,
|
||||||
|
'migrate': cmd_migrate,
|
||||||
|
'meta_description': cmd_meta_description,
|
||||||
|
'update_meta': cmd_update_meta,
|
||||||
|
'performance': cmd_performance,
|
||||||
|
'keywords': cmd_keywords,
|
||||||
|
'report': cmd_report,
|
||||||
|
'import_media': cmd_import_media,
|
||||||
'status': cmd_status,
|
'status': cmd_status,
|
||||||
'help': cmd_help,
|
'help': cmd_help,
|
||||||
}
|
}
|
||||||
@@ -103,8 +152,19 @@ def cmd_export(app, args):
|
|||||||
"""Export all posts."""
|
"""Export all posts."""
|
||||||
if args.dry_run:
|
if args.dry_run:
|
||||||
print("Would export all posts from WordPress sites")
|
print("Would export all posts from WordPress sites")
|
||||||
|
if args.author:
|
||||||
|
print(f" Author filter: {args.author}")
|
||||||
|
if args.author_id:
|
||||||
|
print(f" Author ID filter: {args.author_id}")
|
||||||
return 0
|
return 0
|
||||||
app.export()
|
|
||||||
|
result = app.export(
|
||||||
|
author_filter=args.author,
|
||||||
|
author_ids=args.author_id,
|
||||||
|
site_filter=args.site
|
||||||
|
)
|
||||||
|
if result:
|
||||||
|
print(f"✅ Export completed! Output: {result}")
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
@@ -160,6 +220,8 @@ def cmd_category_apply(app, args):
|
|||||||
print("Would apply category proposals to WordPress")
|
print("Would apply category proposals to WordPress")
|
||||||
print(f" Site: {args.site}")
|
print(f" Site: {args.site}")
|
||||||
print(f" Confidence: {args.confidence}")
|
print(f" Confidence: {args.confidence}")
|
||||||
|
if args.strict:
|
||||||
|
print(f" Strict mode: Yes (exact match only)")
|
||||||
return 0
|
return 0
|
||||||
|
|
||||||
if not args.site:
|
if not args.site:
|
||||||
@@ -180,11 +242,14 @@ def cmd_category_apply(app, args):
|
|||||||
print(f"Applying categories from: {proposals_csv}")
|
print(f"Applying categories from: {proposals_csv}")
|
||||||
print(f"Site: {args.site}")
|
print(f"Site: {args.site}")
|
||||||
print(f"Confidence threshold: {args.confidence}")
|
print(f"Confidence threshold: {args.confidence}")
|
||||||
|
if args.strict:
|
||||||
|
print(f"Strict mode: Yes (exact match only)")
|
||||||
|
|
||||||
stats = app.category_apply(
|
stats = app.category_apply(
|
||||||
proposals_csv=proposals_csv,
|
proposals_csv=proposals_csv,
|
||||||
site_name=args.site,
|
site_name=args.site,
|
||||||
confidence=args.confidence,
|
confidence=args.confidence,
|
||||||
|
strict=args.strict,
|
||||||
dry_run=False
|
dry_run=False
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -253,6 +318,196 @@ def cmd_editorial_strategy(app, args):
|
|||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_migrate(app, args):
|
||||||
|
"""Migrate posts between websites."""
|
||||||
|
if args.dry_run:
|
||||||
|
print("Would migrate posts between websites")
|
||||||
|
if args.destination:
|
||||||
|
print(f" Destination: {args.destination}")
|
||||||
|
if args.source:
|
||||||
|
print(f" Source: {args.source}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
# Validate required arguments
|
||||||
|
if not args.destination:
|
||||||
|
print("❌ Destination site required. Use --destination mistergeek.net|webscroll.fr|hellogeek.net")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
delete_after = not args.keep_source
|
||||||
|
create_categories = not args.no_categories
|
||||||
|
create_tags = not args.no_tags
|
||||||
|
|
||||||
|
# Check if using filtered migration or CSV-based migration
|
||||||
|
if args.source:
|
||||||
|
# Filtered migration
|
||||||
|
print(f"Migrating posts from {args.source} to {args.destination}")
|
||||||
|
print(f"Post status: {args.post_status}")
|
||||||
|
print(f"Delete after migration: {delete_after}")
|
||||||
|
if args.category_filter:
|
||||||
|
print(f"Category filter: {args.category_filter}")
|
||||||
|
if args.tag_filter:
|
||||||
|
print(f"Tag filter: {args.tag_filter}")
|
||||||
|
if args.date_after:
|
||||||
|
print(f"Date after: {args.date_after}")
|
||||||
|
if args.date_before:
|
||||||
|
print(f"Date before: {args.date_before}")
|
||||||
|
if args.limit:
|
||||||
|
print(f"Limit: {args.limit}")
|
||||||
|
|
||||||
|
result = app.migrate_by_filter(
|
||||||
|
source_site=args.source,
|
||||||
|
destination_site=args.destination,
|
||||||
|
category_filter=args.category_filter,
|
||||||
|
tag_filter=args.tag_filter,
|
||||||
|
date_after=args.date_after,
|
||||||
|
date_before=args.date_before,
|
||||||
|
status_filter=None,
|
||||||
|
create_categories=create_categories,
|
||||||
|
create_tags=create_tags,
|
||||||
|
delete_after=delete_after,
|
||||||
|
status=args.post_status,
|
||||||
|
limit=args.limit,
|
||||||
|
ignore_original_date=args.ignore_original_date
|
||||||
|
)
|
||||||
|
|
||||||
|
if result:
|
||||||
|
print(f"\n✅ Migration completed!")
|
||||||
|
print(f" Report: {result}")
|
||||||
|
else:
|
||||||
|
# CSV-based migration
|
||||||
|
csv_file = args.args[0] if args.args else None
|
||||||
|
|
||||||
|
if not csv_file:
|
||||||
|
print("❌ CSV file required. Provide path to CSV with 'site' and 'post_id' columns")
|
||||||
|
print(" Usage: seo migrate <csv_file> --destination <site>")
|
||||||
|
print(" Or use filtered migration: seo migrate --source <site> --destination <site>")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
print(f"Migrating posts from CSV: {csv_file}")
|
||||||
|
print(f"Destination: {args.destination}")
|
||||||
|
print(f"Post status: {args.post_status}")
|
||||||
|
print(f"Delete after migration: {delete_after}")
|
||||||
|
|
||||||
|
result = app.migrate(
|
||||||
|
csv_file=csv_file,
|
||||||
|
destination_site=args.destination,
|
||||||
|
create_categories=create_categories,
|
||||||
|
create_tags=create_tags,
|
||||||
|
delete_after=delete_after,
|
||||||
|
status=args.post_status,
|
||||||
|
output_file=args.output,
|
||||||
|
ignore_original_date=args.ignore_original_date
|
||||||
|
)
|
||||||
|
|
||||||
|
if result:
|
||||||
|
print(f"\n✅ Migration completed!")
|
||||||
|
print(f" Report: {result}")
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_meta_description(app, args):
|
||||||
|
"""Generate AI-optimized meta descriptions."""
|
||||||
|
if args.dry_run:
|
||||||
|
print("Would generate AI-optimized meta descriptions")
|
||||||
|
if args.only_missing:
|
||||||
|
print(" Filter: Only posts without meta descriptions")
|
||||||
|
if args.only_poor:
|
||||||
|
print(" Filter: Only posts with poor quality meta descriptions")
|
||||||
|
if args.limit:
|
||||||
|
print(f" Limit: {args.limit} posts")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
csv_file = args.args[0] if args.args else None
|
||||||
|
|
||||||
|
print("Generating AI-optimized meta descriptions...")
|
||||||
|
if args.only_missing:
|
||||||
|
print(" Filter: Only posts without meta descriptions")
|
||||||
|
elif args.only_poor:
|
||||||
|
print(" Filter: Only posts with poor quality meta descriptions")
|
||||||
|
if args.limit:
|
||||||
|
print(f" Limit: {args.limit} posts")
|
||||||
|
|
||||||
|
output_file, summary = app.generate_meta_descriptions(
|
||||||
|
csv_file=csv_file,
|
||||||
|
output_file=args.output,
|
||||||
|
only_missing=args.only_missing,
|
||||||
|
only_poor_quality=args.only_poor,
|
||||||
|
limit=args.limit
|
||||||
|
)
|
||||||
|
|
||||||
|
if output_file and summary:
|
||||||
|
print(f"\n✅ Meta description generation completed!")
|
||||||
|
print(f" Results: {output_file}")
|
||||||
|
print(f"\n📊 Summary:")
|
||||||
|
print(f" Total processed: {summary.get('total_posts', 0)}")
|
||||||
|
print(f" Improved: {summary.get('improved', 0)} ({summary.get('improvement_rate', 0):.1f}%)")
|
||||||
|
print(f" Optimal length: {summary.get('optimal_length_count', 0)} ({summary.get('optimal_length_rate', 0):.1f}%)")
|
||||||
|
print(f" Average score: {summary.get('average_score', 0):.1f}")
|
||||||
|
print(f" API calls: {summary.get('api_calls', 0)}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_update_meta(app, args):
|
||||||
|
"""Fetch, generate, and update meta descriptions directly on WordPress."""
|
||||||
|
if args.dry_run:
|
||||||
|
print("Would update meta descriptions on WordPress")
|
||||||
|
if not args.site:
|
||||||
|
print(" ❌ Site is required. Use --site mistergeek.net|webscroll.fr|hellogeek.net")
|
||||||
|
return 1
|
||||||
|
print(f" Site: {args.site}")
|
||||||
|
if args.post_ids:
|
||||||
|
print(f" Post IDs: {args.post_ids}")
|
||||||
|
if args.category:
|
||||||
|
print(f" Categories: {args.category}")
|
||||||
|
if args.author:
|
||||||
|
print(f" Authors: {args.author}")
|
||||||
|
if args.limit:
|
||||||
|
print(f" Limit: {args.limit} posts")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
# Site is required
|
||||||
|
if not args.site:
|
||||||
|
print("❌ Site is required. Use --site mistergeek.net|webscroll.fr|hellogeek.net")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
print(f"Updating meta descriptions on {args.site}...")
|
||||||
|
if args.post_ids:
|
||||||
|
print(f" Post IDs: {args.post_ids}")
|
||||||
|
if args.category:
|
||||||
|
print(f" Categories: {args.category}")
|
||||||
|
if args.author:
|
||||||
|
print(f" Authors: {args.author}")
|
||||||
|
if args.category_id:
|
||||||
|
print(f" Category IDs: {args.category_id}")
|
||||||
|
if args.limit:
|
||||||
|
print(f" Limit: {args.limit} posts")
|
||||||
|
print(f" Skip existing: {not args.force}")
|
||||||
|
print(f" Dry run: {args.dry_run}")
|
||||||
|
|
||||||
|
stats = app.update_meta_descriptions(
|
||||||
|
site=args.site,
|
||||||
|
post_ids=args.post_ids,
|
||||||
|
category_names=args.category,
|
||||||
|
category_ids=args.category_id,
|
||||||
|
author_names=args.author,
|
||||||
|
limit=args.limit,
|
||||||
|
dry_run=args.dry_run,
|
||||||
|
skip_existing=not args.force,
|
||||||
|
force_regenerate=args.force
|
||||||
|
)
|
||||||
|
|
||||||
|
if stats:
|
||||||
|
print(f"\n✅ Meta description update completed!")
|
||||||
|
print(f"\n📊 Summary:")
|
||||||
|
print(f" Total posts: {stats.get('total_posts', 0)}")
|
||||||
|
print(f" Updated: {stats.get('updated', 0)}")
|
||||||
|
print(f" Failed: {stats.get('failed', 0)}")
|
||||||
|
print(f" Skipped: {stats.get('skipped', 0)}")
|
||||||
|
print(f" API calls: {stats.get('api_calls', 0)}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
def cmd_status(app, args):
|
def cmd_status(app, args):
|
||||||
"""Show status."""
|
"""Show status."""
|
||||||
if args.dry_run:
|
if args.dry_run:
|
||||||
@@ -272,6 +527,123 @@ def cmd_status(app, args):
|
|||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_performance(app, args):
|
||||||
|
"""Analyze page performance from GA4 and GSC data."""
|
||||||
|
if args.dry_run:
|
||||||
|
print("Would analyze page performance")
|
||||||
|
if args.ga4:
|
||||||
|
print(f" GA4 file: {args.ga4}")
|
||||||
|
if args.gsc:
|
||||||
|
print(f" GSC file: {args.gsc}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
print("Analyzing page performance...")
|
||||||
|
|
||||||
|
output_file, analysis = app.performance(
|
||||||
|
ga4_file=args.ga4,
|
||||||
|
gsc_file=args.gsc,
|
||||||
|
start_date=args.start_date,
|
||||||
|
end_date=args.end_date,
|
||||||
|
output_file=args.output
|
||||||
|
)
|
||||||
|
|
||||||
|
if output_file and analysis:
|
||||||
|
print(f"\n✅ Performance analysis completed!")
|
||||||
|
print(f" Results: {output_file}")
|
||||||
|
print(f"\n📊 Summary:")
|
||||||
|
summary = analysis.get('summary', {})
|
||||||
|
print(f" Total pages: {summary.get('total_pages', 0)}")
|
||||||
|
print(f" Total pageviews: {summary.get('total_pageviews', 0)}")
|
||||||
|
print(f" Total clicks: {summary.get('total_clicks', 0)}")
|
||||||
|
print(f" Average CTR: {summary.get('average_ctr', 0):.2%}")
|
||||||
|
print(f" Average position: {summary.get('average_position', 0):.1f}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_keywords(app, args):
|
||||||
|
"""Analyze keyword opportunities from GSC data."""
|
||||||
|
if args.dry_run:
|
||||||
|
print("Would analyze keyword opportunities")
|
||||||
|
if args.args:
|
||||||
|
print(f" GSC file: {args.args[0]}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
gsc_file = args.args[0] if args.args else None
|
||||||
|
|
||||||
|
if not gsc_file:
|
||||||
|
print("❌ GSC export file required")
|
||||||
|
print(" Usage: seo keywords <gsc_export.csv>")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
print(f"Analyzing keyword opportunities from {gsc_file}...")
|
||||||
|
|
||||||
|
opportunities = app.keywords(gsc_file=gsc_file, limit=args.limit or 50)
|
||||||
|
|
||||||
|
if opportunities:
|
||||||
|
print(f"\n✅ Found {len(opportunities)} keyword opportunities!")
|
||||||
|
print(f"\nTop opportunities:")
|
||||||
|
for i, kw in enumerate(opportunities[:10], 1):
|
||||||
|
print(f" {i}. {kw['query']} - Position: {kw['position']:.1f}, Impressions: {kw['impressions']}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_report(app, args):
|
||||||
|
"""Generate comprehensive SEO performance report."""
|
||||||
|
if args.dry_run:
|
||||||
|
print("Would generate SEO performance report")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
print("Generating SEO performance report...")
|
||||||
|
|
||||||
|
report_file = app.seo_report(output_file=args.output)
|
||||||
|
|
||||||
|
if report_file:
|
||||||
|
print(f"\n✅ Report generated!")
|
||||||
|
print(f" Report: {report_file}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
def cmd_import_media(app, args):
|
||||||
|
"""Import media from source to destination site for migrated posts."""
|
||||||
|
if args.dry_run:
|
||||||
|
print("Would import media")
|
||||||
|
print(f" Source: {args.from_site or 'mistergeek.net'}")
|
||||||
|
print(f" Destination: {args.to_site or 'hellogeek.net'}")
|
||||||
|
if args.args:
|
||||||
|
print(f" Migration report: {args.args[0]}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
migration_report = args.args[0] if args.args else None
|
||||||
|
|
||||||
|
if not migration_report:
|
||||||
|
print("❌ Migration report CSV required")
|
||||||
|
print(" Usage: seo import_media <migration_report.csv>")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
source_site = args.from_site or 'mistergeek.net'
|
||||||
|
dest_site = args.to_site or 'hellogeek.net'
|
||||||
|
|
||||||
|
print(f"Importing media from {source_site} to {dest_site}...")
|
||||||
|
print(f"Migration report: {migration_report}")
|
||||||
|
|
||||||
|
stats = app.import_media(
|
||||||
|
migration_report=migration_report,
|
||||||
|
source_site=source_site,
|
||||||
|
destination_site=dest_site,
|
||||||
|
dry_run=False
|
||||||
|
)
|
||||||
|
|
||||||
|
if stats:
|
||||||
|
print(f"\n✅ Media import completed!")
|
||||||
|
print(f"\n📊 Summary:")
|
||||||
|
print(f" Total posts: {stats.get('total_posts', 0)}")
|
||||||
|
print(f" Posts with media: {stats.get('posts_with_media', 0)}")
|
||||||
|
print(f" Images uploaded: {stats.get('images_uploaded', 0)}")
|
||||||
|
print(f" Featured images set: {stats.get('featured_images_set', 0)}")
|
||||||
|
print(f" Errors: {stats.get('errors', 0)}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
def cmd_help(app, args):
|
def cmd_help(app, args):
|
||||||
"""Show help."""
|
"""Show help."""
|
||||||
print("""
|
print("""
|
||||||
@@ -279,10 +651,18 @@ SEO Automation CLI - Available Commands
|
|||||||
|
|
||||||
Export & Analysis:
|
Export & Analysis:
|
||||||
export Export all posts from WordPress sites
|
export Export all posts from WordPress sites
|
||||||
|
export --author "John Doe" Export posts by specific author
|
||||||
|
export --author-id 1 2 Export posts by author IDs
|
||||||
|
export -s mistergeek.net Export from specific site only
|
||||||
analyze [csv_file] Analyze posts with AI
|
analyze [csv_file] Analyze posts with AI
|
||||||
analyze -f title Analyze specific fields (title, meta_description, categories, site)
|
analyze -f title Analyze specific fields (title, meta_description, categories, site)
|
||||||
analyze -u Update input CSV with new columns (creates backup)
|
analyze -u Update input CSV with new columns (creates backup)
|
||||||
category_propose [csv] Propose categories based on content
|
category_propose [csv] Propose categories based on content
|
||||||
|
meta_description [csv] Generate AI-optimized meta descriptions
|
||||||
|
meta_description --only-missing Generate only for posts without meta descriptions
|
||||||
|
update_meta --site <site> Fetch, generate, and update meta on WordPress
|
||||||
|
update_meta --site A --post-ids 1 2 3 Update specific posts
|
||||||
|
update_meta --site A --category "VPN" Update posts in category
|
||||||
|
|
||||||
Category Management:
|
Category Management:
|
||||||
category_apply [csv] Apply AI category proposals to WordPress
|
category_apply [csv] Apply AI category proposals to WordPress
|
||||||
@@ -293,11 +673,61 @@ Category Management:
|
|||||||
Strategy & Migration:
|
Strategy & Migration:
|
||||||
editorial_strategy [csv] Analyze editorial lines and recommend migrations
|
editorial_strategy [csv] Analyze editorial lines and recommend migrations
|
||||||
editorial_strategy Get migration recommendations between sites
|
editorial_strategy Get migration recommendations between sites
|
||||||
|
migrate <csv> --destination <site> Migrate posts from CSV to destination site
|
||||||
|
migrate --source <site> --destination <site> Migrate posts with filters
|
||||||
|
migrate --source A --to B --category-filter "VPN" Migrate specific categories
|
||||||
|
migrate --source A --to B --date-after 2024-01-01 --limit 10
|
||||||
|
|
||||||
Utility:
|
Utility:
|
||||||
status Show output files status
|
status Show output files status
|
||||||
|
performance [ga4.csv] [gsc.csv] Analyze page performance
|
||||||
|
performance --ga4 analytics.csv --gsc search.csv Analyze with both sources
|
||||||
|
keywords <gsc.csv> Show keyword opportunities
|
||||||
|
report Generate SEO performance report
|
||||||
|
import_media <report.csv> Import media for migrated posts
|
||||||
help Show this help message
|
help Show this help message
|
||||||
|
|
||||||
|
Export Options:
|
||||||
|
--author Filter by author name(s) (case-insensitive, partial match)
|
||||||
|
--author-id Filter by author ID(s)
|
||||||
|
--site, -s Export from specific site only
|
||||||
|
|
||||||
|
Meta Description Options:
|
||||||
|
--only-missing Only generate for posts without meta descriptions
|
||||||
|
--only-poor Only generate for posts with poor quality meta descriptions
|
||||||
|
--limit Limit number of posts to process
|
||||||
|
--output, -o Custom output file path
|
||||||
|
|
||||||
|
Update Meta Options:
|
||||||
|
--site, -s WordPress site (REQUIRED): mistergeek.net, webscroll.fr, hellogeek.net
|
||||||
|
--post-ids Specific post IDs to update
|
||||||
|
--category Filter by category name(s)
|
||||||
|
--category-id Filter by category ID(s)
|
||||||
|
--author Filter by author name(s)
|
||||||
|
--force Force regenerate even for good quality meta descriptions
|
||||||
|
|
||||||
|
Performance Options:
|
||||||
|
--ga4 Path to Google Analytics 4 export CSV
|
||||||
|
--gsc Path to Google Search Console export CSV
|
||||||
|
--start-date Start date YYYY-MM-DD (for API mode)
|
||||||
|
--end-date End date YYYY-MM-DD (for API mode)
|
||||||
|
--limit Limit number of results
|
||||||
|
|
||||||
|
Migration Options:
|
||||||
|
--destination, --to Destination site: mistergeek.net, webscroll.fr, hellogeek.net
|
||||||
|
--source, --from Source site for filtered migration
|
||||||
|
--keep-source Keep posts on source site (default: delete after migration)
|
||||||
|
--post-status Status for migrated posts: draft, publish, pending (default: draft)
|
||||||
|
--no-categories Do not create categories automatically
|
||||||
|
--no-tags Do not create tags automatically
|
||||||
|
--category-filter Filter by category names (for filtered migration)
|
||||||
|
--tag-filter Filter by tag names (for filtered migration)
|
||||||
|
--date-after Migrate posts after this date (YYYY-MM-DD)
|
||||||
|
--date-before Migrate posts before this date (YYYY-MM-DD)
|
||||||
|
--limit Limit number of posts to migrate
|
||||||
|
--ignore-original-date Use current date instead of original post date
|
||||||
|
--output, -o Custom output file path for migration report
|
||||||
|
|
||||||
Options:
|
Options:
|
||||||
--verbose, -v Enable verbose logging
|
--verbose, -v Enable verbose logging
|
||||||
--dry-run Show what would be done without doing it
|
--dry-run Show what would be done without doing it
|
||||||
@@ -307,14 +737,32 @@ Options:
|
|||||||
--confidence, -c Confidence threshold: High, Medium, Low
|
--confidence, -c Confidence threshold: High, Medium, Low
|
||||||
--site, -s WordPress site: mistergeek.net, webscroll.fr, hellogeek.net
|
--site, -s WordPress site: mistergeek.net, webscroll.fr, hellogeek.net
|
||||||
--description, -d Category description
|
--description, -d Category description
|
||||||
|
--strict Strict confidence matching (exact match only, not "or better")
|
||||||
|
|
||||||
Examples:
|
Examples:
|
||||||
seo export
|
seo export
|
||||||
|
seo export --author "John Doe"
|
||||||
|
seo export --author-id 1 2
|
||||||
|
seo export -s mistergeek.net --author "admin"
|
||||||
seo analyze -f title categories
|
seo analyze -f title categories
|
||||||
seo category_propose
|
seo category_propose
|
||||||
seo category_apply -s mistergeek.net -c Medium
|
seo category_apply -s mistergeek.net -c Medium
|
||||||
seo category_create -s webscroll.fr "Torrent Clients"
|
seo category_create -s webscroll.fr "Torrent Clients"
|
||||||
seo editorial_strategy
|
seo editorial_strategy
|
||||||
|
seo migrate posts_to_migrate.csv --destination mistergeek.net
|
||||||
|
seo migrate --source webscroll.fr --destination mistergeek.net --category-filter VPN
|
||||||
|
seo migrate --source A --to B --date-after 2024-01-01 --limit 10 --keep-source
|
||||||
|
seo meta_description # Generate for all posts
|
||||||
|
seo meta_description --only-missing # Generate only for posts without meta
|
||||||
|
seo meta_description --only-poor --limit 10 # Fix 10 poor quality metas
|
||||||
|
seo update_meta --site mistergeek.net # Update all posts on site
|
||||||
|
seo update_meta --site A --post-ids 1 2 3 # Update specific posts
|
||||||
|
seo update_meta --site A --category "VPN" --limit 10 # Update 10 posts in category
|
||||||
|
seo update_meta --site A --author "john" --limit 10 # Update 10 posts by author
|
||||||
|
seo update_meta --site A --dry-run # Preview changes
|
||||||
|
seo performance --ga4 analytics.csv --gsc search.csv # Analyze performance
|
||||||
|
seo keywords gsc_export.csv # Show keyword opportunities
|
||||||
|
seo report # Generate SEO report
|
||||||
seo status
|
seo status
|
||||||
""")
|
""")
|
||||||
return 0
|
return 0
|
||||||
|
|||||||
@@ -20,11 +20,21 @@ logger = logging.getLogger(__name__)
|
|||||||
class PostExporter:
|
class PostExporter:
|
||||||
"""Export posts from WordPress sites to CSV."""
|
"""Export posts from WordPress sites to CSV."""
|
||||||
|
|
||||||
def __init__(self):
|
def __init__(self, author_filter: Optional[List[str]] = None,
|
||||||
"""Initialize the exporter."""
|
author_ids: Optional[List[int]] = None):
|
||||||
|
"""
|
||||||
|
Initialize the exporter.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
author_filter: List of author names to filter by (case-insensitive)
|
||||||
|
author_ids: List of author IDs to filter by
|
||||||
|
"""
|
||||||
self.sites = Config.WORDPRESS_SITES
|
self.sites = Config.WORDPRESS_SITES
|
||||||
self.all_posts = []
|
self.all_posts = []
|
||||||
self.category_cache = {}
|
self.category_cache = {}
|
||||||
|
self.author_filter = author_filter
|
||||||
|
self.author_ids = author_ids
|
||||||
|
self.author_cache = {} # Cache author info by site
|
||||||
|
|
||||||
def fetch_category_names(self, site_name: str, site_config: Dict) -> Dict[int, Dict]:
|
def fetch_category_names(self, site_name: str, site_config: Dict) -> Dict[int, Dict]:
|
||||||
"""Fetch category names from a WordPress site."""
|
"""Fetch category names from a WordPress site."""
|
||||||
@@ -50,8 +60,55 @@ class PostExporter:
|
|||||||
self.category_cache[site_name] = categories
|
self.category_cache[site_name] = categories
|
||||||
return categories
|
return categories
|
||||||
|
|
||||||
def fetch_posts_from_site(self, site_name: str, site_config: Dict) -> List[Dict]:
|
def fetch_authors(self, site_name: str, site_config: Dict) -> Dict[int, Dict]:
|
||||||
"""Fetch all posts from a WordPress site."""
|
"""
|
||||||
|
Fetch all authors/users from a WordPress site.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dict mapping author ID to author data (name, slug)
|
||||||
|
"""
|
||||||
|
if site_name in self.author_cache:
|
||||||
|
return self.author_cache[site_name]
|
||||||
|
|
||||||
|
logger.info(f" Fetching authors from {site_name}...")
|
||||||
|
authors = {}
|
||||||
|
base_url = site_config['url'].rstrip('/')
|
||||||
|
api_url = f"{base_url}/wp-json/wp/v2/users"
|
||||||
|
auth = HTTPBasicAuth(site_config['username'], site_config['password'])
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = requests.get(api_url, params={'per_page': 100}, auth=auth, timeout=10)
|
||||||
|
response.raise_for_status()
|
||||||
|
|
||||||
|
for user in response.json():
|
||||||
|
authors[user['id']] = {
|
||||||
|
'id': user['id'],
|
||||||
|
'name': user.get('name', ''),
|
||||||
|
'slug': user.get('slug', ''),
|
||||||
|
'description': user.get('description', '')
|
||||||
|
}
|
||||||
|
logger.info(f" ✓ Fetched {len(authors)} authors")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f" Could not fetch authors from {site_name}: {e}")
|
||||||
|
# Fallback: create empty dict if authors can't be fetched
|
||||||
|
# Author IDs will still be exported, just without names
|
||||||
|
|
||||||
|
self.author_cache[site_name] = authors
|
||||||
|
return authors
|
||||||
|
|
||||||
|
def fetch_posts_from_site(self, site_name: str, site_config: Dict,
|
||||||
|
authors_map: Optional[Dict[int, Dict]] = None) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Fetch all posts from a WordPress site.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
site_name: Site name
|
||||||
|
site_config: Site configuration
|
||||||
|
authors_map: Optional authors mapping for filtering
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of post data
|
||||||
|
"""
|
||||||
logger.info(f"\nFetching posts from {site_name}...")
|
logger.info(f"\nFetching posts from {site_name}...")
|
||||||
|
|
||||||
posts = []
|
posts = []
|
||||||
@@ -59,14 +116,23 @@ class PostExporter:
|
|||||||
api_url = f"{base_url}/wp-json/wp/v2/posts"
|
api_url = f"{base_url}/wp-json/wp/v2/posts"
|
||||||
auth = HTTPBasicAuth(site_config['username'], site_config['password'])
|
auth = HTTPBasicAuth(site_config['username'], site_config['password'])
|
||||||
|
|
||||||
|
# Build base params
|
||||||
|
base_params = {'page': 1, 'per_page': 100, '_embed': True}
|
||||||
|
|
||||||
|
# Add author filter if specified
|
||||||
|
if self.author_ids:
|
||||||
|
base_params['author'] = ','.join(map(str, self.author_ids))
|
||||||
|
logger.info(f" Filtering by author IDs: {self.author_ids}")
|
||||||
|
|
||||||
for status in ['publish', 'draft']:
|
for status in ['publish', 'draft']:
|
||||||
page = 1
|
page = 1
|
||||||
while True:
|
while True:
|
||||||
try:
|
try:
|
||||||
|
params = {**base_params, 'page': page, 'status': status}
|
||||||
logger.info(f" Fetching page {page} ({status} posts)...")
|
logger.info(f" Fetching page {page} ({status} posts)...")
|
||||||
response = requests.get(
|
response = requests.get(
|
||||||
api_url,
|
api_url,
|
||||||
params={'page': page, 'per_page': 100, 'status': status},
|
params=params,
|
||||||
auth=auth,
|
auth=auth,
|
||||||
timeout=10
|
timeout=10
|
||||||
)
|
)
|
||||||
@@ -76,7 +142,28 @@ class PostExporter:
|
|||||||
if not page_posts:
|
if not page_posts:
|
||||||
break
|
break
|
||||||
|
|
||||||
|
# Filter by author name if specified
|
||||||
|
if self.author_filter and authors_map:
|
||||||
|
filtered_posts = []
|
||||||
|
for post in page_posts:
|
||||||
|
author_id = post.get('author')
|
||||||
|
if author_id and author_id in authors_map:
|
||||||
|
author_name = authors_map[author_id]['name'].lower()
|
||||||
|
author_slug = authors_map[author_id]['slug'].lower()
|
||||||
|
|
||||||
|
# Check if author matches filter
|
||||||
|
for filter_name in self.author_filter:
|
||||||
|
filter_lower = filter_name.lower()
|
||||||
|
if (filter_lower in author_name or
|
||||||
|
filter_lower == author_slug):
|
||||||
|
filtered_posts.append(post)
|
||||||
|
break
|
||||||
|
|
||||||
|
page_posts = filtered_posts
|
||||||
|
logger.info(f" ✓ Got {len(page_posts)} posts after author filter")
|
||||||
|
|
||||||
posts.extend(page_posts)
|
posts.extend(page_posts)
|
||||||
|
if page_posts:
|
||||||
logger.info(f" ✓ Got {len(page_posts)} posts")
|
logger.info(f" ✓ Got {len(page_posts)} posts")
|
||||||
|
|
||||||
page += 1
|
page += 1
|
||||||
@@ -94,7 +181,8 @@ class PostExporter:
|
|||||||
logger.info(f"✓ Total posts from {site_name}: {len(posts)}\n")
|
logger.info(f"✓ Total posts from {site_name}: {len(posts)}\n")
|
||||||
return posts
|
return posts
|
||||||
|
|
||||||
def extract_post_details(self, post: Dict, site_name: str, category_map: Dict) -> Dict:
|
def extract_post_details(self, post: Dict, site_name: str, category_map: Dict,
|
||||||
|
author_map: Optional[Dict[int, Dict]] = None) -> Dict:
|
||||||
"""Extract post details for CSV export."""
|
"""Extract post details for CSV export."""
|
||||||
title = post.get('title', {})
|
title = post.get('title', {})
|
||||||
if isinstance(title, dict):
|
if isinstance(title, dict):
|
||||||
@@ -122,6 +210,13 @@ class PostExporter:
|
|||||||
for cat_id in category_ids
|
for cat_id in category_ids
|
||||||
]) if category_ids else ''
|
]) if category_ids else ''
|
||||||
|
|
||||||
|
# Get author name from author map
|
||||||
|
author_id = post.get('author', '')
|
||||||
|
author_name = ''
|
||||||
|
if author_map and author_id:
|
||||||
|
author_data = author_map.get(author_id, {})
|
||||||
|
author_name = author_data.get('name', '')
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'site': site_name,
|
'site': site_name,
|
||||||
'post_id': post['id'],
|
'post_id': post['id'],
|
||||||
@@ -129,7 +224,8 @@ class PostExporter:
|
|||||||
'title': title.strip(),
|
'title': title.strip(),
|
||||||
'slug': post.get('slug', ''),
|
'slug': post.get('slug', ''),
|
||||||
'url': post.get('link', ''),
|
'url': post.get('link', ''),
|
||||||
'author_id': post.get('author', ''),
|
'author_id': author_id,
|
||||||
|
'author_name': author_name,
|
||||||
'date_published': post.get('date', ''),
|
'date_published': post.get('date', ''),
|
||||||
'date_modified': post.get('modified', ''),
|
'date_modified': post.get('modified', ''),
|
||||||
'categories': category_names,
|
'categories': category_names,
|
||||||
@@ -158,7 +254,7 @@ class PostExporter:
|
|||||||
return ""
|
return ""
|
||||||
|
|
||||||
fieldnames = [
|
fieldnames = [
|
||||||
'site', 'post_id', 'status', 'title', 'slug', 'url', 'author_id',
|
'site', 'post_id', 'status', 'title', 'slug', 'url', 'author_id', 'author_name',
|
||||||
'date_published', 'date_modified', 'categories', 'tags', 'excerpt',
|
'date_published', 'date_modified', 'categories', 'tags', 'excerpt',
|
||||||
'content_preview', 'seo_title', 'meta_description', 'focus_keyword', 'word_count',
|
'content_preview', 'seo_title', 'meta_description', 'focus_keyword', 'word_count',
|
||||||
]
|
]
|
||||||
@@ -173,24 +269,46 @@ class PostExporter:
|
|||||||
logger.info(f"✓ CSV exported to: {output_file}")
|
logger.info(f"✓ CSV exported to: {output_file}")
|
||||||
return str(output_file)
|
return str(output_file)
|
||||||
|
|
||||||
def run(self) -> str:
|
def run(self, site_filter: Optional[str] = None) -> str:
|
||||||
"""Run the complete export process."""
|
"""
|
||||||
|
Run the complete export process.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
site_filter: Optional site name to export from (default: all sites)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Path to exported CSV file
|
||||||
|
"""
|
||||||
logger.info("="*70)
|
logger.info("="*70)
|
||||||
logger.info("EXPORTING ALL POSTS")
|
logger.info("EXPORTING ALL POSTS")
|
||||||
logger.info("="*70)
|
logger.info("="*70)
|
||||||
|
|
||||||
|
if self.author_filter:
|
||||||
|
logger.info(f"Author filter: {self.author_filter}")
|
||||||
|
if self.author_ids:
|
||||||
|
logger.info(f"Author IDs: {self.author_ids}")
|
||||||
|
if site_filter:
|
||||||
|
logger.info(f"Site filter: {site_filter}")
|
||||||
|
|
||||||
logger.info("Sites configured: " + ", ".join(self.sites.keys()))
|
logger.info("Sites configured: " + ", ".join(self.sites.keys()))
|
||||||
|
|
||||||
for site_name, config in self.sites.items():
|
for site_name, config in self.sites.items():
|
||||||
|
# Skip sites if filter is specified
|
||||||
|
if site_filter and site_name != site_filter:
|
||||||
|
logger.info(f"Skipping {site_name} (not in filter)")
|
||||||
|
continue
|
||||||
|
|
||||||
categories = self.fetch_category_names(site_name, config)
|
categories = self.fetch_category_names(site_name, config)
|
||||||
posts = self.fetch_posts_from_site(site_name, config)
|
authors = self.fetch_authors(site_name, config)
|
||||||
|
posts = self.fetch_posts_from_site(site_name, config, authors)
|
||||||
|
|
||||||
if posts:
|
if posts:
|
||||||
for post in posts:
|
for post in posts:
|
||||||
post_details = self.extract_post_details(post, site_name, categories)
|
post_details = self.extract_post_details(post, site_name, categories, authors)
|
||||||
self.all_posts.append(post_details)
|
self.all_posts.append(post_details)
|
||||||
|
|
||||||
if not self.all_posts:
|
if not self.all_posts:
|
||||||
logger.error("No posts found on any site")
|
logger.warning("No posts found matching criteria")
|
||||||
return ""
|
return ""
|
||||||
|
|
||||||
self.all_posts.sort(key=lambda x: (x['site'], x['post_id']))
|
self.all_posts.sort(key=lambda x: (x['site'], x['post_id']))
|
||||||
|
|||||||
467
src/seo/media_importer.py
Normal file
467
src/seo/media_importer.py
Normal file
@@ -0,0 +1,467 @@
|
|||||||
|
"""
|
||||||
|
Media Importer - Import media from one WordPress site to another
|
||||||
|
Specifically designed for migrated posts
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import tempfile
|
||||||
|
import requests
|
||||||
|
from requests.auth import HTTPBasicAuth
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Dict, List, Optional, Tuple
|
||||||
|
import csv
|
||||||
|
|
||||||
|
from .config import Config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class WordPressMediaImporter:
|
||||||
|
"""Import media from source WordPress site to destination site."""
|
||||||
|
|
||||||
|
def __init__(self, source_site: str = 'mistergeek.net',
|
||||||
|
destination_site: str = 'hellogeek.net'):
|
||||||
|
"""
|
||||||
|
Initialize media importer.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
source_site: Source site name
|
||||||
|
destination_site: Destination site name
|
||||||
|
"""
|
||||||
|
self.source_site = source_site
|
||||||
|
self.destination_site = destination_site
|
||||||
|
self.sites = Config.WORDPRESS_SITES
|
||||||
|
|
||||||
|
# Validate sites
|
||||||
|
if source_site not in self.sites:
|
||||||
|
raise ValueError(f"Source site '{source_site}' not found")
|
||||||
|
if destination_site not in self.sites:
|
||||||
|
raise ValueError(f"Destination site '{destination_site}' not found")
|
||||||
|
|
||||||
|
# Setup source
|
||||||
|
self.source_config = self.sites[source_site]
|
||||||
|
self.source_url = self.source_config['url'].rstrip('/')
|
||||||
|
self.source_auth = HTTPBasicAuth(
|
||||||
|
self.source_config['username'],
|
||||||
|
self.source_config['password']
|
||||||
|
)
|
||||||
|
|
||||||
|
# Setup destination
|
||||||
|
self.dest_config = self.sites[destination_site]
|
||||||
|
self.dest_url = self.dest_config['url'].rstrip('/')
|
||||||
|
self.dest_auth = HTTPBasicAuth(
|
||||||
|
self.dest_config['username'],
|
||||||
|
self.dest_config['password']
|
||||||
|
)
|
||||||
|
|
||||||
|
self.media_cache = {} # Cache source media ID -> dest media ID
|
||||||
|
self.stats = {
|
||||||
|
'total_posts': 0,
|
||||||
|
'posts_with_media': 0,
|
||||||
|
'images_downloaded': 0,
|
||||||
|
'images_uploaded': 0,
|
||||||
|
'featured_images_set': 0,
|
||||||
|
'errors': 0
|
||||||
|
}
|
||||||
|
|
||||||
|
def fetch_migrated_posts(self, post_ids: Optional[List[int]] = None) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Fetch posts that need media imported.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
post_ids: Specific post IDs to process
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of post dicts
|
||||||
|
"""
|
||||||
|
logger.info(f"Fetching posts from {self.destination_site}...")
|
||||||
|
|
||||||
|
if post_ids:
|
||||||
|
# Fetch specific posts
|
||||||
|
posts = []
|
||||||
|
for post_id in post_ids:
|
||||||
|
try:
|
||||||
|
response = requests.get(
|
||||||
|
f"{self.dest_url}/wp-json/wp/v2/posts/{post_id}",
|
||||||
|
auth=self.dest_auth,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
if response.status_code == 200:
|
||||||
|
posts.append(response.json())
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error fetching post {post_id}: {e}")
|
||||||
|
return posts
|
||||||
|
else:
|
||||||
|
# Fetch recent posts (assuming migrated posts are recent)
|
||||||
|
try:
|
||||||
|
response = requests.get(
|
||||||
|
f"{self.dest_url}/wp-json/wp/v2/posts",
|
||||||
|
params={
|
||||||
|
'per_page': 100,
|
||||||
|
'status': 'publish,draft',
|
||||||
|
'_embed': True
|
||||||
|
},
|
||||||
|
auth=self.dest_auth,
|
||||||
|
timeout=30
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
return response.json()
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error fetching posts: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def get_source_post(self, post_id: int) -> Optional[Dict]:
|
||||||
|
"""
|
||||||
|
Fetch corresponding post from source site.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
post_id: Post ID on source site
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Post dict or None
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
response = requests.get(
|
||||||
|
f"{self.source_url}/wp-json/wp/v2/posts/{post_id}",
|
||||||
|
auth=self.source_auth,
|
||||||
|
timeout=10,
|
||||||
|
params={'_embed': True}
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
return response.json()
|
||||||
|
else:
|
||||||
|
logger.warning(f"Source post {post_id} not found")
|
||||||
|
return None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error fetching source post {post_id}: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def download_media(self, media_url: str) -> Optional[bytes]:
|
||||||
|
"""
|
||||||
|
Download media file from source site.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
media_url: URL of media file
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
File content bytes or None
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
response = requests.get(media_url, timeout=30)
|
||||||
|
response.raise_for_status()
|
||||||
|
return response.content
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error downloading {media_url}: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def upload_media(self, file_content: bytes, filename: str,
|
||||||
|
mime_type: str = 'image/jpeg',
|
||||||
|
alt_text: str = '',
|
||||||
|
caption: str = '') -> Optional[int]:
|
||||||
|
"""
|
||||||
|
Upload media to destination site.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
file_content: File content bytes
|
||||||
|
filename: Filename for the media
|
||||||
|
mime_type: MIME type of the file
|
||||||
|
alt_text: Alt text for the image
|
||||||
|
caption: Caption for the image
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Media ID on destination site or None
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Upload file
|
||||||
|
files = {'file': (filename, file_content, mime_type)}
|
||||||
|
|
||||||
|
response = requests.post(
|
||||||
|
f"{self.dest_url}/wp-json/wp/v2/media",
|
||||||
|
files=files,
|
||||||
|
auth=self.dest_auth,
|
||||||
|
headers={
|
||||||
|
'Content-Disposition': f'attachment; filename={filename}',
|
||||||
|
'Content-Type': mime_type
|
||||||
|
},
|
||||||
|
timeout=30
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 201:
|
||||||
|
media_data = response.json()
|
||||||
|
media_id = media_data['id']
|
||||||
|
|
||||||
|
# Update alt text and caption
|
||||||
|
if alt_text or caption:
|
||||||
|
meta_update = {}
|
||||||
|
if alt_text:
|
||||||
|
meta_update['_wp_attachment_image_alt'] = alt_text
|
||||||
|
if caption:
|
||||||
|
meta_update['excerpt'] = caption
|
||||||
|
|
||||||
|
requests.post(
|
||||||
|
f"{self.dest_url}/wp-json/wp/v2/media/{media_id}",
|
||||||
|
json=meta_update,
|
||||||
|
auth=self.dest_auth,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(f"✓ Uploaded {filename} (ID: {media_id})")
|
||||||
|
return media_id
|
||||||
|
else:
|
||||||
|
logger.error(f"Error uploading {filename}: {response.status_code}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error uploading {filename}: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def import_featured_image(self, source_post: Dict, dest_post_id: int) -> bool:
|
||||||
|
"""
|
||||||
|
Import featured image from source post to destination post.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
source_post: Source post dict
|
||||||
|
dest_post_id: Destination post ID
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful
|
||||||
|
"""
|
||||||
|
# Check if source has featured image
|
||||||
|
featured_media_id = source_post.get('featured_media')
|
||||||
|
if not featured_media_id:
|
||||||
|
logger.info(f" No featured image on source post")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check if already imported
|
||||||
|
if featured_media_id in self.media_cache:
|
||||||
|
dest_media_id = self.media_cache[featured_media_id]
|
||||||
|
logger.info(f" Using cached media ID: {dest_media_id}")
|
||||||
|
else:
|
||||||
|
# Fetch media details from source
|
||||||
|
try:
|
||||||
|
media_response = requests.get(
|
||||||
|
f"{self.source_url}/wp-json/wp/v2/media/{featured_media_id}",
|
||||||
|
auth=self.source_auth,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
|
||||||
|
if media_response.status_code != 200:
|
||||||
|
logger.error(f"Could not fetch media {featured_media_id}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
media_data = media_response.json()
|
||||||
|
|
||||||
|
# Download media file
|
||||||
|
media_url = media_data.get('source_url', '')
|
||||||
|
if not media_url:
|
||||||
|
# Try alternative URL structure
|
||||||
|
media_url = media_data.get('guid', {}).get('rendered', '')
|
||||||
|
|
||||||
|
file_content = self.download_media(media_url)
|
||||||
|
if not file_content:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Extract filename and mime type
|
||||||
|
filename = media_data.get('slug', 'image.jpg') + '.jpg'
|
||||||
|
mime_type = media_data.get('mime_type', 'image/jpeg')
|
||||||
|
alt_text = media_data.get('alt_text', '')
|
||||||
|
caption = media_data.get('caption', {}).get('rendered', '')
|
||||||
|
|
||||||
|
# Upload to destination
|
||||||
|
dest_media_id = self.upload_media(
|
||||||
|
file_content, filename, mime_type, alt_text, caption
|
||||||
|
)
|
||||||
|
|
||||||
|
if not dest_media_id:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Cache the mapping
|
||||||
|
self.media_cache[featured_media_id] = dest_media_id
|
||||||
|
self.stats['images_uploaded'] += 1
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error importing featured image: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Set featured image on destination post
|
||||||
|
try:
|
||||||
|
response = requests.post(
|
||||||
|
f"{self.dest_url}/wp-json/wp/v2/posts/{dest_post_id}",
|
||||||
|
json={'featured_media': dest_media_id},
|
||||||
|
auth=self.dest_auth,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
logger.info(f"✓ Set featured image on post {dest_post_id}")
|
||||||
|
self.stats['featured_images_set'] += 1
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
logger.error(f"Error setting featured image: {response.status_code}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error setting featured image: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def import_post_media(self, source_post: Dict, dest_post_id: int) -> int:
|
||||||
|
"""
|
||||||
|
Import all media from a post (featured image + inline images).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
source_post: Source post dict
|
||||||
|
dest_post_id: Destination post ID
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Number of images imported
|
||||||
|
"""
|
||||||
|
images_imported = 0
|
||||||
|
|
||||||
|
# Import featured image
|
||||||
|
if self.import_featured_image(source_post, dest_post_id):
|
||||||
|
images_imported += 1
|
||||||
|
|
||||||
|
# TODO: Import inline images from content
|
||||||
|
# This would require parsing the content for <img> tags
|
||||||
|
# and replacing source URLs with destination URLs
|
||||||
|
|
||||||
|
return images_imported
|
||||||
|
|
||||||
|
def process_posts(self, post_mappings: List[Tuple[int, int]],
|
||||||
|
dry_run: bool = False) -> Dict:
|
||||||
|
"""
|
||||||
|
Process media import for mapped posts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
post_mappings: List of (source_post_id, dest_post_id) tuples
|
||||||
|
dry_run: If True, preview without importing
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Statistics dict
|
||||||
|
"""
|
||||||
|
logger.info("\n" + "="*70)
|
||||||
|
logger.info("MEDIA IMPORTER")
|
||||||
|
logger.info("="*70)
|
||||||
|
logger.info(f"Source: {self.source_site}")
|
||||||
|
logger.info(f"Destination: {self.destination_site}")
|
||||||
|
logger.info(f"Posts to process: {len(post_mappings)}")
|
||||||
|
logger.info(f"Dry run: {dry_run}")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
self.stats['total_posts'] = len(post_mappings)
|
||||||
|
|
||||||
|
for i, (source_id, dest_id) in enumerate(post_mappings, 1):
|
||||||
|
logger.info(f"\n[{i}/{len(post_mappings)}] Processing post mapping:")
|
||||||
|
logger.info(f" Source: {source_id} → Destination: {dest_id}")
|
||||||
|
|
||||||
|
# Fetch source post
|
||||||
|
source_post = self.get_source_post(source_id)
|
||||||
|
if not source_post:
|
||||||
|
logger.warning(f" Skipping: Source post not found")
|
||||||
|
self.stats['errors'] += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Check if source has media
|
||||||
|
if not source_post.get('featured_media'):
|
||||||
|
logger.info(f" No featured image to import")
|
||||||
|
continue
|
||||||
|
|
||||||
|
self.stats['posts_with_media'] += 1
|
||||||
|
|
||||||
|
if dry_run:
|
||||||
|
logger.info(f" [DRY RUN] Would import featured image")
|
||||||
|
self.stats['images_downloaded'] += 1
|
||||||
|
self.stats['images_uploaded'] += 1
|
||||||
|
self.stats['featured_images_set'] += 1
|
||||||
|
else:
|
||||||
|
# Import media
|
||||||
|
imported = self.import_post_media(source_post, dest_id)
|
||||||
|
if imported > 0:
|
||||||
|
self.stats['images_downloaded'] += imported
|
||||||
|
|
||||||
|
# Print summary
|
||||||
|
logger.info("\n" + "="*70)
|
||||||
|
logger.info("IMPORT SUMMARY")
|
||||||
|
logger.info("="*70)
|
||||||
|
logger.info(f"Total posts: {self.stats['total_posts']}")
|
||||||
|
logger.info(f"Posts with media: {self.stats['posts_with_media']}")
|
||||||
|
logger.info(f"Images downloaded: {self.stats['images_downloaded']}")
|
||||||
|
logger.info(f"Images uploaded: {self.stats['images_uploaded']}")
|
||||||
|
logger.info(f"Featured images set: {self.stats['featured_images_set']}")
|
||||||
|
logger.info(f"Errors: {self.stats['errors']}")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
return self.stats
|
||||||
|
|
||||||
|
def run_from_csv(self, csv_file: str, dry_run: bool = False) -> Dict:
|
||||||
|
"""
|
||||||
|
Import media for posts listed in CSV file.
|
||||||
|
|
||||||
|
CSV should have columns: source_post_id, destination_post_id
|
||||||
|
|
||||||
|
Args:
|
||||||
|
csv_file: Path to CSV file with post mappings
|
||||||
|
dry_run: If True, preview without importing
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Statistics dict
|
||||||
|
"""
|
||||||
|
logger.info(f"Loading post mappings from: {csv_file}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(csv_file, 'r', encoding='utf-8') as f:
|
||||||
|
reader = csv.DictReader(f)
|
||||||
|
mappings = []
|
||||||
|
|
||||||
|
for row in reader:
|
||||||
|
source_id = int(row.get('source_post_id', 0))
|
||||||
|
dest_id = int(row.get('destination_post_id', 0))
|
||||||
|
|
||||||
|
if source_id and dest_id:
|
||||||
|
mappings.append((source_id, dest_id))
|
||||||
|
|
||||||
|
logger.info(f"✓ Loaded {len(mappings)} post mappings")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error loading CSV: {e}")
|
||||||
|
return self.stats
|
||||||
|
|
||||||
|
return self.process_posts(mappings, dry_run=dry_run)
|
||||||
|
|
||||||
|
def run_from_migration_report(self, report_file: str,
|
||||||
|
dry_run: bool = False) -> Dict:
|
||||||
|
"""
|
||||||
|
Import media using migration report CSV.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
report_file: Path to migration report CSV
|
||||||
|
dry_run: If True, preview without importing
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Statistics dict
|
||||||
|
"""
|
||||||
|
logger.info(f"Loading migration report: {report_file}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(report_file, 'r', encoding='utf-8') as f:
|
||||||
|
reader = csv.DictReader(f)
|
||||||
|
mappings = []
|
||||||
|
|
||||||
|
for row in reader:
|
||||||
|
source_id = int(row.get('source_post_id', 0))
|
||||||
|
dest_id = int(row.get('destination_post_id', 0))
|
||||||
|
|
||||||
|
if source_id and dest_id:
|
||||||
|
mappings.append((source_id, dest_id))
|
||||||
|
|
||||||
|
logger.info(f"✓ Loaded {len(mappings)} post mappings from migration report")
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error loading migration report: {e}")
|
||||||
|
return self.stats
|
||||||
|
|
||||||
|
return self.process_posts(mappings, dry_run=dry_run)
|
||||||
482
src/seo/meta_description_generator.py
Normal file
482
src/seo/meta_description_generator.py
Normal file
@@ -0,0 +1,482 @@
|
|||||||
|
"""
|
||||||
|
Meta Description Generator - AI-powered meta description generation and optimization
|
||||||
|
"""
|
||||||
|
|
||||||
|
import csv
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Dict, List, Optional, Tuple
|
||||||
|
import requests
|
||||||
|
|
||||||
|
from .config import Config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class MetaDescriptionGenerator:
|
||||||
|
"""AI-powered meta description generator and optimizer."""
|
||||||
|
|
||||||
|
def __init__(self, csv_file: str):
|
||||||
|
"""
|
||||||
|
Initialize the generator.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
csv_file: Path to CSV file with posts
|
||||||
|
"""
|
||||||
|
self.csv_file = Path(csv_file)
|
||||||
|
self.openrouter_api_key = Config.OPENROUTER_API_KEY
|
||||||
|
self.ai_model = Config.AI_MODEL
|
||||||
|
self.posts = []
|
||||||
|
self.generated_results = []
|
||||||
|
self.api_calls = 0
|
||||||
|
self.ai_cost = 0.0
|
||||||
|
|
||||||
|
# Meta description best practices
|
||||||
|
self.max_length = 160 # Optimal length for SEO
|
||||||
|
self.min_length = 120
|
||||||
|
self.include_keywords = True
|
||||||
|
|
||||||
|
def load_csv(self) -> bool:
|
||||||
|
"""Load posts from CSV file."""
|
||||||
|
logger.info(f"Loading CSV: {self.csv_file}")
|
||||||
|
|
||||||
|
if not self.csv_file.exists():
|
||||||
|
logger.error(f"CSV file not found: {self.csv_file}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(self.csv_file, 'r', encoding='utf-8') as f:
|
||||||
|
reader = csv.DictReader(f)
|
||||||
|
self.posts = list(reader)
|
||||||
|
|
||||||
|
logger.info(f"✓ Loaded {len(self.posts)} posts from CSV")
|
||||||
|
return True
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error loading CSV: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _build_prompt(self, post: Dict) -> str:
|
||||||
|
"""
|
||||||
|
Build AI prompt for meta description generation.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
post: Post data dict
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
AI prompt string
|
||||||
|
"""
|
||||||
|
title = post.get('title', '')
|
||||||
|
content_preview = post.get('content_preview', '')
|
||||||
|
excerpt = post.get('excerpt', '')
|
||||||
|
focus_keyword = post.get('focus_keyword', '')
|
||||||
|
current_meta = post.get('meta_description', '')
|
||||||
|
|
||||||
|
# Build context from available content
|
||||||
|
content_context = ""
|
||||||
|
if excerpt:
|
||||||
|
content_context += f"Excerpt: {excerpt}\n"
|
||||||
|
if content_preview:
|
||||||
|
content_context += f"Content preview: {content_preview[:300]}..."
|
||||||
|
|
||||||
|
prompt = f"""You are an SEO expert. Generate an optimized meta description for the following blog post.
|
||||||
|
|
||||||
|
**Post Title:** {title}
|
||||||
|
|
||||||
|
**Content Context:**
|
||||||
|
{content_context}
|
||||||
|
|
||||||
|
**Focus Keyword:** {focus_keyword if focus_keyword else 'Not specified'}
|
||||||
|
|
||||||
|
**Current Meta Description:** {current_meta if current_meta else 'None (needs to be created)'}
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
1. Length: 120-160 characters (optimal for SEO)
|
||||||
|
2. Include the focus keyword naturally if available
|
||||||
|
3. Make it compelling and action-oriented
|
||||||
|
4. Clearly describe what the post is about
|
||||||
|
5. Use active voice
|
||||||
|
6. Include a call-to-action when appropriate
|
||||||
|
7. Avoid clickbait - be accurate and valuable
|
||||||
|
8. Write in the same language as the content
|
||||||
|
|
||||||
|
**Output Format:**
|
||||||
|
Return ONLY the meta description text, nothing else. No quotes, no explanations."""
|
||||||
|
|
||||||
|
return prompt
|
||||||
|
|
||||||
|
def _call_ai_api(self, prompt: str) -> Optional[str]:
|
||||||
|
"""
|
||||||
|
Call AI API to generate meta description.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
prompt: AI prompt
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Generated meta description or None
|
||||||
|
"""
|
||||||
|
url = "https://openrouter.ai/api/v1/chat/completions"
|
||||||
|
headers = {
|
||||||
|
"Authorization": f"Bearer {self.openrouter_api_key}",
|
||||||
|
"Content-Type": "application/json"
|
||||||
|
}
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"model": self.ai_model,
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "system",
|
||||||
|
"content": "You are an SEO expert specializing in meta description optimization. You write compelling, concise, and search-engine optimized meta descriptions."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": prompt
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"temperature": 0.7,
|
||||||
|
"max_tokens": 100
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = requests.post(url, json=payload, headers=headers, timeout=30)
|
||||||
|
response.raise_for_status()
|
||||||
|
|
||||||
|
result = response.json()
|
||||||
|
self.api_calls += 1
|
||||||
|
|
||||||
|
# Extract generated text
|
||||||
|
if 'choices' in result and len(result['choices']) > 0:
|
||||||
|
meta_description = result['choices'][0]['message']['content'].strip()
|
||||||
|
|
||||||
|
# Remove quotes if AI included them
|
||||||
|
if meta_description.startswith('"') and meta_description.endswith('"'):
|
||||||
|
meta_description = meta_description[1:-1]
|
||||||
|
|
||||||
|
return meta_description
|
||||||
|
else:
|
||||||
|
logger.warning("No AI response received")
|
||||||
|
return None
|
||||||
|
|
||||||
|
except requests.exceptions.RequestException as e:
|
||||||
|
logger.error(f"API call failed: {e}")
|
||||||
|
return None
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error processing AI response: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _validate_meta_description(self, meta: str) -> Dict[str, any]:
|
||||||
|
"""
|
||||||
|
Validate meta description quality.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
meta: Meta description text
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Validation results dict
|
||||||
|
"""
|
||||||
|
length = len(meta)
|
||||||
|
|
||||||
|
validation = {
|
||||||
|
'length': length,
|
||||||
|
'is_valid': False,
|
||||||
|
'too_short': False,
|
||||||
|
'too_long': False,
|
||||||
|
'optimal': False,
|
||||||
|
'score': 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check length
|
||||||
|
if length < self.min_length:
|
||||||
|
validation['too_short'] = True
|
||||||
|
validation['score'] = max(0, 50 - (self.min_length - length))
|
||||||
|
elif length > self.max_length:
|
||||||
|
validation['too_long'] = True
|
||||||
|
validation['score'] = max(0, 50 - (length - self.max_length))
|
||||||
|
else:
|
||||||
|
validation['optimal'] = True
|
||||||
|
validation['score'] = 100
|
||||||
|
|
||||||
|
# Check if it ends with a period (good practice)
|
||||||
|
if meta.endswith('.'):
|
||||||
|
validation['score'] = min(100, validation['score'] + 5)
|
||||||
|
|
||||||
|
# Check for call-to-action words
|
||||||
|
cta_words = ['learn', 'discover', 'find', 'explore', 'read', 'get', 'see', 'try', 'start']
|
||||||
|
if any(word in meta.lower() for word in cta_words):
|
||||||
|
validation['score'] = min(100, validation['score'] + 5)
|
||||||
|
|
||||||
|
validation['is_valid'] = validation['score'] >= 70
|
||||||
|
|
||||||
|
return validation
|
||||||
|
|
||||||
|
def generate_for_post(self, post: Dict) -> Optional[Dict]:
|
||||||
|
"""
|
||||||
|
Generate meta description for a single post.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
post: Post data dict
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Result dict with generated meta and validation
|
||||||
|
"""
|
||||||
|
title = post.get('title', '')
|
||||||
|
post_id = post.get('post_id', '')
|
||||||
|
current_meta = post.get('meta_description', '')
|
||||||
|
|
||||||
|
logger.info(f"Generating meta description for post {post_id}: {title[:50]}...")
|
||||||
|
|
||||||
|
# Skip if post has no title
|
||||||
|
if not title:
|
||||||
|
logger.warning(f"Skipping post {post_id}: No title")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Build prompt and call AI
|
||||||
|
prompt = self._build_prompt(post)
|
||||||
|
generated_meta = self._call_ai_api(prompt)
|
||||||
|
|
||||||
|
if not generated_meta:
|
||||||
|
logger.error(f"Failed to generate meta description for post {post_id}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Validate the result
|
||||||
|
validation = self._validate_meta_description(generated_meta)
|
||||||
|
|
||||||
|
# Calculate improvement
|
||||||
|
improvement = False
|
||||||
|
if current_meta:
|
||||||
|
current_validation = self._validate_meta_description(current_meta)
|
||||||
|
improvement = validation['score'] > current_validation['score']
|
||||||
|
else:
|
||||||
|
improvement = True # Any meta is an improvement over none
|
||||||
|
|
||||||
|
result = {
|
||||||
|
'post_id': post_id,
|
||||||
|
'site': post.get('site', ''),
|
||||||
|
'title': title,
|
||||||
|
'current_meta_description': current_meta,
|
||||||
|
'generated_meta_description': generated_meta,
|
||||||
|
'generated_length': validation['length'],
|
||||||
|
'validation_score': validation['score'],
|
||||||
|
'is_optimal_length': validation['optimal'],
|
||||||
|
'improvement': improvement,
|
||||||
|
'status': 'generated'
|
||||||
|
}
|
||||||
|
|
||||||
|
logger.info(f"✓ Generated meta description (score: {validation['score']}, length: {validation['length']})")
|
||||||
|
|
||||||
|
# Rate limiting
|
||||||
|
time.sleep(0.5)
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
def generate_batch(self, batch: List[Dict]) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Generate meta descriptions for a batch of posts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
batch: List of post dicts
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of result dicts
|
||||||
|
"""
|
||||||
|
results = []
|
||||||
|
|
||||||
|
for i, post in enumerate(batch, 1):
|
||||||
|
logger.info(f"Processing post {i}/{len(batch)}")
|
||||||
|
result = self.generate_for_post(post)
|
||||||
|
if result:
|
||||||
|
results.append(result)
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
def filter_posts_for_generation(self, posts: List[Dict],
|
||||||
|
only_missing: bool = False,
|
||||||
|
only_poor_quality: bool = False) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Filter posts based on meta description status.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
posts: List of post dicts
|
||||||
|
only_missing: Only include posts without meta descriptions
|
||||||
|
only_poor_quality: Only include posts with poor meta descriptions
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Filtered list of posts
|
||||||
|
"""
|
||||||
|
filtered = []
|
||||||
|
|
||||||
|
for post in posts:
|
||||||
|
current_meta = post.get('meta_description', '')
|
||||||
|
|
||||||
|
if only_missing:
|
||||||
|
# Skip posts that already have meta descriptions
|
||||||
|
if current_meta:
|
||||||
|
continue
|
||||||
|
filtered.append(post)
|
||||||
|
|
||||||
|
elif only_poor_quality:
|
||||||
|
# Skip posts without meta descriptions (handle separately)
|
||||||
|
if not current_meta:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Check if current meta is poor quality
|
||||||
|
validation = self._validate_meta_description(current_meta)
|
||||||
|
if validation['score'] < 70:
|
||||||
|
filtered.append(post)
|
||||||
|
|
||||||
|
else:
|
||||||
|
# Include all posts
|
||||||
|
filtered.append(post)
|
||||||
|
|
||||||
|
return filtered
|
||||||
|
|
||||||
|
def save_results(self, results: List[Dict], output_file: Optional[str] = None) -> str:
|
||||||
|
"""
|
||||||
|
Save generation results to CSV.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
results: List of result dicts
|
||||||
|
output_file: Custom output file path
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Path to saved file
|
||||||
|
"""
|
||||||
|
if not output_file:
|
||||||
|
output_dir = Path(__file__).parent.parent.parent / 'output'
|
||||||
|
output_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||||
|
output_file = output_dir / f'meta_descriptions_{timestamp}.csv'
|
||||||
|
|
||||||
|
output_file = Path(output_file)
|
||||||
|
output_file.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
fieldnames = [
|
||||||
|
'post_id', 'site', 'title', 'current_meta_description',
|
||||||
|
'generated_meta_description', 'generated_length',
|
||||||
|
'validation_score', 'is_optimal_length', 'improvement', 'status'
|
||||||
|
]
|
||||||
|
|
||||||
|
logger.info(f"Saving {len(results)} results to {output_file}...")
|
||||||
|
|
||||||
|
with open(output_file, 'w', newline='', encoding='utf-8') as f:
|
||||||
|
writer = csv.DictWriter(f, fieldnames=fieldnames)
|
||||||
|
writer.writeheader()
|
||||||
|
writer.writerows(results)
|
||||||
|
|
||||||
|
logger.info(f"✓ Results saved to: {output_file}")
|
||||||
|
return str(output_file)
|
||||||
|
|
||||||
|
def generate_summary(self, results: List[Dict]) -> Dict:
|
||||||
|
"""
|
||||||
|
Generate summary statistics.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
results: List of result dicts
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Summary dict
|
||||||
|
"""
|
||||||
|
if not results:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
total = len(results)
|
||||||
|
improved = sum(1 for r in results if r.get('improvement', False))
|
||||||
|
optimal_length = sum(1 for r in results if r.get('is_optimal_length', False))
|
||||||
|
avg_score = sum(r.get('validation_score', 0) for r in results) / total
|
||||||
|
|
||||||
|
# Count by site
|
||||||
|
by_site = {}
|
||||||
|
for r in results:
|
||||||
|
site = r.get('site', 'unknown')
|
||||||
|
if site not in by_site:
|
||||||
|
by_site[site] = {'total': 0, 'improved': 0}
|
||||||
|
by_site[site]['total'] += 1
|
||||||
|
if r.get('improvement', False):
|
||||||
|
by_site[site]['improved'] += 1
|
||||||
|
|
||||||
|
summary = {
|
||||||
|
'total_posts': total,
|
||||||
|
'improved': improved,
|
||||||
|
'improvement_rate': (improved / total * 100) if total > 0 else 0,
|
||||||
|
'optimal_length_count': optimal_length,
|
||||||
|
'optimal_length_rate': (optimal_length / total * 100) if total > 0 else 0,
|
||||||
|
'average_score': avg_score,
|
||||||
|
'api_calls': self.api_calls,
|
||||||
|
'by_site': by_site
|
||||||
|
}
|
||||||
|
|
||||||
|
return summary
|
||||||
|
|
||||||
|
def run(self, output_file: Optional[str] = None,
|
||||||
|
only_missing: bool = False,
|
||||||
|
only_poor_quality: bool = False,
|
||||||
|
limit: Optional[int] = None) -> Tuple[str, Dict]:
|
||||||
|
"""
|
||||||
|
Run complete meta description generation process.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
output_file: Custom output file path
|
||||||
|
only_missing: Only generate for posts without meta descriptions
|
||||||
|
only_poor_quality: Only generate for posts with poor quality meta descriptions
|
||||||
|
limit: Maximum number of posts to process
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (output_file_path, summary_dict)
|
||||||
|
"""
|
||||||
|
logger.info("\n" + "="*70)
|
||||||
|
logger.info("AI META DESCRIPTION GENERATION")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
# Load posts
|
||||||
|
if not self.load_csv():
|
||||||
|
return "", {}
|
||||||
|
|
||||||
|
# Filter posts
|
||||||
|
posts_to_process = self.filter_posts_for_generation(
|
||||||
|
self.posts,
|
||||||
|
only_missing=only_missing,
|
||||||
|
only_poor_quality=only_poor_quality
|
||||||
|
)
|
||||||
|
|
||||||
|
logger.info(f"Posts to process: {len(posts_to_process)}")
|
||||||
|
|
||||||
|
if only_missing:
|
||||||
|
logger.info("Filter: Only posts without meta descriptions")
|
||||||
|
elif only_poor_quality:
|
||||||
|
logger.info("Filter: Only posts with poor quality meta descriptions")
|
||||||
|
|
||||||
|
# Apply limit
|
||||||
|
if limit:
|
||||||
|
posts_to_process = posts_to_process[:limit]
|
||||||
|
logger.info(f"Limited to: {len(posts_to_process)} posts")
|
||||||
|
|
||||||
|
if not posts_to_process:
|
||||||
|
logger.warning("No posts to process")
|
||||||
|
return "", {}
|
||||||
|
|
||||||
|
# Generate meta descriptions
|
||||||
|
results = self.generate_batch(posts_to_process)
|
||||||
|
|
||||||
|
# Save results
|
||||||
|
if results:
|
||||||
|
output_path = self.save_results(results, output_file)
|
||||||
|
|
||||||
|
# Generate and log summary
|
||||||
|
summary = self.generate_summary(results)
|
||||||
|
|
||||||
|
logger.info("\n" + "="*70)
|
||||||
|
logger.info("GENERATION SUMMARY")
|
||||||
|
logger.info("="*70)
|
||||||
|
logger.info(f"Total posts processed: {summary['total_posts']}")
|
||||||
|
logger.info(f"Improved: {summary['improved']} ({summary['improvement_rate']:.1f}%)")
|
||||||
|
logger.info(f"Optimal length: {summary['optimal_length_count']} ({summary['optimal_length_rate']:.1f}%)")
|
||||||
|
logger.info(f"Average validation score: {summary['average_score']:.1f}")
|
||||||
|
logger.info(f"API calls made: {summary['api_calls']}")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
return output_path, summary
|
||||||
|
else:
|
||||||
|
logger.warning("No results generated")
|
||||||
|
return "", {}
|
||||||
631
src/seo/meta_description_updater.py
Normal file
631
src/seo/meta_description_updater.py
Normal file
@@ -0,0 +1,631 @@
|
|||||||
|
"""
|
||||||
|
Meta Description Updater - Fetch, generate, and update meta descriptions directly on WordPress
|
||||||
|
"""
|
||||||
|
|
||||||
|
import csv
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Dict, List, Optional, Tuple
|
||||||
|
import requests
|
||||||
|
from requests.auth import HTTPBasicAuth
|
||||||
|
|
||||||
|
from .config import Config
|
||||||
|
from .meta_description_generator import MetaDescriptionGenerator
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class MetaDescriptionUpdater:
|
||||||
|
"""Fetch posts from WordPress, generate AI meta descriptions, and update them."""
|
||||||
|
|
||||||
|
def __init__(self, site_name: str):
|
||||||
|
"""
|
||||||
|
Initialize the updater.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
site_name: WordPress site name (e.g., 'mistergeek.net')
|
||||||
|
"""
|
||||||
|
self.site_name = site_name
|
||||||
|
self.sites = Config.WORDPRESS_SITES
|
||||||
|
|
||||||
|
if site_name not in self.sites:
|
||||||
|
raise ValueError(f"Site '{site_name}' not found in configuration")
|
||||||
|
|
||||||
|
self.site_config = self.sites[site_name]
|
||||||
|
self.base_url = self.site_config['url'].rstrip('/')
|
||||||
|
self.auth = HTTPBasicAuth(
|
||||||
|
self.site_config['username'],
|
||||||
|
self.site_config['password']
|
||||||
|
)
|
||||||
|
|
||||||
|
self.openrouter_api_key = Config.OPENROUTER_API_KEY
|
||||||
|
self.ai_model = Config.AI_MODEL
|
||||||
|
|
||||||
|
self.posts = []
|
||||||
|
self.update_results = []
|
||||||
|
self.api_calls = 0
|
||||||
|
self.stats = {
|
||||||
|
'total_posts': 0,
|
||||||
|
'updated': 0,
|
||||||
|
'failed': 0,
|
||||||
|
'skipped': 0
|
||||||
|
}
|
||||||
|
|
||||||
|
def fetch_posts(self, post_ids: Optional[List[int]] = None,
|
||||||
|
category_ids: Optional[List[int]] = None,
|
||||||
|
category_names: Optional[List[str]] = None,
|
||||||
|
author_names: Optional[List[str]] = None,
|
||||||
|
limit: Optional[int] = None,
|
||||||
|
status: Optional[List[str]] = None) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Fetch posts from WordPress site.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
post_ids: Specific post IDs to fetch
|
||||||
|
category_ids: Filter by category IDs
|
||||||
|
category_names: Filter by category names (will be resolved to IDs)
|
||||||
|
author_names: Filter by author names
|
||||||
|
limit: Maximum number of posts to fetch
|
||||||
|
status: Post statuses to fetch (default: ['publish'])
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of post dicts
|
||||||
|
"""
|
||||||
|
logger.info(f"Fetching posts from {self.site_name}...")
|
||||||
|
|
||||||
|
if post_ids:
|
||||||
|
logger.info(f" Post IDs: {post_ids}")
|
||||||
|
if category_ids:
|
||||||
|
logger.info(f" Category IDs: {category_ids}")
|
||||||
|
if category_names:
|
||||||
|
logger.info(f" Category names: {category_names}")
|
||||||
|
if author_names:
|
||||||
|
logger.info(f" Authors: {author_names}")
|
||||||
|
if limit:
|
||||||
|
logger.info(f" Limit: {limit}")
|
||||||
|
|
||||||
|
# Resolve category names to IDs if needed
|
||||||
|
if category_names and not category_ids:
|
||||||
|
category_ids = self._get_category_ids_by_names(category_names)
|
||||||
|
|
||||||
|
# Resolve author names to IDs if needed
|
||||||
|
author_ids = None
|
||||||
|
if author_names:
|
||||||
|
author_ids = self._get_author_ids_by_names(author_names)
|
||||||
|
|
||||||
|
# Build API parameters
|
||||||
|
params = {
|
||||||
|
'per_page': 100,
|
||||||
|
'page': 1,
|
||||||
|
'status': ','.join(status) if status else 'publish',
|
||||||
|
'_embed': True
|
||||||
|
}
|
||||||
|
|
||||||
|
if post_ids:
|
||||||
|
# Fetch specific posts
|
||||||
|
posts = []
|
||||||
|
for post_id in post_ids:
|
||||||
|
try:
|
||||||
|
response = requests.get(
|
||||||
|
f"{self.base_url}/wp-json/wp/v2/posts/{post_id}",
|
||||||
|
auth=self.auth,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
if response.status_code == 200:
|
||||||
|
posts.append(response.json())
|
||||||
|
else:
|
||||||
|
logger.warning(f" Post {post_id} not found or inaccessible")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f" Error fetching post {post_id}: {e}")
|
||||||
|
self.posts = posts
|
||||||
|
else:
|
||||||
|
# Fetch posts with filters
|
||||||
|
if category_ids:
|
||||||
|
params['categories'] = ','.join(map(str, category_ids))
|
||||||
|
|
||||||
|
if author_ids:
|
||||||
|
params['author'] = ','.join(map(str, author_ids))
|
||||||
|
|
||||||
|
posts = []
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
response = requests.get(
|
||||||
|
f"{self.base_url}/wp-json/wp/v2/posts",
|
||||||
|
params=params,
|
||||||
|
auth=self.auth,
|
||||||
|
timeout=30
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
|
||||||
|
page_posts = response.json()
|
||||||
|
if not page_posts:
|
||||||
|
break
|
||||||
|
|
||||||
|
posts.extend(page_posts)
|
||||||
|
|
||||||
|
if len(page_posts) < 100:
|
||||||
|
break
|
||||||
|
if limit and len(posts) >= limit:
|
||||||
|
break
|
||||||
|
|
||||||
|
params['page'] += 1
|
||||||
|
time.sleep(0.3)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error fetching posts: {e}")
|
||||||
|
break
|
||||||
|
|
||||||
|
# Apply limit if specified
|
||||||
|
if limit:
|
||||||
|
posts = posts[:limit]
|
||||||
|
|
||||||
|
self.posts = posts
|
||||||
|
|
||||||
|
logger.info(f"✓ Fetched {len(self.posts)} posts from {self.site_name}")
|
||||||
|
return self.posts
|
||||||
|
|
||||||
|
def _get_category_ids_by_names(self, category_names: List[str]) -> List[int]:
|
||||||
|
"""
|
||||||
|
Get category IDs by category names.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
category_names: List of category names
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of category IDs
|
||||||
|
"""
|
||||||
|
logger.info(f"Resolving category names to IDs...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = requests.get(
|
||||||
|
f"{self.base_url}/wp-json/wp/v2/categories",
|
||||||
|
params={'per_page': 100},
|
||||||
|
auth=self.auth,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
|
||||||
|
categories = response.json()
|
||||||
|
category_map = {cat['name'].lower(): cat['id'] for cat in categories}
|
||||||
|
|
||||||
|
category_ids = []
|
||||||
|
for name in category_names:
|
||||||
|
name_lower = name.lower()
|
||||||
|
if name_lower in category_map:
|
||||||
|
category_ids.append(category_map[name_lower])
|
||||||
|
logger.info(f" ✓ '{name}' -> ID {category_map[name_lower]}")
|
||||||
|
else:
|
||||||
|
# Try partial match
|
||||||
|
for cat_name, cat_id in category_map.items():
|
||||||
|
if name_lower in cat_name or cat_name in name_lower:
|
||||||
|
category_ids.append(cat_id)
|
||||||
|
logger.info(f" ✓ '{name}' -> ID {cat_id} (partial match)")
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
logger.warning(f" ✗ Category '{name}' not found")
|
||||||
|
|
||||||
|
return category_ids
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error fetching categories: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def _get_author_ids_by_names(self, author_names: List[str]) -> List[int]:
|
||||||
|
"""
|
||||||
|
Get author/user IDs by author names.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
author_names: List of author names
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of author IDs
|
||||||
|
"""
|
||||||
|
logger.info(f"Resolving author names to IDs...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = requests.get(
|
||||||
|
f"{self.base_url}/wp-json/wp/v2/users",
|
||||||
|
params={'per_page': 100},
|
||||||
|
auth=self.auth,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
response.raise_for_status()
|
||||||
|
|
||||||
|
users = response.json()
|
||||||
|
author_map = {}
|
||||||
|
|
||||||
|
# Build map of name/slug to ID
|
||||||
|
for user in users:
|
||||||
|
name = user.get('name', '').lower()
|
||||||
|
slug = user.get('slug', '').lower()
|
||||||
|
author_map[name] = user['id']
|
||||||
|
author_map[slug] = user['id']
|
||||||
|
|
||||||
|
author_ids = []
|
||||||
|
for name in author_names:
|
||||||
|
name_lower = name.lower()
|
||||||
|
|
||||||
|
# Try exact match
|
||||||
|
if name_lower in author_map:
|
||||||
|
author_ids.append(author_map[name_lower])
|
||||||
|
logger.info(f" ✓ '{name}' -> ID {author_map[name_lower]}")
|
||||||
|
else:
|
||||||
|
# Try partial match
|
||||||
|
found = False
|
||||||
|
for author_name, author_id in author_map.items():
|
||||||
|
if name_lower in author_name or author_name in name_lower:
|
||||||
|
author_ids.append(author_id)
|
||||||
|
logger.info(f" ✓ '{name}' -> ID {author_id} (partial match: '{author_name}')")
|
||||||
|
found = True
|
||||||
|
break
|
||||||
|
|
||||||
|
if not found:
|
||||||
|
logger.warning(f" ✗ Author '{name}' not found")
|
||||||
|
|
||||||
|
return author_ids
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error fetching authors: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def _generate_meta_description(self, post: Dict) -> Optional[str]:
|
||||||
|
"""
|
||||||
|
Generate meta description for a post using AI.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
post: Post data dict
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Generated meta description or None
|
||||||
|
"""
|
||||||
|
title = post.get('title', {}).get('rendered', '')
|
||||||
|
content = post.get('content', {}).get('rendered', '')
|
||||||
|
excerpt = post.get('excerpt', {}).get('rendered', '')
|
||||||
|
|
||||||
|
# Strip HTML from content
|
||||||
|
import re
|
||||||
|
content_text = re.sub('<[^<]+?>', '', content)[:500]
|
||||||
|
excerpt_text = re.sub('<[^<]+?>', '', excerpt)
|
||||||
|
|
||||||
|
# Build prompt
|
||||||
|
prompt = f"""You are an SEO expert. Generate an optimized meta description for the following blog post.
|
||||||
|
|
||||||
|
**Post Title:** {title}
|
||||||
|
|
||||||
|
**Content Context:**
|
||||||
|
Excerpt: {excerpt_text}
|
||||||
|
Content preview: {content_text}...
|
||||||
|
|
||||||
|
**Requirements:**
|
||||||
|
1. Length: 120-160 characters (optimal for SEO)
|
||||||
|
2. Make it compelling and action-oriented
|
||||||
|
3. Clearly describe what the post is about
|
||||||
|
4. Use active voice
|
||||||
|
5. Include a call-to-action when appropriate
|
||||||
|
6. Avoid clickbait - be accurate and valuable
|
||||||
|
|
||||||
|
**Output Format:**
|
||||||
|
Return ONLY the meta description text, nothing else. No quotes, no explanations."""
|
||||||
|
|
||||||
|
# Call AI API
|
||||||
|
url = "https://openrouter.ai/api/v1/chat/completions"
|
||||||
|
headers = {
|
||||||
|
"Authorization": f"Bearer {self.openrouter_api_key}",
|
||||||
|
"Content-Type": "application/json"
|
||||||
|
}
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"model": self.ai_model,
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "system",
|
||||||
|
"content": "You are an SEO expert specializing in meta description optimization."
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": prompt
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"temperature": 0.7,
|
||||||
|
"max_tokens": 100
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
response = requests.post(url, json=payload, headers=headers, timeout=30)
|
||||||
|
response.raise_for_status()
|
||||||
|
|
||||||
|
result = response.json()
|
||||||
|
self.api_calls += 1
|
||||||
|
|
||||||
|
if 'choices' in result and len(result['choices']) > 0:
|
||||||
|
meta_description = result['choices'][0]['message']['content'].strip()
|
||||||
|
|
||||||
|
# Remove quotes if AI included them
|
||||||
|
if meta_description.startswith('"') and meta_description.endswith('"'):
|
||||||
|
meta_description = meta_description[1:-1]
|
||||||
|
|
||||||
|
return meta_description
|
||||||
|
else:
|
||||||
|
logger.warning("No AI response received")
|
||||||
|
return None
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"API call failed: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _update_post_meta(self, post_id: int, meta_description: str) -> bool:
|
||||||
|
"""
|
||||||
|
Update post meta description in WordPress.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
post_id: Post ID to update
|
||||||
|
meta_description: New meta description
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if successful, False otherwise
|
||||||
|
"""
|
||||||
|
logger.info(f"Updating post {post_id}...")
|
||||||
|
|
||||||
|
# Determine which SEO plugin meta key to use
|
||||||
|
# Try RankMath first, then Yoast
|
||||||
|
meta_fields = {
|
||||||
|
'rank_math_description': meta_description
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
# First, get current post meta to preserve other fields
|
||||||
|
response = requests.get(
|
||||||
|
f"{self.base_url}/wp-json/wp/v2/posts/{post_id}",
|
||||||
|
auth=self.auth,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.status_code != 200:
|
||||||
|
logger.error(f" Could not fetch post {post_id}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
current_post = response.json()
|
||||||
|
current_meta = current_post.get('meta', {})
|
||||||
|
|
||||||
|
# Update with new meta description
|
||||||
|
updated_meta = {**current_meta, **meta_fields}
|
||||||
|
|
||||||
|
# Update post
|
||||||
|
update_response = requests.post(
|
||||||
|
f"{self.base_url}/wp-json/wp/v2/posts/{post_id}",
|
||||||
|
json={'meta': updated_meta},
|
||||||
|
auth=self.auth,
|
||||||
|
timeout=10
|
||||||
|
)
|
||||||
|
|
||||||
|
if update_response.status_code == 200:
|
||||||
|
logger.info(f" ✓ Updated post {post_id}")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
logger.error(f" ✗ Failed to update post {post_id}: {update_response.status_code}")
|
||||||
|
logger.error(f" Response: {update_response.text}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f" ✗ Error updating post {post_id}: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _validate_meta_description(self, meta: str) -> Dict:
|
||||||
|
"""Validate meta description quality."""
|
||||||
|
length = len(meta)
|
||||||
|
|
||||||
|
validation = {
|
||||||
|
'length': length,
|
||||||
|
'is_optimal': 120 <= length <= 160,
|
||||||
|
'too_short': length < 120,
|
||||||
|
'too_long': length > 160,
|
||||||
|
'score': 0
|
||||||
|
}
|
||||||
|
|
||||||
|
if validation['is_optimal']:
|
||||||
|
validation['score'] = 100
|
||||||
|
elif validation['too_short']:
|
||||||
|
validation['score'] = max(0, 50 - (120 - length))
|
||||||
|
else:
|
||||||
|
validation['score'] = max(0, 50 - (length - 160))
|
||||||
|
|
||||||
|
# Bonus for ending with period
|
||||||
|
if meta.endswith('.'):
|
||||||
|
validation['score'] = min(100, validation['score'] + 5)
|
||||||
|
|
||||||
|
# Bonus for CTA words
|
||||||
|
cta_words = ['learn', 'discover', 'find', 'explore', 'read', 'get', 'see', 'try', 'start']
|
||||||
|
if any(word in meta.lower() for word in cta_words):
|
||||||
|
validation['score'] = min(100, validation['score'] + 5)
|
||||||
|
|
||||||
|
return validation
|
||||||
|
|
||||||
|
def update_posts(self, dry_run: bool = False,
|
||||||
|
skip_existing: bool = False,
|
||||||
|
force_regenerate: bool = False) -> Dict:
|
||||||
|
"""
|
||||||
|
Generate and update meta descriptions for fetched posts.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
dry_run: If True, preview changes without updating
|
||||||
|
skip_existing: If True, skip posts that already have meta descriptions
|
||||||
|
force_regenerate: If True, regenerate even for posts with good meta descriptions
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Statistics dict
|
||||||
|
"""
|
||||||
|
logger.info("\n" + "="*70)
|
||||||
|
logger.info("META DESCRIPTION UPDATE")
|
||||||
|
logger.info("="*70)
|
||||||
|
logger.info(f"Site: {self.site_name}")
|
||||||
|
logger.info(f"Posts to process: {len(self.posts)}")
|
||||||
|
logger.info(f"Dry run: {dry_run}")
|
||||||
|
logger.info(f"Skip existing: {skip_existing}")
|
||||||
|
logger.info(f"Force regenerate: {force_regenerate}")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
self.stats['total_posts'] = len(self.posts)
|
||||||
|
|
||||||
|
for i, post in enumerate(self.posts, 1):
|
||||||
|
post_id = post.get('id')
|
||||||
|
title = post.get('title', {}).get('rendered', '')[:50]
|
||||||
|
|
||||||
|
logger.info(f"\n[{i}/{len(self.posts)}] Processing post {post_id}: {title}...")
|
||||||
|
|
||||||
|
# Check current meta description
|
||||||
|
meta_dict = post.get('meta', {})
|
||||||
|
current_meta = (
|
||||||
|
meta_dict.get('rank_math_description', '') or
|
||||||
|
meta_dict.get('_yoast_wpseo_metadesc', '') or
|
||||||
|
''
|
||||||
|
)
|
||||||
|
|
||||||
|
# Skip if has existing meta and skip_existing is True
|
||||||
|
if current_meta and skip_existing and not force_regenerate:
|
||||||
|
logger.info(f" Skipping: Already has meta description")
|
||||||
|
self.stats['skipped'] += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Validate existing meta (if any)
|
||||||
|
if current_meta and not force_regenerate:
|
||||||
|
validation = self._validate_meta_description(current_meta)
|
||||||
|
if validation['score'] >= 80:
|
||||||
|
logger.info(f" Skipping: Existing meta is good quality (score: {validation['score']})")
|
||||||
|
self.stats['skipped'] += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Generate new meta description
|
||||||
|
logger.info(f" Generating meta description...")
|
||||||
|
generated_meta = self._generate_meta_description(post)
|
||||||
|
|
||||||
|
if not generated_meta:
|
||||||
|
logger.error(f" ✗ Failed to generate meta description")
|
||||||
|
self.stats['failed'] += 1
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Validate generated meta
|
||||||
|
validation = self._validate_meta_description(generated_meta)
|
||||||
|
logger.info(f" Generated: {generated_meta[:80]}...")
|
||||||
|
logger.info(f" Length: {validation['length']} chars, Score: {validation['score']}")
|
||||||
|
|
||||||
|
# Update post
|
||||||
|
if dry_run:
|
||||||
|
logger.info(f" [DRY RUN] Would update post {post_id}")
|
||||||
|
self.update_results.append({
|
||||||
|
'post_id': post_id,
|
||||||
|
'title': title,
|
||||||
|
'current_meta': current_meta,
|
||||||
|
'generated_meta': generated_meta,
|
||||||
|
'status': 'dry_run',
|
||||||
|
'validation_score': validation['score']
|
||||||
|
})
|
||||||
|
else:
|
||||||
|
success = self._update_post_meta(post_id, generated_meta)
|
||||||
|
|
||||||
|
if success:
|
||||||
|
logger.info(f" ✓ Successfully updated post {post_id}")
|
||||||
|
self.stats['updated'] += 1
|
||||||
|
self.update_results.append({
|
||||||
|
'post_id': post_id,
|
||||||
|
'title': title,
|
||||||
|
'current_meta': current_meta,
|
||||||
|
'generated_meta': generated_meta,
|
||||||
|
'status': 'updated',
|
||||||
|
'validation_score': validation['score']
|
||||||
|
})
|
||||||
|
else:
|
||||||
|
self.stats['failed'] += 1
|
||||||
|
self.update_results.append({
|
||||||
|
'post_id': post_id,
|
||||||
|
'title': title,
|
||||||
|
'status': 'failed',
|
||||||
|
'validation_score': validation['score']
|
||||||
|
})
|
||||||
|
|
||||||
|
# Rate limiting
|
||||||
|
time.sleep(0.5)
|
||||||
|
|
||||||
|
# Save results
|
||||||
|
self._save_results()
|
||||||
|
|
||||||
|
# Print summary
|
||||||
|
logger.info("\n" + "="*70)
|
||||||
|
logger.info("UPDATE SUMMARY")
|
||||||
|
logger.info("="*70)
|
||||||
|
logger.info(f"Total posts: {self.stats['total_posts']}")
|
||||||
|
logger.info(f"Updated: {self.stats['updated']}")
|
||||||
|
logger.info(f"Failed: {self.stats['failed']}")
|
||||||
|
logger.info(f"Skipped: {self.stats['skipped']}")
|
||||||
|
logger.info(f"API calls: {self.api_calls}")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
return self.stats
|
||||||
|
|
||||||
|
def _save_results(self):
|
||||||
|
"""Save update results to CSV."""
|
||||||
|
if not self.update_results:
|
||||||
|
return
|
||||||
|
|
||||||
|
output_dir = Path(__file__).parent.parent.parent / 'output'
|
||||||
|
output_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||||
|
output_file = output_dir / f'meta_update_{self.site_name}_{timestamp}.csv'
|
||||||
|
|
||||||
|
fieldnames = [
|
||||||
|
'post_id', 'title', 'current_meta', 'generated_meta',
|
||||||
|
'status', 'validation_score'
|
||||||
|
]
|
||||||
|
|
||||||
|
with open(output_file, 'w', newline='', encoding='utf-8') as f:
|
||||||
|
writer = csv.DictWriter(f, fieldnames=fieldnames)
|
||||||
|
writer.writeheader()
|
||||||
|
writer.writerows(self.update_results)
|
||||||
|
|
||||||
|
logger.info(f"\n✓ Results saved to: {output_file}")
|
||||||
|
|
||||||
|
def run(self, post_ids: Optional[List[int]] = None,
|
||||||
|
category_ids: Optional[List[int]] = None,
|
||||||
|
category_names: Optional[List[str]] = None,
|
||||||
|
author_names: Optional[List[str]] = None,
|
||||||
|
limit: Optional[int] = None,
|
||||||
|
dry_run: bool = False,
|
||||||
|
skip_existing: bool = False,
|
||||||
|
force_regenerate: bool = False) -> Dict:
|
||||||
|
"""
|
||||||
|
Run complete meta description update process.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
post_ids: Specific post IDs to update
|
||||||
|
category_ids: Filter by category IDs
|
||||||
|
category_names: Filter by category names
|
||||||
|
author_names: Filter by author names
|
||||||
|
limit: Maximum number of posts to process
|
||||||
|
dry_run: If True, preview changes without updating
|
||||||
|
skip_existing: If True, skip posts with existing meta descriptions
|
||||||
|
force_regenerate: If True, regenerate even for good quality metas
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Statistics dict
|
||||||
|
"""
|
||||||
|
# Fetch posts
|
||||||
|
self.fetch_posts(
|
||||||
|
post_ids=post_ids,
|
||||||
|
category_ids=category_ids,
|
||||||
|
category_names=category_names,
|
||||||
|
author_names=author_names,
|
||||||
|
limit=limit
|
||||||
|
)
|
||||||
|
|
||||||
|
if not self.posts:
|
||||||
|
logger.warning("No posts found matching criteria")
|
||||||
|
return self.stats
|
||||||
|
|
||||||
|
# Update posts
|
||||||
|
return self.update_posts(
|
||||||
|
dry_run=dry_run,
|
||||||
|
skip_existing=skip_existing,
|
||||||
|
force_regenerate=force_regenerate
|
||||||
|
)
|
||||||
396
src/seo/performance_analyzer.py
Normal file
396
src/seo/performance_analyzer.py
Normal file
@@ -0,0 +1,396 @@
|
|||||||
|
"""
|
||||||
|
SEO Performance Analyzer - Analyze page performance from imported data
|
||||||
|
Supports Google Analytics and Search Console CSV imports
|
||||||
|
"""
|
||||||
|
|
||||||
|
import csv
|
||||||
|
import logging
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime
|
||||||
|
from typing import Dict, List, Optional, Tuple
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class PerformanceAnalyzer:
|
||||||
|
"""Analyze SEO performance from imported CSV data."""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
"""Initialize performance analyzer."""
|
||||||
|
self.performance_data = []
|
||||||
|
self.analysis_results = {}
|
||||||
|
|
||||||
|
def load_ga4_export(self, csv_file: str) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Load Google Analytics 4 export CSV.
|
||||||
|
|
||||||
|
Expected columns: page_path, page_title, pageviews, sessions, bounce_rate, etc.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
csv_file: Path to GA4 export CSV
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of data dicts
|
||||||
|
"""
|
||||||
|
logger.info(f"Loading GA4 export: {csv_file}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(csv_file, 'r', encoding='utf-8') as f:
|
||||||
|
reader = csv.DictReader(f)
|
||||||
|
data = list(reader)
|
||||||
|
|
||||||
|
# Normalize column names
|
||||||
|
normalized = []
|
||||||
|
for row in data:
|
||||||
|
normalized_row = {}
|
||||||
|
for key, value in row.items():
|
||||||
|
# Normalize key names
|
||||||
|
new_key = key.lower().replace(' ', '_').replace('-', '_')
|
||||||
|
if 'page' in new_key and 'path' in new_key:
|
||||||
|
normalized_row['page'] = value
|
||||||
|
elif 'page' in new_key and 'title' in new_key:
|
||||||
|
normalized_row['page_title'] = value
|
||||||
|
elif 'pageviews' in new_key or 'views' in new_key:
|
||||||
|
normalized_row['pageviews'] = int(value) if value else 0
|
||||||
|
elif 'sessions' in new_key:
|
||||||
|
normalized_row['sessions'] = int(value) if value else 0
|
||||||
|
elif 'bounce' in new_key and 'rate' in new_key:
|
||||||
|
normalized_row['bounce_rate'] = float(value) if value else 0.0
|
||||||
|
elif 'engagement' in new_key and 'rate' in new_key:
|
||||||
|
normalized_row['engagement_rate'] = float(value) if value else 0.0
|
||||||
|
elif 'duration' in new_key or 'time' in new_key:
|
||||||
|
normalized_row['avg_session_duration'] = float(value) if value else 0.0
|
||||||
|
else:
|
||||||
|
normalized_row[new_key] = value
|
||||||
|
|
||||||
|
normalized.append(normalized_row)
|
||||||
|
|
||||||
|
self.performance_data.extend(normalized)
|
||||||
|
logger.info(f"✓ Loaded {len(normalized)} rows from GA4")
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error loading GA4 export: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def load_gsc_export(self, csv_file: str) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Load Google Search Console export CSV.
|
||||||
|
|
||||||
|
Expected columns: Page, Clicks, Impressions, CTR, Position
|
||||||
|
|
||||||
|
Args:
|
||||||
|
csv_file: Path to GSC export CSV
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of data dicts
|
||||||
|
"""
|
||||||
|
logger.info(f"Loading GSC export: {csv_file}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
with open(csv_file, 'r', encoding='utf-8') as f:
|
||||||
|
reader = csv.DictReader(f)
|
||||||
|
data = list(reader)
|
||||||
|
|
||||||
|
# Normalize column names
|
||||||
|
normalized = []
|
||||||
|
for row in data:
|
||||||
|
normalized_row = {'page': ''}
|
||||||
|
for key, value in row.items():
|
||||||
|
new_key = key.lower().replace(' ', '_')
|
||||||
|
if 'page' in new_key or 'url' in new_key:
|
||||||
|
normalized_row['page'] = value
|
||||||
|
elif 'clicks' in new_key:
|
||||||
|
normalized_row['clicks'] = int(value) if value else 0
|
||||||
|
elif 'impressions' in new_key:
|
||||||
|
normalized_row['impressions'] = int(value) if value else 0
|
||||||
|
elif 'ctr' in new_key:
|
||||||
|
normalized_row['ctr'] = float(value) if value else 0.0
|
||||||
|
elif 'position' in new_key or 'rank' in new_key:
|
||||||
|
normalized_row['position'] = float(value) if value else 0.0
|
||||||
|
elif 'query' in new_key or 'keyword' in new_key:
|
||||||
|
normalized_row['query'] = value
|
||||||
|
|
||||||
|
normalized.append(normalized_row)
|
||||||
|
|
||||||
|
# Merge with existing data
|
||||||
|
self._merge_gsc_data(normalized)
|
||||||
|
|
||||||
|
logger.info(f"✓ Loaded {len(normalized)} rows from GSC")
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error loading GSC export: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def _merge_gsc_data(self, gsc_data: List[Dict]):
|
||||||
|
"""Merge GSC data with existing performance data."""
|
||||||
|
# Create lookup by page
|
||||||
|
existing_pages = {p.get('page', ''): p for p in self.performance_data}
|
||||||
|
|
||||||
|
for gsc_row in gsc_data:
|
||||||
|
page = gsc_row.get('page', '')
|
||||||
|
|
||||||
|
if page in existing_pages:
|
||||||
|
# Update existing record
|
||||||
|
existing_pages[page].update(gsc_row)
|
||||||
|
else:
|
||||||
|
# Add new record
|
||||||
|
new_record = {
|
||||||
|
'page': page,
|
||||||
|
'page_title': '',
|
||||||
|
'pageviews': 0,
|
||||||
|
'sessions': 0,
|
||||||
|
'bounce_rate': 0.0,
|
||||||
|
'engagement_rate': 0.0,
|
||||||
|
'avg_session_duration': 0.0
|
||||||
|
}
|
||||||
|
new_record.update(gsc_row)
|
||||||
|
self.performance_data.append(new_record)
|
||||||
|
|
||||||
|
def analyze(self) -> Dict:
|
||||||
|
"""
|
||||||
|
Analyze performance data.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Analysis results dict
|
||||||
|
"""
|
||||||
|
if not self.performance_data:
|
||||||
|
logger.warning("No data to analyze")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
logger.info("\n" + "="*70)
|
||||||
|
logger.info("PERFORMANCE ANALYSIS")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
# Calculate summary metrics
|
||||||
|
total_pages = len(self.performance_data)
|
||||||
|
total_pageviews = sum(p.get('pageviews', 0) for p in self.performance_data)
|
||||||
|
total_clicks = sum(p.get('clicks', 0) for p in self.performance_data)
|
||||||
|
total_impressions = sum(p.get('impressions', 0) for p in self.performance_data)
|
||||||
|
|
||||||
|
avg_ctr = total_clicks / total_impressions if total_impressions > 0 else 0.0
|
||||||
|
avg_position = sum(p.get('position', 0) for p in self.performance_data) / total_pages if total_pages > 0 else 0.0
|
||||||
|
|
||||||
|
# Top pages
|
||||||
|
top_by_views = sorted(
|
||||||
|
self.performance_data,
|
||||||
|
key=lambda x: x.get('pageviews', 0),
|
||||||
|
reverse=True
|
||||||
|
)[:20]
|
||||||
|
|
||||||
|
top_by_clicks = sorted(
|
||||||
|
self.performance_data,
|
||||||
|
key=lambda x: x.get('clicks', 0),
|
||||||
|
reverse=True
|
||||||
|
)[:20]
|
||||||
|
|
||||||
|
# Pages with issues
|
||||||
|
low_ctr = [
|
||||||
|
p for p in self.performance_data
|
||||||
|
if p.get('impressions', 0) > 100 and p.get('ctr', 0) < 0.02
|
||||||
|
]
|
||||||
|
|
||||||
|
low_position = [
|
||||||
|
p for p in self.performance_data
|
||||||
|
if p.get('impressions', 0) > 50 and p.get('position', 0) > 20
|
||||||
|
]
|
||||||
|
|
||||||
|
high_impressions_low_clicks = [
|
||||||
|
p for p in self.performance_data
|
||||||
|
if p.get('impressions', 0) > 500 and p.get('ctr', 0) < 0.01
|
||||||
|
]
|
||||||
|
|
||||||
|
# Keyword opportunities (from GSC data)
|
||||||
|
keyword_opportunities = self._analyze_keywords()
|
||||||
|
|
||||||
|
analysis = {
|
||||||
|
'summary': {
|
||||||
|
'total_pages': total_pages,
|
||||||
|
'total_pageviews': total_pageviews,
|
||||||
|
'total_clicks': total_clicks,
|
||||||
|
'total_impressions': total_impressions,
|
||||||
|
'average_ctr': avg_ctr,
|
||||||
|
'average_position': avg_position
|
||||||
|
},
|
||||||
|
'top_pages': {
|
||||||
|
'by_views': top_by_views,
|
||||||
|
'by_clicks': top_by_clicks
|
||||||
|
},
|
||||||
|
'issues': {
|
||||||
|
'low_ctr': low_ctr,
|
||||||
|
'low_position': low_position,
|
||||||
|
'high_impressions_low_clicks': high_impressions_low_clicks
|
||||||
|
},
|
||||||
|
'keyword_opportunities': keyword_opportunities,
|
||||||
|
'recommendations': self._generate_recommendations(analysis)
|
||||||
|
}
|
||||||
|
|
||||||
|
# Log summary
|
||||||
|
logger.info(f"Total pages analyzed: {total_pages}")
|
||||||
|
logger.info(f"Total pageviews: {total_pageviews}")
|
||||||
|
logger.info(f"Total clicks: {total_clicks}")
|
||||||
|
logger.info(f"Total impressions: {total_impressions}")
|
||||||
|
logger.info(f"Average CTR: {avg_ctr:.2%}")
|
||||||
|
logger.info(f"Average position: {avg_position:.1f}")
|
||||||
|
logger.info(f"\nPages with low CTR: {len(low_ctr)}")
|
||||||
|
logger.info(f"Pages with low position: {len(low_position)}")
|
||||||
|
logger.info(f"High impression, low click pages: {len(high_impressions_low_clicks)}")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
self.analysis_results = analysis
|
||||||
|
return analysis
|
||||||
|
|
||||||
|
def _analyze_keywords(self) -> List[Dict]:
|
||||||
|
"""Analyze keyword opportunities from GSC data."""
|
||||||
|
keywords = {}
|
||||||
|
|
||||||
|
for page in self.performance_data:
|
||||||
|
query = page.get('query', '')
|
||||||
|
if not query:
|
||||||
|
continue
|
||||||
|
|
||||||
|
if query not in keywords:
|
||||||
|
keywords[query] = {
|
||||||
|
'query': query,
|
||||||
|
'clicks': 0,
|
||||||
|
'impressions': 0,
|
||||||
|
'position': 0.0,
|
||||||
|
'pages': []
|
||||||
|
}
|
||||||
|
|
||||||
|
keywords[query]['clicks'] += page.get('clicks', 0)
|
||||||
|
keywords[query]['impressions'] += page.get('impressions', 0)
|
||||||
|
keywords[query]['pages'].append(page.get('page', ''))
|
||||||
|
|
||||||
|
# Calculate average position per keyword
|
||||||
|
for query in keywords:
|
||||||
|
positions = [
|
||||||
|
p.get('position', 0) for p in self.performance_data
|
||||||
|
if p.get('query') == query
|
||||||
|
]
|
||||||
|
if positions:
|
||||||
|
keywords[query]['position'] = sum(positions) / len(positions)
|
||||||
|
|
||||||
|
# Sort by impressions
|
||||||
|
keyword_list = list(keywords.values())
|
||||||
|
keyword_list.sort(key=lambda x: x['impressions'], reverse=True)
|
||||||
|
|
||||||
|
# Filter opportunities (position 5-20, high impressions)
|
||||||
|
opportunities = [
|
||||||
|
k for k in keyword_list
|
||||||
|
if 5 <= k['position'] <= 20 and k['impressions'] > 100
|
||||||
|
]
|
||||||
|
|
||||||
|
return opportunities[:50] # Top 50 opportunities
|
||||||
|
|
||||||
|
def _generate_recommendations(self, analysis: Dict) -> List[str]:
|
||||||
|
"""Generate SEO recommendations."""
|
||||||
|
recommendations = []
|
||||||
|
|
||||||
|
issues = analysis.get('issues', {})
|
||||||
|
|
||||||
|
# Low CTR
|
||||||
|
low_ctr_count = len(issues.get('low_ctr', []))
|
||||||
|
if low_ctr_count > 0:
|
||||||
|
recommendations.append(
|
||||||
|
f"📝 {low_ctr_count} pages have low CTR (<2% with 100+ impressions). "
|
||||||
|
"Improve meta titles and descriptions to increase click-through rate."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Low position
|
||||||
|
low_pos_count = len(issues.get('low_position', []))
|
||||||
|
if low_pos_count > 0:
|
||||||
|
recommendations.append(
|
||||||
|
f"📊 {low_pos_count} pages rank beyond position 20. "
|
||||||
|
"Consider content optimization and internal linking."
|
||||||
|
)
|
||||||
|
|
||||||
|
# High impressions, low clicks
|
||||||
|
high_imp_count = len(issues.get('high_impressions_low_clicks', []))
|
||||||
|
if high_imp_count > 0:
|
||||||
|
recommendations.append(
|
||||||
|
f"⚠️ {high_imp_count} pages have 500+ impressions but <1% CTR. "
|
||||||
|
"These are prime candidates for title/description optimization."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Keyword opportunities
|
||||||
|
keyword_count = len(analysis.get('keyword_opportunities', []))
|
||||||
|
if keyword_count > 0:
|
||||||
|
recommendations.append(
|
||||||
|
f"🎯 {keyword_count} keyword opportunities identified (ranking 5-20). "
|
||||||
|
"Focus content optimization on these keywords."
|
||||||
|
)
|
||||||
|
|
||||||
|
return recommendations
|
||||||
|
|
||||||
|
def save_analysis(self, output_file: Optional[str] = None) -> str:
|
||||||
|
"""
|
||||||
|
Save analysis results to CSV.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
output_file: Custom output file path
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Path to saved file
|
||||||
|
"""
|
||||||
|
if not output_file:
|
||||||
|
output_dir = Path(__file__).parent.parent.parent / 'output'
|
||||||
|
output_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||||
|
output_file = output_dir / f'performance_analysis_{timestamp}.csv'
|
||||||
|
|
||||||
|
output_file = Path(output_file)
|
||||||
|
output_file.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
fieldnames = [
|
||||||
|
'page', 'page_title', 'pageviews', 'sessions', 'bounce_rate',
|
||||||
|
'engagement_rate', 'avg_session_duration', 'clicks', 'impressions',
|
||||||
|
'ctr', 'position', 'query'
|
||||||
|
]
|
||||||
|
|
||||||
|
logger.info(f"Saving analysis to {output_file}...")
|
||||||
|
|
||||||
|
with open(output_file, 'w', newline='', encoding='utf-8') as f:
|
||||||
|
writer = csv.DictWriter(f, fieldnames=fieldnames)
|
||||||
|
writer.writeheader()
|
||||||
|
writer.writerows(self.performance_data)
|
||||||
|
|
||||||
|
logger.info(f"✓ Saved to: {output_file}")
|
||||||
|
return str(output_file)
|
||||||
|
|
||||||
|
def run(self, ga4_file: Optional[str] = None,
|
||||||
|
gsc_file: Optional[str] = None,
|
||||||
|
output_file: Optional[str] = None) -> Tuple[str, Dict]:
|
||||||
|
"""
|
||||||
|
Run complete performance analysis.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
ga4_file: Path to GA4 export CSV
|
||||||
|
gsc_file: Path to GSC export CSV
|
||||||
|
output_file: Custom output file path
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (output_file_path, analysis_dict)
|
||||||
|
"""
|
||||||
|
logger.info("\n" + "="*70)
|
||||||
|
logger.info("SEO PERFORMANCE ANALYZER")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
# Load data
|
||||||
|
if ga4_file:
|
||||||
|
self.load_ga4_export(ga4_file)
|
||||||
|
if gsc_file:
|
||||||
|
self.load_gsc_export(gsc_file)
|
||||||
|
|
||||||
|
if not self.performance_data:
|
||||||
|
logger.error("No data loaded. Provide GA4 and/or GSC export files.")
|
||||||
|
return "", {}
|
||||||
|
|
||||||
|
# Analyze
|
||||||
|
analysis = self.analyze()
|
||||||
|
|
||||||
|
# Save
|
||||||
|
output_path = self.save_analysis(output_file)
|
||||||
|
|
||||||
|
return output_path, analysis
|
||||||
494
src/seo/performance_tracker.py
Normal file
494
src/seo/performance_tracker.py
Normal file
@@ -0,0 +1,494 @@
|
|||||||
|
"""
|
||||||
|
SEO Performance Tracker - Google Analytics 4 & Search Console Integration
|
||||||
|
Fetch and analyze page performance data for SEO optimization
|
||||||
|
"""
|
||||||
|
|
||||||
|
import csv
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
from pathlib import Path
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from typing import Dict, List, Optional, Tuple
|
||||||
|
|
||||||
|
# Optional Google imports
|
||||||
|
try:
|
||||||
|
from google.analytics.admin import AnalyticsAdminServiceClient
|
||||||
|
from google.analytics.data import BetaAnalyticsDataClient
|
||||||
|
from google.analytics.data_v1beta.types import (
|
||||||
|
DateRange,
|
||||||
|
Dimension,
|
||||||
|
Metric,
|
||||||
|
RunReportRequest,
|
||||||
|
)
|
||||||
|
from google.oauth2 import service_account
|
||||||
|
from googleapiclient.discovery import build
|
||||||
|
GOOGLE_AVAILABLE = True
|
||||||
|
except ImportError:
|
||||||
|
GOOGLE_AVAILABLE = False
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
logger.warning("Google libraries not installed. API mode disabled. Use CSV imports instead.")
|
||||||
|
|
||||||
|
from .config import Config
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class SEOPerformanceTracker:
|
||||||
|
"""Track and analyze SEO performance from Google Analytics and Search Console."""
|
||||||
|
|
||||||
|
def __init__(self, ga4_credentials: Optional[str] = None,
|
||||||
|
gsc_credentials: Optional[str] = None,
|
||||||
|
ga4_property_id: Optional[str] = None,
|
||||||
|
gsc_site_url: Optional[str] = None):
|
||||||
|
"""
|
||||||
|
Initialize performance tracker.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
ga4_credentials: Path to GA4 service account JSON
|
||||||
|
gsc_credentials: Path to GSC service account JSON
|
||||||
|
ga4_property_id: GA4 property ID (e.g., "properties/123456789")
|
||||||
|
gsc_site_url: GSC site URL (e.g., "https://www.mistergeek.net")
|
||||||
|
"""
|
||||||
|
self.ga4_credentials = ga4_credentials or Config.GA4_CREDENTIALS
|
||||||
|
self.gsc_credentials = gsc_credentials or Config.GSC_CREDENTIALS
|
||||||
|
self.ga4_property_id = ga4_property_id or Config.GA4_PROPERTY_ID
|
||||||
|
self.gsc_site_url = gsc_site_url or Config.GSC_SITE_URL
|
||||||
|
|
||||||
|
self.ga4_client = None
|
||||||
|
self.gsc_service = None
|
||||||
|
|
||||||
|
# Initialize clients
|
||||||
|
self._init_ga4_client()
|
||||||
|
self._init_gsc_service()
|
||||||
|
|
||||||
|
self.performance_data = []
|
||||||
|
|
||||||
|
def _init_ga4_client(self):
|
||||||
|
"""Initialize Google Analytics 4 client."""
|
||||||
|
if not GOOGLE_AVAILABLE:
|
||||||
|
logger.warning("Google libraries not installed. API mode disabled.")
|
||||||
|
return
|
||||||
|
|
||||||
|
if not self.ga4_credentials or not self.ga4_property_id:
|
||||||
|
logger.warning("GA4 credentials not configured")
|
||||||
|
return
|
||||||
|
|
||||||
|
try:
|
||||||
|
credentials = service_account.Credentials.from_service_account_file(
|
||||||
|
self.ga4_credentials,
|
||||||
|
scopes=["https://www.googleapis.com/auth/analytics.readonly"]
|
||||||
|
)
|
||||||
|
self.ga4_client = BetaAnalyticsDataClient(credentials=credentials)
|
||||||
|
logger.info("✓ GA4 client initialized")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to initialize GA4 client: {e}")
|
||||||
|
self.ga4_client = None
|
||||||
|
|
||||||
|
def _init_gsc_service(self):
|
||||||
|
"""Initialize Google Search Console service."""
|
||||||
|
if not GOOGLE_AVAILABLE:
|
||||||
|
logger.warning("Google libraries not installed. API mode disabled.")
|
||||||
|
return
|
||||||
|
|
||||||
|
if not self.gsc_credentials:
|
||||||
|
logger.warning("GSC credentials not configured")
|
||||||
|
return
|
||||||
|
|
||||||
|
try:
|
||||||
|
credentials = service_account.Credentials.from_service_account_file(
|
||||||
|
self.gsc_credentials,
|
||||||
|
scopes=["https://www.googleapis.com/auth/webmasters.readonly"]
|
||||||
|
)
|
||||||
|
self.gsc_service = build('webmasters', 'v3', credentials=credentials)
|
||||||
|
logger.info("✓ GSC service initialized")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to initialize GSC service: {e}")
|
||||||
|
self.gsc_service = None
|
||||||
|
|
||||||
|
def fetch_ga4_data(self, start_date: str, end_date: str,
|
||||||
|
dimensions: Optional[List[str]] = None) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Fetch data from Google Analytics 4.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
start_date: Start date (YYYY-MM-DD)
|
||||||
|
end_date: End date (YYYY-MM-DD)
|
||||||
|
dimensions: List of dimensions to fetch
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of performance data dicts
|
||||||
|
"""
|
||||||
|
if not self.ga4_client:
|
||||||
|
logger.warning("GA4 client not available")
|
||||||
|
return []
|
||||||
|
|
||||||
|
logger.info(f"Fetching GA4 data from {start_date} to {end_date}...")
|
||||||
|
|
||||||
|
# Default dimensions
|
||||||
|
if dimensions is None:
|
||||||
|
dimensions = ['pagePath', 'pageTitle']
|
||||||
|
|
||||||
|
# Default metrics
|
||||||
|
metrics = [
|
||||||
|
'screenPageViews',
|
||||||
|
'sessions',
|
||||||
|
'bounceRate',
|
||||||
|
'averageSessionDuration',
|
||||||
|
'engagementRate'
|
||||||
|
]
|
||||||
|
|
||||||
|
try:
|
||||||
|
request = RunReportRequest(
|
||||||
|
property=self.ga4_property_id,
|
||||||
|
dimensions=[Dimension(name=dim) for dim in dimensions],
|
||||||
|
metrics=[Metric(name=metric) for metric in metrics],
|
||||||
|
date_ranges=[DateRange(start_date=start_date, end_date=end_date)]
|
||||||
|
)
|
||||||
|
|
||||||
|
response = self.ga4_client.run_report(request)
|
||||||
|
|
||||||
|
data = []
|
||||||
|
for row in response.rows:
|
||||||
|
row_data = {}
|
||||||
|
|
||||||
|
# Extract dimensions
|
||||||
|
for i, dim_header in enumerate(response.dimension_headers):
|
||||||
|
row_data[dim_header.name] = row.dimension_values[i].value
|
||||||
|
|
||||||
|
# Extract metrics
|
||||||
|
for i, metric_header in enumerate(response.metric_headers):
|
||||||
|
value = row.metric_values[i].value
|
||||||
|
# Convert to appropriate type
|
||||||
|
if metric_header.name in ['bounceRate', 'engagementRate']:
|
||||||
|
value = float(value) if value else 0.0
|
||||||
|
elif metric_header.name in ['screenPageViews', 'sessions']:
|
||||||
|
value = int(value) if value else 0
|
||||||
|
elif metric_header.name == 'averageSessionDuration':
|
||||||
|
value = float(value) if value else 0.0
|
||||||
|
row_data[metric_header.name] = value
|
||||||
|
|
||||||
|
data.append(row_data)
|
||||||
|
|
||||||
|
logger.info(f"✓ Fetched {len(data)} rows from GA4")
|
||||||
|
return data
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error fetching GA4 data: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def fetch_gsc_data(self, start_date: str, end_date: str,
|
||||||
|
dimensions: Optional[List[str]] = None) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Fetch data from Google Search Console.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
start_date: Start date (YYYY-MM-DD)
|
||||||
|
end_date: End date (YYYY-MM-DD)
|
||||||
|
dimensions: List of dimensions to fetch
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of performance data dicts
|
||||||
|
"""
|
||||||
|
if not self.gsc_service:
|
||||||
|
logger.warning("GSC service not available")
|
||||||
|
return []
|
||||||
|
|
||||||
|
logger.info(f"Fetching GSC data from {start_date} to {end_date}...")
|
||||||
|
|
||||||
|
# Default dimensions
|
||||||
|
if dimensions is None:
|
||||||
|
dimensions = ['page']
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Build request
|
||||||
|
request = {
|
||||||
|
'startDate': start_date,
|
||||||
|
'endDate': end_date,
|
||||||
|
'dimensions': dimensions,
|
||||||
|
'rowLimit': 5000,
|
||||||
|
'startRow': 0
|
||||||
|
}
|
||||||
|
|
||||||
|
response = self.gsc_service.searchanalytics().query(
|
||||||
|
siteUrl=self.gsc_site_url,
|
||||||
|
body=request
|
||||||
|
).execute()
|
||||||
|
|
||||||
|
data = []
|
||||||
|
if 'rows' in response:
|
||||||
|
for row in response['rows']:
|
||||||
|
row_data = {
|
||||||
|
'page': row['keys'][0] if len(row['keys']) > 0 else '',
|
||||||
|
'clicks': row.get('clicks', 0),
|
||||||
|
'impressions': row.get('impressions', 0),
|
||||||
|
'ctr': row.get('ctr', 0.0),
|
||||||
|
'position': row.get('position', 0.0)
|
||||||
|
}
|
||||||
|
|
||||||
|
# Add query if available
|
||||||
|
if len(row['keys']) > 1:
|
||||||
|
row_data['query'] = row['keys'][1]
|
||||||
|
|
||||||
|
data.append(row_data)
|
||||||
|
|
||||||
|
logger.info(f"✓ Fetched {len(data)} rows from GSC")
|
||||||
|
return data
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error fetching GSC data: {e}")
|
||||||
|
return []
|
||||||
|
|
||||||
|
def fetch_combined_data(self, start_date: str, end_date: str) -> List[Dict]:
|
||||||
|
"""
|
||||||
|
Fetch and combine data from GA4 and GSC.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
start_date: Start date (YYYY-MM-DD)
|
||||||
|
end_date: End date (YYYY-MM-DD)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
List of combined performance data dicts
|
||||||
|
"""
|
||||||
|
logger.info("\n" + "="*70)
|
||||||
|
logger.info("FETCHING PERFORMANCE DATA")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
# Fetch from both sources
|
||||||
|
ga4_data = self.fetch_ga4_data(start_date, end_date)
|
||||||
|
gsc_data = self.fetch_gsc_data(start_date, end_date)
|
||||||
|
|
||||||
|
# Combine data by page path
|
||||||
|
combined = {}
|
||||||
|
|
||||||
|
# Add GA4 data
|
||||||
|
for row in ga4_data:
|
||||||
|
page_path = row.get('pagePath', '')
|
||||||
|
combined[page_path] = {
|
||||||
|
'page': page_path,
|
||||||
|
'page_title': row.get('pageTitle', ''),
|
||||||
|
'pageviews': row.get('screenPageViews', 0),
|
||||||
|
'sessions': row.get('sessions', 0),
|
||||||
|
'bounce_rate': row.get('bounceRate', 0.0),
|
||||||
|
'avg_session_duration': row.get('averageSessionDuration', 0.0),
|
||||||
|
'engagement_rate': row.get('engagementRate', 0.0),
|
||||||
|
'clicks': 0,
|
||||||
|
'impressions': 0,
|
||||||
|
'ctr': 0.0,
|
||||||
|
'position': 0.0
|
||||||
|
}
|
||||||
|
|
||||||
|
# Merge GSC data
|
||||||
|
for row in gsc_data:
|
||||||
|
page_path = row.get('page', '')
|
||||||
|
|
||||||
|
if page_path in combined:
|
||||||
|
# Update existing record
|
||||||
|
combined[page_path]['clicks'] = row.get('clicks', 0)
|
||||||
|
combined[page_path]['impressions'] = row.get('impressions', 0)
|
||||||
|
combined[page_path]['ctr'] = row.get('ctr', 0.0)
|
||||||
|
combined[page_path]['position'] = row.get('position', 0.0)
|
||||||
|
else:
|
||||||
|
# Create new record
|
||||||
|
combined[page_path] = {
|
||||||
|
'page': page_path,
|
||||||
|
'page_title': '',
|
||||||
|
'pageviews': 0,
|
||||||
|
'sessions': 0,
|
||||||
|
'bounce_rate': 0.0,
|
||||||
|
'avg_session_duration': 0.0,
|
||||||
|
'engagement_rate': 0.0,
|
||||||
|
'clicks': row.get('clicks', 0),
|
||||||
|
'impressions': row.get('impressions', 0),
|
||||||
|
'ctr': row.get('ctr', 0.0),
|
||||||
|
'position': row.get('position', 0.0)
|
||||||
|
}
|
||||||
|
|
||||||
|
self.performance_data = list(combined.values())
|
||||||
|
|
||||||
|
logger.info(f"✓ Combined {len(self.performance_data)} pages")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
return self.performance_data
|
||||||
|
|
||||||
|
def analyze_performance(self) -> Dict:
|
||||||
|
"""
|
||||||
|
Analyze performance data and generate insights.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Analysis results dict
|
||||||
|
"""
|
||||||
|
if not self.performance_data:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
logger.info("\n" + "="*70)
|
||||||
|
logger.info("PERFORMANCE ANALYSIS")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
# Calculate metrics
|
||||||
|
total_pageviews = sum(p.get('pageviews', 0) for p in self.performance_data)
|
||||||
|
total_clicks = sum(p.get('clicks', 0) for p in self.performance_data)
|
||||||
|
total_impressions = sum(p.get('impressions', 0) for p in self.performance_data)
|
||||||
|
|
||||||
|
avg_ctr = total_clicks / total_impressions if total_impressions > 0 else 0
|
||||||
|
avg_position = sum(p.get('position', 0) for p in self.performance_data) / len(self.performance_data)
|
||||||
|
|
||||||
|
# Top pages by pageviews
|
||||||
|
top_pages = sorted(
|
||||||
|
self.performance_data,
|
||||||
|
key=lambda x: x.get('pageviews', 0),
|
||||||
|
reverse=True
|
||||||
|
)[:10]
|
||||||
|
|
||||||
|
# Top pages by CTR
|
||||||
|
top_ctr = sorted(
|
||||||
|
[p for p in self.performance_data if p.get('impressions', 0) > 100],
|
||||||
|
key=lambda x: x.get('ctr', 0),
|
||||||
|
reverse=True
|
||||||
|
)[:10]
|
||||||
|
|
||||||
|
# Pages needing improvement (low CTR)
|
||||||
|
low_ctr = [
|
||||||
|
p for p in self.performance_data
|
||||||
|
if p.get('impressions', 0) > 100 and p.get('ctr', 0) < 0.02
|
||||||
|
]
|
||||||
|
|
||||||
|
# Pages with good traffic but low position
|
||||||
|
opportunity_pages = [
|
||||||
|
p for p in self.performance_data
|
||||||
|
if p.get('pageviews', 0) > 50 and p.get('position', 0) > 10
|
||||||
|
]
|
||||||
|
|
||||||
|
analysis = {
|
||||||
|
'summary': {
|
||||||
|
'total_pages': len(self.performance_data),
|
||||||
|
'total_pageviews': total_pageviews,
|
||||||
|
'total_clicks': total_clicks,
|
||||||
|
'total_impressions': total_impressions,
|
||||||
|
'average_ctr': avg_ctr,
|
||||||
|
'average_position': avg_position
|
||||||
|
},
|
||||||
|
'top_pages': top_pages,
|
||||||
|
'top_ctr': top_ctr,
|
||||||
|
'low_ctr': low_ctr,
|
||||||
|
'opportunities': opportunity_pages,
|
||||||
|
'recommendations': self._generate_recommendations(analysis)
|
||||||
|
}
|
||||||
|
|
||||||
|
# Log summary
|
||||||
|
logger.info(f"Total pages: {analysis['summary']['total_pages']}")
|
||||||
|
logger.info(f"Total pageviews: {analysis['summary']['total_pageviews']}")
|
||||||
|
logger.info(f"Total clicks: {analysis['summary']['total_clicks']}")
|
||||||
|
logger.info(f"Average CTR: {analysis['summary']['average_ctr']:.2%}")
|
||||||
|
logger.info(f"Average position: {analysis['summary']['average_position']:.1f}")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
return analysis
|
||||||
|
|
||||||
|
def _generate_recommendations(self, analysis: Dict) -> List[str]:
|
||||||
|
"""Generate SEO recommendations based on analysis."""
|
||||||
|
recommendations = []
|
||||||
|
|
||||||
|
# Low CTR recommendations
|
||||||
|
low_ctr_count = len(analysis.get('low_ctr', []))
|
||||||
|
if low_ctr_count > 0:
|
||||||
|
recommendations.append(
|
||||||
|
f"📝 {low_ctr_count} pages have low CTR (<2%). "
|
||||||
|
"Consider improving meta titles and descriptions."
|
||||||
|
)
|
||||||
|
|
||||||
|
# Position opportunities
|
||||||
|
opportunity_count = len(analysis.get('opportunities', []))
|
||||||
|
if opportunity_count > 0:
|
||||||
|
recommendations.append(
|
||||||
|
f"🎯 {opportunity_count} pages have good traffic but rank >10. "
|
||||||
|
"Optimize content to improve rankings."
|
||||||
|
)
|
||||||
|
|
||||||
|
# High impressions, low clicks
|
||||||
|
high_impressions = [
|
||||||
|
p for p in self.performance_data
|
||||||
|
if p.get('impressions', 0) > 1000 and p.get('ctr', 0) < 0.01
|
||||||
|
]
|
||||||
|
if high_impressions:
|
||||||
|
recommendations.append(
|
||||||
|
f"⚠️ {len(high_impressions)} pages have high impressions but very low CTR. "
|
||||||
|
"Review title tags for better click appeal."
|
||||||
|
)
|
||||||
|
|
||||||
|
return recommendations
|
||||||
|
|
||||||
|
def save_to_csv(self, output_file: Optional[str] = None) -> str:
|
||||||
|
"""
|
||||||
|
Save performance data to CSV.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
output_file: Custom output file path
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Path to saved file
|
||||||
|
"""
|
||||||
|
if not output_file:
|
||||||
|
output_dir = Path(__file__).parent.parent.parent / 'output'
|
||||||
|
output_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||||
|
output_file = output_dir / f'performance_data_{timestamp}.csv'
|
||||||
|
|
||||||
|
output_file = Path(output_file)
|
||||||
|
output_file.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
fieldnames = [
|
||||||
|
'page', 'page_title', 'pageviews', 'sessions', 'bounce_rate',
|
||||||
|
'avg_session_duration', 'engagement_rate', 'clicks', 'impressions',
|
||||||
|
'ctr', 'position'
|
||||||
|
]
|
||||||
|
|
||||||
|
logger.info(f"Saving {len(self.performance_data)} rows to {output_file}...")
|
||||||
|
|
||||||
|
with open(output_file, 'w', newline='', encoding='utf-8') as f:
|
||||||
|
writer = csv.DictWriter(f, fieldnames=fieldnames)
|
||||||
|
writer.writeheader()
|
||||||
|
writer.writerows(self.performance_data)
|
||||||
|
|
||||||
|
logger.info(f"✓ Saved to: {output_file}")
|
||||||
|
return str(output_file)
|
||||||
|
|
||||||
|
def run(self, start_date: Optional[str] = None,
|
||||||
|
end_date: Optional[str] = None,
|
||||||
|
output_file: Optional[str] = None) -> Tuple[str, Dict]:
|
||||||
|
"""
|
||||||
|
Run complete performance analysis.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
start_date: Start date (YYYY-MM-DD), default 30 days ago
|
||||||
|
end_date: End date (YYYY-MM-DD), default yesterday
|
||||||
|
output_file: Custom output file path
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (output_file_path, analysis_dict)
|
||||||
|
"""
|
||||||
|
# Default date range (last 30 days)
|
||||||
|
if not end_date:
|
||||||
|
end_date = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
|
||||||
|
if not start_date:
|
||||||
|
start_date = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%d')
|
||||||
|
|
||||||
|
logger.info("\n" + "="*70)
|
||||||
|
logger.info("SEO PERFORMANCE ANALYSIS")
|
||||||
|
logger.info("="*70)
|
||||||
|
logger.info(f"Date range: {start_date} to {end_date}")
|
||||||
|
logger.info("="*70)
|
||||||
|
|
||||||
|
# Fetch data
|
||||||
|
self.fetch_combined_data(start_date, end_date)
|
||||||
|
|
||||||
|
if not self.performance_data:
|
||||||
|
logger.warning("No performance data available")
|
||||||
|
return "", {}
|
||||||
|
|
||||||
|
# Analyze
|
||||||
|
analysis = self.analyze_performance()
|
||||||
|
|
||||||
|
# Save
|
||||||
|
output_path = self.save_to_csv(output_file)
|
||||||
|
|
||||||
|
return output_path, analysis
|
||||||
1007
src/seo/post_migrator.py
Normal file
1007
src/seo/post_migrator.py
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user