summaryrefslogtreecommitdiff
path: root/assets/2026-02-03-yt-sync-improvements-obsolete.md
blob: 0db1161e8c69f7cde1bead923d5b161165b5b5fb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# yt-sync.sh Improvements

## Problem
- Current scan takes 1-2 hours
- Scans 200 videos × 15 channels = 3000 metadata fetches
- Most videos are skipped (already downloaded or too old)
- Cron running hourly causes overlap

## Speed Improvements for yt-dlp

### 1. Break on existing (RECOMMENDED)
```bash
--break-on-existing
```
Stops scanning when it hits a video already in archive. Since playlists are chronological, once we hit an old video, all subsequent are old too.

### 2. Break on date reject
```bash
--break-on-reject
```
Stops when hitting a video outside the --dateafter range. Combined with chronological order, stops at first old video.

### 3. Reduce playlist scan depth
```bash
--playlist-end 50  # Instead of 200
```
Most channels don't post 50 videos in 30 days.

### 4. Track last sync timestamp
Store last successful sync time and use tighter --dateafter:
```bash
LAST_SYNC_FILE="$YOUTUBE_DIR/.last_sync"
if [[ -f "$LAST_SYNC_FILE" ]]; then
    LAST_SYNC=$(cat "$LAST_SYNC_FILE")
    DATE_AFTER="--dateafter $LAST_SYNC"
else
    DATE_AFTER="--dateafter $(date -d '30 days ago' '+%Y%m%d')"
fi
# After successful sync:
date '+%Y%m%d' > "$LAST_SYNC_FILE"
```

### 5. Parallel channel downloads (aggressive)
Use GNU parallel to download multiple channels simultaneously:
```bash
parallel -j 3 yt-dlp [opts] ::: "${CHANNELS[@]}"
```
Risk: More likely to trigger rate limiting.

## Scheduling Options

### Option A: Systemd timer (prevents overlap)
```ini
# ~/.config/systemd/user/yt-sync.timer
[Unit]
Description=YouTube Sync Timer

[Timer]
OnCalendar=*-*-* 00,06,12,18:00:00
Persistent=true

[Install]
WantedBy=timers.target
```

```ini
# ~/.config/systemd/user/yt-sync.service
[Unit]
Description=YouTube Sync

[Service]
Type=oneshot
ExecStart=/home/cjennings/.local/bin/yt-sync.sh all
ExecStartPost=/home/cjennings/.local/bin/yt-sync.sh sync
```

Systemd won't start a new run if previous is still running.

### Option B: Lock file wrapper
```bash
#!/bin/bash
LOCKFILE="/tmp/yt-sync.lock"
exec 200>"$LOCKFILE"
flock -n 200 || { echo "Already running"; exit 1; }
# ... run sync ...
```

### Option C: Longer cron interval
```cron
# Every 4 hours during off-peak
0 0,4,20 * * * /home/cjennings/.local/bin/yt-sync.sh all && yt-sync.sh sync
```

## Recommended Changes

1. Add `--break-on-existing` to YT_OPTS (biggest win)
2. Add `--break-on-reject` to YT_OPTS  
3. Reduce `--playlist-end` to 50
4. Use systemd timer instead of cron
5. Optionally track last sync date for tighter filtering