Mirroring My GitHub Repositories

Once upon a time I distrusted all proprietary code, self-hosted everything (even my email), and was generally fairly paranoid about the privacy and security of my data. While I still run a server at home, it's mostly for the convenience of having a local SMB share and a couple of test backends for app development. I now push all of my code to GitHub, where I'm quite confident that it'll be safe for the foreseeable future. This isn't a guarantee though, as GitHub has to deal with a myriad of DMCA takedown requests all the time, legitimate or not. One such case that comes to mind is when youtube-dl was taken down. To GitHub's credit, they reinstated youtube-dl, and then went as far as to fund a developer defense fund and publicy share all of the DMCA takedown notices that they receive. youtube-dl was a pretty high-visibility case though, with wide coverage from various news outlets. If any of my repos were to be erroneously taken down, I doubt I'd receive the same support. DMCA takedown notices aren't the only concern either. GitLab suffered a data loss event a couple years back. Accidents happen, I get that. I just don't want to risk them happening to my code.

Experience has taught me that hosts like GitHub and GitLab have far better backup strategies and maintenance policies than I do though, not to mention teams of people dedicated to safeguarding the data, so I am no longer interested in avoiding them completely by running my own git server. I think a fair compromise is to continue to push my code to GitHub and simply maintain a local mirror. As such, I wrote a small Python script to do just that:

#!/usr/bin/env python3

import requests
from os import getcwd,chdir
from os.path import exists
from subprocess import run

repos = requests.get("https://api.github.com/user/repos?per_page=200").json()
for repo in repos:
    name = repo['name']
    url = repo['clone_url']
    dirname = name + '.git'
    if exists(dirname):
        print('updating ' + name + '...')
        run('git remote update', cwd=dirname, shell=True)
    else:
        print('cloning ' + name + '...')
        run('git clone --mirror ' + url, shell=True)

It's quite simple, it just queries your repos, then for each one it'll check if it's already cloned the repo, in which case it'll pull the latest changes, otherwise it'll clone it for the first time. I'm relying on my ~/.netrc file for authentication here, and I've set it up on a cron job to run once per hour.