The Palos Publishing Company

Follow Us On The X Platform @PalosPublishing
Categories We Write About

Sync local and cloud folders with Python

Syncing local and cloud folders using Python involves comparing files between a local directory and a cloud storage location, then uploading, downloading, or deleting files to keep both in sync. This is commonly used for backups, file sharing, or cloud storage management.

Below is a detailed guide with an example Python script demonstrating how to sync a local folder with a cloud folder using Google Drive as the cloud storage example. The same principles apply to other cloud services like AWS S3, Dropbox, or OneDrive, with appropriate API changes.


Key Concepts for Syncing

  1. Listing files: Get lists of files in both local and cloud folders.

  2. Comparing files: Identify new, updated, or deleted files by comparing timestamps, hashes, or file sizes.

  3. Uploading: Upload new or updated files from local to cloud.

  4. Downloading: Download new or updated files from cloud to local (if two-way sync).

  5. Deleting: Optionally delete files that exist only in one location.

  6. Conflict resolution: Decide how to handle files changed in both places.


Example: Sync Local Folder with Google Drive Folder

Prerequisites

  • Python 3.x installed

  • google-api-python-client and google-auth-httplib2 libraries (pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib)

  • Google Cloud Project with Drive API enabled

  • OAuth credentials JSON file downloaded (credentials.json)

Step 1: Setup Google Drive API Authentication

python
from google.oauth2.credentials import Credentials from google_auth_oauthlib.flow import InstalledAppFlow from google.auth.transport.requests import Request import os.path SCOPES = ['https://www.googleapis.com/auth/drive'] def authenticate_drive(): creds = None if os.path.exists('token.json'): creds = Credentials.from_authorized_user_file('token.json', SCOPES) if not creds or not creds.valid: if creds and creds.expired and creds.refresh_token: creds.refresh(Request()) else: flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES) creds = flow.run_local_server(port=0) with open('token.json', 'w') as token: token.write(creds.to_json()) return creds

Step 2: List Files in a Google Drive Folder

python
from googleapiclient.discovery import build def list_drive_files(service, folder_id): query = f"'{folder_id}' in parents and trashed=false" results = service.files().list(q=query, fields="files(id, name, modifiedTime, md5Checksum)").execute() return results.get('files', [])

Step 3: Sync Logic

python
import os import hashlib from datetime import datetime def md5sum(filename): hash_md5 = hashlib.md5() with open(filename, "rb") as f: for chunk in iter(lambda: f.read(4096), b""): hash_md5.update(chunk) return hash_md5.hexdigest() def sync_local_to_drive(service, local_folder, drive_folder_id): # List local files with metadata local_files = {} for root, dirs, files in os.walk(local_folder): for file in files: path = os.path.join(root, file) rel_path = os.path.relpath(path, local_folder).replace("\", "/") local_files[rel_path] = { 'path': path, 'md5': md5sum(path), 'mtime': os.path.getmtime(path) } # List drive files metadata drive_files = list_drive_files(service, drive_folder_id) drive_files_map = {f['name']: f for f in drive_files} # Upload or update files to Drive from googleapiclient.http import MediaFileUpload for rel_path, info in local_files.items(): drive_file = drive_files_map.get(rel_path) if drive_file: # Check md5, update if different if 'md5Checksum' not in drive_file or drive_file['md5Checksum'] != info['md5']: print(f"Updating file on Drive: {rel_path}") media = MediaFileUpload(info['path'], resumable=True) service.files().update(fileId=drive_file['id'], media_body=media).execute() else: # Upload new file print(f"Uploading new file to Drive: {rel_path}") file_metadata = { 'name': rel_path, 'parents': [drive_folder_id] } media = MediaFileUpload(info['path'], resumable=True) service.files().create(body=file_metadata, media_body=media).execute() # Optionally delete files from Drive that don't exist locally for drive_file_name in drive_files_map: if drive_file_name not in local_files: print(f"File on Drive not found locally, deleting: {drive_file_name}") service.files().delete(fileId=drive_files_map[drive_file_name]['id']).execute()

Step 4: Running the Sync

python
def main(): local_folder = '/path/to/local/folder' drive_folder_id = 'your-google-drive-folder-id' creds = authenticate_drive() service = build('drive', 'v3', credentials=creds) sync_local_to_drive(service, local_folder, drive_folder_id) if __name__ == '__main__': main()

Notes

  • This example shows a one-way sync (local to cloud). You can add downloading from Drive to local by comparing and downloading files missing or updated in local.

  • For other cloud services like AWS S3, Dropbox, or OneDrive, you’d use their SDKs (boto3, dropbox, msal + requests respectively) but the logic remains similar.

  • Handling large files or many files may require pagination and chunked uploads.

  • Conflict resolution strategies should be implemented if syncing both ways.

  • Using hashes (MD5) is efficient for detecting file changes, but sometimes timestamps may suffice.


If you want, I can provide example scripts for syncing with AWS S3 or Dropbox, or a two-way sync example!

Share this Page your favorite way: Click any app below to share.

Enter your email below to join The Palos Publishing Company Email List

We respect your email privacy

Categories We Write About