neurotic.datasets.gdrive

The neurotic.datasets.gdrive module implements a class for downloading files from Google Drive using paths, rather than file IDs or shareable links.

class neurotic.datasets.gdrive.GoogleDriveDownloader(client_secret_file, tokens_file=None, save_tokens=False)[source]

A class for downloading files from Google Drive using paths.

Files can be specified for download using URL-like paths of the form

gdrive://<drive name>/<folder 1>/<…>/<folder N>/<file name>

The “<drive name>” may be “My Drive” for files located in a personal Google Drive, or it may be the name of a Shared Drive that the user has permission to access.

Note that these URL-like paths are not equivalent to ordinary URLs associated with Google Drive files, such as shareable links, which are composed of pseudorandom file IDs and do not reveal anything about the name of the file or the folders containing it.

This class can only download files that are uniquely identifiable by their paths. Google Drive does not require file or folder names to be unique, so two or more files or folders with identical names may coexist in a folder. Such files and folders cannot be distinguished by their paths, so they cannot be downloaded using this class. A download will fail while traversing the file tree if at any step there is more than one folder or file that matches the path.

This class manages access authorization, optionally saving authorization tokens to a file so that the authorization flow does not need to be repeated in the future.

The client_secret_file should be the path to a client secret file in JSON format, obtained from the Google API Console. The Drive API must be enabled for the corresponding client.

If save_tokens=False, the authorization flow (a request via web browser for permission to access Google Drive) will always run the first time a new instance of this class is used, and authorization will not persist after the instance is destroyed. If save_tokens=True and a file path is provided with tokens_file, access/refresh tokens resulting from a successful authorization are stored in the file, and tokens are loaded from the file in the future, so that the authorization flow does not need to be repeated.

GetSharedDrivesList()[source]

Return information about available Shared Drives.

GetUserEmail()[source]

Get the email address for the authorized Google Drive account.

authorize()[source]

Obtain tokens for reading the contents of a Google Drive account.

If save_tokens=True, tokens will be loaded from the tokens_file if possible. If tokens cannot be restored this way, or if the loaded tokens have expired, an authorization flow will be initiated, prompting the user through a web browser to grant read-only privileges to the client associated with the client_secret_file. When the authorization flow completes, if save_tokens=True, the newly created tokens will be stored in the tokens_file for future use.

Authorization is performed automatically when needed, but this method can be called directly to retrieve (and possibly store) tokens without initiating a download.

deauthorize()[source]

Forget tokens and delete the tokens_file. The authorization flow will be required for the next download.

download(gdrive_url, local_file, overwrite_existing=False, show_progress=True, bytes_per_chunk=5242880)[source]

Download a file from Google Drive using a URL-like path beginning with “gdrive://”.

is_authorized()[source]

Get the current authorization state.