Using Rclone for SharePoint Shared Files
Rclone is a very powerful, storage-agnostic tool for transferring files across the web. You can copy files to and from roughly 30 different storage providers including S3, FTP, Onedrive, and yes, you can copy files from SharePoint sites using WebDAv.
How To Use Rclone?
This multi-threaded computer program helps to manage data within the cloud. It can perform many different tasks such as sync, crypt, cache, transfer, compress, union, and mount.
However, to automate the downloading of files "Shared" with a SharePoint user on a SharePoint Site, Rclone has a known issue using either onedrive, or webdav configurations.
Related: How Automated Data Analytics Can Improve Your Data Teams Productivity
The problem is our user doesn't own the files shared with them, so Rclone does not show them when you list files.
Let's assume we were given the following SharePoint information:
url = https://myacct.sharepoint.com/sites/MYSITE
user = me@mycompany.com
pass = somepassword
As mentioned you cannot see files that have been shared with this user using Rclone, so the workaround involves using the Microsoft Graph API to list the files and then to re-configure Rclone webdav with the information provided.
To accomplish this we use the Microsoft Graph API Explorer. Click the link to load the explorer, and then sign in as the SharePoint user.
Once logged in, scroll down. On the left under "OneDrive" click on "files shared with me" to see a list of files shared with this user.
The information we need to re-configure Rclone is under the key "remoteItem": weDavUrl
. It should be something like:
https://myacct.sharepoint.com/personal/useremail_domain_suffix/Documents/Path/To/Shared%20Folder/filename.xlsx
myacct
, and useremail_domain_suffix
will change depending on your account name, and the user who shared the file. NOTE: This url is encoded, and you will need to decode it before it can be used with Rclone and Zuar Runner. What is Zuar Runner? Check it out...
Configure Rclone
With the information above you're ready to configure Rclone to copy the file. If you don't have Rclone installed locally install it first. Then pop open a terminal, and type:
rclone config
Follow the instructions here to configure for "WebDav." For the URL, use the webdav link you got in the previous step, excluding everything after Documents/
. In the example above you would use:
. Select "SharePoint" as the vendor, then enter the username and password for your SharePoint user. Leave "Bearer Token" blank.https://myacct.sharepoint.com/personal/useremail_domain_suffix/Documents/
Using Rclone with Zuar Runner
If you're a Zuar Runner user (highly recommended!), you can test your rclone config locally before you upload it.. In a terminal type: rclone lsl sharepoint-webdav:
sharepoint-webdav
is the name of the remote in the config. In this case it will not list any files, but if it runs without an error Rclone is configured properly.
If you already have Rclone installed and configured on your Runner instance download the existing config file to your local machine, and append the new rclone remote to the end of the existing config. Otherwise, just copy the new config to a new file.
cat .config/rclone/rclone.conf
- to view the new remote in your local config file.
With the new remote in the config file, upload the file back to your Runner instance.
Related: The Best Place To Store Your Data
Create a cmd job
Finally, we'll create the job. In your Runner UI, on the bottom left, click on "Add Job" and then choose "Generic." Give the job a name like "[cmd] copy shared file", and edit the following json:
{
"cmd": "rclone copy sharepoint-webdav:Path/To/Shared\\ Folder/filename.xlsx /var/runner/data/ --config /var/runner/data/rclone.conf",
"shell": true
}
NOTE: After creating the job, click the little pencil next to "job type" and change it from io
to cmd
.
sharepoint-webdav
is the name of the Rclone remote (inside [] in the config file).
Path/To/Shared\\ Folder/filename.xlsx
NOTE: the path to the file has been decoded and the space is escaped twice.
Transport, warehouse, transform, model, report & monitor: learn how Zuar Runner gets data flowing from hundreds of potential sources into a single destination for analytics.