BlogNode.js

How to copy an S3 folder with aws-sdk in Node.js

Written by Codemzy on February 9th, 2024

In this blog post, we will create a function to copy a folder or directory in AWS S3. We need to do a bit of extra work to copy objects because folders don't exist in S3 - even if it seems like they do!

This post contains affiliate links. If you use these links, I may earn a commission (at no cost to you). I only recommend products I use myself that solve a specific problem. In this post, you are recommended DigitalOcean Spaces, for S3-compatible storage with affordable and predictable pricing. Get started with a $200 credit when you use my referral.

Folders don't exist in S3, and yet I use folders in S3 all the time. And I know that doesn't make any sense, let me explain.

This is how I like to visualise my folders in S3:

// ❌
📁 projects/
├── 📁 project1/
 ├── 📄 first-file.png
 ├── 📄 second-file.jpg
 └── 📄 third-file.pdf
└── 📁 project2/
 ├── 📄 first-file.pdf
 ├── 📄 second-file.jpg
 └── 📄 third-file.jpg

But S3 is an object storage. It stores objects. And it doesn't store folders. Instead of folders, "projects" and "project1" are just prefixes in a flat file system. Like this:

// ✅
📄 /projects/project1/first-file.png
📄 /projects/project1/second-file.jpg
📄 /projects/project1/third-file.pdf
📄 /projects/project2/first-file.pdf
📄 /projects/project2/second-file.jpg
📄 /projects/project2/third-file.jpg

And that makes folder-related actions, like deleting a folder or copying a folder, very tricky - because folders don't exist!

If folders, don't exist, then how can you copy a folder in S3? Is it even possible?

Well, it is, and it isn't. Since folders don't really exist, there is no CopyFolder or CopyDirectory command in S3. We can't copy a folder to another folder, but we can copy the objects at one prefix and put the copies at another prefix.

*When I use the word folder in S3 going forward, what I really mean is the path before the filename - also known as the prefix.

NodeJS @aws-sdk set up

If you already have @aws-sdk installed and configured, you can skip to the next section. Let's start by using the latest version (v3.x) of AWS SDK.

npm install @aws-sdk/client-s3

Now we will configure it with an S3-compatible service.

I'm a big fan of (and currently use) DigitalOcean Spaces as my S3-compatible object storage provider. So my setup looks like this:

const { S3Client } = require('@aws-sdk/client-s3');

const s3 = new S3Client({
  endpoint: "https://ams3.digitaloceanspaces.com",
  forcePathStyle: false,
  region: "ams3",
  credentials: {
    accessKeyId: process.env.S3_KEY,
    secretAccessKey: process.env.S3_SECRET
  }
});

If you use AWS directly, the setup is similar, but it will look more like this:

const { S3Client } = require('@aws-sdk/client-s3');

const s3 = new S3Client({
  region:'eu-west-1',
  credentials: {
    accessKeyId: process.env.S3_KEY,
    secretAccessKey: process.env.S3_SECRET
  }
});

Don't forget to switch the region to wherever your buckets are located, and pass in your own credentials.

I use DigitalOcean Spaces for S3-compatible object storage because I like the predictable pricing. I'd recommend it to a friend - you can get a $200 credit to try out DigitalOcean Spaces here.

Ok, now we have S3 set up in NodeJS, let's start coding a copyFolder function.

copyFolder function

function copyFolder() {
 // we will add the code here
};

Ok, you wanna copy a folder. But since folders don't exist here's what we are going to do:

- Get a list of all the items at a specific prefix (a.k.a. the "folder")
- Copy each of those items to a new prefix (a.k.a. the new "folder")

So we will start with getting a list of all the items at the prefix - which is our `fromFolder`.

Our `copyFolder` function is going to need to take a few arguments:

- `fromBucket` - the bucket where we are copying the folder from
- `fromLocation` - the path to the folder/prefix
- `toBucket` - in case we want to copy the folder to a different bucket
- `toLocation` - the path for the new folder/prefix

Since we will be providing a few arguments, and one of them `toBucket` will be optional, I'm going to pass an object as a parameter to the  `copyFolder` function so we don't have to worry about the order. 

```js
function copyFolder({ fromBucket, fromLocation, toBucket, toLocation }) {
  // we will add the code here
};

And default the toBucket to fromBucket so if we are copying a folder in the same bucket, we only need to give it the fromBucket argument.

function copyFolder({ fromBucket, fromLocation, toBucket = fromBucket, toLocation }) {
  // we will add the code here
};

Now let's get a list of all the objects we need to copy.

List all the objects at the prefix (folder)

If you read my blog post on how to delete a folder in s3, we're going to start our copyFolder function in much the same way as the deleteFolder() function I created then.

We can use [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html) (the latest version of ListObjects) to get a list of all the objects at the fromLocation (the objects we want to copy).

Since that's an asynchronous function, we will make the copyFolder function async so we can await the response when we send the command to our s3 client (that we set up earlier).

const { S3Client, ListObjectsV2Command } = require('@aws-sdk/client-s3');

// const s3 = new S3Client({ ...

async function copyFolder({ fromBucket, fromLocation, toBucket, toLocation }) {
  const listCommand = new ListObjectsV2Command({
    Bucket: fromBucket,
    Prefix: fromLocation,
  });
  let list = await s3.send(listCommand); // get the list
};

Now we have a list of the objects in the "folder" we want to copy. Let's copy them!

Copy all the objects at the prefix (folder)

To copy an object, we can use the CopyObjectCommand. It will look like this:

const copyCommand = new CopyObjectCommand({
  ACL: 'public-read', // access permissions
  Bucket: toBucket, // the new bucket (if supplied)
  CopySource: `${fromBucket}/${fromObjectKey}`, // the location of the file to be copied
  Key: toObjectKey // the new location
});
await s3.send(copyCommand);

We need to set the ACL even if it is the same as our copied objects, or it will be reset to "private".

When you copy an object, the ACL metadata is not preserved and is set to private by default.

- AWS CopyObjectCommand Input

We can get the Key (which is the full path to an object) for each object by looping through our list.

const fromObjectKeys = list.Contents.map(content => content.Key); // get the existing object keys

Then we can pass each Key to the CopyObjectCommand to tell it what to copy.

We will only get Contents in the response if some items exist in the folder, so we will wrap the CopyObjectCommand in an if statement, to only run if there are files to copy.

async function copyFolder({ fromBucket, fromLocation, toBucket, toLocation }) {
  // list the objects to copy
  const listCommand = new ListObjectsV2Command({
    Bucket: fromBucket,
    Prefix: fromLocation,
  });
  let list = await s3.send(listCommand); // get the list
  // copy the objects
  if (list.KeyCount) { // if items to copy
    const fromObjectKeys = list.Contents.map(content => content.Key); // get the existing object keys
    for (let fromObjectKey of fromObjectKeys) { // loop through items and copy each one
      const toObjectKey = fromObjectKey.replace(fromLocation, toLocation); // replace with the destination in the key
      // copy the file
      const copyCommand = new CopyObjectCommand({
        ACL: 'public-read',
        Bucket: toBucket,
        CopySource: `${fromBucket}/${fromObjectKey}`,
        Key: toObjectKey
      });
      await s3.send(copyCommand);
    }
    return `${fromObjectKeys.length} files copied.`;
  }
};

Make it recursive

Our code works up to a point, but if there are over 1,000 items in your folder, you'll run into problems. And that's because ListObjectsV2 only returns up to 1,000 objects.

You might not expect over 1,000 objects in a folder, but it's better to be safe than sorry! Let's make our copyFolder function recursive just in case.

If ListObjectsV2 returns a NextContinuationToken we know there are more objects to copy. We can wrap all of our code inside a recursiveCopy function, and call it again after we have deleted the first 1,000 files, to fetch the next batch of keys.

async function copyFolder({ fromBucket, fromLocation, toBucket = fromBucket, toLocation }) {
  let count = 0;
  const recursiveCopy = async function(token) {
    const listCommand = new ListObjectsV2Command({
      Bucket: fromBucket,
      Prefix: fromLocation,
      ContinuationToken: token
    });
    let list = await s3.send(listCommand); // get the list
    if (list.KeyCount) { // if items to copy
      const fromObjectKeys = list.Contents.map(content => content.Key); // get the existing object keys
      for (let fromObjectKey of fromObjectKeys) { // loop through items and copy each one
        const toObjectKey = fromObjectKey.replace(fromLocation, toLocation); // replace with the destination in the key
        // copy the file
        const copyCommand = new CopyObjectCommand({
          ACL: 'public-read',
          Bucket: toBucket,
          CopySource: `${fromBucket}/${fromObjectKey}`,
          Key: toObjectKey
        });
        await s3.send(copyCommand);
        count += 1;
      }
    }
    if (list.NextContinuationToken) {
      recursiveCopy(list.NextContinuationToken);
    }
    return `${count} files copied.`;
  };
  return recursiveCopy();
};

I've added a count variable to keep track of the total number of files copied.

And that's it! You can now copy S3 folders from Node.js with the copyFolder function.

let copyResult = await copyFolder({ fromBucket: "my-bucket", fromLocation: "projects/project1", toLocation: "projects/project3" });
console.log(copyResult);
// 3 files copied.

Final Code

const { S3Client, ListObjectsV2Command, DeleteObjectsCommand } = require('@aws-sdk/client-s3');

// s3 client
const s3 = new S3Client({
  region: "your-region",
  credentials: {
    accessKeyId: process.env.S3_KEY,
    secretAccessKey: process.env.S3_SECRET
  }
});

// copies all items in a folder on s3
async function copyFolder({ fromBucket, fromLocation, toBucket = fromBucket, toLocation }) {
  let count = 0;
  const recursiveCopy = async function(token) {
    const listCommand = new ListObjectsV2Command({
      Bucket: fromBucket,
      Prefix: fromLocation,
      ContinuationToken: token
    });
    let list = await s3.send(listCommand); // get the list
    if (list.KeyCount) { // if items to copy
      const fromObjectKeys = list.Contents.map(content => content.Key); // get the existing object keys
      for (let fromObjectKey of fromObjectKeys) { // loop through items and copy each one
        const toObjectKey = fromObjectKey.replace(fromLocation, toLocation); // replace with the destination in the key
        // copy the file
        const copyCommand = new CopyObjectCommand({
          ACL: 'public-read',
          Bucket: toBucket,
          CopySource: `${fromBucket}/${fromObjectKey}`,
          Key: toObjectKey
        });
        await s3.send(copyCommand);
        count += 1;
      }
    }
    if (list.NextContinuationToken) {
      recursiveCopy(list.NextContinuationToken);
    }
    return `${count} files copied.`;
  };
  return recursiveCopy();
};