Skip to Content
Technical Articles
Author's profile photo Showkath Ali Naseem

How to migrate SAP BTP – Amazon Web Services(AWS) S3 Bucket files from one system to other system

SAP BTP Object Store is a cloud storage for Store and manage the blobs/objects on SAP BTP ,which involves creation, upload, download, and deletion. This service is specific to the IaaS layer such as Azure Blob Storage, Amazon Web Services(AWS), and Google Cloud Platform.

In this blog post I would like to focus on AWS S3 Bucket

In AWS S3, bucket is a container for objects. An object is a file and any metadata that describes that file.

To store an object in Amazon S3, you create a bucket and then upload the object to the bucket. When the object is in the bucket, you can open it, download it, and move it. When you no longer need an object or a bucket, you can clean up your resources.

I assume you already configured Object Store Instance in your BTP CF Space to use AWS Storage Service

Option 1 : Either Create Object Store Instance in your BTP CF space either using cockpit or CLI

Configure Object Store to use Amazon Simple Storage Service

Option 2 :  In MTA Yaml  add this Object Store resource as a dependency. When application deployed the AWS Object Store service instance will be created automatically.

In this blog post I would like share “How to migrate AWS S3 Bucket files from one system to other

Documents are hosted within a S3 Bucket on AWS. To Migrate objects from one S3 bucket to another S3 Bucket or Other Storage follow these options:

1) Using AWS Command Line Interface (AWS CLI) for platform (Mac/ Linux/ Windows)

2)  Programmatically

3)  Migration software

 

#1) Option 1:  AWS CLI

   What is AWS CLI  you can read here https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html

Steps

  1. Install the AWS CLI [Instructions here]
  2. Verify Installation by executing below command from command prompt /Terminal
  • aws –version

3. configure the AWS CLI

  • aws configure

In case you are using SAP BTP CF to create AWS S3 object store you can get this credentails from
BTP cockpit > BTP subaccount > Space ->Instances > Instances > ObjectStore –> Credential
You can generate you service key

Enter your access keys (access key ID and secret access key).

Press Enter to skip the default Region and default output options.

For more information about Amazon S3 Region parameters, see AWS service endpoints.

“credentials”: {

“access_key_id”: “< Access Key ID of the AWS IAM technical user created>”,

“bucket”: “<  Name of the AWS S3 bucket provisioned>”,

“host”: “<region_specific_s3_endpoint>”,

“region”: “< Region in which the AWS S3 bucket is provisioned>”,

“secret_access_key”: “< Secret Key ID of the AWS IAM technical user created>”,

“uri”: “s3://<some_access_key_id>:<some_secret_access_key>@<region_specific_s3_endpoint>/<some_bucket_name>”,

“username”: AWS IAM user name”

}

 

  1. Download files from S3 Bucket to Local Folder of your machine
  • C:\Yourfolder>aws s3 sync s3://EXAMPLE-BUCKET-SOURCE/ .
  1. Upload from Local directory to BTP AWS S3bucket

#1) As you downloaded to files to your local machine then you can copy/migrate to any system           as per your requirement you can write utility program

#2) If you have requirement to upload or migrate files to other BTP space or sub-account AWS           S3 bucket, here you need to configure other bucket details just like step -3, in new command             prompt (i.e open new session form command prompt )

  • C:\Yourfolder>aws configure
  • C:\Yourfolder>aws s3 sync . s3://EXAMPLE-BUCKET-TARGET

Here source can be bucket name or local folder (.)

Note : You can also try to Copy the objects between the source and target buckets by running the following sync command using the AWS CLI:

aws s3 sync s3://DOC-EXAMPLE-BUCKET-SOURCE s3://DOC-EXAMPLE-BUCKET-TARGET

Note: Update the sync command to include your source and target bucket names.

The sync command uses the CopyObject APIs to copy objects between S3 buckets.

6) To List (Display File details on CLI) use below command

 

  • aws s3 ls s3://your-bucket-id

 


 

Advance Concepts

 

#1) How can I improve the transfer performance of the sync command for Amazon S3?

    Multi-threading :You can run multiple, parallel instances of aws s3 cp, aws s3 mv, or aws s3 sync using the AWS CLI

    Please follow https://aws.amazon.com/premiumsupport/knowledge-center/s3-improve-transfer-sync-command/

 

#2)AWS CLI configurations to speed up the data transfer:

Best practices and guidelines for setting additional configuration values for aws s3 transfer commands

https://awscli.amazonaws.com/v2/documentation/api/latest/topic/s3-config.html

multipart_chunksize: This value sets the size of each part that the AWS CLI uploads in a multipart upload for an individual file. This setting allows you to break down a larger file (for example, 300 MB) into smaller parts for quicker upload speeds.

Note: A multipart upload requires that a single file is uploaded in not more than 10,000 distinct parts. You must be sure that the chunksize that you set balances the part file size and the number of parts.

max_concurrent_requests: This value sets the number of requests that can be sent to Amazon S3 at a time. The default value is 10. You can increase it to a higher value like resources on your machine. You must be sure that your machine has enough resources to support the maximum number of concurrent requests that you want.

 

More details : https://aws.amazon.com/premiumsupport/knowledge-center/move-objects-s3-bucket/

 

Limitations  :

Using the aws s3 ls or aws s3 sync commands (Option # AWS CLI) on large buckets (with 10 million objects or more) can be expensive, resulting in a timeout. If you encounter timeouts because of a large bucket, then consider using Amazon CloudWatch metrics to calculate the size and number of objects in a bucket. Also, consider using S3 Batch Operations to copy the objects.

 

#2) Option 2: Programmatically

 

This option is suitable If you have many files in your S3 bucket (more than 10 million objects), then consider using S3 Batch Operations. Custom application might be more efficient at performing a transfer at the scale of hundreds of millions of objects.

You can use S3 Batch Operations to automate the copy process. You can use the following JAVA sample code snippets as a reference to Download an object from S3 Bucket.

https://help.sap.com/viewer/2ee77ef7ea4648f9ab2c54ee3aef0a29/Cloud/en-US/32517ae707c44ad48f635ea6bcbe271a.html

https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObject.html

 

For more information, see the documentation on the Amazon Web Services Web site.

You can choose your comfortable programming language of AWS SDK & connect S3 Bucket ,write code to download files from source S3 Bucket & copy to Destination as per your requirement

 

 

Recommendations from AWS as below

Source: https://aws.amazon.com/premiumsupport/knowledge-center/move-objects-s3-bucket/

 

#3) Option 3:  Migration Software

FYI : As mentioned in documentation you can try other option “Use S3DistCp with Amazon EMR”

I have not tried this option due to additional cost .Be sure to review Amazon EMR pricing.

 

Assigned Tags

      2 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Lakshmi Sankaran
      Lakshmi Sankaran

      Hello All,

      I was using the Option 1:  AWS CLI 

      But had 2 issues:

      1. Source sync to local machines didn't bring in all the files. This i confirmed by checking the properties of the folder in the local machines.
        • How to solve?
          • As per https://github.com/aws/aws-cli/issues/3273 the sync command in option one didn't work without --exact-timestamps. Please use that --exact-timestamps to get exact files out of source!
      2. Destination sync didn't had access to few files because source files had ACL Public.
        • How to solve?
          • use --acl public-read-write in the destination if you face some object are accessible in destination, but source was have in --acl public.

       

      Regards

      Lakshmi

       

       

       

       

      Author's profile photo Showkath Ali Naseem
      Showkath Ali Naseem
      Blog Post Author

      Hi ,

       

      #1). Regarding AWS S3 Sync does not sync all files

      I have not tried all parameters

      • Did you get chance to  refer AWS documentation

      https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

      --size-only (boolean) Makes the size of each key the only criteria used to decide whether to sync from source to destination.

      --exact-timestamps (boolean) When syncing from S3 to local, same-sized items will be ignored only when the timestamps match exactly. The default behavior is to ignore same-sized items unless the local version is newer than the S3 version

       

      • Someone posted workaround for same problem here in this comment

      https://github.com/aws/aws-cli/issues/3273#issuecomment-477015262

       

      • https://stackoverflow.com/questions/49228740/syncing-files-with-aws-s3-sync-that-have-a-minimum-timestamp

        If you don't need all the files locally, you could delete them after some time (48 hours?). This means less files will need to be compared. By default, aws s3 sync will not delete destination files that do not match a local file (but this can be configured via a flag).
        You could copy recent files (past 24 hours?) into a different directory and run aws s3 sync from that directory. Then, clear out those files after a successful sync run.
        If you have flexibility over the filenames, you could include the date in the filename (eg 2018-03-13-foo.txt) and then use --include and --exclude parameters to only copy files with desired prefixes.

       

      #2) Access issue i have not tried this scenario in this Blog Post

       

      My input is did you explore

       

      https://docs.aws.amazon.com/AmazonS3/latest/userguide/about-object-ownership.html

      or try other parameters

      --acl bucket-owner-full-control
      
      Best Wishes,
      Showkath.