How to migrate SAP BTP – Amazon Web Services(AWS) S3 Bucket files from one system to other system
SAP BTP Object Store is a cloud storage for Store and manage the blobs/objects on SAP BTP ,which involves creation, upload, download, and deletion. This service is specific to the IaaS layer such as Azure Blob Storage, Amazon Web Services(AWS), and Google Cloud Platform.
In this blog post I would like to focus on AWS S3 Bucket
In AWS S3, bucket is a container for objects. An object is a file and any metadata that describes that file.
To store an object in Amazon S3, you create a bucket and then upload the object to the bucket. When the object is in the bucket, you can open it, download it, and move it. When you no longer need an object or a bucket, you can clean up your resources.
I assume you already configured Object Store Instance in your BTP CF Space to use AWS Storage Service
Option 1 : Either Create Object Store Instance in your BTP CF space either using cockpit or CLI
Configure Object Store to use Amazon Simple Storage Service
Option 2 : In MTA Yaml add this Object Store resource as a dependency. When application deployed the AWS Object Store service instance will be created automatically.
In this blog post I would like share “How to migrate AWS S3 Bucket files from one system to other “
Documents are hosted within a S3 Bucket on AWS. To Migrate objects from one S3 bucket to another S3 Bucket or Other Storage follow these options:
1) Using AWS Command Line Interface (AWS CLI) for platform (Mac/ Linux/ Windows)
3) Migration software
#1) Option 1: AWS CLI
What is AWS CLI you can read here https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html
- Install the AWS CLI [Instructions here]
- Verify Installation by executing below command from command prompt /Terminal
- aws –version
3. configure the AWS CLI
- aws configure
In case you are using SAP BTP CF to create AWS S3 object store you can get this credentails from
BTP cockpit > BTP subaccount > Space ->Instances > Instances > ObjectStore –> Credential
You can generate you service key
Enter your access keys (access key ID and secret access key).
Press Enter to skip the default Region and default output options.
For more information about Amazon S3 Region parameters, see AWS service endpoints.
“access_key_id”: “< Access Key ID of the AWS IAM technical user created>”,
“bucket”: “< Name of the AWS S3 bucket provisioned>”,
“region”: “< Region in which the AWS S3 bucket is provisioned>”,
“secret_access_key”: “< Secret Key ID of the AWS IAM technical user created>”,
“username”: AWS IAM user name”
- Download files from S3 Bucket to Local Folder of your machine
- C:\Yourfolder>aws s3 sync s3://EXAMPLE-BUCKET-SOURCE/ .
- Upload from Local directory to BTP AWS S3bucket
#1) As you downloaded to files to your local machine then you can copy/migrate to any system as per your requirement you can write utility program
#2) If you have requirement to upload or migrate files to other BTP space or sub-account AWS S3 bucket, here you need to configure other bucket details just like step -3, in new command prompt (i.e open new session form command prompt )
- C:\Yourfolder>aws configure
- C:\Yourfolder>aws s3 sync . s3://EXAMPLE-BUCKET-TARGET
Here source can be bucket name or local folder (.)
Note : You can also try to Copy the objects between the source and target buckets by running the following sync command using the AWS CLI:
aws s3 sync s3://DOC-EXAMPLE-BUCKET-SOURCE s3://DOC-EXAMPLE-BUCKET-TARGET
Note: Update the sync command to include your source and target bucket names.
The sync command uses the CopyObject APIs to copy objects between S3 buckets.
6) To List (Display File details on CLI) use below command
- aws s3 ls s3://your-bucket-id
#1) How can I improve the transfer performance of the sync command for Amazon S3?
Multi-threading :You can run multiple, parallel instances of aws s3 cp, aws s3 mv, or aws s3 sync using the AWS CLI
Please follow https://aws.amazon.com/premiumsupport/knowledge-center/s3-improve-transfer-sync-command/
#2)AWS CLI configurations to speed up the data transfer:
Best practices and guidelines for setting additional configuration values for aws s3 transfer commands
multipart_chunksize: This value sets the size of each part that the AWS CLI uploads in a multipart upload for an individual file. This setting allows you to break down a larger file (for example, 300 MB) into smaller parts for quicker upload speeds.
Note: A multipart upload requires that a single file is uploaded in not more than 10,000 distinct parts. You must be sure that the chunksize that you set balances the part file size and the number of parts.
max_concurrent_requests: This value sets the number of requests that can be sent to Amazon S3 at a time. The default value is 10. You can increase it to a higher value like resources on your machine. You must be sure that your machine has enough resources to support the maximum number of concurrent requests that you want.
More details : https://aws.amazon.com/premiumsupport/knowledge-center/move-objects-s3-bucket/
Using the aws s3 ls or aws s3 sync commands (Option # AWS CLI) on large buckets (with 10 million objects or more) can be expensive, resulting in a timeout. If you encounter timeouts because of a large bucket, then consider using Amazon CloudWatch metrics to calculate the size and number of objects in a bucket. Also, consider using S3 Batch Operations to copy the objects.
#2) Option 2: Programmatically
This option is suitable If you have many files in your S3 bucket (more than 10 million objects), then consider using S3 Batch Operations. Custom application might be more efficient at performing a transfer at the scale of hundreds of millions of objects.
You can use S3 Batch Operations to automate the copy process. You can use the following JAVA sample code snippets as a reference to Download an object from S3 Bucket.
For more information, see the documentation on the Amazon Web Services Web site.
You can choose your comfortable programming language of AWS SDK & connect S3 Bucket ,write code to download files from source S3 Bucket & copy to Destination as per your requirement
- Using the AWS SDK for Java
- Using the AWS SDK for .NET
- Using the AWS SDK for PHP and Running PHP Examples
- Using the AWS SDK for Ruby – Version 3
- Using the AWS SDK for Python (Boto)
- Using the AWS Mobile SDKs for iOS and Android
Recommendations from AWS as below
- To copy objects across AWS accounts, set up the correct cross-account permissions on the bucket and the relevant AWS Identity and Access Management (IAM) role.
- If you’re using AWS CLI version 2 to copy objects across buckets, then your IAM role must also have proper permissions. Make sure that your IAM role can access s3:GetObjectTagging for source objects and s3:PutObjectTagging for destination objects.
- To increase the performance of the sync process, tune the AWS CLI to use a higher concurrence. You can also split sync commands for different prefixes to optimize your S3 bucket performance. For more information about optimizing the performance of your workload, see Best practices design patterns: Optimizing Amazon S3 performance.
#3) Option 3: Migration Software
FYI : As mentioned in documentation you can try other option “Use S3DistCp with Amazon EMR”
I have not tried this option due to additional cost .Be sure to review Amazon EMR pricing.
I was using the Option 1: AWS CLI
But had 2 issues:
#1). Regarding AWS S3 Sync does not sync all files
I have not tried all parameters
--size-only (boolean) Makes the size of each key the only criteria used to decide whether to sync from source to destination.
--exact-timestamps (boolean) When syncing from S3 to local, same-sized items will be ignored only when the timestamps match exactly. The default behavior is to ignore same-sized items unless the local version is newer than the S3 version
If you don't need all the files locally, you could delete them after some time (48 hours?). This means less files will need to be compared. By default, aws s3 sync will not delete destination files that do not match a local file (but this can be configured via a flag).
You could copy recent files (past 24 hours?) into a different directory and run aws s3 sync from that directory. Then, clear out those files after a successful sync run.
If you have flexibility over the filenames, you could include the date in the filename (eg 2018-03-13-foo.txt) and then use --include and --exclude parameters to only copy files with desired prefixes.
#2) Access issue i have not tried this scenario in this Blog Post
My input is did you explore
or try other parameters