Best way to upload & download documents using SAP ...

sourav-das · ‎11-30-2023

Let's test and define one of the best ways to upload and download documents using the SAP Document Management Service,

Document Upload to Server usually takes up memory and disk space so let's see if we can optimize on some of the things required.

The Solution

Let's build a Node.js server & compute the difference between the different approaches of uploading and downloading files

Why Node.js ?

Node.js is built on an event-driven, non-blocking I/O model, making it particularly suitable for handling concurrent I/O operations. This asynchronous nature allows Node.js to efficiently handle a large number of concurrent connections without the overhead of threads, making it well-suited for applications requiring for high scalability.

We will be using the SAP Document Management Service Client mentioned in this blog Node Js Client

Node Server

const REPOSITORY_ID = "com.demo.test.sdm";



let sm = new CmisSessionManager(sdmCredentials);

// create Repository if not there

// await sm.createRepositoryIfNotExists(REPOSITORY_ID, "provider", {});





const app = express();

const port = 3000;

app.use(cors());



app.get('/', (req, res) => {

    res.send('This is a test Server!');

});



app.listen(port, () => {

    console.log(`Server is running on port ${port}`);

});

This is a simple node.js server that accepts incoming requests.

Download (Stream)

Let's create a download API, that will receive a object path from client and return the document to the client, but the catch is that the process should be very memory efficient.

app.get('/download', async (req, res) => {

    // create or reuse cmis session

    let session = await sm.getOrCreateConnection(REPOSITORY_ID, "provider");

    // get object by path req.param.objectPath = "/temp/doc.pdf"

    let obj = await session.getObjectByPath(req.param.objectPath);

    // get the content Stream

    let result = await session.getContentStream(obj.succinctProperties["cmis:objectId"]);

    

    // update the needed Headers

    res.header('Content-Type', obj.succinctProperties["cmis:contentStreamMimeType"]);

    res.header('Content-Length', obj.succinctProperties["cmis:contentStreamLength"]);

    res.header('Content-Disposition', `attachment; filename="${obj.succinctProperties["cmis:name"]}"`);

    

    // pipe the doc store response to the client

    result.body.pipe(res);

});

The above approach has the following advantages:

1.Memory Efficiency: Streams process data in smaller chunks, allowing you to handle files larger than available memory.

2.Scalability: When dealing with multiple streams of data or concurrent I/O operations, using streams enables efficient handling of concurrent requests without blocking other operations.

3.Reduced Latency: Streams can start processing data as soon as they receive the initial chunks, reducing overall latency.

Upload (using Multer)

Let's create a upload API using Multer, where the client will upload a single file and the server will upload the single file to DMS, but we have to do in the most memory efficient way.

// Set up storage for uploaded files

const storage = multer.diskStorage({

    destination: function (req, file, cb) {

        cb(null, 'uploads/');

    }

});



// Create multer instance with the storage configuration

const upload = multer({storage: storage})



app.post('/upload', upload.single('file'), async (req, res) => {

    if (req.file) {

        // create or reuse DMS session

        let session = await sm.getOrCreateConnection(REPOSITORY_ID, "provider");

        // multer collects a file parts and returns temp file path.

        // let's create a read stream from that path

        let readStream = fs.createReadStream(req.file.path);

        // upload the file to DMS

        let response = await session.createDocumentFromStream("/temp", readStream, req.file.originalname)



        res.status(200).end(response.data);

    } else {

        res.status(400).end('Uploaded File not found.');

    }

});

In this approach we are using Multer to parse the multipart form data, where the file is getting uploaded.As we are using disk storage options, the uploaded file will get cached in the server's disk and the Multer will pass a file reference along with REQ for the controller to use.Next, we will stream the file to the DMS and storing it there.

Results

Results: 



Total Data Uploaded : 2 GB

No of files : 10 txt files

Time Taken: 12 mins

Total Memory Used: 3.8 GB 

Served Disk Used : 2.1 GB

Some advantages & disadvantages are:

Advantage:

Server-Side Validation: With Multer, developers can perform server-side validation on uploaded files.

Network Faults: removes network faults that can happend from client side. server has the entire file and then it gets uploaded to DMS.

Ease of Handling File Uploads: Multer simplifies the process of handling multipart/form-data uploads in Node.js.

Disadvantages

Server-Side Resource Consumption: When using Multer with disk storage, uploaded files are temporarily stored on the server's disk before processing. This might consume server disk space, especially when dealing with large or numerous uploads.

Handling Large Files: Multer might encounter limitations when handling very large files that exceed server memory or disk space.

This approach is only good when you are prototyping for MVP or small files.

Upload (using DMS Append)

Lets create another upload API where we try to remove the Multer disk & memory limitation.

app.post('/upload-optimised', async (req, res) => {



    // get the file metadata from custom headers.

    const fileName = req.headers["cs-filename"];

    const opType = req.headers["cs-operation"];

    const mimeType = req.headers["content-type"];



    // create or reuse the DMS session

    let session = await sm.getOrCreateConnection(REPOSITORY_ID, "provider");

    let response = {success: "false"};



    // if operation is "create" then create the document in DMS with initial chuck

    if (opType === "create") {

        // create a document from the response stream

        response = await session.createDocumentFromStream("/temp", req, fileName);

    }



    // if operation is "append" then append the content an existing file

    if (opType === "append") {

        const obj = await session.getObjectByPath("/temp/" + fileName);

        // get the object id from the object path.

        const objId = obj.succinctProperties["cmis:objectId"];

        // append the content to the previously created file.

        response = await session.appendContentFromStream(objId, req);

    }



    res.json(response);

});

In this approach, we are appending the file content over multiple HTTP request, because of this a custom client handling is required.

This is a HTML form that uploads a file to our server, but it manually breaks the code into chunks and uses append functionality.

<html>

<head>

    <title>TEST</title>

</head>

<body>

<h1>Upload a File</h1>

<div>

    <input id="uploadFile" type="file" name="fileInput">

    <input value="Upload" onclick="uploadFile(this)">

</div>

<script>



    // trigger when upload button is clicked

    function uploadFile(event) {

        let elementById = document.getElementById("uploadFile");

        // get the selected file

        const file = elementById.files[0];

        if (file) {

            // read the file content and upload in chunks

            const reader = new FileReader();

            reader.onload = function (event) {

                const contents = event.target.result;

                console.log('File contents:', contents.length);

                uploadFileInChunks(file, contents);

            };

            reader.readAsText(file);

        }

    }



    async function uploadFileInChunks(file, content) {

        // Specify your desired chunk size

        const chunkSize = 1024;



        // console.log(content);



        // total no of chunks to be uploaded, may be created a progress bar

        const totalChunks = Math.ceil(content.length / chunkSize);



        for (let i = 0; i < totalChunks; i++) {

            // calculate start of the chunk

            const start = i * chunkSize;

            // calculate end of chuck

            const end = Math.min(start + chunkSize, content.length);

            // get the chunk from the entire content

            const chunk = content.slice(start, end);

            // Process the chunk

            console.log('Chunk', i + 1, 'of', totalChunks, ':', chunk); 

            // create if first chunk or else append

            const operation = i === 0 ? "create" : "append";



            const myHeaders = new Headers();

            myHeaders.append("cs-filename", file.name);

            myHeaders.append("cs-operation", operation);

            myHeaders.append("Content-Type", file.type);



            const requestOptions = {

                method: 'POST',

                headers: myHeaders,

                body: chunk,

                redirect: 'follow'

            };



            // upload to the server

            const response = await fetch("http://localhost:3000/upload-optimised/", requestOptions);

            console.log(await response.json());

        }

    }

</script>

</body>

</html>

Results

Results: 



Total Data Uploaded : 2 GB

No of files : 10 txt files

Time Taken: 9 mins

Total Memory Used: 200 Mb

Served Disk Used : 0 bytes

From the results we are very clear about the amount of server resource usage.

This method has its own set of advantages and disadvantages:

Advantages of using the Append Approach:

Support for Large Files: This approach allows uploading large files that may exceed the server's memory or disk space limitations. By breaking the file into chunks.

Resume-able Uploads: If an upload is interrupted due to network issues or other reasons, the append approach facilitates resuming the upload from where it left off.

Disadvantages of using the Append Approach:

Complexity in Client-Side Handling: Implementing the append approach requires significant client-side code to break the file into chunks, manage chunking logic, handle HTTP requests for each chunk, and manage the upload process.

Network Overhead: Breaking the file into chunks and sending multiple HTTP requests introduces additional network overhead compared to a single-file upload.

Loss of Transactional Integrity: Append uploads do not guarantee transactional integrity during the upload process. In case of failure or interruptions, managing the state of uploaded chunks and ensuring the file's consistency may pose challenges.

Limited Compatibility: Not all server-side storage or services may support append-style uploads. Compatibility issues might arise when integrating with certain storage systems or APIs that expect a complete file upload in a single request.

Conclusion

The choice of approach depends on specific project requirements, including file sizes, network conditions, server capabilities, and desired functionalities. Each approach has its strengths and trade-offs, and selecting the most suitable approach should be based on a careful consideration of these factors.