Skip to Content
Author's profile photo Florian Wiedemann

Perl module sapnwrfc to retrieve data from a huge SAP table with RFC_READ_TABLE

My requirement was to download the data of a big table that contains several million entries to a CSV file. Knowing the function module RFC_READ_TABLE lead me to believe that it would be an easy task. I’m familiar with Perl and the nice cpan module sapnwrfc.

But retrieving millions of data with one RFC call is not a good idea as it will consume all your RAM (in my case 8 GB RAM).

So I searched for a feasible way to retrieve the data. I found what I was looking for – of course on SCN. Former Member proposed a solution for Python for the same requirement in an archived discussion.

Under the motto of “talk is cheap – we want some action!” here my working code:

# Meaning of ROWSKIPS and ROWCOUNT as parameters of function module RFC_READ_TABLE:
#
# For example, ROWSKIPS = 0, ROWCOUNT = 500 fetches first 500 records, 
# then ROWSKIPS = 501, ROWCOUNT = 500 gets next 500 records, and so on. 
# If left at 0, then no chunking is implemented. The maximum value to either of these fields is 999999.
my $RecordsCounter = 1;
my $Iteration = 0;
my $FetchSize = 1000;
my $RowSkips = 0;
my $RowCount = 1000;

# Open RFC connection
my $conn = SAPNW::Rfc->rfc_connect;

# Reference to function module call
my $rd = $conn->function_lookup("RFC_READ_TABLE");

# Reference to later function module call
my $rc;

# Loop to get data out of table in several chunks
while ($RecordsCounter > 0){

    # Calculate the already retrieved rows that need to be skipped
    $RowSkips = $Iteration * $FetchSize;

    # Reference to function module call
    $rc = $rd->create_function_call;

    # Table where data needs to be extracted
    $rc->QUERY_TABLE("/PLMB/AUTH_OBSID");

    # Delimeter between columns
    $rc->DELIMITER("@");

    # Columns to be retrieved
    $rc->FIELDS([ {'FIELDNAME' => 'OBJECT_ID'}, {'FIELDNAME' => 'SID'} ]);

    # SELECT criteria
    $rc->OPTIONS([{'TEXT' => 'OBJ_TYPE = \'PLM_DIR\''}]);

    # Define number of data to be retrieved
    $rc->ROWCOUNT($RowCount);

    # Define number of rows to be skipped that have been retrieved in the previous fetch
    $rc->ROWSKIPS($RowSkips);

    # Function call
    $rc->invoke;

    $Iteration++;

    # Data retrieved        
    if(defined $rc->DATA->[0]){ 

      print "Fetch $Iteration\n";

      foreach my $TableLine ( @{ $rc->DATA } ) {
        print "$TableLine->{WA}\n";
      }

  }

  # No more data to retrieve
  else{

    # Leave loop
    $RecordsCounter = 0;
  }

}

# Disconnect RFC connection
$conn->disconnect;

 

Assigned Tags

      2 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Harald Boeing
      Harald Boeing

      Hi Florian,

      All RFC_READ_TABLE versions that I have seen so far have one major flaw for such a mass extraction (leaving aside issues with alignment or truncation when using non-char-like types) and I don't think that will ever be fixed (check your system if you see the same issue): Downloading a huge table takes often too much time as the ROWSKIPS parameter does not prevent the module from reading the data, it just does not add them to the returned data table DATA. So with each consecutive read you will read and read more data (i.e. if you chunk it by 500 records, first invocation will read 500 records, next will read 1,000 records, but only return 500, then we read 1,500 records and again only return 500, etc.).

      Also, note that the extraction does not impose any ordering, so the actual data returned is not well defined - at least in theory. In practice the database most likely will pick one execution plan and stick to it, so we should get all table rows.

      I previously also had the need to mass extract data and had implemented a Java program using JCo and parallel threads to extract table data based on partitioning the query into packages by some field. E.g. if you have one key, you could divide the key into equal partitions and then query each time for records within a specific interval. Not great though as you need some a priori knowledge with respect to how values are distributed; but it works well is you have nicely distributed values (e.g. one internal number range) and it avoids reading data multiple times.

      Of course all problems disappear if we can actually implement and use our own custom RFC ABAP module, but often we don't have that luxury.

      Regards, harald

      Author's profile photo Ulrich Schmidt
      Ulrich Schmidt

      In addition to the technical shortcomings mentioned by Harald, I would also like to point out the security shortcomings of RFC_READ_TABLE, mainly its lack of proper authorizations checks and its proneness to SQL injection attacks.

      In a productive SAP system, this FM should be disabled for security reasons.