Skip to Content

Hana Smart Data Integration – The File Adapter

The SDI File Adapter is one of the adapters preinstalled with every SDI Agent. Its purpose is to make any file, fixed width or delimited, in every character encoding and all formats available to Hana as a virtual table to select from as easy as from any other Hana table..

 

Concept

 

Once the File Adapter is made visible in Hana it provides multiple tables out of the box, e.g. FILEDIRECTORY to get a list of all available files or FILECONTENT with file content as one large BLOB. As this might allow any Hana user with the create-remote-source privilege to read files, there needs to be some security built around.

When deploying the adapter in the agent, a root directory is to be set. The adapter does not allow to reach outside of this root directory, ever. Therefore sensible files like /etc/passwd or C:\Windows cannot be read. Second is some password (or AccessToken as it is labeled) to make sure the person creating the remote source in Hana is allowed to use this adapter.

The administrator installing the agent decides on these settings by deploying the FileAdapter and changing the preferences of this adapter in the AgentConfig tool.

 

/wp-content/uploads/2016/01/fileadapter_preferences_862606.png

As one virtual table specifies the file format definition and multiple files can have the matching structure, there is no 1:1 relationship between files and virtual tables. So what should the adapter return when browsing the list of available virtual tables? To provide this browsing option the adapter does scan a File Format Root Directory for files of name *.cfg and each file contains the format information. This is the second directory to configure.

 

With this information the remote source and the virtual tables can be created and the file content selected from.

 

Remote Source

 

When creating a remote source the root and file format directories are set by default to the directories of the adapter and can be further restricted here. The idea is, maybe one remote source points to the plan data subdirectory, another to the employee data (both sub directories of the adapter root directory) and users are allowed to use either the one or the other remote source. Just in case more fine grained security is needed.

The AccessToken has to match the token entered in the adapter preferences as security measure.

/wp-content/uploads/2016/01/fileadapter_createsource_862732.png

 

Alternatively the remote source can be created via SQL as well

 

CREATE REMOTE SOURCE “FileSource” ADAPTER “FileAdapter” AT LOCATION AGENT “agent_local”

CONFIGURATION ‘<?xml version=”1.0″ encoding=”UTF-8″?>

<ConnectionProperties name=”ConnectionInfo”>

<PropertyEntry name=”rootdir”>/usr/sap/FileAdapter/FileServer</PropertyEntry>

<PropertyEntry name=”fileformatdir”>/usr/sap/FileAdapter/FileformatDefinitions</PropertyEntry>

</ConnectionProperties>’

WITH CREDENTIAL TYPE ‘PASSWORD’ USING

  ‘<CredentialEntry name=”AccessTokenEntry”>

    <password>1234</password>

  </CredentialEntry>’;

 

The built-in tables

 

The following tables are provided by the adapter always. These are supposed to help building applications using the File Adapter.

/wp-content/uploads/2016/01/fileadapter_builtintables_862740.png

 

  • FILEDIRECTORY: List all files of the remote source, including the sub directories. Supports pushdown of LIKE and = on the PATH and NAME column.
    /wp-content/uploads/2016/01/fileadapter_filedirectory_862819.png
  • FILECONTENT: Allows to read a single file as is. It requires the PATH and NAME column to be restricted to a single file using a where clause and returns the entire file content as one BLOB.
    /wp-content/uploads/2016/01/fileadapter_filecontent_862818.png
  • FILECONTENTTEXT: As above but this time the file content is returned as NCLOB. That means the file is read with a specific character encoding. This can either be the default – based on the Byte Order Mark of the data file or the OS default – or provided as dataprovisioning parameter. In this example a ASCII file is read with UTF32 codepage, hence producing wrong characters.
    /wp-content/uploads/2016/01/fileadapter_filecontenttext_862835.png
  • FILECONTENTROWS: Again returns a BLOB but this time one row per line. A line is supposed to end with a \n (ASCII newline) character/byte.
    /wp-content/uploads/2016/01/fileadapter_filecontentrows_862836.png
  • CODEPAGES: Returns the list of all supported values for codepages by querying the adapter’s JVM installation.
  • LOCALE: Returns the locales the JVM does support.

 

Defining file formats

 

One option of creating a virtual table for a given file format is executing the create virtual table statement manually and providing the format information as additional metadata. Below is an example of the syntax and it also lists all possible parameters. Only some of them are mandatory, most can be omitted to leave them at a useful default.

 

create virtual table “v_plan” at “FileSource”.”<NULL>”.”<NULL>”.”v_plan”

REMOTE PROPERTY ‘dataprovisioning_parameters’=

‘<?xml version=”1.0″ encoding=”UTF-8″ standalone=”yes”?>

<Parameters>

<Parameter name=”FORCE_FILENAME_PATTERN”>plan%.txt</Parameter>

<Parameter name=”FORCE_DIRECTORY_PATTERN”></Parameter>

<Parameter name=”FORMAT”>CSV</Parameter>

<Parameter name=”CODEPAGE”>UTF-8</Parameter>

<Parameter name=”ROW_DELIMITER”>\r\n</Parameter>

<Parameter name=”COLUMN_DELIMITER”>;</Parameter>

<Parameter name=”TEXT_QUOTES”></Parameter>

<Parameter name=”SKIP_HEADER_LINES”>0</Parameter>

<Parameter name=”QUOTED_TEXT_CONTAIN_ROW_DELIMITER”>false</Parameter>

<Parameter name=”ESCAPE_CHAR”></Parameter>

<Parameter name=”TEXT_QUOTES_ESCAPE_CHAR”></Parameter>

<Parameter name=”ERROR_ON_COLUMNCOUNT”></Parameter>

<Parameter name=”LOCALE”>de_DE</Parameter>

<Parameter name=”DATEFORMAT”>dd. MMMM yyyy</Parameter>

<Parameter name=”TIMEFORMAT”>HH:mm:ss</Parameter>

<Parameter name=”SECONDDATEFORMAT”>yyyy.MM.dd HH:mm:ss</Parameter>

<Parameter name=”TIMESTAMPFORMAT”>yyyy.MM.dd HH:mm:ss</Parameter>

<Parameter name=”COLUMN”>YEAR;Integer</Parameter>

<Parameter name=”COLUMN”>COST_CENTER;varchar(8)</Parameter>

<Parameter name=”COLUMN”>PLAN_AMOUNT;decimal (15, 2)</Parameter>

<Parameter name=”COLUMN”>CHANGE_SECONDDATE;seconddate</Parameter>

</Parameters>’;

 

The other option is to create a .cfg file inside the Directory of file format definitions or a sub directory thereof. The content of the file are logically the same. The file has to have a first line with a description and then lists all parameters as key value pairs. Above example as cfg file would look like:

 

Fileformat to read a CSV file with plan data

FORCE_FILENAME_PATTERN=plan%.txt

FORCE_DIRECTORY_PATTERN=

FORMAT=CSV

CODEPAGE=UTF-8

ROW_DELIMITER=\r\n

COLUMN_DELIMITER=;

TEXT_QUOTES=

SKIP_HEADER_LINES=1

QUOTED_TEXT_CONTAIN_ROW_DELIMITER=false

ESCAPE_CHAR=

TEXT_QUOTES_ESCAPE_CHAR=

ERROR_ON_COLUMNCOUNT=false

LOCALE=de_DE

DATEFORMAT=dd. MMMM yyyy

TIMEFORMAT=HH:mm:ss

SECONDDATEFORMAT=yyyy.MM.dd HH:mm:ss

TIMESTAMPFORMAT=yyyy.MM.dd HH:mm:ss

COLUMN=YEAR;Integer

COLUMN=COST_CENTER;varchar(8)

COLUMN=PLAN_AMOUNT;decimal (15, 2)

COLUMN=CHANGE_SECONDDATE;seconddate

 

At the end both versions have the same endresult, a virtual table with all these parameters as virtual table attributes. The only difference is with the cfg files the browsing shows something and then the virtual table is created, whereas in the first option the table is created directly.

 

Note: Changing the cfg file after creating the virtual table has no effect. These parameters are really copied into the Hana data dictionary.

 

/wp-content/uploads/2016/01/fileadapter_tableproperties_862949.png

 

In case the file is a fixed width file, so the column start and end position is always the same, the position information is needed in addition.

 

Fixed width demo

FORMAT=FIXED

FORCE_FILENAME_PATTERN=plan%.txt

CODEPAGE=UTF-8

ROW_DELIMITER=\r\n

COLUMNSSTARTENDPOSITION=0-3;4-11;12-37;37-53

ROWLENGTH=54

COLUMN=YEAR;Integer

COLUMN=COST_CENTER;varchar(8)

COLUMN=PLAN_AMOUNT;decimal (15, 2)

COLUMN=CHANGE_SECONDDATE;seconddate

 

Parameters in detail

 

Global

  • FORCE_FILENAME_PATTERN, FORCE_DIRECTORY_PATTERN: A user might execute a simple “select * from virtualtable” without any where clause on directory and name of file. In that case every single file in the root directory and subdirectories will be read and parsed according to this virtual table format definitions. That might take a while a produce many errors. But if the virtual table does match to files in a certain directory, directory tree or to certain file names only, it makes sense to hard wire this information into the virtual table directly. Examples:
    • FORCE_DIRECTORY_PATTERN=/usr/sap/FileAdapter/FileServer/plandata: Reads all files in that directory only
    • FORCE_DIRECTORY_PATTERN=/usr/sap/FileAdapter/FileServer/plandata/%: Reads all files in that directory and subdirectories
    • FORCE_DIRECTORY_PATTERN=/usr/sap/FileAdapter/FileServer/plan%: Reads all files in directories that start with “plan”
    • FORCE_FILENAME_PATTERN=plan%.txt: Reads files like plan20120101.txt
    • FORCE_DIRECTORY_PATTERN=/usr/sap/FileAdapter/FileServer/plandata  FORCE_FILENAME_PATTERN=plan%.txt: Files inside the directory and matching the provided name pattern
  • FORMAT: Is either CSV (default) or FIXED
  • CODEPAGE: The character encoding the file is read with. By default the operating system default is used. In case the file has a Byte Order Mark this codepage is used always. Valid values of the Java installation can be found by creating a virtual table for CODEPAGE and querying its contents.
  • ROW_DELIMITER: A character sequence indicating the end of a row in the file. In case these are non-printable characters they can be provided encoded as \d65536 or \xFFFF or as Unicode notation \u+FFFF. Alternatively the typical \r and \n is supported as well. Examples:
    • \n Unix standard
    • \r\n Windows standard
    • \d13\d10 Windows standard but characters provided as decimal number
    • \x0D\x0A Windows standard but characters provided as hex number
  • SKIP_HEADER_LINES: In case the file has a header, the number of lines to be skipped is entered here.
  • ERROR_ON_COLUMNCOUNT: By default a row with less columns then defined is considered okay. By setting this parameter to true, it is expected that all rows of the file have as many columns as defined.
  • LOCALE: The decimal and date conversion from the strings in the file into native numbers or dates might be locale specific. For example the text “3,1415” is a perfectly nice value for PI but in the German notation where the comma is the decimal separator. With an English locale it would be a large integer number 31415. Same thing with dates in case the month or day name is used. “14. Oktober 2000” with locale German is fine, for all other languages it is not. Again, valid values for the locale can be found by querying a virtual table based on the LOCALE table of the adapter.
  • DATEFORMAT, TIMEFORMAT, SECONDDATEFORMAT, TIMESTAMPFORMAT: The file format can use these datatypes for the date/time related values. Each can have a different format string. The syntax of the format string is the Java SimpleDateFormat syntax.
  • COLUMN: multiple entries each consist of the columnname;datatype where the datatype is any normal Hana datatype.

 

CSV only

  • COLUMN_DELIMITER: The character sequence indicating the next column. If non-printable characters are used then either one of the encodings will work \d65536, \xFFFF or \u+FFFF
    • ; Meaning the ; is the column separator, so a line looks like 2000;IT Costs;435.55
    • | using the pipe character as delimiter
    • \d09 using an ASCII tab character as delimiter
  • TEXT_QUOTES: Sometimes text data is enclosed in quotes so a column delimiter inside the text does not break the format. The line 2000;IT Costs; software related only;435.55 would appear as 4 columns as the text contains a semicolon as well. If the file was created with quotes like 2000;”IT Costs; software related only”;435.55 then there is no such issue but the file parser needs to act more carefully and not just search for the next column delimiter. It needs to check if the text is inside the text quote character or outside.
  • ESCAPE_CHAR: Another way to deal with inline special characters is to escape those, like in 2000;IT Costs\; software related only;435.55. Here the \ char is an escape char and indicates that the subsequent character is to be taken literally, not as e.g. column delimiter.
  • TEXT_QUOTES_ESCAPE_CHAR: Leaves the question on how to make quotes appear inside the text, like in IT Costs; “software related” only. One option the file creator might have used is to simply use the global escape character: 2000;”IT Costs; \”software related\” only”;435.55. Another popular method is the have the quotes escaped by another quote like in 2000;”IT Costs; “”software related”” only”;435.55. In that case both the TEXT_QUOTE=” and the TEXT_QUOTE_ESCAPE_CHAR=” are set to the ” character.
  • QUOTED_TEXT_CONTAIN_ROW_DELIMITER: The default value is false and tells the parser regardless of any quotes or escape characters the text inside a row does never contain the row delimiter character sequence. In this case the parser can break the file into rows much faster, it needs to search for the character sequence only, and only the column parsing has to consider the escape and quote characters. If set to true parsing will be slower.

 

Fixed Width only

  • COLUMNSSTARTENDPOSITION: In a fixed width file the column positions need to be specified for each column. Example:
    • 0-3;4-11;12-37;38-53 defines that the first 4 characters are the first column, the next 8 contain the data for the second column etc. Columns need to be in the proper order.
    • 0;4;12;38 is equivalent to above example, the last column ends with the line end
    • 0;4;12;38-53 can be used as well. In fact every single column can either specify the start and end position or just the start.
  • ROWLENGTH: In fixed with files there does not need to be a row delimiter. Often the file has some and then they need to be stripped away. Examples assuming the last data character is at index 53 as specified above:
    • ROWLENGTH=56 ROW_DELIMITER= would work for a file that has a row delimiter. The payload text ranges from 0..53 and hence 54 characters long plus two characters for \r\n. But the last column does not contain the \r\n as it is told to end at index 53.
    • ROWLENGTH=54 ROW_DELIMITER=\r\n is equivalent to above. Each row is expected to be 54 characters plus 2 characters long. The main advantage of this notation is that COLUMNSSTARTENDPOSITION=0;4;12;38 would work as well as the trailing \r\n is stripped away. In the previous example the the last column would start at 38 but end at index 55 due to rowlength=56 and hence contain the \r\n characters in the last column.

 

 

 

Using the virtual table

 

When the virtual table is created it has 5 additional columns:

  • PATH: The path name of the file being read
  • NAME: The file name this row belongs to
  • ROWNUMBER: The line number inside the file
  • PARTITION: All rows get evenly distributed into 10 partitions (ROWNUMBER modulo 10)
  • ERROR: A string with all column parsing errors

 

/wp-content/uploads/2016/01/fileadapter_query_862996.png

 

Executing a select * from virtualtable will return all rows of all files the virtual table is allowed to read, that is all files or the matching ones based on FORCE_DIRECTORY_PATTERN and FORCE_FILENAME_PATTERN settings. Therefore each row tells what file it belongs to in the PATH and NAME columns. As this text can be quite lengthy, it might be a good idea to not select those columns unless the information is needed of course.

This is actually an important capability of the adapter, it supports projections. Therefore only those columns actually needed are transferred, parsed, converted into the proper datatype. The “top” clause is pushed down to the adapter as well, it would read the first lines and then finish.

The adapter also supports filters on PATH and NAME hence all where clauses with equal, in-lists or like are pushed to the adapter and it can navigate to the matching files directly.

 

When a parsing error occurs the reading does continue still but the information about all errors in that row are put into the ERROR column. Things like “Cannot convert to seconddate using the provided format and locale (abcd)”. Hence reading the file without checking the ERROR column is dangerous.

 

In order to simplify finding issues in the file, the line number is returned in the ROWNUMBER column. In above example the file starts with row number 2 as the first line is the header line.

 

In case the file is very large and has lots of expensive data conversions the virtual table provides a PARTITION column to read a subset of the file only. This can be used to conveniently execute for example two insert…select statements in parallel, each processing half the file.

insert into target select * from v_us_county_census where partition in (0,1,2,3,4);

insert into target select * from v_us_county_census where partition not in (0,1,2,3,4);

 

Since the flowgraph supports partitioned reading as well, the file adapter works well together with that feature.

 

51 Comments
You must be Logged on to comment or reply to a post.
  • Werner,

    Thank you for your post! We’re using Smart Data Integration to virtualize multiple tables from existing SQL Server sources. From a practical perspective, accessing smaller reference information from legacy sources without having to actually replicate the data in HANA is a way to stay current from outside source systems without the overhead of maintaining an alternative subscription and publication model.

      • LOL! Well, let me know when your newest baby is due–and of course– what the name/acronym for her will be, so that I can take her on a trial run as soon as possible! 🙂

        • Well, as analogies go, I will have my hands full of nurturing it and let it get more advanced. With the adapters I am quite happy, the UIs take the most time.

      • Werner,

        I am trying to configure a  xls file using the same format as csv. but it is taking ‘,’ as the default COLUMN_DELIMITER. Changing the COLUMN_DELIMITER parameter doesn’t change the output,
        Is there a way to work around it?

  • Hi

    This is a great post! I have the File Adapter working for CSV files. However, I am have difficulty with the Fixed File Format. As I am getting a “unknown keyword in the fileformat cfg file” error message. I cannot see an error in my CFG file. I’m wondering if Fixed File Format functionality was released in SPS 9 or a later release of HANA?

    CFG.PNG

    Thanks

    Colm

    • As I recall it was an SPS10 functionality.

      Please keep in mind the cfg file has to have a first line with the description. But that is unrelated to the problem you are facing.

      • Hi Werner,

        Thank you for your reply. I upgraded everything to SPS 11 (DP Agent, HANA DB, HANA Client, HANA Studio) and it now works for me.

        Cheers

        Colm

  • Hi all,

    Thanks for this post! Helps a lot.

    I´m facing the follwing issue:
    config setting “locale=de_DE” is ignored when replicating decimals into SAP HANA.

    in config file i´ve added rows

    locale=de_DE

    column=BETRAG;DECIMAL(17,2)

    data file has entry e.g. 10,5

    when replicating is see 105 in SAP HANA virtual table (instead of 10,5)

    when changing data value in data file from 10,5 to 10.5 value is replicated correctly (although locale is still set to de_DE).

    anyone facing similar issue?

    Thanks,

    Andreas

      • Hello Werner, Andreas,

        had a similar issue: CFG file with locale ‘de_DE’ and the value ‘0,01’ was written into the target table as value ‘1’.

        Opened a internal incidenet ticket 1670064335 which was closed on March 24th,

        with the comment that the issue has been identified and a fix will be available

        in the next version of the DPAgent.

        Perhaps the resolution covers your above described problem that the “,” is not correctly interpreted.

        BR Dietmar

        • Root cause got identified by development: The locale de_DE is stored in the virtual table properties table as German_Germany and this string the locale formatter does not recognize this as German locale.

  • Hello, thank you for this, it helped greatly to get me going.  I am picking up a csv file from a share and loading it into a table in Hana DB and is working great.  I am running into an issue when I try to do more than one file.  I have even just copied the file and changed name to make sure it was not an issue with the other file.  I have my config file set up like below.  But get this error:  (dberror) 2048 – column store error: “_SYS_BIC”.”SUGAR.INNOVATION.SDI::SDI_FILE_DF_SP”: line 8 col 0 (at pos 193): [2048] (range 3): column store error: task framework: [403] internal error: Remote execution error Data is not read by Hana as it seems for query “SELECT “FieldInventory_Inventory”.”PATH”, “FieldInventory_Inventory”.”NAME”, “FieldInventory_Inventory”.”ROWNUMBER”, “FieldInventory_Inventory”.”PlantNumber”, “FieldInventory_Inventory”.”FiscalYear”, “FieldInventory_Inventory”.”FiscalPeriod”, “FieldInventory_Inventory”.”Material”, “FieldInventory_Inventory”.”Mat_DEsc”, “FieldInventory_Inventory”.”Unit”, “FieldInventory_Inventory”.”Quantity”, “FieldInventory_Inventory”.”Damaged”, “FieldInventory_Inventory”.”OnHold”, “FieldInventory_Inventory”.”Aged” FROM “Inventory” “FieldInventory_Inventory” “

    Fileformat to read a CSV file with plan data

    FORCE_FILENAME_PATTERN=Field%.csv

    FORCE_DIRECTORY_PATTERN=\\fcqdo000\Interface\SDI\usr\sap\FileAdaptor\FileServer\Inventory

    FORMAT=CSV

    CODEPAGE=UTF-8

    ROW_DELIMITER=\r\n

    COLUMN_DELIMITER=,

    TEXT_QUOTES=

    SKIP_HEADER_LINES=2

    QUOTED_TEXT_CONTAIN_ROW_DELIMITER=false

    ESCAPE_CHAR=

    TEXT_QUOTES_ESCAPE_CHAR=

    ERROR_ON_COLUMNCOUNT=false

    LOCALE=de_DE

    DATEFORMAT=dd. MMMM yyyy

    TIMEFORMAT=HH:mm:ss

    SECONDDATEFORMAT=yyyy.MM.dd HH:mm:ss

    TIMESTAMPFORMAT=yyyy.MM.dd HH:mm:ss

    COLUMN=PlantNumber;Varchar(4)

    COLUMN=FiscalYear;varchar(4)

    COLUMN=FiscalPeriod;varchar(3)

    COLUMN=Material;varchar(15)

    COLUMN=Mat_DEsc;varchar(30)

    COLUMN=Unit;varchar(3)

    COLUMN=Quantity;Integer

    COLUMN=Damaged;Integer

    COLUMN=OnHold;Integer

    COLUMN=Aged;Integer

    • Hi Edward,

      I see this question is more than a year old, but, yes, you are correct – this was a bug in the FileAdapter.  It was fixed a while back and is available in the latest versions on both the 1.3.x line and the 2.x line.

      Jim

       

  • Good day,

    What if the files that I receive have incremental information and at the end of the file processing I want to move the file so the FileAdapter does not have to scan that file again?, what would be the right approach to this?

  • Great Blog Thanks,

    One issue i noticed is when i tried to use

    insert into “PhysicalTable” select Columns from “LogicalTable” where “PARTITION” in (0,1,2,3)

    i am getting errors saying IN operator is not allowed.

    SAP DBTech JDBC: [403]: internal error: Error opening the cursor for the remote database Unsupported condition (IN)

     

    its working fine when i use

    insert into “PhysicalTable” select Columns from “LogicalTable” where “PARTITION” = 0

     

    Regards,

    Sai

     

  • Hi

     

    I m loading a fixed file, about 40 columns and about  700.00 rows.

    i m facing a very bad problem.

    ROWS: ‘112148’,’392516′

    Are ALWAYS loaded in a bad way. I tried to manage the file chaing rows, characters, position without any success.

    Have anybody faced this issue?

    thanks in advance

    • Hi Matteo,

      yes, this was a bug in the fileAdapter – under some conditions, every few hundred thousand rows, the last few columns would get garbled.  This has been fixed – I think it in version 1.3.9, but certainly in the latest version on both the 1.3.x line and the 2.x line.

       

      Jim

       

  • Hi,

    what is best way to see if that partitioning on virtual table (I use flowgraph) works?

    I use HANA 1.0 SPS12 rev 009 with DPA on 1.3.5. So that should be sufficient.
    Somewhere I should see several tasks running in parallel, but my gut feeling is that this is not working…

    Tried it with splitting up Northrine-Westphalia into its “counties”.

    thanks

    • Hi Clemens,

      yes, you are correct – there is a bug preventing partitioning from working with the FileAdapter.  This bug is being fixed in both the 1.3.x lines and 2.x lines and should be released soon.  For now, your only alternative is to manually split a large file into several files and load each individually.

      Jim

       

  • Hi Werner,

    .TXT files are placed in DP Agent Server and it connected to HANA using File Adapter of SDI. The Column are separated by | (Pipe Delimiter). The virtual table is not able to read the last column data

    6+6|LA|2017|ACTUAL|3|BOL|2000105406|COC|5000013583|NAB|263466|128.83|680769.8
    6+6|LA|2017|ACTUAL|6|BOL|2000105432|COC|5000013583|NAB|12852|6.28|33208.28
    6+6|LA|2017|ACTUAL|3|BOL|2000105447|COC|5000013583|NAB|4284|2.09|11069.43
    6+6|LA|2017|ACTUAL|6|BOL|2000105447|COC|5000013583|NAB|6426|3.14|16604.14
    6+6|LA|2017|ACTUAL|3|BOL|2000105418|COC|5000013657|NAB|147798|71.83|344605.83

    i have data in this format in .txt file. The last column data (for e.g. 680769.8) is not visible in virtual table in hana for all the rows.

    Please find the cfg file .

    Please help me how can i get the last column data in virtual table.

     

    thanks,

    Smith

    • Hi Smith,

      yes, Vladimir is correct.  This was a bug in a couple of versions of the DP Agent, but it is fixed in the latest 1.3.x and 2.x versions.

       

      Jim

       

  • @jim.havlicek We are doing a scenario where we need to load HANA DB via SDI and source is File. Can you please let us know if xml files are supported in file adapter and does file adapter support complex nested structure in xml?

     

    File(xml file) –> SDI –> HANA DB table

     

    Thanks in advance,

    Aman

    • Hi Aman,

      no, the FileAdapter only supports delimited and fixed-length flat (i.e. no nesting) files.

      I will ask around a bit, but perhaps the Camel JDBC adapter in SDI would work for you.  You’d have to grab a JDBC driver for XML and try it.  I am not sure how well that will handle a nested structure either.

       

      Perhaps I am missing something – I will post back after I ask around the dev team.

      Jim

       

  • Hi Werner,

    We are trying to access multiple .txt files as source to one single virtual table. we have one .cfg file where we mentioned FORCE_FILENAME_PATTERN=*.txt to read all the files.  Are we doing anything wrong? we see only one file under file formats source connection .

    • Hi Kishore,

      the FileAdapter uses % as the wildcard, not *.    So, %.txt should get your oars back in the water.

       

      There are some edge cases that I should make you aware of :

      • the ‘txt’ part is case-sensitive, even on windows.  So, if that causes an issue, there is a complicated workaround – let me know.
      • also, the period acts as a single-character wildcard.  So, %.txt will match to a filename of whatevertxt    (with no extension)

      Jim

       

  • Hi Werner, Jim,

    I am testing with the FileAdapter on my local machine to a HANA 1.0 SPS12 DB on the Cloud Platform. Everything works fine, except it only reads up to 1,000 rows in the source file then stops. I tried changing the limits on the FileAdapter but it is still reading only 1,000 rows, and also tried adding parameter remote_objects_tree_max_size on the server.

    Any ideas where this limit is set?

     

    Thanks!

    KW

    • Hi,

      I ran the data provisioning without the config tool running on my machine, and it is loading past the 1000 row limit. At least that is what I think fixed it. I have not changed any limits on HANA.

       

      Thanks,

      KW

  • hi, I notice that there is an sFTP fileadapter as well. I was wondering if it’s possible to use basic FTP as source rather than sFTP? I’ve tried a few permutations but it doesn’t seem to work with FTP. is this expected?

  • Hi Werner,

     

    Could you please confirm if the Partition column in Virtual table for the flat files will always have Partitions in 0 to 9 range? or can it have more than 10 Partitions in case the file size is huge?

    Thanks.

  • Hi Werner,

     

    I hope you can help us out. Our scenario is that we have a flat file that is placed on a server every 15 minutes. We have setup the File Adapter on that server with the property below set:

    <PropertyEntry name=”usecdc”>true</PropertyEntry>

    We then created the remote source to that server and created the virtual table on top of that flat file in our HANA environment. We can do a data preview correctly.

    I have also built a Flowgraph on that Virtual Table and I’m able to do batch loads correctly.

    Our challenge is when enabling the “Real Time” flag under “Real-Time Behavior”.

    I can see that under the Task Monitor section of the Data Provisioning monitor it has created two new tasks ending in **_FL and *_RT. I can also see under the remote subscription monitor it created a subscription ending in **_RS.

    Everything is green and successful, but no data is getting replicated from that virtual table when real-time is enabled.

    I have tried a replication task but with no success either.

    Any pointers on how to do real time replication using the file adapter on a flat file in a server?

    Appreciate any input you can provide!

    Best regards,

    Marcelo

    • Hi Marcelo,

      the FileAdapter works with either a Rep task or a realtime flowgraph, however it can only replicate in two scenarios  :

      1. You have a single file and you append new records to the end of the file.  In this case, the new records are then inserted into the target table.
      2. You drop new files WITH UNIQUE NAMES into a folder.  In this case, the entire new file is inserted into the target table.

      It sounds as if you are attempting the second scenario.

      To make this work, you need to have a % wildcard in the FORCE_FILENAME_PATTERN setting in your CFG file  (e.g. SALES_%.CSV).   Then, each filename has to have a unique name (e.g. SALES_2018_12_28_11_00.CSV, SALES_2018_12_28_11_15.CSV, SALES_2018_12_28_11_30.CSV, etc.)

      When the 11_15 file is dropped into the folder, the CDC feature will detect it and trigger the load.

       

      Does that match you use case?

       

       

       

      • Hi Jim,

        Thanks for taking the time to reply. We got it to work. We were 99% there, we did everything correctly, just didn’t see the step where we need to call the stored procedure that is created automatically once we save the replication task. The modelling guide can be a little non intuitive at times.

        Once we called it, it did the initial load and started the replication. Our scenario is your case 1, where we’re appending new records to the end of the file.

        A follow-up question though, is that we noticed the first time we appended rows to the file. it took about 20 seconds to detect the new rows, and then 2 minutes to start the replication of the data (1000 rows only). The replication itself was fast.

        The next time we appended the file, the whole process took about 1 minute.

        Is there any reason there’d be such a delay, as the expectation is that it would be near real-time (in the seconds rather than in the minutes).

        Appreciate your response.

        • Hi Marcello,

          I wanted to touch base to let you know that I am still researching this question.  The interval is supposed to be 5 seconds, and our test team sees a delay of less than 5 seconds on replication.  But, I have been able to use one of my machines to reproduce  the behavior that you see  (60-90 second delays). So, I am working to isolate the differences and give you a concise answer.

          Thanks,

          Jim

           

          • Hi Marcello,

            The short answer is that you can adjust the following two DPServer parameters to make a tradeoff between throughput and latency.  If they are set very low (1 second is the minimum), then only one second worth of changes will be queued for each commit.  So, in your example above, the small change at the end of the file will be detected in 5 seconds, and then transmitted from the DP Agent to the DP Server, and then committed in 1 second.  By applying these changes, I was able to see the file changes in less than 10 seconds total on my test machine.

            However, these settings apply to all CDC activity on the DPServer, so beware.  If you also have high volume CDC activity on any remote source, the frequent commits will incur overhead that will reduce overall throughput. So, you’ll have to consider the tradeoffs when changing these from their default settings.

            ALTER SYSTEM ALTER CONFIGURATION (‘dpserver.ini’, ‘SYSTEM’) SET (‘persistence’, ‘distributor_commit_interval_sec’) = ‘1’ WITH RECONFIGURE;

            ALTER SYSTEM ALTER CONFIGURATION (‘dpserver.ini’, ‘SYSTEM’) SET (‘persistence’, ‘receiver_commit_interval_sec’) = ‘1’ WITH RECONFIGURE;

             

            You can refer to this KBA for more detail

            2710286 – SDI realtime replication latency between RECEIVER and DISPATCHER comoponents – tuning DP Server performance

             

            Jim

             

      • Jim, I thought the FileAdapter does not support realtime still? Obviously it does but then the PAM is wrong.

        It says on page 20: FileAdapter: Realtime = No.

         

        And why do you need to set a property use_cdc for the adapter? Either it has a realtime subscription or it does not.

        https://apps.support.sap.com/sap(bD1kZSZjPTAwMQ==)/support/pam/pam.html?smpsrv=https%3a%2f%2fwebsmp202.sap-ag.de#ts=5&s=smart%20data%20&filter=&o=name%7Casc&st=l&rpp=20&page=1&pvnr=73555000100900001404&pt=g%7Cd

        • Hi Werner,

          thanks for pointing that out – the PAM is wrong, the doc is correct.  That mistake on the PAM will be fixed the next time the PAM is updated.  CDC has been available in the FileAdapter for at least a year and a half.

          Jim

           

           

  • Hi Werner

    Could you help me in how to avoid these additional columns when i am trying write to a virtual table, in Flowgraph (XSA ) my data source (table ) doesn’t has these 5 columns.

  • Hi Kishore,

    Are you reading from a virtual table defined on the FileAdapter, and writing another virtual table?  And the target table doesn’t have the 5 built-in columns, right?

    If I understand correctly, you should be able to add a projection node to your flowgraph and simply remove those 5 built-in columns from the mapping.   Alternatively, if you don’t need the built-in columns in your flowgraph at all, then you could be more explicit and create a view on top of the virtual table, and then change the Data Source in your flowgraph to SELECT from the view instead of the virtual table.

    If that’s not the problem, perhaps I am misinterpreting your question.

    Hope that helps,

    Jim

     

    • Thank you for your valuable suggestionsJim, appreciated it.

      Yes correct projection will help me during the inbound, where as i can unheck the  addtional columns from the source and do mapping, but not during the other way ie., outbound from synonym to a virtual table, stills same issue during outbound with the 5 columns.

       

      I dont know how to create a view on virtual table in XSA Weide environment, I will explore this option, appreciate if you could share some reference.

       

      The  Scenario I am trying here:-

      I am reading from synonym created in XSA WedIDE for a  real table and writing to a virtual table of file adapter.

      my source table has only following columns

      ADAPTER_NAME;NVARCHAR(1024)
      SOURCE_VERSION;NVARCHAR(1024);
      CAPABILITY_NAME;NVARCHAR(1024);
      DESCRIPTION;NVARCHAR(1024);
      SCOPE;NVARCHAR(1024);
      IS_POOL_SUPPORTED;NVARCHAR(1024);
      IS_MDE_SUPPORTED;NVARCHAR(1024);

      where as when my virtual table has following columns

      PATH;NVARCHAR(1024);
      NAME;NVARCHAR(1024);
      ROWNUMBER;NVARCHAR(1024);
      ERROR;NVARCHAR(1024);
      PARTITION;NVARCHAR(1024);
      ADAPTER_NAME;NVARCHAR(1024)
      SOURCE_VERSION;NVARCHAR(1024);
      CAPABILITY_NAME;NVARCHAR(1024);
      DESCRIPTION;NVARCHAR(1024);
      SCOPE;NVARCHAR(1024);
      IS_POOL_SUPPORTED;NVARCHAR(1024);
      IS_MDE_SUPPORTED;NVARCHAR(1024);

       

      I am able to write successfully from virtual table to real table using synonym with a flow graph, but similarly i am not able to write from synonym to a virtual table.

       

      adding some screen captures to help more here.

       

      Thank you in advance.

      Kishore

      • Hi Kishore,

        ahh, now I understand.  To write to the FileAdapter, you should populate those built-in columns with hardcoded values.

        PATH = ”

        NAME = ”

        ROWNUMBER = 1

        ERROR = NULL

        PARTITION = NULL

        These values will not get written to the file, but that is enough to get past the problem.   These columns are ignored when writing under normal circumstances.

         

        Hope that helps,

        Jim

         

  • Thank you Jim appreciate you update.

     

    yes i was expecting that these ignored standard, but in mycase the the data target of the flowgraph is showing as error since i do not have matching columns form the source tables.

    do i have to add those columns in every table that i want to write to file using sdi file adatper ?

     

    Thanks

    Kishore

     

  • Dear Werner Daehn,

     

    I want to use the SDI File Adapter for the following scenario :

    every day, some external application will generate a new file in a path that I will read in order to integrate this in a BW flow.

    As soon as I have read the file, I want to be able to move it to an archive folder.

    Do you think there is a way to achieve this last step of moving the file with SDI / HANA ?

     

    Thanks a lot for your help !

    Vincent.