Hana Smart Data Integration – The File Adapter
The SDI File Adapter is one of the adapters preinstalled with every SDI Agent. Its purpose is to make any file, fixed width or delimited, in every character encoding and all formats available to Hana as a virtual table to select from as easy as from any other Hana table..
Concept
Once the File Adapter is made visible in Hana it provides multiple tables out of the box, e.g. FILEDIRECTORY to get a list of all available files or FILECONTENT with file content as one large BLOB. As this might allow any Hana user with the create-remote-source privilege to read files, there needs to be some security built around.
When deploying the adapter in the agent, a root directory is to be set. The adapter does not allow to reach outside of this root directory, ever. Therefore sensible files like /etc/passwd or C:\Windows cannot be read. Second is some password (or AccessToken as it is labeled) to make sure the person creating the remote source in Hana is allowed to use this adapter.
The administrator installing the agent decides on these settings by deploying the FileAdapter and changing the preferences of this adapter in the AgentConfig tool.
As one virtual table specifies the file format definition and multiple files can have the matching structure, there is no 1:1 relationship between files and virtual tables. So what should the adapter return when browsing the list of available virtual tables? To provide this browsing option the adapter does scan a File Format Root Directory for files of name *.cfg and each file contains the format information. This is the second directory to configure.
With this information the remote source and the virtual tables can be created and the file content selected from.
Remote Source
When creating a remote source the root and file format directories are set by default to the directories of the adapter and can be further restricted here. The idea is, maybe one remote source points to the plan data subdirectory, another to the employee data (both sub directories of the adapter root directory) and users are allowed to use either the one or the other remote source. Just in case more fine grained security is needed.
The AccessToken has to match the token entered in the adapter preferences as security measure.
Alternatively the remote source can be created via SQL as well
CREATE REMOTE SOURCE “FileSource” ADAPTER “FileAdapter” AT LOCATION AGENT “agent_local”
CONFIGURATION ‘<?xml version=”1.0″ encoding=”UTF-8″?>
<ConnectionProperties name=”ConnectionInfo”>
<PropertyEntry name=”rootdir”>/usr/sap/FileAdapter/FileServer</PropertyEntry>
<PropertyEntry name=”fileformatdir”>/usr/sap/FileAdapter/FileformatDefinitions</PropertyEntry>
</ConnectionProperties>’
WITH CREDENTIAL TYPE ‘PASSWORD’ USING
‘<CredentialEntry name=”AccessTokenEntry”>
<password>1234</password>
</CredentialEntry>’;
The built-in tables
The following tables are provided by the adapter always. These are supposed to help building applications using the File Adapter.
- FILEDIRECTORY: List all files of the remote source, including the sub directories. Supports pushdown of LIKE and = on the PATH and NAME column.
- FILECONTENT: Allows to read a single file as is. It requires the PATH and NAME column to be restricted to a single file using a where clause and returns the entire file content as one BLOB.
- FILECONTENTTEXT: As above but this time the file content is returned as NCLOB. That means the file is read with a specific character encoding. This can either be the default – based on the Byte Order Mark of the data file or the OS default – or provided as dataprovisioning parameter. In this example a ASCII file is read with UTF32 codepage, hence producing wrong characters.
- FILECONTENTROWS: Again returns a BLOB but this time one row per line. A line is supposed to end with a \n (ASCII newline) character/byte.
- CODEPAGES: Returns the list of all supported values for codepages by querying the adapter’s JVM installation.
- LOCALE: Returns the locales the JVM does support.
Defining file formats
One option of creating a virtual table for a given file format is executing the create virtual table statement manually and providing the format information as additional metadata. Below is an example of the syntax and it also lists all possible parameters. Only some of them are mandatory, most can be omitted to leave them at a useful default.
create virtual table “v_plan” at “FileSource”.”<NULL>”.”<NULL>”.”v_plan”
REMOTE PROPERTY ‘dataprovisioning_parameters’=
‘<?xml version=”1.0″ encoding=”UTF-8″ standalone=”yes”?>
<Parameters>
<Parameter name=”FORCE_FILENAME_PATTERN”>plan%.txt</Parameter>
<Parameter name=”FORCE_DIRECTORY_PATTERN”></Parameter>
<Parameter name=”FORMAT”>CSV</Parameter>
<Parameter name=”CODEPAGE”>UTF-8</Parameter>
<Parameter name=”ROW_DELIMITER”>\r\n</Parameter>
<Parameter name=”COLUMN_DELIMITER”>;</Parameter>
<Parameter name=”TEXT_QUOTES”></Parameter>
<Parameter name=”SKIP_HEADER_LINES”>0</Parameter>
<Parameter name=”QUOTED_TEXT_CONTAIN_ROW_DELIMITER”>false</Parameter>
<Parameter name=”ESCAPE_CHAR”></Parameter>
<Parameter name=”TEXT_QUOTES_ESCAPE_CHAR”></Parameter>
<Parameter name=”ERROR_ON_COLUMNCOUNT”></Parameter>
<Parameter name=”LOCALE”>de_DE</Parameter>
<Parameter name=”DATEFORMAT”>dd. MMMM yyyy</Parameter>
<Parameter name=”TIMEFORMAT”>HH:mm:ss</Parameter>
<Parameter name=”SECONDDATEFORMAT”>yyyy.MM.dd HH:mm:ss</Parameter>
<Parameter name=”TIMESTAMPFORMAT”>yyyy.MM.dd HH:mm:ss</Parameter>
<Parameter name=”COLUMN”>YEAR;Integer</Parameter>
<Parameter name=”COLUMN”>COST_CENTER;varchar(8)</Parameter>
<Parameter name=”COLUMN”>PLAN_AMOUNT;decimal (15, 2)</Parameter>
<Parameter name=”COLUMN”>CHANGE_SECONDDATE;seconddate</Parameter>
</Parameters>’;
The other option is to create a .cfg file inside the Directory of file format definitions or a sub directory thereof. The content of the file are logically the same. The file has to have a first line with a description and then lists all parameters as key value pairs. Above example as cfg file would look like:
Fileformat to read a CSV file with plan data
FORCE_FILENAME_PATTERN=plan%.txt
FORCE_DIRECTORY_PATTERN=
FORMAT=CSV
CODEPAGE=UTF-8
ROW_DELIMITER=\r\n
COLUMN_DELIMITER=;
TEXT_QUOTES=
SKIP_HEADER_LINES=1
QUOTED_TEXT_CONTAIN_ROW_DELIMITER=false
ESCAPE_CHAR=
TEXT_QUOTES_ESCAPE_CHAR=
ERROR_ON_COLUMNCOUNT=false
LOCALE=de_DE
DATEFORMAT=dd. MMMM yyyy
TIMEFORMAT=HH:mm:ss
SECONDDATEFORMAT=yyyy.MM.dd HH:mm:ss
TIMESTAMPFORMAT=yyyy.MM.dd HH:mm:ss
COLUMN=YEAR;Integer
COLUMN=COST_CENTER;varchar(8)
COLUMN=PLAN_AMOUNT;decimal (15, 2)
COLUMN=CHANGE_SECONDDATE;seconddate
At the end both versions have the same endresult, a virtual table with all these parameters as virtual table attributes. The only difference is with the cfg files the browsing shows something and then the virtual table is created, whereas in the first option the table is created directly.
Note: Changing the cfg file after creating the virtual table has no effect. These parameters are really copied into the Hana data dictionary.
In case the file is a fixed width file, so the column start and end position is always the same, the position information is needed in addition.
Fixed width demo
FORMAT=FIXED
FORCE_FILENAME_PATTERN=plan%.txt
CODEPAGE=UTF-8
ROW_DELIMITER=\r\n
COLUMNSSTARTENDPOSITION=0-3;4-11;12-37;37-53
ROWLENGTH=54
COLUMN=YEAR;Integer
COLUMN=COST_CENTER;varchar(8)
COLUMN=PLAN_AMOUNT;decimal (15, 2)
COLUMN=CHANGE_SECONDDATE;seconddate
Parameters in detail
Global
- FORCE_FILENAME_PATTERN, FORCE_DIRECTORY_PATTERN: A user might execute a simple “select * from virtualtable” without any where clause on directory and name of file. In that case every single file in the root directory and subdirectories will be read and parsed according to this virtual table format definitions. That might take a while a produce many errors. But if the virtual table does match to files in a certain directory, directory tree or to certain file names only, it makes sense to hard wire this information into the virtual table directly. Examples:
- FORCE_DIRECTORY_PATTERN=/usr/sap/FileAdapter/FileServer/plandata: Reads all files in that directory only
- FORCE_DIRECTORY_PATTERN=/usr/sap/FileAdapter/FileServer/plandata/%: Reads all files in that directory and subdirectories
- FORCE_DIRECTORY_PATTERN=/usr/sap/FileAdapter/FileServer/plan%: Reads all files in directories that start with “plan”
- FORCE_FILENAME_PATTERN=plan%.txt: Reads files like plan20120101.txt
- FORCE_DIRECTORY_PATTERN=/usr/sap/FileAdapter/FileServer/plandata FORCE_FILENAME_PATTERN=plan%.txt: Files inside the directory and matching the provided name pattern
- FORMAT: Is either CSV (default) or FIXED
- CODEPAGE: The character encoding the file is read with. By default the operating system default is used. In case the file has a Byte Order Mark this codepage is used always. Valid values of the Java installation can be found by creating a virtual table for CODEPAGE and querying its contents.
- ROW_DELIMITER: A character sequence indicating the end of a row in the file. In case these are non-printable characters they can be provided encoded as \d65536 or \xFFFF or as Unicode notation \u+FFFF. Alternatively the typical \r and \n is supported as well. Examples:
- \n Unix standard
- \r\n Windows standard
- \d13\d10 Windows standard but characters provided as decimal number
- \x0D\x0A Windows standard but characters provided as hex number
- SKIP_HEADER_LINES: In case the file has a header, the number of lines to be skipped is entered here.
- ERROR_ON_COLUMNCOUNT: By default a row with less columns then defined is considered okay. By setting this parameter to true, it is expected that all rows of the file have as many columns as defined.
- LOCALE: The decimal and date conversion from the strings in the file into native numbers or dates might be locale specific. For example the text “3,1415” is a perfectly nice value for PI but in the German notation where the comma is the decimal separator. With an English locale it would be a large integer number 31415. Same thing with dates in case the month or day name is used. “14. Oktober 2000” with locale German is fine, for all other languages it is not. Again, valid values for the locale can be found by querying a virtual table based on the LOCALE table of the adapter.
- DATEFORMAT, TIMEFORMAT, SECONDDATEFORMAT, TIMESTAMPFORMAT: The file format can use these datatypes for the date/time related values. Each can have a different format string. The syntax of the format string is the Java SimpleDateFormat syntax.
- COLUMN: multiple entries each consist of the columnname;datatype where the datatype is any normal Hana datatype.
CSV only
- COLUMN_DELIMITER: The character sequence indicating the next column. If non-printable characters are used then either one of the encodings will work \d65536, \xFFFF or \u+FFFF
- ; Meaning the ; is the column separator, so a line looks like 2000;IT Costs;435.55
- | using the pipe character as delimiter
- \d09 using an ASCII tab character as delimiter
- TEXT_QUOTES: Sometimes text data is enclosed in quotes so a column delimiter inside the text does not break the format. The line 2000;IT Costs; software related only;435.55 would appear as 4 columns as the text contains a semicolon as well. If the file was created with quotes like 2000;”IT Costs; software related only”;435.55 then there is no such issue but the file parser needs to act more carefully and not just search for the next column delimiter. It needs to check if the text is inside the text quote character or outside.
- ESCAPE_CHAR: Another way to deal with inline special characters is to escape those, like in 2000;IT Costs\; software related only;435.55. Here the \ char is an escape char and indicates that the subsequent character is to be taken literally, not as e.g. column delimiter.
- TEXT_QUOTES_ESCAPE_CHAR: Leaves the question on how to make quotes appear inside the text, like in IT Costs; “software related” only. One option the file creator might have used is to simply use the global escape character: 2000;”IT Costs; \”software related\” only”;435.55. Another popular method is the have the quotes escaped by another quote like in 2000;”IT Costs; “”software related”” only”;435.55. In that case both the TEXT_QUOTE=” and the TEXT_QUOTE_ESCAPE_CHAR=” are set to the ” character.
- QUOTED_TEXT_CONTAIN_ROW_DELIMITER: The default value is false and tells the parser regardless of any quotes or escape characters the text inside a row does never contain the row delimiter character sequence. In this case the parser can break the file into rows much faster, it needs to search for the character sequence only, and only the column parsing has to consider the escape and quote characters. If set to true parsing will be slower.
Fixed Width only
- COLUMNSSTARTENDPOSITION: In a fixed width file the column positions need to be specified for each column. Example:
- 0-3;4-11;12-37;38-53 defines that the first 4 characters are the first column, the next 8 contain the data for the second column etc. Columns need to be in the proper order.
- 0;4;12;38 is equivalent to above example, the last column ends with the line end
- 0;4;12;38-53 can be used as well. In fact every single column can either specify the start and end position or just the start.
- ROWLENGTH: In fixed with files there does not need to be a row delimiter. Often the file has some and then they need to be stripped away. Examples assuming the last data character is at index 53 as specified above:
- ROWLENGTH=56 ROW_DELIMITER= would work for a file that has a row delimiter. The payload text ranges from 0..53 and hence 54 characters long plus two characters for \r\n. But the last column does not contain the \r\n as it is told to end at index 53.
- ROWLENGTH=54 ROW_DELIMITER=\r\n is equivalent to above. Each row is expected to be 54 characters plus 2 characters long. The main advantage of this notation is that COLUMNSSTARTENDPOSITION=0;4;12;38 would work as well as the trailing \r\n is stripped away. In the previous example the the last column would start at 38 but end at index 55 due to rowlength=56 and hence contain the \r\n characters in the last column.
Using the virtual table
When the virtual table is created it has 5 additional columns:
- PATH: The path name of the file being read
- NAME: The file name this row belongs to
- ROWNUMBER: The line number inside the file
- PARTITION: All rows get evenly distributed into 10 partitions (ROWNUMBER modulo 10)
- ERROR: A string with all column parsing errors
Executing a select * from virtualtable will return all rows of all files the virtual table is allowed to read, that is all files or the matching ones based on FORCE_DIRECTORY_PATTERN and FORCE_FILENAME_PATTERN settings. Therefore each row tells what file it belongs to in the PATH and NAME columns. As this text can be quite lengthy, it might be a good idea to not select those columns unless the information is needed of course.
This is actually an important capability of the adapter, it supports projections. Therefore only those columns actually needed are transferred, parsed, converted into the proper datatype. The “top” clause is pushed down to the adapter as well, it would read the first lines and then finish.
The adapter also supports filters on PATH and NAME hence all where clauses with equal, in-lists or like are pushed to the adapter and it can navigate to the matching files directly.
When a parsing error occurs the reading does continue still but the information about all errors in that row are put into the ERROR column. Things like “Cannot convert to seconddate using the provided format and locale (abcd)”. Hence reading the file without checking the ERROR column is dangerous.
In order to simplify finding issues in the file, the line number is returned in the ROWNUMBER column. In above example the file starts with row number 2 as the first line is the header line.
In case the file is very large and has lots of expensive data conversions the virtual table provides a PARTITION column to read a subset of the file only. This can be used to conveniently execute for example two insert…select statements in parallel, each processing half the file.
insert into target select * from v_us_county_census where partition in (0,1,2,3,4);
insert into target select * from v_us_county_census where partition not in (0,1,2,3,4);
Since the flowgraph supports partitioned reading as well, the file adapter works well together with that feature.
Werner,
Thank you for your post! We're using Smart Data Integration to virtualize multiple tables from existing SQL Server sources. From a practical perspective, accessing smaller reference information from legacy sources without having to actually replicate the data in HANA is a way to stay current from outside source systems without the overhead of maintaining an alternative subscription and publication model.
Glad you like the solution - as it is my baby. 🙂
LOL! Well, let me know when your newest baby is due--and of course-- what the name/acronym for her will be, so that I can take her on a trial run as soon as possible! 🙂
Well, as analogies go, I will have my hands full of nurturing it and let it get more advanced. With the adapters I am quite happy, the UIs take the most time.
Werner,
I am trying to configure a xls file using the same format as csv. but it is taking ',' as the default COLUMN_DELIMITER. Changing the COLUMN_DELIMITER parameter doesn't change the output,
Is there a way to work around it?
Hi
This is a great post! I have the File Adapter working for CSV files. However, I am have difficulty with the Fixed File Format. As I am getting a "unknown keyword in the fileformat cfg file" error message. I cannot see an error in my CFG file. I'm wondering if Fixed File Format functionality was released in SPS 9 or a later release of HANA?
Thanks
Colm
As I recall it was an SPS10 functionality.
Please keep in mind the cfg file has to have a first line with the description. But that is unrelated to the problem you are facing.
Hi Werner,
Thank you for your reply. I upgraded everything to SPS 11 (DP Agent, HANA DB, HANA Client, HANA Studio) and it now works for me.
Cheers
Colm
Hi all,
Thanks for this post! Helps a lot.
I´m facing the follwing issue:
config setting "locale=de_DE" is ignored when replicating decimals into SAP HANA.
in config file i´ve added rows
locale=de_DE
column=BETRAG;DECIMAL(17,2)
data file has entry e.g. 10,5
when replicating is see 105 in SAP HANA virtual table (instead of 10,5)
when changing data value in data file from 10,5 to 10.5 value is replicated correctly (although locale is still set to de_DE).
anyone facing similar issue?
Thanks,
Andreas
You are right, this stopped working. I have created an internal bug message in bugzilla: 109029
Hello Werner, Andreas,
had a similar issue: CFG file with locale 'de_DE' and the value '0,01' was written into the target table as value '1'.
Opened a internal incidenet ticket 1670064335 which was closed on March 24th,
with the comment that the issue has been identified and a fix will be available
in the next version of the DPAgent.
Perhaps the resolution covers your above described problem that the "," is not correctly interpreted.
BR Dietmar
Root cause got identified by development: The locale de_DE is stored in the virtual table properties table as German_Germany and this string the locale formatter does not recognize this as German locale.
Hello, thank you for this, it helped greatly to get me going. I am picking up a csv file from a share and loading it into a table in Hana DB and is working great. I am running into an issue when I try to do more than one file. I have even just copied the file and changed name to make sure it was not an issue with the other file. I have my config file set up like below. But get this error: (dberror) 2048 - column store error: "_SYS_BIC"."SUGAR.INNOVATION.SDI::SDI_FILE_DF_SP": line 8 col 0 (at pos 193): [2048] (range 3): column store error: task framework: [403] internal error: Remote execution error Data is not read by Hana as it seems for query "SELECT "FieldInventory_Inventory"."PATH", "FieldInventory_Inventory"."NAME", "FieldInventory_Inventory"."ROWNUMBER", "FieldInventory_Inventory"."PlantNumber", "FieldInventory_Inventory"."FiscalYear", "FieldInventory_Inventory"."FiscalPeriod", "FieldInventory_Inventory"."Material", "FieldInventory_Inventory"."Mat_DEsc", "FieldInventory_Inventory"."Unit", "FieldInventory_Inventory"."Quantity", "FieldInventory_Inventory"."Damaged", "FieldInventory_Inventory"."OnHold", "FieldInventory_Inventory"."Aged" FROM "Inventory" "FieldInventory_Inventory" "
Fileformat to read a CSV file with plan data
FORCE_FILENAME_PATTERN=Field%.csv
FORCE_DIRECTORY_PATTERN=\\fcqdo000\Interface\SDI\usr\sap\FileAdaptor\FileServer\Inventory
FORMAT=CSV
CODEPAGE=UTF-8
ROW_DELIMITER=\r\n
COLUMN_DELIMITER=,
TEXT_QUOTES=
SKIP_HEADER_LINES=2
QUOTED_TEXT_CONTAIN_ROW_DELIMITER=false
ESCAPE_CHAR=
TEXT_QUOTES_ESCAPE_CHAR=
ERROR_ON_COLUMNCOUNT=false
LOCALE=de_DE
DATEFORMAT=dd. MMMM yyyy
TIMEFORMAT=HH:mm:ss
SECONDDATEFORMAT=yyyy.MM.dd HH:mm:ss
TIMESTAMPFORMAT=yyyy.MM.dd HH:mm:ss
COLUMN=PlantNumber;Varchar(4)
COLUMN=FiscalYear;varchar(4)
COLUMN=FiscalPeriod;varchar(3)
COLUMN=Material;varchar(15)
COLUMN=Mat_DEsc;varchar(30)
COLUMN=Unit;varchar(3)
COLUMN=Quantity;Integer
COLUMN=Damaged;Integer
COLUMN=OnHold;Integer
COLUMN=Aged;Integer
Hi Edward,
I see this question is more than a year old, but, yes, you are correct - this was a bug in the FileAdapter. It was fixed a while back and is available in the latest versions on both the 1.3.x line and the 2.x line.
Jim
Great blog.
Good day,
What if the files that I receive have incremental information and at the end of the file processing I want to move the file so the FileAdapter does not have to scan that file again?, what would be the right approach to this?
Hi Antonio,
with the FileAdapter, you can create a virtual procedure that will run any batch/script file that you put on the DP Agent machine. So you can write a batch file to delete/move the files, and then call it from a flowgraph.
START HERE : This is the main link that gets you started – it shows how to call a BAT file or script file through the FileAdapter.
https://help.sap.com/viewer/7952ef28a6914997abc01745fef1b607/2.0_SPS01/en-US/f4ceea5385614430aed257f9e42cb39e.html
This page shows how to create a virtual procedure with WebIDE. It is NOT specific to the FileAdapter. The FileAdapter has fewer capabilities than a regular DBMS adapter, so some of the magic mentioned on this page didn’t seem to work for the FileAdapter. For instance, step 3
https://uacp2.hana.ondemand.com/viewer/71c4a6e6b4dc4a5ab3e17bb1d7e98104/2.0_SPS01/en-US/c1507d9164364248aae0995ae144394c.html
This link shows how to create a virtual procedure with SQL, but frankly it's discusses some more complicated topics than you need to get a simple one working.
https://help.sap.com/viewer/4fe29514fd584807ac9f2a04f6754767/2.0.01/en-US/bdf548f907614458894543c7768051c6.html
This page describes the CALL statement
https://uacp2.hana.ondemand.com/viewer/4fe29514fd584807ac9f2a04f6754767/2.0.01/en-US/20d364c175191014b592f500ccb5510c.html
Good luck, hope that helps,
Jim
Great Blog Thanks,
One issue i noticed is when i tried to use
insert into "PhysicalTable" select Columns from "LogicalTable" where "PARTITION" in (0,1,2,3)
i am getting errors saying IN operator is not allowed.
SAP DBTech JDBC: [403]: internal error: Error opening the cursor for the remote database Unsupported condition (IN)
its working fine when i use
insert into "PhysicalTable" select Columns from "LogicalTable" where "PARTITION" = 0
Regards,
Sai
Hi
I m loading a fixed file, about 40 columns and about 700.00 rows.
i m facing a very bad problem.
ROWS: '112148','392516'
Are ALWAYS loaded in a bad way. I tried to manage the file chaing rows, characters, position without any success.
Have anybody faced this issue?
thanks in advance
Hi Matteo,
yes, this was a bug in the fileAdapter - under some conditions, every few hundred thousand rows, the last few columns would get garbled. This has been fixed - I think it in version 1.3.9, but certainly in the latest version on both the 1.3.x line and the 2.x line.
Jim
Hi,
what is best way to see if that partitioning on virtual table (I use flowgraph) works?
I use HANA 1.0 SPS12 rev 009 with DPA on 1.3.5. So that should be sufficient.
Somewhere I should see several tasks running in parallel, but my gut feeling is that this is not working…
Tried it with splitting up Northrine-Westphalia into its “counties”.
thanks
Hi Clemens,
yes, you are correct - there is a bug preventing partitioning from working with the FileAdapter. This bug is being fixed in both the 1.3.x lines and 2.x lines and should be released soon. For now, your only alternative is to manually split a large file into several files and load each individually.
Jim
Hi Werner,
.TXT files are placed in DP Agent Server and it connected to HANA using File Adapter of SDI. The Column are separated by | (Pipe Delimiter). The virtual table is not able to read the last column data
6+6|LA|2017|ACTUAL|3|BOL|2000105406|COC|5000013583|NAB|263466|128.83|680769.8
6+6|LA|2017|ACTUAL|6|BOL|2000105432|COC|5000013583|NAB|12852|6.28|33208.28
6+6|LA|2017|ACTUAL|3|BOL|2000105447|COC|5000013583|NAB|4284|2.09|11069.43
6+6|LA|2017|ACTUAL|6|BOL|2000105447|COC|5000013583|NAB|6426|3.14|16604.14
6+6|LA|2017|ACTUAL|3|BOL|2000105418|COC|5000013657|NAB|147798|71.83|344605.83
i have data in this format in .txt file. The last column data (for e.g. 680769.8) is not visible in virtual table in hana for all the rows.
Please find the cfg file .
Please help me how can i get the last column data in virtual table.
thanks,
Smith
Hi Smith,
yes, Vladimir is correct. This was a bug in a couple of versions of the DP Agent, but it is fixed in the latest 1.3.x and 2.x versions.
Jim
Hi Smith,
Check the version and SP of your DP Agent, I have the same scenario and it works for me. I am using SP02 Patch3.
Best Ragards.
Vladimir
@jim.havlicek We are doing a scenario where we need to load HANA DB via SDI and source is File. Can you please let us know if xml files are supported in file adapter and does file adapter support complex nested structure in xml?
File(xml file) --> SDI --> HANA DB table
Thanks in advance,
Aman
Hi Aman,
no, the FileAdapter only supports delimited and fixed-length flat (i.e. no nesting) files.
I will ask around a bit, but perhaps the Camel JDBC adapter in SDI would work for you. You'd have to grab a JDBC driver for XML and try it. I am not sure how well that will handle a nested structure either.
Perhaps I am missing something - I will post back after I ask around the dev team.
Jim
Hi Aman,
what are trying to do with the XML file?
thanks,
Jim
Hi Werner,
We are trying to access multiple .txt files as source to one single virtual table. we have one .cfg file where we mentioned FORCE_FILENAME_PATTERN=*.txt to read all the files. Are we doing anything wrong? we see only one file under file formats source connection .
Hi Kishore,
the FileAdapter uses % as the wildcard, not *. So, %.txt should get your oars back in the water.
There are some edge cases that I should make you aware of :
Jim
Hi Werner, Jim,
I am testing with the FileAdapter on my local machine to a HANA 1.0 SPS12 DB on the Cloud Platform. Everything works fine, except it only reads up to 1,000 rows in the source file then stops. I tried changing the limits on the FileAdapter but it is still reading only 1,000 rows, and also tried adding parameter remote_objects_tree_max_size on the server.
Any ideas where this limit is set?
Thanks!
KW
Hi,
I ran the data provisioning without the config tool running on my machine, and it is loading past the 1000 row limit. At least that is what I think fixed it. I have not changed any limits on HANA.
Thanks,
KW
hi, I notice that there is an sFTP fileadapter as well. I was wondering if it's possible to use basic FTP as source rather than sFTP? I've tried a few permutations but it doesn't seem to work with FTP. is this expected?
Hi Lance,
correct - we do not support basic FTP.
Jim
Hi Werner,
Could you please confirm if the Partition column in Virtual table for the flat files will always have Partitions in 0 to 9 range? or can it have more than 10 Partitions in case the file size is huge?
Thanks.
Yes, 10 partitions are hardcoded into the adapter at the moment.
Hi Werner,
I hope you can help us out. Our scenario is that we have a flat file that is placed on a server every 15 minutes. We have setup the File Adapter on that server with the property below set:
<PropertyEntry name="usecdc">true</PropertyEntry>
We then created the remote source to that server and created the virtual table on top of that flat file in our HANA environment. We can do a data preview correctly.
I have also built a Flowgraph on that Virtual Table and I'm able to do batch loads correctly.
Our challenge is when enabling the "Real Time" flag under "Real-Time Behavior".
I can see that under the Task Monitor section of the Data Provisioning monitor it has created two new tasks ending in **_FL and *_RT. I can also see under the remote subscription monitor it created a subscription ending in **_RS.
Everything is green and successful, but no data is getting replicated from that virtual table when real-time is enabled.
I have tried a replication task but with no success either.
Any pointers on how to do real time replication using the file adapter on a flat file in a server?
Appreciate any input you can provide!
Best regards,
Marcelo
Hi Marcelo,
the FileAdapter works with either a Rep task or a realtime flowgraph, however it can only replicate in two scenarios :
It sounds as if you are attempting the second scenario.
To make this work, you need to have a % wildcard in the FORCE_FILENAME_PATTERN setting in your CFG file (e.g. SALES_%.CSV). Then, each filename has to have a unique name (e.g. SALES_2018_12_28_11_00.CSV, SALES_2018_12_28_11_15.CSV, SALES_2018_12_28_11_30.CSV, etc.)
When the 11_15 file is dropped into the folder, the CDC feature will detect it and trigger the load.
Does that match you use case?
Hi Jim,
Thanks for taking the time to reply. We got it to work. We were 99% there, we did everything correctly, just didn't see the step where we need to call the stored procedure that is created automatically once we save the replication task. The modelling guide can be a little non intuitive at times.
Once we called it, it did the initial load and started the replication. Our scenario is your case 1, where we're appending new records to the end of the file.
A follow-up question though, is that we noticed the first time we appended rows to the file. it took about 20 seconds to detect the new rows, and then 2 minutes to start the replication of the data (1000 rows only). The replication itself was fast.
The next time we appended the file, the whole process took about 1 minute.
Is there any reason there'd be such a delay, as the expectation is that it would be near real-time (in the seconds rather than in the minutes).
Appreciate your response.
Hi Marcello,
I wanted to touch base to let you know that I am still researching this question. The interval is supposed to be 5 seconds, and our test team sees a delay of less than 5 seconds on replication. But, I have been able to use one of my machines to reproduce the behavior that you see (60-90 second delays). So, I am working to isolate the differences and give you a concise answer.
Thanks,
Jim
Hi Marcello,
The short answer is that you can adjust the following two DPServer parameters to make a tradeoff between throughput and latency. If they are set very low (1 second is the minimum), then only one second worth of changes will be queued for each commit. So, in your example above, the small change at the end of the file will be detected in 5 seconds, and then transmitted from the DP Agent to the DP Server, and then committed in 1 second. By applying these changes, I was able to see the file changes in less than 10 seconds total on my test machine.
However, these settings apply to all CDC activity on the DPServer, so beware. If you also have high volume CDC activity on any remote source, the frequent commits will incur overhead that will reduce overall throughput. So, you'll have to consider the tradeoffs when changing these from their default settings.
ALTER SYSTEM ALTER CONFIGURATION ('dpserver.ini', 'SYSTEM') SET ('persistence', 'distributor_commit_interval_sec') = '1' WITH RECONFIGURE;
ALTER SYSTEM ALTER CONFIGURATION ('dpserver.ini', 'SYSTEM') SET ('persistence', 'receiver_commit_interval_sec') = '1' WITH RECONFIGURE;
You can refer to this KBA for more detail
2710286 - SDI realtime replication latency between RECEIVER and DISPATCHER comoponents - tuning DP Server performance
Jim
Dear Jim ,
Please let us know What the default values of the parameters distributor_commit_interval_sec & receiver_commit_interval_sec are ? I could not find it as per the below mentioned ways.
How can I identify the SAP HANA default value of a parameter?
Starting with SAP HANA 2.0 SPS 04 you can check view CONFIGURATION_PARAMETER_PROPERTIES or analysis command SQL: "HANA_Configuration_Parameters_Properties" (SAP Note 1969700) in order to display default values of SAP HANA parameters.
Other options to display default values are:
Regards
Venkat
Jim, I thought the FileAdapter does not support realtime still? Obviously it does but then the PAM is wrong.
It says on page 20: FileAdapter: Realtime = No.
And why do you need to set a property use_cdc for the adapter? Either it has a realtime subscription or it does not.
https://apps.support.sap.com/sap(bD1kZSZjPTAwMQ==)/support/pam/pam.html?smpsrv=https%3a%2f%2fwebsmp202.sap-ag.de#ts=5&s=smart%20data%20&filter=&o=name%7Casc&st=l&rpp=20&page=1&pvnr=73555000100900001404&pt=g%7Cd
Hi Werner,
thanks for pointing that out - the PAM is wrong, the doc is correct. That mistake on the PAM will be fixed the next time the PAM is updated. CDC has been available in the FileAdapter for at least a year and a half.
Jim
Hi Werner,
you were also correct about the use_cdc parameter. It is vestigial and that will be corrected also.
thanks,
Jim
Hi Werner
Could you help me in how to avoid these additional columns when i am trying write to a virtual table, in Flowgraph (XSA ) my data source (table ) doesn’t has these 5 columns.
Hi Kishore,
Are you reading from a virtual table defined on the FileAdapter, and writing another virtual table? And the target table doesn't have the 5 built-in columns, right?
If I understand correctly, you should be able to add a projection node to your flowgraph and simply remove those 5 built-in columns from the mapping. Alternatively, if you don't need the built-in columns in your flowgraph at all, then you could be more explicit and create a view on top of the virtual table, and then change the Data Source in your flowgraph to SELECT from the view instead of the virtual table.
If that's not the problem, perhaps I am misinterpreting your question.
Hope that helps,
Jim
Thank you for your valuable suggestionsJim, appreciated it.
Yes correct projection will help me during the inbound, where as i can unheck the addtional columns from the source and do mapping, but not during the other way ie., outbound from synonym to a virtual table, stills same issue during outbound with the 5 columns.
I dont know how to create a view on virtual table in XSA Weide environment, I will explore this option, appreciate if you could share some reference.
The Scenario I am trying here:-
I am reading from synonym created in XSA WedIDE for a real table and writing to a virtual table of file adapter.
my source table has only following columns
ADAPTER_NAME;NVARCHAR(1024)
SOURCE_VERSION;NVARCHAR(1024);
CAPABILITY_NAME;NVARCHAR(1024);
DESCRIPTION;NVARCHAR(1024);
SCOPE;NVARCHAR(1024);
IS_POOL_SUPPORTED;NVARCHAR(1024);
IS_MDE_SUPPORTED;NVARCHAR(1024);
where as when my virtual table has following columns
PATH;NVARCHAR(1024);
NAME;NVARCHAR(1024);
ROWNUMBER;NVARCHAR(1024);
ERROR;NVARCHAR(1024);
PARTITION;NVARCHAR(1024);
ADAPTER_NAME;NVARCHAR(1024)
SOURCE_VERSION;NVARCHAR(1024);
CAPABILITY_NAME;NVARCHAR(1024);
DESCRIPTION;NVARCHAR(1024);
SCOPE;NVARCHAR(1024);
IS_POOL_SUPPORTED;NVARCHAR(1024);
IS_MDE_SUPPORTED;NVARCHAR(1024);
I am able to write successfully from virtual table to real table using synonym with a flow graph, but similarly i am not able to write from synonym to a virtual table.
adding some screen captures to help more here.
Thank you in advance.
Kishore
Hi Kishore,
ahh, now I understand. To write to the FileAdapter, you should populate those built-in columns with hardcoded values.
PATH = ''
NAME = ''
ROWNUMBER = 1
ERROR = NULL
PARTITION = NULL
These values will not get written to the file, but that is enough to get past the problem. These columns are ignored when writing under normal circumstances.
Hope that helps,
Jim
Thank you Jim appreciate you update.
yes i was expecting that these ignored standard, but in mycase the the data target of the flowgraph is showing as error since i do not have matching columns form the source tables.
do i have to add those columns in every table that i want to write to file using sdi file adatper ?
Thanks
Kishore
Hi Kishore,
yes, you need to add those columns every time you write to the FileAdapter, with the same hardcoded values.
Jim
Dear Werner Daehn,
I want to use the SDI File Adapter for the following scenario :
every day, some external application will generate a new file in a path that I will read in order to integrate this in a BW flow.
As soon as I have read the file, I want to be able to move it to an archive folder.
Do you think there is a way to achieve this last step of moving the file with SDI / HANA ?
Thanks a lot for your help !
Vincent.
Hi Vincent,
yes you can. Actually, that question was asked previously on 9/29/2017. Here's the answer :
https://blogs.sap.com/2016/01/06/hana-smart-data-integration-the-file-adapter/#comment-393023
Let me know if you run into any problems.
thanks,
Jim
Hi Werner,
It is really great blog with giving insights use of File Adpater. I am new to HANA and SDI data provisioning.
I have a secenario, where CSV files will be generated in shared folder with different names. I need to pull them in SAP HANA (on Cloud) using file adapter everyday. I understood that using config file we can get the file underneath file adapter and can create virtual table on top of that.
So what will be best approach to get the files everying in HANA and who can I make it possible without missing any data or file.
It would be good if you can give some suggestions and adivse on that. It's bit urgent and I really need to firgure it out.
Hi Sunil,
there are a couple of approaches depending on your requirements, so let me give you some ideas and then you can ask followup questions. In either of these scenarios, you only need to create one CFG file with a wildcard in the filename like this :
FORCE_FILENAME_PATTERN=DailySalesFile%.CSV
#1 : you could then create a realtime flowgraph or a replication task that reads from the virtual table and writes to a physical table. The flowgraph/reptask will be running all the time and whenever a new file that matches that naming pattern is copied into the folder, the entire file will be read by the flowgraph or reptask and appended to the physical table. Once a file has been copied to the physical table, you can delete the file. You could, for instance, periodically delete old files through some external process like the one that is generating the files. Or you might consider calling a virtual procedure to do the cleanup - see the answer previously posted to this blog here :
https://blogs.sap.com/2016/01/06/hana-smart-data-integration-the-file-adapter/#comment-393023
# 2 : you could create the same flowgraph as in #1, but run it as a batch flowgraph. With this approach, you can choose when you run the flowgraph (through some external scheduler, for instance). You could run it daily, weekly, etc. and it would copy all files that are currently in the folder. And then, you can follow the same procedure as in #1 to delete all of the files that have already been loaded so that you don't load them twice.
Hope that helps. If you have followup questions, let me know,
Jim
Thanks for the prompt response. It is great news and I am glad you mentioned about it.
I have following questions now.
Looking forward to connect with you soon with your response.
Thanks & Regards
Sunil Sharma
Hi Sunil,
1. Yes, the File Adapter supports realtime replication. It has been supported for at least 18 months, but I am not sure of the exact version number. What version are you using? It is limited to append only, either at the end of an existing file or via the addition of a totally new file like you are doing. In short, we don't replicate updating or deleting records within a file, but I don't think you need that anyway.
2. you don't really need access to the installation folder, but you do need access to the File Adapter's Root Directory and File Format Root Directory. Your data files will generally be in the Root Directory, or a subfolder thereof, and your .CFG files will be in the File Format Root Directory, or a subfolder thereof.
You can use a text editor to create the CFG file, there are some examples in the doc - you can just start with a simple one that will give you all of the required parameters, and then it is just a matter of defining all of the columns. The batch file mentioned in that video is helpful in that it will read your data file and make reasonable choices for the columns' datatypes, but you would probably need to manually edit the CFG file to get it perfect anyway. I am pretty sure that the batch file will work from any folder, so if your admin can either give you access to that specific utility subfolder, or even copy it somewhere for you, I believe that will work.
Depending on what version you are running, the XSA version of WebIDE has a wizard to create the CFG file, given a data file. I think that is explained in an SAP Hana Academy video on youtube - I can chase down a link and add it here later.
One last trick - if you are using the XSC version of WebIDE, and if you already have the Hana table defined that will be the target of the load, you can create a CFG file based on that table (which should have the same columns as the file). To do that, you can create a flowgraph that reads from the Hana table and writes to a Template File. Simply by activating this flowgraph, the flowgraph will create a CFG file for the Template File - you don't even need to execute the flowgraph. Then you can just rename the CFG to whatever you want as if you had done all of typing, and create a virtual table from it. (Then you can discard that flowgraph, the whole point was to get it to create the CFG file for you).
Or, like I said above, just create it by hand in a text editor - unless you have to make dozens of them , the old-school approach is completely reasonable.
3. Even though you will have lots of data files, you will only have one CFG file and one virtual table. In your case, you will have data files like this :
Mydata_12032019.csv (let's say each file has 100 records)
Mydata_12042019.csv
Mydata_12052019.csv
Your one config file will be named something like Mydata.CFG and will have
FORCE_FILENAME_PATTERN=Mydata_%.CSV (the wildcard is why this works on multiple files)
Then you create a virtual table from that CFG file and call it V_Mydata.
If you run SELECT * FROM V_Mydata, it will read all 3 of the files above and return 300 rows.
When you run the SELECT against the virtual table, it reads whatever files are available AT THAT MOMENT. So, if you then deleted file Mydata_12032019.csv and re-ran the query, you'd only get 200 records. Likewise, if you now added a couple of new files, Mydata_12062019.csv and Mydata_12072019.csv, and re-ran the query, you'd get 400 records. The point is that a virtual table is really just "federation", so when you run a query using the virtual table, it reads the files from scratch each time.
Now let's contrast that with a replication task. Let's say you create a replication task on that same virtual table, and set the target as a physical table in hana called, for instance, P_Mydata. If you have those 3 files above in the folder when you start the reptask, all 300 records will be copied into P_Mydata. The reptask wlll keep track of which ones it has read (and, actually, how many records from each file). Then, if you added a couple of new files into that folder, Mydata_12062019.csv and Mydata_12072019.csv, it will detect them in 5 seconds or so, and read those also and now P_Mydata will have 500 records. Once they are read, you can delete the files and it will not affect P_Mydata.This is how a reptask on the FileAdapter differs from other reptasks - it doesn't support replication of deletes, only appends - but this is what you want for your use case anyway.
To demonstrate the difference between replication and federation, let's see what happens when you delete files. Let's say you put all 5 files in the folder while the replication task is running. You should end up with 500 records in P_Mydata. If you run SELECT * FROM V_Mydata, there will also be 500 in the virtual table. Now, delete 2 of the files. There will still be 500 rows in the physical table, P_Mydata, because the reptask copied them there. But, if you re-run the SELECT * FROM V_Mydata, i.e the virtual table, since this is federation, the FileAdapter will answer the query at that moment, and only return 300 records.
Hope that clears it up a bit, but if not, ask more questions.
Jim
Hi Jim,
You have just revealed bible of this case. It is like my project completed here itself. Thank a lot for your reponse and suggestion. I have very good picture of file adatper and how it doest work now.
Thank you so much again! Let me work on and test this. If I will have any doubts, I will definetly reat out to you again.
Have a great weekend. Jim!
Thank You
Sunil Sharma
How can I move file from the root directory to another?
Hi William,
I am not 100% sure what you mean. Do you mean after you load a file, you want to move it to somewhere else? If so, there have been a few other folks that have asked that as well
Here’s the answer :
https://blogs.sap.com/2016/01/06/hana-smart-data-integration-the-file-adapter/#comment-393023
https://blogs.sap.com/2016/01/06/hana-smart-data-integration-the-file-adapter/comment-page-1/#comment-456159
Let me know if you run into any problems, or if that's not what you mean.
thanks,
Jim
Hello HANA XSA Guru’s,
We have requirement to read the file daily basis that mean we have the file with DATESTAMP.
Example: FILENAME_20200118.csv so we schedule a flow graphs runs daily and we tried with FILENAME_%.csv but it picks all the files. The requirements is the virtual table should pick up only today's file.
Note: I tried with FILE NAME PATTERN = FILENAME_’yyyy.MM.dd’.csv but we get an error as “file do not exist….error”
Anyone can help us ? thanks
Regards,
Manoj
Hi Manoj,
what most folks do is to read today's file, and then move it to a folder like COMPLETED or somesuch. Then, your CFG file with FILENAME_%.CSV will work just fine because there will always only be today's in the folder. This certainly extends to running every other days, weekly, whatever. Be careful not to make COMPLETED a subfolder of your normal folder, however, because in some versions of the FileAdapter, the % will search in subfolders as well.
Several others have asked how to move the file after the flowgraph executes - here is the answer.
https://blogs.sap.com/2016/01/06/hana-smart-data-integration-the-file-adapter/#comment-393023
https://blogs.sap.com/2016/01/06/hana-smart-data-integration-the-file-adapter/comment-page-1/#comment-456159
Let me know if you run into any problems, or if that solution doesn't work for you.
Also, if the files are small enough, you can "read" all of them and filter on the NAME column in the virtual table. In this approach, the fileAdapter would read all of the records from all of the files, but only today's date would make it past the filter. To get it to filter on "today", you could construct the string on the outside, and pass it to the flowgraph as a parameter. Or, you might be able to craft that filter within the flowgraph itself with some SQL to get today's date and manipulate the string. Maybe - I haven't tried it.
thanks,
Jim
Hello There!
How to read the file from ZIP folder?
Is there any property available for the same at configuration file level?
Can you please suggest how to achieve the same.
Thank you.
Hi Srinivasa,
unfortunately, the FileAdapter doesn't support reading Zip files - at least not yet anyway.
Here are two workarounds I can think of :
https://blogs.sap.com/2016/01/06/hana-smart-data-integration-the-file-adapter/comment-page-1/#comment-456159
Hello FileAdapter folks,
I just published a KBA full of performance tips for the FileAdapter.
2885316 - How to diagnose and fix performance problems with FileAdapter data loads
Jim
Dear Werner Daehn ,
Please take a look and help for this question regarding with the file adapter.
https://answers.sap.com/questions/13048637/cannot-provide-the-data-file-table-in-file-adapter.html
Dear Werner Daehn ,
Now I created a configuration file and a file adapter with a virtual table "teste" based on the standard SAP table VBAK, by that I mean with the same columns . As you mentioned the virtual table is created with 5 additional columns.
I would like to try to export the content of VBAK to the virtual table , possibly without doing manually an insert for each table entry.
And I am struggling with SQL : trying to insert the content of VBAK into the virtual table teste
INSERT INTO SAPHANADB.”teste” WITH w1 AS (SELECT * FROM SAPHANADB.”VBAK” ) SELECT * FROM w1;
Could not execute ‘INSERT INTO SAPHANADB.”teste” WITH w1 AS (SELECT * FROM SAPHANADB.”VBAK” ) SELECT * FROM w1’
SAP DBTech JDBC: [270]: not enough values: DP_CLEAR_STA_HDR
It doesn’t work as there are 5 additional fields in the virtual tables : Path, name, rownmumber, error, part
Any SQL guru who could help me ?
Hi Raoul,
you need to supply some hardcoded values for those 5 columns in the insert statement, but they are essentially ignored.
SELECT '','',1,'',1,* FROM SAPHANADB."VBAK"
Jim
Hello There!
How to read the file from Excel File ?
Is there any property available for the same at configuration file level?
Can you please suggest how to achieve the same?
Thank you.
Hi Burak,
you can use the ExcelAdapter instead of the fileAdapter.
https://help.sap.com/viewer/7952ef28a6914997abc01745fef1b607/2.0_SPS02/en-US/94b50c05702f4050ad1a0cfd759bb3ad.html
Jim
Hello,
Please somebody help me, i have the follow file wich is separated by commas but the values are inside quotes
When I check the values in the virtual table, the result is empty.
This is my CFG
the problem is that the virtual table does not show the result, however when I delete the quotes the virtual table brings information.
Thanks for your help.
Regards
Hi Emilio,
1) please check the framework.trc file in <install folder>\log on your dp Agent machine. There might be a useful error message there.
2) When you "delete the quotes", do you simply use a text editor to search-replace ALL of the quotes in the file? Perhaps your file contains an unmatched quote somewhere, or a quote within a data field (e.g. a field named DIMENSIONS that contains "5' 4" x 3' ". As a debugging step, I'd recommend trimming the file down to only a couple of lines and manually check that the quotes line up correctly. Then, see if you can get one or two lines to load in the virtual table.
Jim
Hello Jim,
Thanks.
Hi Emilio,
can you ensure that your dpagentconfig.ini file has framework.log.level=INFO (just to make sure we see any warnings that might be generated also)
Then try it with just the first two-lines from your problematic CSV file. I am still suspicious that there is a problem in the file somewhere, and that excel is correcting it automatically.
If necessary, you can paste the entire CFG file and the two-line data file here and I can take a quick look.
thanks,
Jim
Hi Experts,
I have a Fixed length file with Header (single) and Item records (multiple). Also, Header and Items can be repeated any times. Below is the file structure.
Header1
Item1
Item2
Header 2
Item1
Item2
Please suggest a way to read the file and replicate the records in Physical table.
Thanks