This is the second post of a series that talks about the BlogProject application development with HANA XS. If you missed the first one make sure to have a look:
Given the fact that I will develop the application in parallel with the posts, I will update them when and if new requirements arise. If I miss anything, forgive me if there are any inconsistencies regarding what I am proposing and what I will actually do.
- Application schema
The above schema represents the data model of the application. Some things that might by confusing are the “subject”, “latitude” and “longitude” columns and the POST2POST table. Let’s start with the columns. The “subject” column of the POST table is going to hold the subject of a post, which will be the result of the use of the text analysis capabilities of SAP HANA. The “latitude” and “longitude” columns will help us with the geospatial statistics. Lastly, the POST2POST table will save all the link action between the posts, storing, for each link, the post that includes the link and the post that is referenced.
Column or row?
In the most cases the tables we create in HANA are columnar, but you can use row tables of course, depending on your application’s needs. Each table type benefits specific processes.
a) Easier to insert and update
b) Better if you need the complete row, because the reconstructing is one of the most expensive column store operations
c) Preferred if the table has a small number of rows (e. g. configuration tables).
d) If your application needs to only process a single record at one time
a) Only affected columns have to be read during the selection process of a query.
b) All the columns can serve as an index
c) Facilitates higher compression rates
d) Better if the table holds huge amounts of data that should be aggregated and analyzed
e) Better if the table has a large number of columns.
f) Preferred if calculations are typically executed on single or a few columns only.
To sum up, the basic difference is the type of processes for which we use each table type. In OLAP processes it is better to use column stores because for analysis we query certain columns and column stores provide a much better compression, thus minimizing the querying time. On the other hand, row stores are better for OLTP processes facilitating fast inserts and updates.
Notes and suggestions:
To enable fast on-the-fly aggregations, ad-hoc reporting, and to benefit from compression mechanisms on transactional data it is recommended to store them in a column-based table.
If you need to join tables avoid using different storage types, since using both storage engines reduces performance.
Attribute, Analytic and Calculation Views are only supported on columnar tables.
Enabling search is only possible on column tables.
How to create
There are two ways to create a table:
We can create a table via the repository by creating a new file in the “data” folder, with the extension .hdbtable. (If we choose to create a new “Database Table” and not a “File”, then the repository understands the file extension and we don’t need to add it. This applies to all file types). Then all we have to do is execute an SQL CREATE statement.
To create a table, simply create a “New Table” in the “tables” folder of the schema and then add the columns, types etc. in a graphical manner. If we create our tables that way, we will not be able to add a foreign key to the table. To do so we will have to add it with a simple SQL statement.
For our application I have created the 5 tables below via the “Systems” view (Catalog):
HANA does not support the “autoincrement” of a column (e.g. id), so we need to create a sequence to do that, which also provides additional capabilities.
A sequence definition includes the below variables, of which only the schema is compulsory, some have default values and the rest are optional:
schema = “”;
increment_by = integer; //the incrementation value (default = 1)
start_with = integer; //the first value of the sequence (default = -1)
maxvalue = integer; //the maximum value of the sequence
nomaxvalue = boolean; //if the sequence has a max value or not (default = false)
minvalue = integer; // the minimum value of the sequence
nominvalue = boolean; // if the sequence has a min value or not (default = false)
cycles = boolean;//if the sequence starts with the minvalue after the maxvalue has been reached, or the opposite
reset_by = “”; //the query that will be used on server reboot to find the value that the sequence will start with
public = boolean; //(default = false)
depends_on_table = “”; //the dependency to a specific table
depends_on_view = “”; // the dependency to a specific view
How to create
Create a new file inside the “data” folder of the repository with the extension .hdbsequence. In our application, we just want the ID columns to increment by 1 so I used a few variables. For example, for the USER table:
Create a “New Sequence” in the “Sequences” folder of the schema and add the values of the sequence in a graphical manner as shown below:
For the BlogProject I created the below sequences. The POST2POST table does not need a sequence because it does not have an ID column, but two columns that are foreign keys referencing the POST table’s ID.
Note: A single sequence can be used for all tables, but its value is incrementing regardless of the table. For example if we get the value 5 from a call of the sequence and then call it for another table, we get the value 6 and not the next value we were probably expecting for the specific column of the table. If we want a single incremental value for each table, then we must create different sequences, one for each.
Calling a sequence
Sequences are not associated with tables, but can only be restricted to apply to certain tables (with the “depends_on_table” variable). In fact, they are used by applications through SQL statements, which can use CURRVAL to get the current value of the sequence, or NEXTVAL to get the next value.
For example if we want to insert a new user to our USER table we execute this statement:
insert into “BlogProject”.“USER”
values (“BlogProject”.“MyBlogProject.data::userID”.NEXTVAL, ‘user’, ‘user’, ‘user’, 1)
- Accessing database objects
When we access a table from a file in the repository we write “schema::table_name”, and if we call it from the SQL console we write “schema”.”table_name”. The same rule applies to all the database objects.
And now we came to the end of our persistence model creation. In the next post we will talk about the modeling of our data using information views and other data-centric application logic enablers (triggers, functions and procedures).
Thanks for reading!