Skip to Content
Technical Articles
Author's profile photo Mark Gearhart

Reducing storage for text data in ASE

I have not found a write-up which adequately describes the storage of text data in ASE, especially when it comes to storing data efficiently space-wise. So, here is an example I worked through recently, after receiving complaints about the HUGE amount of space used by relatively small amounts of text in ASE.
 
In this example, we are going to take an 7.81GB table, reduce it to 798MB, and then reduce it to 487MB. These steps show how to reduce storage for text data.
 
LOB (text and image) data compression is not documented very well. There are actually 2 things available in ASE. The first thing is called In-Row storage. The second thing is called Data Compression, which I will call an “enhancement” to In-Row storage. None of this is documented in one spot. These two features were added to ASE over several years. To get to the bottom of this, you will have to search through 9000 pages of ASE documentation. OK, first In-Row storage.
 

IN-ROW STORAGE

 
This feature moves LOB data from index pages into data pages. By default, LOB data is stored in index pages called Off-Row storage, with ONE ROW PER INDEX PAGE. So, if you have a 1,000,000 row table with text data, you will have 1,000,000 index pages of text data. Even if the actual text is 1 byte, a full index page will be reserved for it on disk. This is an incredible waste of disc space. By contrast, In-Row storage will pack as much data as possible into data pages.
 
I think it is kind of weird to call it an “index” page. It is really an “Off-Row LOB page”. But I suspect the ASE engineers had probably coded up the dataserver with two kinds of pages, “data” and “index”, and maybe adding a third kind of page might have been a monumental task. Maybe.
 
Our dataserver page size is 8192 bytes. That would be 8GB more or less for 1,000,000 rows. Here is an example. I added extra evidence to verify disk space since I was having trouble recognizing the effect:

1> create table SMART_Messages (
2> messageId   int             not null,
3> jsonMessage text            not null)
4> go

 

bcp TEST..SMART_Messages in 1million.dat

OK, a table is created with a text datatype and we have loaded it up with 1,000,000 rows. Let’s look at the “before” size.


1> sp_helpdb TEST
2> go

 

device_fragments size free_mbytes
data08 5120.0 MB 165 MB
data07 10240.0 MB 7461 MB

 


1> sp_spaceused SMART_Messages
2> go

 

name rowtotal reserved data index_size unused
SMART_Messages 1000000 8034224 KB 33328 KB 8000008 KB 848 KB

 


1> sp_spaceusage "display using unit=MB","table",SMART_Messages
2> go

 

TableName IndId NumRows UsedPages RsvdPages
SMART_Messages 0 1000000.0 32.5546875 33.3046875
SMART_Messages 255 NULL 7812.5390625 7812.6171875

In this last query, you can clearly see the space usage. The text data uses 7.812GB of storage! There is not nearly that much text data. In fact, the average jsonMessage text length is 755 bytes. In reality, 1,000,000 rows of data should be about 755MB more or less. OK, let’s move the text data to In-Row storage. To do this, you have to estimate the number of bytes for an In-Row LOB column. The range is 0 up to the logical page size of the database.


1>select @@maxpagesize
2>go
 
8192

Row size is actually 8108 bytes due to overhead. Subtract messageId (4 bytes) and jsonMessage can be 8108-4 = 8104. OK, make it an even 8000. This is the maximum amount of json text that can fit into one In-Row data page. If the json text is more than 8000 bytes, it will remain in Off-Row index page storage. If the json text is small, then a bunch of them can fit into one data page, up to 8000 bytes, before the next data page is allocated.
 
We will now modify the column to have In-Row storage, and migrate the data from Off-Row to In-Row. If you get a warning from the alter table command below, then lower the number.
 


W (1): Warning: Row size (8128 bytes) could exceed row size limit, which is 8108 bytes.

Do an sp_help on the table, add up datatype lengths in the length column of the output, and subtract from 8108. That is how many bytes are free on a data page after all the columns have been accounted for.


1> alter table SMART_Messages modify jsonMessage in row(8000)
2> go
1> update SMART_Messages set jsonMessage=jsonMessage
2> go
1> sp_helpdb TEST
2> go

 

device_fragments size free_mbytes
data08 5120.0 MB 4392 MB
data07 10240.0 MB 10144 MB

 


1> sp_spaceused SMART_Messages
2> go<

 

name rowtotal reserved data index_size unused
SMART_Messages 1000000 823336 KB 817448 KB 8 KB 5840 KB

 


1> sp_spaceusage "display using unit=MB","table",SMART_Messages
2> go

 

TableName IndId NumRows UsedPages RsvdPages
SMART_Messages 0 1000000.0 798.296875 803.921875
SMART_Messages 255 NULL .0390625 .1171875

The sp_spaceusage query shows 798MB for everything. The sp_helpdb query shows that we freed up 6.9GB ((5120-165)+(10240-7461)) – ((5120-4392)+(10240-10144)). Yay! It worked. There is a tiny bit of index storage used. I am not sure what that is since the maximum json text message is 955 bytes and we allow for 8000 bytes In-Row storage. I suspect it is needed due to the fact that text data is part of the schema, and Off-Row storage could still be used for extra-large text greater than 8000 bytes.
 
We also see something extra in the table schema, due to In-Row storage. There are now 2 extra columns in the schema definition for In_row_Len and Object_storage.


1> sp_help SMART_Messages
2> go

 

Column_name Type Length In_row_Len Prec Scale Object_storage
messageId int 4 NULL NULL NULL NULL
jsonMessage text 16 8000 NULL NULL in row

Now the question is, does the In-Row behavior persist as new rows are inserted? Let’s try it by inserting 1,000,000 more rows.


1> insert SMART_Messages select * from SMART_Messages
2> go
1> sp_spaceusage "display using unit=MB","table",SMART_Messages
2> go

 

TableName IndId NumRows UsedPages RsvdPages
SMART_Messages 0 2000000.0 1593.921875 1599.039
SMART_Messages 255 NULL .015625 .0625

Yay! The jsonMessage column is In-Row forever. Let’s see if a brand new table will work for In-Row storage without having to migrate from Off-Row to In-Row:


1> create table SMART_Messages (
2> messageId   int             not null,
3> jsonMessage text            not null in row (8000))
4> go

 


bcp TEST..SMART_Messages in 1million.dat

 


1> sp_spaceusage "display using unit=MB","table",SMART_Messages
2> go

 

TableName IndId NumRows UsedPages RsvdPages
SMART_Messages 0 1000000.0 798.1796875 803.2890625
SMART_Messages 255 NULL .015625 .0625

OK great. In-Row storage works for a table created with the “in row” setting on a text column.

You can also force In-Row storage by default with the inrow_lob_length parameter of alter database command.


1> alter database TEST set inrow_lob_length=8000
2> go

 


1> create table SMART_Messages (
2> messageId   int             not null,
3> jsonMessage text            not null)
4> go

 


bcp TEST..SMART_Messages in 1million.dat

 


1> sp_spaceusage "display using unit=MB","table",SMART_Messages
2> go

 

TableName IndId NumRows UsedPages RsvdPages
SMART_Messages 0 1000000.0 798.46093755 803.2890625
SMART_Messages 255 NULL .015625 .0625

OK great. Now you don’t even need to use the “in row (8000)” thing when you create the table unless you want something other than 8000.
 

DATA COMPRESSION

 
The first thing to know about LOB data and compression is that Off-Row LOB data is NOT COMPRESSED, no matter how hard you try. In-Row LOB data is only compressed if you set the inrow_lob_length database option. This is totally undocumented. You may think you are compressing text data, but you are not unless this specific option is included in the database setup. So let’s compress In-Row LOB data to see if it is better than uncompressed.


1> sp_configure "enable compression",1
2> go
1> alter database TEST set compression=page
2> go
1> alter database TEST set lob_compression=9  -- optional. 9 is the default.
2> go
1> alter database TEST set inrow_lob_length=8000
2> go

 


1> create table SMART_Messages (
2> messageId   int             not null,
3> jsonMessage text            not null)
4> with compression = page   -- not really needed since database option is set
5> go

 


bcp TEST..SMART_Messages in 1million.dat

 


1> sp_spaceusage "display using unit=MB","table",SMART_Messages
2> go

 

TableName IndId NumRows UsedPages RsvdPages
SMART_Messages 0 1000000.0 487.5 487.53125
SMART_Messages 255 NULL .015625 .0625

You can see that compression works. It looks to me like we have gone from 798MB for the In-Row uncompressed, down to 487MB for the In-Row compressed. That is a 39% reduction in size. Not bad.

Assigned Tags

      5 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Tilman Model-Bosch
      Tilman Model-Bosch

      Thanks for the blog post .

      You may want to mention DB option 'deallocate first text page' resp. table attribute 'dealloc_first_txtpg' .
      If the table attribute is set for a table with LOB columns AND you update a LOB to NULL, then the first LOB page  gets deallocated.
      If it is not set - then it will remain allocated - wasting one server page size to store a NULL .
      DB option 'deallocate first text page' sets this table attribute as default for all tables with LOB columns created in the database.

      Author's profile photo Mark Gearhart
      Mark Gearhart
      Blog Post Author

      Actually, I did not know that. Thanks for the tip.

      Author's profile photo Ben Slade
      Ben Slade

      Interesting article.

      Also, you might want to mention that compression isn’t available without purchasing an extra license (although it comes with the Edge license).

      I think in-row LOB’s don’t require an extra license (correct me if I’m wrong)

      Also, if you modify a column to in-row LOB and get the “exceeds row size” warning, you have to modify it to off-row first before trying a small value.  But according the the document page syntax, there is no “modify off row” option.  And when I try ‘alter table mytable modify mycolumn off “row”‘ it indeed doesn’t work.

      Also, if you modify a text/image column to in-row, does it require any programming changes to clients that are programmed to retrieve arbitrarily large text/image columns in chunks?

      Author's profile photo Sean He
      Sean He

      The client application need not any change.

      Author's profile photo Tilman Model-Bosch
      Tilman Model-Bosch

      In Row LOBs are invisble to the application - it just changes the way ASE stores these data. I don't see how this could require chnages on client side.