Additional Blogs by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member

As noted before, the River Definition Language aims to make business application development easier in several different ways. The main goal of the language design being focusing the developer on the business problem he’s solving as much as possible and less about structuring the code to fit some pattern or lifecycle/performance consideration.

This post explores the language from that perspective: what code can be written, what are its semantics, and how it can help in achieving a productive (and yet efficient) development model. We assume familiarity with the River Definition Language (RDL), specifically its syntax and specification. Further information can be found here.

What does the Code Do?

As noted earlier, for the sake of readability, the code of an RDL specification (a “program”), is meant to be written and read as a specification, a declaration of how the application’s data is to be updated, or how certain values are calculated given a set of inputs, and a state of the application.

The immediate corollary for the developer reading and writing RDL, is that the code should not be interpreted as reading values from a persistent storage into memory, changing memory values and writing back to the persistency. Instead, the code should be read and written as a specification of how certain values are changed - how the application state is modified.

This declarative model shouldn’t be foreign for people using SQL queries, for example.

When writing a SQL query, the developer has a certain mental image of the data structure (the tables), and their relationships. The developer then writes down what kind of data he needs to be read, filter, etc. The query itself is declarative in the sense that it does not specify how exactly the data is read, in what order, using what indices. It doesn’t specify whether the data is first projected and then filtered, or the other way around. It is the job of the DBMS and its underlying SQL engine to come up with that plan. In that sense, a SQL query is a mini-program on its own. Note that I did not mention data manipulation, only queries[1].

With RDL, the exercise is very similar. Only here, we apply it to data manipulation, not just queries. This of course makes it harder to reason about, but it has the potential to enable more concise and clear code that distills the application’s idea and not its realization in a given system.

This way of looking and writing code isn’t new. Functional programming languages have been doing something similar for quite a while. In functional languages, one focuses its mental energy on defining the functions - manipulations to the data (and occasionally counting brackets :wink: ), and the code reflects only that. A developer using a functional-paradigm language usually doesn’t so much care about memory assignments, counting loop variables, and relevant memory model. In that sense, a SQL query is a functional language. One does not maintain the loop variable going over the read table - there’s simply no need for it.

The downside (at least one of them) is that sometimes, describing complex application logic using just functions isn’t very intuitive for a developer.

The merits and criticism of functional programming are beyond the scope of this post; suffice it to say that the notion of focusing on intent by describing how data is changed given a certain input is close enough to what we’re trying to achieve with RDL. This spirit also lends itself quite nicely to behind-the-scenes optimization, specifically parallelizing code, so it aligns rather well with the goals we have set for RDL.

RDL in itself isn’t a purely functional language. For the reasons given above, we adopt some principles of the functional paradigm. But we also try to make developers feel at home, and convenient, and also help with creating useful abstractions when necessary. As a result, in RDL, a developer also talks in “objects”, and how these objects relate and interact with each other. Abstractions relying on defining new data types are possible, as well as encapsulating data with relevant actions. But the underlying “programming model” is rather functional. The goal of familiarity and intuitive code also drove us to allow all kinds of “shortcuts” or syntactic sugars, for an otherwise functional model.

Changing Data

One aspect of this model is that from a programming point of view, the developer deals exclusively with data values, not their memory locations, or pointers to those values. This is usually referred to as “value semantics”. Meaning: a value of data is just that - a value. It is not changed, but rather used to compute (/create/define) a new value, using some computation.

This also means that variable definitions are not definitions of places in memory where data is stored, but rather name bindings. In other words: a variable is simply a way to tag a specific value computed in the program. After a name is bound to a value, it can be used to refer to that value elsewhere in the program, usually in the scope of the block defining it. A variable can be re-bound to another value, so it will refer to the new value from that point on (“from that point on” - referring to the lexical structure of the code block in which the variable is used).

So in fact, when writing:

let e = Employee { name : ‘John McClane’,

position : ‘Police Officer’ … };

it is simply attaching the name ‘e’ to the value Employee { name : ‘John McClane’, position : ‘Police Officer’ … }, to be used in the scope where ‘e’ is defined.

Action parameters actually behave similarly - they simply bind values to names that are then used in the specification of the business logic in that action. This means that parameters are “passed-by-value”.

For those preferring the memory management perspective, all object instances are immutable - they don’t change after they’re initialized/allocated. So when looking at a variable, you’re looking at a stable value, there’s no chance for anyone else changing the data “under you”.

This is the point where the reader should ask himself how it is then possible to change value on variables; after all, one could write:

let e = Employee { name : ‘John McClane’, position : ‘Police Officer’ … };

      e.name = ‘Hans Gruber’;

i.e. changing the name of the value ‘e’ is bound to.

What happened in this case? Isn’t it supposed to be an immutable value?

The answer is that indeed this is possible, and also, the value hasn’t changed. The model still holds. What happens in this case is that ‘e’ is implicitly re-bound to the new value - the value that ‘e’ was bound to before the rebinding, plus the changed value.

So the above snippet can be thought of:

let e = Employee { name : ‘John McClane’, position : ‘Police Officer’ … };

      e = Employee {name : ‘Hans Gruber’, position : ‘Police Officer’ … }; //the “old” value of ‘e’ with the name changed

Again, for those looking at it from a memory model point of view, this behavior is similar to what is known as “copy-on-write” - the object instance is copied to a new location whenever it is changed.

Why did we do this?

For allowing a more familiar feel for developers coming from traditional imperative languages, where the assignment there happens “in-place”. This looks very similar in that sense. The difference would be when someone would bind another variable to the same value.

For example:

let e = Employee {name : ‘John McClane’,

position : ‘Police Officer’ … };

let m = e;

      e.name = ‘Hans Gruber’;

In this case, ‘m’ points to the “old value” (with ‘John McClane’) and ‘e’ points to the “new value”. So a new value was computed (the name change), and re-bound to ‘e’, but not to ‘m’.

What about ‘this’?

As noted previously and above, the River definition language allows defining objects, and encapsulating data in them. One of the implications of this, is that actions defined on an object have access to the data of the object value (the instance) on which it was invoked; i.e. they have access to an implicit parameter, called ‘this’, which is also passed to every action.

So when the following action:

action Employee.setName(n : String)

{

     this.name = n;

}


is invoked:

             e.setName(‘Hans Gruber’);

the value bound to ‘e’ is passed to ‘setName’ as ‘this.

‘this’ is just another parameter passed to every action, so it is also a name binding to a value.

But what happens when we change the value of ‘this’? Are we changing the value? Or is it also passed by value?

The answer is that ‘this’, when used inside an action definition (its body), behaves like other parameters and variables defined. It does receive a special treatment when looking at the return value of the action - the ‘this’ value is always returned from the action. The difference is also apparent in the client code invoking the action: if the action is invoked using a variable, that variable is re-bound to the value returned from the action for ‘this’.

So the example above in fact behaves as if it was written like this:

action Employee.setName(n : String) : Employee

{

     this.name = n;

     return this;

}

            e = e.setName(‘Hans Gruber’);

For an action that is defined to return a value, it only means that it in fact returns a tuple of values.

So the action:

action Employee.updateSalary(newSalary : DecimalFloat) : DecimalFloat

{

let oldSalary = this.salary;

      this.salary = newSalary;

      return oldSalary;

}

is in fact defined as:

action Employee.updateSalary(newSalary : DecimalFloat) : (Employee, DecimalFloat)

{

     let oldSalary = this.salary;

     this.salary = newSalary;

     return (this,oldSalary);

}


and the invocation:

            previousSalary = e.updateSalary(100000);

is in fact:

            (e,previousSalary) = e.updateSalary(100000);

Note that this example uses the tuple syntax, which is also useful for other purposes. Here it is just used to illustrate the behavior defined for using the ‘this’.

A series of statements changing values bound to variables in a given block is therefore just a definition of a single function broken into individual steps to ease readability.

For example, the action:

action Employee.promote()

{

     this.level = this.level + 1;

     this.updateSalary(this.salary * 1.1);

}

is in fact the function (in some pseudo code, assume all necessary definitions are in place):

            promote (this : Employee) =  new Employee(this.name, this.level + 1, this.salary * 1.1);

---

Streams (collections of instances) are values as well, and are therefore subject to the same rule as other values. So when a value is inserted or removed from a stream – it’s a new stream value. And when a value is changed in a stream, one could view it as a change in the stream as well – the old value is removed and the new one is inserted.

But streams are also collections of values (or, in fact, a representation of a collection of values) so we can easily apply a computation to an entire set of values by simply applying it to the stream. Also computing a new stream value from an existing one is easy enough.

For example, given a set of Employee instances, one can compute a new stream value, a stream of strings containing the employee names by simply writing:

            select name from employees

where employees is just a variable bound to the specific set of Employee instances.

Remember that in RDL we define the store’s set of values for a given object as a stream as well - the entity stream for that object, which is also available in code.

Similarly, computing the stream of names and salaries of all the highly paid employees would be a computation defined as:

            select name,salary from Employee where salary > 50000

In this case, we computed a new value out of the Employee entity stream, out of the store. An entity stream is simply a definition of stream value – the values in the application store, which is “created automatically” given a definition of an entity.

Note that all objects essentially define a stream and can be manipulated similarly. For example, the Integer type, defining the integer scalar type can also be thought of as a stream of values, and queried accordingly. For instance, the stream of all the even integers can be expressed as simply:

            Integer[Integer % 2 = 0]

(the Integer in the predicate specification refers to the individual objects instances – the numbers – passed to the filtering function).

Similarly, given an ‘isPrime’ action which tests whether a given integer is a prime number, we could express the set of all prime numbers as:

            Integer[isPrime(Integer)]

What about Associations?

Associations between objects present an interesting conundrum in this case. Semantically they represent relationships between object instances (values), which are part of the model. So from a data model point of view these are “pointers” – pointing from one value to another. But object values as treated in RDL are immutable, and this includes values that are associated to one another.

Changing the association itself is similar to changing any other element of an object – it computes a new value of the pointing instance.

For example:

     let e = Employee[[id = 1305]];

     e.manager = Employee[[id = 2013]];

(assume manager is an association to another Employee instance)

In this case, a new value of Employee is computed, and e is re-bound to it. This in itself isn’t surprising since a value is a value, whether it’s a structured one or a simple scalar value.

But what if the associated value itself is changed?

For example, with the statement:

     e.address.country = ‘Oz’;

(assume e.address is an association to an Address object)

we change the associated value, and then re-bind it to e.address. Another way to look at it is that we in fact compute two values – the associated one and the one bound to e – a new Address and a new Employee.

As before, other bindings to the same Employee instance won’t be automatically re-bound to the new Employee value, the one who lives in Oz. This includes other associations.

So far, this isn’t anything new. But here comes the “tricky” part – the associations actually model relationships between object instances, and these are reflected in the store. So when updating the application’s data store, changes done through a different variable are reflected in the state of the store. Another way to look at it is that in the data store, we do actually have reference semantics (as opposed to value semantics) – the references between the instances. But these references exist in the store only, not in memory, which is irrelevant in our model.

For example, consider this slightly expanded example:

            let e1 = Employee[[…]];

      let e2 = Employee[[…]];

      …

      e1.address = e2.address; //they moved in together

      e1.address.country = ‘Oz’; //relocating

at this point, e1.address != e2.address – they have a different country value.

But, e1.address is e2.address – they are looking at the same instance, from an identity point of view. Recall that the ‘is’ operator compares the identity of two instances (in “DB-speak” – their keys, but this is generally a broader concept).

What happens if we write e1.save(); at this point? It would update the corresponding Address instance with the new value, with the country being ‘Oz’.

What happens if, alternatively, we write:

e2.address.city = ‘Metropolis’;

e2.save();

?

It would result in a new application state where address, for both employees are living in Metropolis, but not in Oz.

So it is important what value gets saved, but two different values with the same identity will result in referring to the same value in the application’s data store.

The key point is that business logic expressed in RDL reflects the changes made to the application’s data. This state of data is reflected in the store, which is the real “memory” of the application - its logical/semantic memory. The language model deals only with changes to that memory. Not with any other changes.

An implementation of this model may of course make use of a computer system’s memory and distinguish between that memory and the persisted data, e.g. for optimization purposes. But this is an implementation detail and not part of the language/execution model definition. The RDL “program” deals only with the changes it needs to make to that state.

The River Flows - Between States

Looking at the River program from this declarative point of view leads us to interpreting the code accordingly. So a piece of business logic written with RDL (an action) is in fact a specification of how the state of the application changes. You could look at an action as one function that receives as input the application state (the store), and possibly some other values, and outputs a new application state, with possibly new values:

In other words, an action represents how a new application state is computed given a previous state and a set of inputs. The input is composed of the formal parameters defined for an action (+’this’) and the streams of values in the application store when the action is computed – all the entity streams.

The action code itself defines how the values in the input are used to compute new values that will either be returned as a result of the computation or used to update the application state (the store). In this sense, the block of statements inside an action defines a pipeline of computation through which the (streams of) values flow. The source of the pipeline is the formal input argument set + the application store. The sink of the pipeline is the output argument set + the application store.

The action defines the new state by indicating what value is part of the new state (or what values are removed). This is done using the ‘save’ and ‘delete’ actions. In other words, what we call “modifying the store” is in fact calculating the new store state. The built-in store modifying operations (save, delete) are not commands to the underlying database; they are simply an indication of what value is part of the new state.

Of course, an action can also not calculate a new state of the application, and simply “leave” the application in the same state, i.e. the output state is the same as the input state. This is what we refer to as a pure computation – an action that simply computes a new value on the output channel, without modifying the application state.

The values of the store (the entity streams) are treated in a special way in this model – they are logically read and written for each action without the need to explicitly specify it.

One could think of the store as the ‘this’ variable of the entire application, passed to each flow of actions (each root invocation). It is the exact same idea, on a larger scale – instead of passing a variable called ‘this’, we pass a lot of variables named after the entities defined for the application. Semantically, each ‘save’ and ‘delete’ needs to be reflected in the store (like changes to ‘this’ are reflected in an action) and flows are independent of each other, like separate transactions. Technically it doesn’t necessarily mean that it’s a call to the DB – that’s an implementation detail.

Another analogy that might be useful for developers might be that of "version controlling the application": you can think of an application's data as if it were stored in a version control system. An action invocation in this sense is like making a private branch from that state, and making local changes that are then merged back into the master branch - the application's store. In this analogy, variables are simply tags associated with interim values. They are there so one can refer to these values, in this branch alone. The different statements are expression of how to compute new versions for the values that were given as input (="copied from the master branch").

Sailing down the River

Having this model in place allows for easier performance optimization, specifically parallelization, to take place. Treating values as immutable, and making action referentially transparent allows for easier caching and distribution of the computation. Looking at an action block as a definition of a pipeline allows us to opt for computation models that are not necessarily bound to a physical memory; possibly handling infinite value streams.

But this doesn’t come with the price of an awkward syntax. The syntax and its semantics are pretty straightforward and consistent and in most cases “look” imperative. Only when writing RDL code, one has to pay attention to the fact that changes in one place are not automatically reflected in other bindings. We maintain that from a program readability and maintainability this is actually good news – relying on references of the same object values being updated in different places, using different “pointers/references” often leads to more complex and harder to maintain code. Maintaining such semantics in RDL would essentially mean that a developer would have to explicitly rebind variables to changed values. But this only serves to highlight the complexity that is already there.

Easier Coding

It’s not just about looking imperative and being functional. It’s worth noting the RDL syntax allows for some easier expression in some cases where traditional imperative languages end up with (usually) more convoluted code.

One specific area where this shines is the usage of loops. There is rarely a need to define a loop in a River program, though it’s possible. Almost all situations where a loop is needed can be easily transformed into using select or apply-to construct over a given stream or set of streams.

For example, paying a 120% bonus to all employees with a high performance rating can be simply achieved with (assume all definitions are in place):

            apply giveBonus(1.2) to Employee where performanceRating = 5;

or:

            apply giveBonus(1.2) to Employee[performanceRating = 5];

there is no need for a for-each or while loops.

Projecting a value out of a given stream of values is trivial using the select statement.

But note a select statement also serves as mapping function (usually called ‘map’ in other languages).

For example, say we want to calculate the commission for all sales done today.

The imperative method, which is also possible in RDL, would be:

     let commissions = [];

     foreach (sale in Sale[dateSold = now().day])

           add calcCommission(sale) to commissions;

but this can be more elegantly written as:

let commissions =

  select calcCommission(Sale) from Sale[dateSold = now().day];

since calcCommission is simply an expression over the value of Sale.

External Data

As noted previously, applications today don’t always define their entire data used as part of their data model. It’s in fact quite common to see applications that integrate several data sources.

To this end, the River model is open to consumption of external data and services; and it is often necessary to read and persist data in other data containers aside from the data store defined by the River model. The immediate corollary of this is that some update semantics and behavior that are defined for River runtime containers are not implemented in the same way in other places. So one can't necessarily rely on the same semantics in those cases.

At the same time, we strive to have a development experience that is easy to use and consistent as much as possible. This is why external objects are consumed in the code in a way that is very similar to objects defined in River. A developer must take care not to confuse between these cases. External object are annotated accordingly (@existing) so it shouldn’t be hard to follow.


[1] Of course, the reality is that SQL queries are not purely declarative, and that often optimization considerations are taken into account when writing the queries, sometimes DBMS-specific. But for the purpose of discussion, we’ll leave it at that.