On a streaming server or cluster, all projects run in a workspace. There can be one workspace for the whole cluster – in fact there is one created by default (called “default”) – or there can be many workspaces defined on the cluster. But regardless of whether there is one or many, all projects must run in a workspace.
Now at first glance, it may not be obvious what the purpose of a workspace is, and I do get questions about it from time to time, so I thought I’d talk about the raison d’etre of workspaces on the streaming cluster…
In the picture below, I have two workspaces, one called “default” and the other called “workspace1”. And you can see that there are two instances of the project “dmm165_final” running – one instance in each workspace.
Workspaces exist primarily for two reasons:
1. They define a namespace. The fully qualified URI of a stream on a streaming cluster is: cluster.workspace.project.stream. Any publisher or subscriber connecting to a stream uses that URI to connect. The publisher/subscriber doesn’t have to know where the project/stream is physically running on the cluster, but it does need to know which workspace. This protected namespace enables a couple of things:
- the ability to run multiple instances of the same project. Within a workspace, every project must have a unique name. But you can run multiple instances of the same project as long as you run each instance in a different workspace
- the ability to assign workspaces to users are groups. Then, each user/group doesn’t have to worry about duplicating project names of other users/groups – they can do whatever they want in their workspace(s) without knowing what’s going on in the other workspaces, eliminating any need for coordination
2. Access control. Authorization privileges can be defined at the workspace level. So you can control which users or roles can access which workspaces. Again, this works well when you have different users or groups using the same streaming cluster. You can limit a user or group to specific workspaces so that they can’t start/stop projects in other workspaces. You could even have certain workspaces that contain “master projects” that are always running, where only a system administrator has the ability to start/stop projects, but then have user workspaces where users have the ability to create and run projects on an ad-hoc basis that subscribe to the output of the “master projects” to apply further processing logic
Here’s how “workspace” is defined in the Smart data streaming Configuration and Admin guide, section on “Managing your streaming cluster“:
A named scope (similar to a directory) to which you deploy projects and their supporting files, adapters, and data services. Cluster workspaces give you the option to manage the permissions of related objects together. Every cluster has at least one workspace, and any workspace can contain projects, supporting files, adapters, and data services from multiple nodes.