Technical Articles
Implement : Data Privacy in Hyperledger Fabric through private collections
Hyperledger Fabric is a permissioned Blockchain infrastructure that was introduced to solve the problem of establishing accountability in a multiple party business scenario .
Though Hyperledger’s membership model ensures the identity of all participants of the network, in a real world multiparty business scenario, Data Protection and Privacy laws brings legal regulation requirements that might prevent all data to be shared among all parties in the network.
Prior to 1.2 Privacy in HL was addressed through the concept of channels. Companies (peer nodes) that wanted to share private data among selective other organizations were joined in a separate channel. The problem with this approach is duplication of data and creation of a huge number of network of channels among group of organizations.
1.2 version was a huge step forward in addressing this issue via the concept of private data collections. Collection can be considered as an overlay on a channel consisting of many information sharing groups , yet the hash of the private data thats shared in the private collections are maintained in the ledger state to ensure validity and integrity . Transaction data in this case are decimated peer to peer to mask it being exposed to even orderer nodes.
Assuming the reader already has a fair understanding of Hyperledger Fabric basic architecture , I will cut to the chase here and share our viewpoint on implementing “Privacy through SideDBs” .
A single chaincode can define many “Collection”s . Data stored in a specific collection is viewable only to the orgs belonging to the collection.
Below are a simple steps needed to see private data collections in action :
Step 1. Define collections in collection-config.json
Step 2. Separate the data into Public and Private structs [So we can have all orgs in the channel have visibility into some parts of data publicly, yet some part are hidden and shared only within certain parties]
Step 3: Add handlers in code to save the public part of data to world state db and the private data to Collections .
Step 1 : Define Collection-config.json :
In the below example we have defined two private collections . “collections1” is used to persist data that are shared ONLY between CompanyA and CompanyB (companyC doesn’t have visibility to this data) . While “collections2” is used for data that can be seen Only by CompanyA and CompanyC (companyB doesn’t have visibility into this).
(Note that the peer node CompanyA with its own MSP is named as ‘CompanyAMSP’ in our SAP testnet network, hence the name CompanyAMSP.member in the collections config)
[{
"name": "collections1",
"policy": "OR('CompanyAMSP.member','CompanyBMSP.member')",
"requiredPeerCount": 0,
"maxPeerCount": 3,
"blockToLive": 0
},
{
"name": "collections2",
"policy": "OR('CompanyAMSP.member','CompanyCMSP.member')",
"requiredPeerCount": 0,
"maxPeerCount": 2,
"blockToLive": 0
}]
blockToLive parameter defines the max time (in measure of blocks) after which data will be automatically purged (provision for GDPR compliant requirements). A value of 0 denotes never purge.
NOTE: collection-config.json need not be zipped along with the other chaincode src files during deployment. After the chaincide is deployed the channel, during the “instantiate” stage, the collection-config.json need to be uploaded.
Step 2 : Divide the data into Public and Private structs:
Let’s say every peer node belonging to the channel should be given the ability to check the validity and authenticity of an item in the blockchain and read certain basic attributes about the item, but if a company wants share some info about the item like say, manufacturer_country and price with only certain companies based on some contractual agreement, and keep it hidden from others, then we need to divide and conquer the data struct so :
type Item_Public struct {
ItemID string `json:"item_id"`
Description string `json:"description"`
Owner string `json:"owner"`
Status int `json:"status"`
Components string `json:"components"`
}
type Item_Private struct {
ItemID string `json:"item_id"`
Manufacturer_Country string `json:"manufacturer_country"`
Price string `json:item_price"`
}
Please note since the item_id/key represents the unique item, it’s a good practice that it is matched in normal channel data and PrivateDB so we have data integrity.
Step 3: Implement code to handle private collections:
For eg: Below is the simple addItem code to store the public part in world state DB and the private in the Collections/sideDB.
[Please note, I am just storing the private data in collections1 in this code, we can make our code dynamic to determine which “collection” to store based on any specific business logic such as based on item-id, owner etc ]
func (t *GenericChaincode) addItem(stub shim.ChaincodeStubInterface, args []string) peer.Response {
id := strings.ToLower(args[0])
//Get this peer node’s Id (for eg: ‘CompanyA’)
creator, _ := stub.GetCreator()
//Create a SerializedIdentity to hold Unmarshal GetCreator() result
sId := &msp.SerializedIdentity{}
//Unmarshal the creator from []byte to structure
err1 := proto.Unmarshal(creator, sId)
nodeId := sId.Mspid[:len(sId.Mspid)-3]
// catch Unmarshal error
if err1 != nil {
//return Unmarshal error via HTTP
return Error(http.StatusInternalServerError, "Could not deserialize a SerializedIdentity, Error "+err.Error())
}
//Form the Public Asset struct
item := &Item_Public{ItemID: args[0], Description: args[1],
Owner: nodeId,
Status: args[2],
Components: args[3]}
value, err := stub.GetPrivateData("collections1", id)
if err == nil && value != nil {
return shim.Error("This item already exists in collection: " + id)
}
// First save public struct in World State DB
if err := stub.PutState(id, item.ToJSON()); err != nil {
return Error(http.StatusInternalServerError, err.Error())
}
// Now put private struct in SideDB / private collection
item_private := &Item_Private{ItemID: args[0], Manufacturer_Country: args[4], Price: args[5]}
err3 := stub.PutPrivateData("collections1", id, item_private.ToJSON_Private())
if err3 != nil {
return shim.Error(err.Error())
}
return Success(http.StatusCreated, "Private Item Added", nil)
}
Most shim API methods that are available for accessing normal channel data are available for private collections too.
GetState(id) | GetPrivateData(COLLECTION_NAME, id) |
PutState(id) | PutPrivateData(COLLECTION_NAME, id) |
GetQueryResult(queryStr) | GetQueryResult(COLLECTION_NAME, queryStr) |
DelState(id) | DelPrivateData(COLLECTION_NAME, id) |
GetHistoryForKey(id) | <Not yet available > |
Now putting together above bits of info, we have a simple chaincode implementation demonstrating Private Data usage:
A simple implementation can be found in this github repo.
Challenges in 1.3 and enhancement in 1.4 w.r.t Private Data :
Up until Version 1.3, if a more granular access control (for eg: at the transaction level) was needed , we had to implement our own Access Control logic as part of the chaincode to handle it. This was needed because of the non-deterministic nature of peer selection based on the channel wide endorsement policy.
Fabric Version 1.4 has support for both chaincode based endorsement and state based endorsement policies. This adds provision for supporting a tighter Access control based on the transaction.
Thank you for sharing good insights on the topic.
I assume “collection1” should not have “CompanyC1MSP.member” for scenario 1 (“collections1” is used to persist data that are shared ONLY between CompanyA and CompanyB (companyC doesn’t have visibility to this data) ) which you mentioned
Thank you Jinesh Krishnan, Yes I meant to have only CompanyA and B. Updated it. Thank you .
Nice article Sangeetha! Very clean and straightforward tutorials.
One question in mind tho, even with private data collections, one would still need to invoke the chaincode to save the data into the private data collection.
In a practical scenario, a malicious peer would be able to write their own Nodejs server to attach a block listener which would then sniff all blocks committed to the ledger. They would then have access to the block in which contains the args that were passed to the chaincode to be saved into the private data collection, which would expose the data that was supposed to be private.
Any thoughts on this issue?
Hi Jack Yeoh ,
Thank you for your comments !
The example I have used illustrates fundamentals about Private collection. In practical scenario and implementation , we would send data not via chaincode arguments, but via "transient fields" (as shown in this section of "How to pass private data" .
In this case, the data sent in transient field gets excluded from the channel transaction (neither does it get recorded in ledger blocks etc, thus making the private data truly private).
Furthermore, Channel reader's policy can be used for fine graining responses to event listeners.
Let me now if that answers your question? .
Thanks !
Hi Sangeetha,
somehow, i have missed your blog from March, but was talking about this topic (permissioning in blockchain) during my Lounge meetup earlier this week. in my mind, keeping a portion of the block transaction private is critical in the business settings, where e.g. suppliers should not know about each others' prices but only having them disclosed when bidding and later dealing with their customer in the network. it's one of the most critical features of the fabric, which is setting it apart from permissionless or all public blockchains like bitcoin. what's good about fabric samples is that anyone can download and run them at no cost as they are part of the open source Hyperledger Fabric project.
Thank you for posting and answering questions about it here.
Regards, greg