Tuesday, February 28, 2017

Applying blockchain to healthcare - part 5 (logic)

In my last post, I showed how to store data in a blockchain using an ethereum smart contract.  In this blog post, I will expand on that and show how to add validation logic to the smart contract.  Here is the Patient smart contract updated with logic to enforce a little bit of validation logic:

pragma solidity ^0.4.2;

contract Patient {

  string public name;
  string public dateOfBirth;
  string public gender;

  // FAMILY^GIVEN^MIDDLE
  function SetName(string _name) {
    if(bytes(_name).length <= 0) {
        throw;
    }
    name = _name;
  }
  // YYYYMMDD 
  function SetDateOfBirth(string _dateOfBirth) {
    var dobBytes = bytes(_dateOfBirth);
    // check length
    if(dobBytes.length != 8) {
        throw;
    }
    // check for numeric
    for(var i=0; i < 8; i++) {
        if(dobBytes[i] < '0' || dobBytes[i] > '9') {
            throw;
        }
    }
    // validate year, month, day
    dateOfBirth = _dateOfBirth;
  }
  // M,F,U,O
  function SetGender(string _gender) {
    var genderBytes = bytes(_gender);
    if(genderBytes.length != 1) {
        throw;
    }
    if(genderBytes[0] != 'M' &&
       genderBytes[0] != 'F' &&
       genderBytes[0] != 'O' &&
       genderBytes[0] != 'U') {
           throw;
    }
    
    gender = _gender;
  }
}

The logic should be easy to understand for those with programming skills.  A few notes about this:

1) In each case, the string must be cast into bytes to perform validation.  Solidity unfortunately has poor support for strings - you cannot check individual characters or even the length!  The solidity-stringutils library provides much needed string functions.  

2) Solidity currently has very limited support for exceptions - you cannot give them a name or add any data to them.  To the caller, an exception is returned with a cryptic error like "VM Exception: invalid JUMP at 034277a5bb8a98c36ad8ebf8d38277272ce38b35e6a39916f77bd6c3903e8c45/692a70d2e424a56d2c6c27aa97d1a86395877b3a:2119"

Hopefully this gives you an idea of how data can be stored in ethereum and logic applied to that data. Next up is looking at ethereum events which allows enables notification of changes in ethereum to the outside world and basic queries.

Monday, February 20, 2017

Applying blockchain to healthcare - part 4 (storing data)

In my previous blog post, I talked about how blockchain is a database that can be trusted.  In this post, I want to show how we can store data in blockchain using ethereum - a popular open source project which is based on blockchain.

Storage in ethereum is handle by smart contracts.  A smart contract instance consists of the following:
1) Data - named properties with different types such as strings, numbers, arrays, bytes and maps.  The data is highly structured like a SQL Schema (not like unstructured NoSQL databases)

2) Code - functions that can receive parameters, return data and access data in the smart contract.  The code is tightly coupled to the data similar to how methods work in object oriented languages facilitating encapsulation.  SQL Stored procedures are also similar but are procedural rather than object oriented.

3) Address - a globally unique address which allows access to a specific smart contract.  This is similar to a pointer or reference to an instance in an object oriented language, or a document identifier in NoSQL or a row id/primary key in a SQL database.

Smart Contracts are not organized like rows are in SQL tables or documents are in MongoDB collections.  Smart contracts simply exist in ethereum and have a unique address.  This is similar to how you instantiate an object or struct in memory with most languages (except they are persistent in ethereum).  Smart contracts are similar in concept to a persistent distributed object that was popular in the 90's with DCOM and CORBA technologies.

Lets assume we are building a healthcare application and want to store patient demographics (a very common feature).  In SQL, we might have a Patients table with columns defined for primary key, name, date of birth and gender:

CREATE TABLE PATIENTS (
   ID      INT,
   NAME    VARCHAR(64),
   DOB     DATETIME,
   GENDER  CHAR(1),
   PRIMARY KEY( ID )
);
In NoSQL, we might store a JSON document that has properties for document id, name, date of birth and gender:

{
  id: "a0e430cc-e386-473a-a81f-310f0f733f47",
  name : "DOE^JOHN",
  dob: new Date("FEB 1, 1973");
  gender: "M"
}

In ethereum, we declare a smart contract with data members for name, dates of birth and gender as well as functions that are used to set or get those data members:

pragma solidity ^0.4.2;

contract Patient {

  string public name;
  string public dateOfBirth;
  string public gender;

  // FAMILY^GIVEN^MIDDLE
  function SetName(string _name) {
    name = _name;
  }
  // YYYYMMDD 
  function SetDateOfBirth(string _dateOfBirth) {
    dateOfBirth = _dateOfBirth;
  }
  // M,F,U,O
  function SetGender(string _gender) {
    gender = _gender;
  }
}

The above smart contract can be compiled using Remix, the solidity online compiler.  Once compiled, you can deploy it into an ethereum deployment (e.g. main net, test net, testrpc, or private network).  For the purpose of this blog, we will run a ethereum virtual machine simulator in the web browser so we don't have to deal with accounts, ether and deployment.  Click the "Environment" icon in the upper right (looks like a cube) and select the first radio button "JavaScript VM".  Now paste the smart contract above into the editor in the IDE.  Remix should automatically compile the smart contract and show a red "Create" button.  Click the red "Create" button and an instance of our patient smart contract will be created.  At this point, remix has created an instance of the smart contract and returned its address.  

You will notice blue buttons for each data member that can be pressed to show the value of that data member.  Initially they are empty, but we can easily set them by entering strings in the edit boxes next the red buttons and pressing the red buttons.  Lets do that now - enter the string "DOE^JOHN" in string_name and press the button "SetName".  Remix responds with some data and we can now check to see if the data got in there by pressing the blue "name" button.  Press it now and notice the "decoded" value shows "DOE^JOHN".

Congratulations!  You have now created your own patient smart contract and deployed it.  This smart contract could be used to represent a single patient in your application. Next post we will look at adding validation logic to the smart contract.

Sunday, February 19, 2017

Applying blockchain to healthcare - part 3 (a different kind of database)

If I was to describe what blockchain is in one sentence it would be this:

"A distributed database that can be trusted"

Lets break this down a bit:

"A distributed database"

A database that runs on many computers, each of which has a full copy of the database and work together to keep things in sync.  This is similar to a high availability database deployment like you might see with SQL Server, Oracle or MongoDB - with a few significant differences:

- Number of computers.  A SQL Server HA deployment typically consists of two computers - an active node and a passive node.  A blockchain database has no real limit on number of computers and some deployments run on thousands of computers (e.g bitcoin, ethereum).

- Connectivity between computers.  A SQL Server HA deployment requires a reliable connection between the active and passive nodes.  If the network is not reliable, the system will run into many problems trying to figure out who is in charge.  This often results in the deployment of the computers being in the same data center.  If the active and passive are split into separate datacenter, there must be a high speed and reliable network between them.  Blockchain on the other hand assumes the network is not reliable.  Any given blockchain node can come and go at any point and the entire system will continue to function.  The network can even slow down and the system will continue to function.

One of the benefits of blockchain running on multiple computers and being resilient to network problems is that it is much more reliable.  Computer and network failures happen regularly on blockchain, yet the system continues to run - thus giving us confidence that it will not go down.  Contrast that with a HA SQL Server deployment.  A failure on such a system is very rare and you often discover it isn't working as expected until a failure actually occurs.

"database that can be trusted"

Blockchain brings a different model of trust than we are used to and this is probably the hardest thing to fully understand as it requires a paradigm shift.  Today we put our trust in many things that are not in our control.  We trust the bank with our money, we trust the government to represent the peoples interests, the media with truthful reporting and even our IT administrator with our data.  Is SQL Server trustworthy?  In some ways it is - it is proven technology and as a developer, I am very confident that doing an INSERT or UPDATE will work properly.  I even trust that its authorization and authentication mechanisms work to protect sensitive data.  While this sounds like a trustworthy technology, it has several shortcomings:

- Identity management handled by applications.  Databases typically have user accounts, but they are rarely integrated with end user accounts.  Database accounts are typically granted for applications that handle identity management for end users.  The application may delegate authentication to a centralized system such as active directory, but once the user is authenticated, all database actions are done on behalf of the applications database account.  What this means is that end users have to trust the application to do the right thing as well as everyone who has higher levels of access.  This is rarely a problem in the real world as those with higher levels of access are trustworthy - but this is not a guarantee.  It is entirely possible for someone with higher level access to do bad things.  For example, an IT administrator could delete records from a database that recorded actions by staff that resulted in the death of a patient to avoid potential fines from a malpractice law suit.  I hate to say it, but chances are this has happened and likely more than once.  With blockchain, changes are validated via a digital signature signed by the end user.  Identity management is handled by the end user, not the database!  The database simply validates that actions done by a user are in fact done by that user and nobody else.  It is impossible for anyone but that user to impersonate them (assuming the private key is kept secret).  Decentralizing identity management gives us as users a level of trust that we don't have today with existing database technologies.

- Data is immutable.  This is a new aspect to databases for most of us - it means that once a record is written, it can never be changed.  This means UPDATE and DELETE actions don't work the same way.  An update to data is done by writing a new block to the blockchain saying it updated the data - the original data is never changed (it is left in the blockchain).  Data immutability gives us a level of trust with the underlying data that we currently don't have.

Data immutability with decentralized identity management provides a new paradigm of trust.  It means that end users can be fully in control of their identity and the changes made on its behalf.  It also means that we have a full history of data changes that cannot be changed or corrupted by anyone.  While these properties increase end users control over their data, it also puts a higher level of responsibility on them - specifically around managing their private keys.  Fortunately there are hardware wallet solutions available today such as trezor, ledger and uPort (via iOS/Android devices) to help with protecting private keys.

Blockchain Drawbacks

This may sound great, but the blockchain database has a few drawbacks compared to SQL and NoSQL technologies:
1) Blockchain has a much lower transaction throughput.  While A SQL or NoSQL database can support transaction times in milliseconds, blockchain often operates in seconds, minutes or even hours!  The slowness of blockchain transactions means we can't use it as a direct replacement for the databases we use today.
2) Limited or no query support.  Blockchains are more of a write only transaction log than a database.  For most solutions, a SQL or NoSQL database is populated with data read from the blockchain so query operations can be performed against it.  In the case of ethereum, filters can be used to do primitive queries.  Some query use cases can be addressed by processing the queries in a smart contract or perhaps in the application tier.
3) Cost.  Blockchain requires considerably more CPU, storage and network resources that a corresponding SQL Server HA installation.