Wednesday, May 21, 2014

Fast Image Loading with Cornerstone

During SIIM 2014, Dr. Paul Chang stopped by to see the Cornerstone demo and reminded me that you not only need fast panning/zooming/window/level but also fast loading of images. So far I have implemented a WADO image loader and plan to eventually build one for WADO-RS.  The problem with these standard protocols is that the entire image needs to be loaded before a useful view of the image can be displayed. While it is possible to draw the image from top to bottom as the pixels are received, users generally find such behavior unacceptable.  As a rule of thumb, users expect a system to respond to user input in less than a second - including the display of images.  This is a real challenge for medical images which are typically lossless, high resolution and have more than 8 bits per pixel.  One or more of the following techniques are often applied to improve the time to first image:

1) Variable Compression. A variety of compression algorithms exist each with different characteristics for speed, size of bitstream and image quality. One can imagine using an algorithm that is fast, produces small bitstreams and has low image quality to display an initial view of the image and then followup with another view that is generated from another algorithm that is slower, produces a larger bit stream and has higher image quality. This is exactly how the cornerstone 3D image server works - it sends a lossy JPEG when the user is interacting with the system and then follows up with a lossless PNG once they stop.



2) Image Tiling. If you have used google maps, you have seen tiled rendering at work. A large image is broken up into smaller pieces called tiles at different resolutions.  When the image is first displayed, a very low resolution tile is retrieved and then scaled to the viewport size.  This results in a "blurry" image but is then replaced with a sharper image as the higher resolution tiles are retrieved.  As the user pans the image, regions of the image are exposed that do not have high resolution tiles.  These regions are again displayed by scaling up the lower resolution tile and sharpened once the higher resolution tiles are retrieved.



3) Interlaced or progressive encoding. Image pixels can be encoded such that a reasonable view of the image can be displayed after reading just a portion of the bitstream and then updated to show the entire image after all bits are read. The simplest way to do this is to interlace the image - encode all the even rows first followed by the odd rows.  In this case, a reasonable view of the image can be displayed after reading half of the bitstream as the system can make up the missing odd rows by interpolating between the even rows. Once the remaining bits are read, the interpolated odd rows are replaced with the real data and the image is updated. More complex version of this is the Adam7 Algorithm (shown below) or a Discrete Wavelet Transform.  Note that standard web image formats like PNG, JPEG and GIF all have support for some form of interlaced or progressive encoding but do it slightly differently.



4) Server side rendering.  In this case, the user input is sent to the server side to process and a rendered image is produced and sent back to the client for display.  Remote screen sharing technology such as VNC or Remote Desktop are perhaps the simplest form of server side rendering.  More advanced forms exist by having the client interpret the user input and then make calls to the render server accordingly. This strategy works very well on an enterprise network where latency is low but quickly becomes unusable as you move outside the enterprise (e.g. from other side of city, home or across the country).  When it comes to interactivity, most users find < 10 FPS unusable.  To achieve 10 FPS, each client/server response must complete in less than 100 ms which includes network latency and render time.  Outside the enterprise network, latency starts around 40ms and goes up as you get farther away leaving little time for the server to actually handle the request.  Due to the sensitivity to latency of server side rendering, it will be less attractive when compared to solutions that use client side rendering where the only limit is the available processing power.

All of these techniques result in changes to the underlying pixel data for an image. Currently Cornerstone is designed around static images - that is images where the pixel data does not change. To support the techniques listed above, the following is required:

1) It must be possible for a region of the pixel data to be externally invalidated when the underlying pixel data has been updated for that region. This invalidation will cause Cornerstone to re-render the image so the changed pixels are displayed to the user
2) It must be possible to detect when the user interacts with the image in a way that requires changes to the underlying pixel data. Cornerstone currently emits the CornerstoneImageRendered event when the image is rendered which can be used to detect changes to translation, scale and viewport size but it there may be better ways to deal with this.
3) It must be possible for Cornerstone to render subviews of an image. Currently cornerstone always renders the entire image even if all the pixels aren't displayed. This strategy doesn't work at all for very large images like pathology slides where the client doesn't have enough memory to hold the entire slide in the first place. An interesting side effect of this change is that existing operations like WW/WC might get faster for images that are larger than the viewport as it would only need to push the viewable pixels through the LUT for each frame (rather than all pixels like it does today).

While implementing this is quite challenging - the bigger challenge is how to implement it while keeping Cornerstone easy to understand.  I am just now beginning to think through the design for this and expect it to take me several iterations to get right.  Look for future posts about this to see how the design emerges.

2 comments:

  1. Very informative and interesting read. Today the problem in web viewers like HTML5 based viewing are precisely instant image display with 16 bit gray scale values. It take up to 30 seconds to start adjusting windowing in many HTML5 Viewers. Hope the DICOM Standard like WADO - RS also start supporting the JPIP like of protocol to stream images.

    Interesting to work on your project.

    ReplyDelete
  2. Really good article and thanks for such knowledge sharing.

    ReplyDelete