While working on the faster image loading mechanism for Cornerstone, I found myself making further changes to the drawImage() function. I already felt this function was too complex as it had to deal with at least 12 different flows due to the different types of images and caching scenarios. I realized that adding support for the different faster image loading the function would quickly become hard to read and unmaintainable. Clearly something had to change, but what?
After thinking about it for a bit, I had an "aha" moment and realized that the image rendering responsibility should be moved to the image loader. The image loader design was already providing flexibility with respect to the image format and protocol used and it would also have to be aware of the various fast image loading techniques being used. Adding the image rendering responsibility to the image loader will allow the fastest possible rendering.
The only issue with moving rendering to the image loader is code reuse. I envision a wide variety of image loaders that pull full uncompressed pixel data from different servers in different formats. For these types of image loaders, it doesn't make sense to have them cut and paste the rendering code from another image loader. It would be much better to have this generic image rendering mechanism in a shared location.
Based on the above, I decided to split the drawImage() function into three new functions - renderGrayscaleImage(), renderColorImage() and renderWebImage(). This simple refactoring immediately made the code easier to understand so I knew I was on the right path. The drawImage() function was simplified to some boilerplate logic and delegated the actual rendering to the image object returned by the image loader. I then proceeded to modify each of the image loaders to call one of the three newly created render functions and everything was working again.
This simple refactoring not only made the existing code simpler and easier to understand, but it also prepares Cornerstone for the more complex functionality that will be needed to handle faster image loading. You can find the commit for these changes here
Tuesday, May 27, 2014
Sunday, May 25, 2014
Implementing a QIDO-RS Service
Having previously implemented a basic QIDO-RS worklist using JavaScript, I figured it would be useful to implement a QIDO-RS service to improve my understanding of the standard. For those unfamiliar with QIDO-RS, it allows web based queries for studies, series and instances - similar to C-FIND. This weekend I implemented a simple service implementation using C#, Visual Studio 2013 and ASP.NET MVC 5 WebApi. You can find the source code for this project on my github here.
Overall I found the implementation to be fairly straightforward and it took me about 16 hours total. About 2/3 of this time was related to a) understanding the standard and b) working through various issues with Visual Studio 2013 and ASP.NET MVC 5 WebApi (neither of which I had used before).
Here are some of my thoughts after getting this implemented:
1. I found several errors in the DICOM standard:
2. The standard was difficult to understand
3. It isn't clear what functionality is required and what is optional. This may be due to it being designed as a wrapper around C-FIND and I don't have expert level knowledge of C-FIND.
4. The JSON mapping was not designed for ease of use by JavaScript developers. From F.2 in the standard "The DICOM JSON Model follows the Native DICOM Model for XML very closely, so that systems can take 790 advantage of both formats without much retooling". Here are some specific issues
Overall I found the implementation to be fairly straightforward and it took me about 16 hours total. About 2/3 of this time was related to a) understanding the standard and b) working through various issues with Visual Studio 2013 and ASP.NET MVC 5 WebApi (neither of which I had used before).
Here are some of my thoughts after getting this implemented:
1. I found several errors in the DICOM standard:
- Modality has incorrect tag in Table 6.7.1-2a
- Missing Study Description tag in Table 6.7.1-2
- Misspelled StudyInstanceUid on line 641
- JSON Example in F.2.1.1.2 is Invalid JSON (missing a comma between array entries)
2. The standard was difficult to understand
- It makes several references to concepts documented elsewhere (e.g. fuzzymatching,)
- It seems to be designed as a web wrapper around CFIND which therefore requires fully understanding CFIND
3. It isn't clear what functionality is required and what is optional. This may be due to it being designed as a wrapper around C-FIND and I don't have expert level knowledge of C-FIND.
4. The JSON mapping was not designed for ease of use by JavaScript developers. From F.2 in the standard "The DICOM JSON Model follows the Native DICOM Model for XML very closely, so that systems can take 790 advantage of both formats without much retooling". Here are some specific issues
- The attribute tag (group/element) is used as the property name in the JSON object. JavaScript code cannot use dot notation to access these properties since it is not supported for property names that do not begin with a letter. This could have easily been solved by putting a letter like x in front of the group/element.
- The inclusion of the VR field seems unnecessary since the native JavaScript type system is used.
- Putting all values in an array is awkward. It would have been better to only use an array for attributes with multiplicity > 1
- The use of Alphabetic, Ideographic and Phonetic in the PN mapping is unclear. I am sure these are documented somewhere else in the standard - but where? Having a PN value be an object instead of string is also a bit awkward.
- The casing for property names is not consistent. vr starts with lowercase letter while the others start with an uppercase letter. The most common naming convention in JSON/JavaScript is the first letter lowercase.
Overall this is a positive step forward to bringing DICOM into the world of the web but more can be done to reduce the barriers to using DICOM by developers. The best way DICOM can reduce these barriers is to make web browser and JavaScript based consumption a top priority and make it as easy to use as possible in that environment. Efforts such as the dicomWeb online documentation are very helpful and so are open source implementations of the standard.
Wednesday, May 21, 2014
Fast Image Loading with Cornerstone
During SIIM 2014, Dr. Paul Chang stopped by to see the Cornerstone demo and reminded me that you not only need fast panning/zooming/window/level but also fast loading of images. So far I have implemented a WADO image loader and plan to eventually build one for WADO-RS. The problem with these standard protocols is that the entire image needs to be loaded before a useful view of the image can be displayed. While it is possible to draw the image from top to bottom as the pixels are received, users generally find such behavior unacceptable. As a rule of thumb, users expect a system to respond to user input in less than a second - including the display of images. This is a real challenge for medical images which are typically lossless, high resolution and have more than 8 bits per pixel. One or more of the following techniques are often applied to improve the time to first image:
1) Variable Compression. A variety of compression algorithms exist each with different characteristics for speed, size of bitstream and image quality. One can imagine using an algorithm that is fast, produces small bitstreams and has low image quality to display an initial view of the image and then followup with another view that is generated from another algorithm that is slower, produces a larger bit stream and has higher image quality. This is exactly how the cornerstone 3D image server works - it sends a lossy JPEG when the user is interacting with the system and then follows up with a lossless PNG once they stop.
2) Image Tiling. If you have used google maps, you have seen tiled rendering at work. A large image is broken up into smaller pieces called tiles at different resolutions. When the image is first displayed, a very low resolution tile is retrieved and then scaled to the viewport size. This results in a "blurry" image but is then replaced with a sharper image as the higher resolution tiles are retrieved. As the user pans the image, regions of the image are exposed that do not have high resolution tiles. These regions are again displayed by scaling up the lower resolution tile and sharpened once the higher resolution tiles are retrieved.
3) Interlaced or progressive encoding. Image pixels can be encoded such that a reasonable view of the image can be displayed after reading just a portion of the bitstream and then updated to show the entire image after all bits are read. The simplest way to do this is to interlace the image - encode all the even rows first followed by the odd rows. In this case, a reasonable view of the image can be displayed after reading half of the bitstream as the system can make up the missing odd rows by interpolating between the even rows. Once the remaining bits are read, the interpolated odd rows are replaced with the real data and the image is updated. More complex version of this is the Adam7 Algorithm (shown below) or a Discrete Wavelet Transform. Note that standard web image formats like PNG, JPEG and GIF all have support for some form of interlaced or progressive encoding but do it slightly differently.
4) Server side rendering. In this case, the user input is sent to the server side to process and a rendered image is produced and sent back to the client for display. Remote screen sharing technology such as VNC or Remote Desktop are perhaps the simplest form of server side rendering. More advanced forms exist by having the client interpret the user input and then make calls to the render server accordingly. This strategy works very well on an enterprise network where latency is low but quickly becomes unusable as you move outside the enterprise (e.g. from other side of city, home or across the country). When it comes to interactivity, most users find < 10 FPS unusable. To achieve 10 FPS, each client/server response must complete in less than 100 ms which includes network latency and render time. Outside the enterprise network, latency starts around 40ms and goes up as you get farther away leaving little time for the server to actually handle the request. Due to the sensitivity to latency of server side rendering, it will be less attractive when compared to solutions that use client side rendering where the only limit is the available processing power.
All of these techniques result in changes to the underlying pixel data for an image. Currently Cornerstone is designed around static images - that is images where the pixel data does not change. To support the techniques listed above, the following is required:
1) It must be possible for a region of the pixel data to be externally invalidated when the underlying pixel data has been updated for that region. This invalidation will cause Cornerstone to re-render the image so the changed pixels are displayed to the user
2) It must be possible to detect when the user interacts with the image in a way that requires changes to the underlying pixel data. Cornerstone currently emits the CornerstoneImageRendered event when the image is rendered which can be used to detect changes to translation, scale and viewport size but it there may be better ways to deal with this.
3) It must be possible for Cornerstone to render subviews of an image. Currently cornerstone always renders the entire image even if all the pixels aren't displayed. This strategy doesn't work at all for very large images like pathology slides where the client doesn't have enough memory to hold the entire slide in the first place. An interesting side effect of this change is that existing operations like WW/WC might get faster for images that are larger than the viewport as it would only need to push the viewable pixels through the LUT for each frame (rather than all pixels like it does today).
While implementing this is quite challenging - the bigger challenge is how to implement it while keeping Cornerstone easy to understand. I am just now beginning to think through the design for this and expect it to take me several iterations to get right. Look for future posts about this to see how the design emerges.
1) Variable Compression. A variety of compression algorithms exist each with different characteristics for speed, size of bitstream and image quality. One can imagine using an algorithm that is fast, produces small bitstreams and has low image quality to display an initial view of the image and then followup with another view that is generated from another algorithm that is slower, produces a larger bit stream and has higher image quality. This is exactly how the cornerstone 3D image server works - it sends a lossy JPEG when the user is interacting with the system and then follows up with a lossless PNG once they stop.
2) Image Tiling. If you have used google maps, you have seen tiled rendering at work. A large image is broken up into smaller pieces called tiles at different resolutions. When the image is first displayed, a very low resolution tile is retrieved and then scaled to the viewport size. This results in a "blurry" image but is then replaced with a sharper image as the higher resolution tiles are retrieved. As the user pans the image, regions of the image are exposed that do not have high resolution tiles. These regions are again displayed by scaling up the lower resolution tile and sharpened once the higher resolution tiles are retrieved.
3) Interlaced or progressive encoding. Image pixels can be encoded such that a reasonable view of the image can be displayed after reading just a portion of the bitstream and then updated to show the entire image after all bits are read. The simplest way to do this is to interlace the image - encode all the even rows first followed by the odd rows. In this case, a reasonable view of the image can be displayed after reading half of the bitstream as the system can make up the missing odd rows by interpolating between the even rows. Once the remaining bits are read, the interpolated odd rows are replaced with the real data and the image is updated. More complex version of this is the Adam7 Algorithm (shown below) or a Discrete Wavelet Transform. Note that standard web image formats like PNG, JPEG and GIF all have support for some form of interlaced or progressive encoding but do it slightly differently.
4) Server side rendering. In this case, the user input is sent to the server side to process and a rendered image is produced and sent back to the client for display. Remote screen sharing technology such as VNC or Remote Desktop are perhaps the simplest form of server side rendering. More advanced forms exist by having the client interpret the user input and then make calls to the render server accordingly. This strategy works very well on an enterprise network where latency is low but quickly becomes unusable as you move outside the enterprise (e.g. from other side of city, home or across the country). When it comes to interactivity, most users find < 10 FPS unusable. To achieve 10 FPS, each client/server response must complete in less than 100 ms which includes network latency and render time. Outside the enterprise network, latency starts around 40ms and goes up as you get farther away leaving little time for the server to actually handle the request. Due to the sensitivity to latency of server side rendering, it will be less attractive when compared to solutions that use client side rendering where the only limit is the available processing power.
All of these techniques result in changes to the underlying pixel data for an image. Currently Cornerstone is designed around static images - that is images where the pixel data does not change. To support the techniques listed above, the following is required:
1) It must be possible for a region of the pixel data to be externally invalidated when the underlying pixel data has been updated for that region. This invalidation will cause Cornerstone to re-render the image so the changed pixels are displayed to the user
2) It must be possible to detect when the user interacts with the image in a way that requires changes to the underlying pixel data. Cornerstone currently emits the CornerstoneImageRendered event when the image is rendered which can be used to detect changes to translation, scale and viewport size but it there may be better ways to deal with this.
3) It must be possible for Cornerstone to render subviews of an image. Currently cornerstone always renders the entire image even if all the pixels aren't displayed. This strategy doesn't work at all for very large images like pathology slides where the client doesn't have enough memory to hold the entire slide in the first place. An interesting side effect of this change is that existing operations like WW/WC might get faster for images that are larger than the viewport as it would only need to push the viewable pixels through the LUT for each frame (rather than all pixels like it does today).
While implementing this is quite challenging - the bigger challenge is how to implement it while keeping Cornerstone easy to understand. I am just now beginning to think through the design for this and expect it to take me several iterations to get right. Look for future posts about this to see how the design emerges.
Subscribe to:
Posts (Atom)