Module openj9.dtfj

Class CompressedRecordArray

java.lang.Object
com.ibm.dtfj.corereaders.zos.util.CompressedRecordArray
All Implemented Interfaces:
Serializable

public final class CompressedRecordArray extends Object implements Serializable
This class represents an array of records which are stored in a compressed format whilst still allowing random access to them. Each record in turn is simply an array of ints. Each record must be the same length. To implement this we divide the array of records up into blocks. There is an index and a bit stream. The index gives the start of each block in the bit stream. Each block contains a set of records stored in an encoded format. A header at the beginning defines the encoding used. The encoding is chosen dynamically to give the best compression. Deltas (ie the differences between values in adjacent records) are stored rather than the values themselves which gives good results for certain types of data. The number of records per block is configurable and there is a space/time trade-off to be made because a large number of records per block will give better compression at the cost of more time to extract each record (because you have to start at the beginning of the block and then uncompress each record in turn until you reach the one you want).

I wrote a test to measure the performance on some real life data (in fact this data is the reason I wrote this class in the first place). The data consists of a file containing z/OS fpos_t objects obtained by calling fgetpos sequentially for every block (4060 bytes) in an svcdump. Each fpos_t object is actually an array of 8 ints containing obscure info about the disk geometry or something, but the important thing is that it changes in a reasonably regular fashion and so is a good candidate for compression via deltas. The original file had a length of 3401088. Here are the results which suggest that a block size of 32 (log2 of 5) is a good choice (the time is that taken to write the data and then read it back again to check):

log2block sizememory usagetime (ms)
014191388782
122706992691
241217920621
38790472620
416516772721
532340448942
6643343041362
71283343042223
82563404483966
95123558087470

See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
    CompressedRecordArray(int blockSizeLog2, int recordSize)
    Create a new CompressedRecordArray.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    add(int[] record)
    Add a new record.
    void
    Close this CompressedRecordArray.
    void
    get(int recordNumber, int[] record)
    Get the given record number.
    static void
    main(String[] args)
    This method is provided to test the CompressedRecordArray.
    int
    Give a rough estimate of how many bytes of storage we use.

    Methods declared in class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • CompressedRecordArray

      public CompressedRecordArray(int blockSizeLog2, int recordSize)
      Create a new CompressedRecordArray. A size of 5 for blockSizeLog2 gives good results.
      Parameters:
      blockSizeLog2 - the number of records in each block expressed as a power of 2
      recordSize - the number of ints in each record
  • Method Details

    • add

      public void add(int[] record)
      Add a new record. Data is copied from the given array.
      Parameters:
      record - an array of ints which forms the record to be added
    • close

      public void close()
      Close this CompressedRecordArray. This must be called before any reading is done and no more records may be added afterwards.
    • get

      public void get(int recordNumber, int[] record)
      Get the given record number. To save on GC overhead the user supplies the int array to copy the record into.
      Parameters:
      recordNumber - the sequential number of the record to read
      record - the array to copy the record into
    • memoryUsage

      public int memoryUsage()
      Give a rough estimate of how many bytes of storage we use. This is the actual storage allocated so may be more that what is in use at any one time.
    • main

      public static void main(String[] args) throws Exception
      This method is provided to test the CompressedRecordArray. We need to change this to use unit tests at some point.
      Throws:
      Exception - if anything bad happens