Module openj9.dtfj

Class CompressedRecordArray

  • All Implemented Interfaces:
    Serializable

    public final class CompressedRecordArray
    extends Object
    implements Serializable
    This class represents an array of records which are stored in a compressed format whilst still allowing random access to them. Each record in turn is simply an array of ints. Each record must be the same length. To implement this we divide the array of records up into blocks. There is an index and a bit stream. The index gives the start of each block in the bit stream. Each block contains a set of records stored in an encoded format. A header at the beginning defines the encoding used. The encoding is chosen dynamically to give the best compression. Deltas (ie the differences between values in adjacent records) are stored rather than the values themselves which gives good results for certain types of data. The number of records per block is configurable and there is a space/time trade-off to be made because a large number of records per block will give better compression at the cost of more time to extract each record (because you have to start at the beginning of the block and then uncompress each record in turn until you reach the one you want).

    I wrote a test to measure the performance on some real life data (in fact this data is the reason I wrote this class in the first place). The data consists of a file containing z/OS fpos_t objects obtained by calling fgetpos sequentially for every block (4060 bytes) in an svcdump. Each fpos_t object is actually an array of 8 ints containing obscure info about the disk geometry or something, but the important thing is that it changes in a reasonably regular fashion and so is a good candidate for compression via deltas. The original file had a length of 3401088. Here are the results which suggest that a block size of 32 (log2 of 5) is a good choice (the time is that taken to write the data and then read it back again to check):

    log2block sizememory usagetime (ms)
    014191388782
    122706992691
    241217920621
    38790472620
    416516772721
    532340448942
    6643343041362
    71283343042223
    82563404483966
    95123558087470

    See Also:
    Serialized Form
    • Constructor Detail

      • CompressedRecordArray

        public CompressedRecordArray​(int blockSizeLog2,
                                     int recordSize)
        Create a new CompressedRecordArray. A size of 5 for blockSizeLog2 gives good results.
        Parameters:
        blockSizeLog2 - the number of records in each block expressed as a power of 2
        recordSize - the number of ints in each record
    • Method Detail

      • add

        public void add​(int[] record)
        Add a new record. Data is copied from the given array.
        Parameters:
        record - an array of ints which forms the record to be added
      • close

        public void close()
        Close this CompressedRecordArray. This must be called before any reading is done and no more records may be added afterwards.
      • get

        public void get​(int recordNumber,
                        int[] record)
        Get the given record number. To save on GC overhead the user supplies the int array to copy the record into.
        Parameters:
        recordNumber - the sequential number of the record to read
        record - the array to copy the record into
      • memoryUsage

        public int memoryUsage()
        Give a rough estimate of how many bytes of storage we use. This is the actual storage allocated so may be more that what is in use at any one time.
      • main

        public static void main​(String[] args)
                         throws Exception
        This method is provided to test the CompressedRecordArray. We need to change this to use unit tests at some point.
        Throws:
        Exception - if anything bad happens