A Better TList Class

Let's see how to write a class that overcomes the limitations of the TList class. In Delphi 2.0, TList has no size restrictions, but it still has the problem of static methods. Let's write a class, TS_List, that behaves just like TList, but with virtual methods (so you can create useful subclasses) and without any size restrictions. To more easily write useful, portable components and applications, you need an unrestricted list class. You also need to make sure it works identically in Delphi 1.0 and Delphi 2.0. This approach calls for a custom base class, TS_CustomList, which defines the virtual methods and serves as a base class for many different kinds of list classes. Thus, you can write TS_ObjectList (which you first saw in Listing 5-3) much more easily.

One concern is to be able to store more than 16380 items in the list. The list implementation must be able to cross segment boundaries in Delphi 1.0. You will use the huge pointer routines that you wrote in Chapter 3, Streams and File I/O.

For ease of use, it makes sense to define the public interface of the new class, called TS_List, to be identical to that of TList. To handle large lists in Delphi 1.0, the new class must use LongInt indexes instead of Integer. There is another problem, however.

To make the list class as useful as possible, it must be easy to create custom derived classes. For example, you might want to create a TS_ObjectList class that holds object references and frees the objects when the list is destroyed. To make TS_ObjectList most useful, the Items property should return a TObject reference, not a Pointer. Even more importantly, the default array property of TS_ObjectList should return a TObject value. Listing 5-4 shows one way to declare TS_ObjectList, assuming that the TS_CustomList class is defined appropriately.

Listing 5-4 Declaration of the TS_ObjectList class

type
  TS_ObjectList = class(TS_CustomList)
  protected
    function GetObject(Index: LongInt): TObject; virtual;
    procedure SetObject(Index: LongInt; Obj: TObject); virtual;
    procedure FreeItem(Index: LongInt); override;
  public
    property Objects[Index: LongInt]: TObject read GetObject
        write SetObject; default;
  end;
{ Return the object at Index. }
function TS_ObjectList.GetObject(Index: LongInt): TObject;
begin
  Result := TObject(Items[Index])
end;
{ Set the object at Index to Obj, freeing the object that was there. }
procedure TS_ObjectList.SetObject(Index: LongInt; Obj: TObject);
begin
  if Obj <> Items[Index] then
  begin
    { Items[Index], above, checks the index, so call low-level routines
      directly, to avoid redundant calls to CheckIndex. }
    FreeItem(Index);
    SetPointer(Index, Pointer(Obj));
  end;
end;
{ Free the item at Index. Index has already been checked, so call
  GetPointer directly, to avoid a redundant call to CheckIndex. }
procedure TS_ObjectList.FreeItem(Index: LongInt);
begin
  TObject(GetPointer(Index)).Free;
end;

The important feature to note in Listing 5-4 is the default property Items. If a base class defines a default property, then all derived classes inherit that default property and cannot override it. Thus, you must make sure that the base class does not define a default property, so that any derived class can define its own default property.

On the other hand, TList defines a default property, Items, and it is important to remain compatible with TList. This problem can be solved by splitting the list class into two classes: TS_CustomList and TS_List. TS_CustomList is the base class that does not define a default property. TS_List derives from TS_CustomList and simply makes the Items property the default property. To get a simple list of Pointer values, use TS_List, and to create a custom list class, use TS_CustomList as the base class, thereby satisfying all requirements.

In fact, it is simplest to think of TS_List as a custom derived class, just like TS_ObjectList. Thus, Listing 5-5 shows the declaration of TS_List, which depends on TS_CustomList. It doesn't matter how TS_CustomList is declared, so long as it does not have a default array property.

Listing 5-5 TS_List class

type
  TS_List = class(TS_CustomList)
  public
    property Items; default;
  end;

Now it is time to tackle TS_CustomList. The declaration for the TS_CustomList class is shown in Listing 5-6. Notice that the public methods are similar to those of TList, but the Items property is protected. This gives greater flexibility to a derived class.

Listing 5-6 Declaration of the TS_CustomListclass

type
  TS_PackProc = procedure(Index: LongInt) of object;
  TS_CustomList = class
  private
    fList: Pointer;               { Pointer to the base of the list }
    fCount: LongInt;      { Number of items in the list }
    fCapacity: LongInt;   { Number of available slots in the list }
  protected
    function GetPointer(Index: LongInt): Pointer; virtual;
    procedure ReAllocList(NewCapacity: LongInt); virtual;
    procedure CheckIndex(Index: LongInt); virtual;
    function ExpandSize: LongInt; virtual;
    function GetItem(Index: LongInt): Pointer; virtual;
    procedure SetItem(Index: LongInt; Item: Pointer); virtual;
    procedure SetCapacity(NewCapacity: LongInt); virtual;
    procedure SetCount(NewCount: LongInt); virtual;
    procedure DoPack(PackProc: TS_PackProc); virtual;
    property Items[Index: LongInt]: Pointer read GetItem write SetItem;
  public
    destructor Destroy; override;
    function Add(Item: Pointer): LongInt; virtual;
    property Capacity: LongInt read fCapacity write SetCapacity;
    procedure Clear; virtual;
    property Count: LongInt read fCount write SetCount;
    function Delete(Index: LongInt): Pointer; virtual;
    procedure Exchange(I1, I2: LongInt); virtual;
    function Expand: TS_CustomList;
    function First: Pointer;
    function IndexOf(Item: Pointer): LongInt; virtual;
    procedure Insert(Index: LongInt; Item: Pointer); virtual;
    function Last: Pointer;
    procedure Move(CurIndex, NewIndex: LongInt); virtual;
    procedure Pack; virtual;
    function Remove(Item: Pointer): LongInt; virtual;
  end;

Most of the methods of TS_CustomList are simple and straightforward. For example, Listing 5-7 shows the IndexOf method. This is a typical method of the TS_CustomList class, showing how to use HugeInc to make sure the pointer is correct.

Listing 5-7 IndexOf, which is a typical method of TS_CustomList

function TS_CustomList.IndexOf(Item: Pointer): LongInt;
var
  Ptr: ^Pointer;
begin
  Ptr := fList;
  for Result := 0 to Count-1 do
  begin
    if Ptr^ = Item then
      Exit;
    HugeInc(Ptr, SizeOf(Pointer));
  end;
  Result := -1;
end;

Of particular interest is how to implement the huge array where TS_CustomList stores its data. You already know how to use HugeInc to access the data, but what happens when items are inserted and deleted? The other items in the list must be moved within the huge array, and that's a little trickier.

Delphi has the Move procedure to copy data within a segment, or from one segment to another, but it does not move more than 64 KB at a time, and when copying data, it cannot cross a segment boundary. There is the Windows procedure hmemcpy, which can copy more than 64 KB at a time and can copy memory that crosses segment boundaries, but it does not handle overlapping memory correctly.

Overlapping memory occurs when the source and destination memory intersect, as shown in Figure 5-7. When memory overlaps, you need to be careful when copying, or else you can destroy the very memory you are trying to copy. You must start copying from the start of the memory or from the end, depending on how the memory overlaps. To determine how the memory overlaps, you need to compare pointers to see which pointer comes before the other. Comparing pointers the simple way (Ptr1 < Ptr2) does not work because Delphi compares only the offsets, not the segments. The huge array requires a comparison that uses the segments and the offsets together. In the specific case of a list class, this is easy because the class can compare list indexes, not pointers.

Figure 5-7 Diagram of how to copy overlapping memory

When inserting a new list item, the other items in the list need to be moved to make room for the new item. The existing items can be copied from lower memory addresses to higher, which requires starting at the end (higher addresses) and working toward the lower addresses at the beginning of the memory being copied. When deleting an item, copy from higher addresses to lower, which means starting at the lower addresses and working toward the higher addresses.

When copying the memory, be very careful about crossing segment boundaries. Before copying any memory, check how close the memory is to a segment boundary. If the amount of memory to copy is larger than the number of bytes left between the current pointer and the segment boundary, copy only the remaining bytes in the segment. Update the segment pointer and then start copying again in the new segment.

This computation is actually a little tricky. It depends on whether the copy is going up or down. When copying down, the difference between the maximum segment size ($10000 bytes) and the pointer offset determines how many bytes to copy in that segment. If the segment offset is zero, then truncating $10000 to a word yields zero, in which case you can copy the maximum number of words, $7FFF. These computations are shown in Listing 5-8. The SrcOffset and DstOffset arguments are the offset parts of the source and destination pointers.

Listing 5-8 Computing the number of bytes to copy

function ComputeDownMoveSize(SrcOffset, DstOffset: Word): Word;
begin
  { Determine the number of words remaining in the segment, to copy. }
  if SrcOffset > DstOffset then
    Result := Word($10000 - SrcOffset) div 2
  else
    Result := Word($10000 - DstOffset) div 2;
  { Copy the entire segment. }
  if Result = 0 then
    Result := $7FFF;
end;
function ComputeUpMoveSize(SrcOffset, DstOffset: Word): Word;
begin
  if SrcOffset = $FFFF then
    { Copy as many words as will fit in the destination. }
    Result := DstOffset div 2
  else if DstOffset = $FFFF then
    { Copy as many words as are available in the source. }
    Result := SrcOffset div 2
  else if SrcOffset > DstOffset then
    { Copy as many words as will fit in the destination. }
    Result := DstOffset div 2 + 1
  else
    { Copy as many words as are available in the source. }
    Result := SrcOffset div 2 + 1;
end;

To do the actual memory copying, you might think to use the Move procedure, but that copies memory one byte at a time. Any PC that can run Windows can copy memory a word at a time, which is faster. Since the list contains pointers (which are two words long), it can safely copy the memory by word instead of by byte. Listing 5-9 shows the new Move procedure, written in assembly language because that's the best way to use the MOVSW instruction.

Listing 5-9 New Move procedure, to copy memory by words

{ Copy Size words from SrcPtr to DstPtr, using the current direction flag. }
procedure MoveWords(SrcPtr, DstPtr: Pointer; Size: Word); assembler;
asm
  push ds                 { Save DS, which is modified below. }
  lds si, SrcPtr          { Get the source pointer. }
  les di, DstPtr          { Get the destination pointer. }
  mov cx, Size.Word[0]    { Get the number of words to copy. }
  rep movsw               { Copy the memory. }
  pop ds                  { Restore DS. }
end;

When copying memory with the MOVSW instruction, the direction in which the memory is copied depends on a direction flag. The STD instruction sets the direction flag (copying from higher to lower addresses) and the CLD instruction clears the direction flag (copying from lower to higher addresses). Listing 5-10 shows inline procedures to set or clear the direction flag without the overhead of actually calling any procedure. A single instruction is an ideal use of an inline procedure.

Listing 5-10 Inline procedures to set or clear the direction flag

procedure cld; inline ($fc);    { clear direction flag }
procedure std; inline ($fd);    { set direction flag }

Now it's time to put the pieces together into the HugeMove procedure, which is shown in Listing 5-11. Since this routine is specifically written for the huge list class, the HugeMove procedure takes list indexes as arguments, not arbitrary pointers. There is only one pointer, Base, which points to the base of the array, which must have a zero offset. (Just like all pointers returned by GlobalAlloc.)

Listing 5-11 HugeMove procedure

procedure HugeMove(Base: Pointer; Dst, Src, Size: LongInt);
var
  SrcPtr, DstPtr: Pointer;
  MoveSize: Word;
begin
  SrcPtr := HugeOffset(Base, Src * SizeOf(Pointer));
  DstPtr := HugeOffset(Base, Dst * SizeOf(Pointer));
  { Convert longword size to words. }
  Size := Size * (SizeOf(LongInt) div SizeOf(Word));
  if Src < Dst then
  begin
    { Start from the far end and work toward the front. }
    std;
    HugeInc(SrcPtr, (Size-1) * SizeOf(Word));
    HugeInc(DstPtr, (Size-1) * SizeOf(Word));
    while Size > 0 do
    begin
      { Compute how many bytes to move in the current segment. }
      MoveSize := ComputeUpMoveSize(Word(SrcPtr), Word(DstPtr));
      if MoveSize > Size then
        MoveSize := Word(Size);
      { Move the bytes. }
      MoveWords(SrcPtr, DstPtr, MoveSize);
      { Update the number of bytes left to move. }
      Dec(Size, MoveSize);
      { Update the pointers. }
      HugeDec(SrcPtr, MoveSize * SizeOf(Word));
      HugeDec(DstPtr, MoveSize * SizeOf(Word));
    end;
    cld;     { reset the direction flag }
  end
  else
  begin
    { Start from the beginning and work toward the end. }
    cld;
    while Size > 0 do
    begin
      { Compute how many bytes to move in the current segment. }
      MoveSize := ComputeDownMoveSize(Word(SrcPtr), Word(DstPtr));
      if MoveSize > Size then
        MoveSize := Word(Size);
      { Move the bytes. }
      MoveWords(SrcPtr, DstPtr, MoveSize);
   
      { Update the number of bytes left to move. }
      Dec(Size, MoveSize);
      { Advance the pointers. }
      HugeInc(SrcPtr, MoveSize * SizeOf(Word));
      HugeInc(DstPtr, MoveSize * SizeOf(Word));
    end;
  end;
end;

Now that you have seen how to move large amounts of memory in Delphi 1.0, it's time to use this knowledge. The Insert and Delete methods need to move memory. Insert makes room for the new item, and Delete eliminates an item and moves the rest of the list over. Listing 5-12 shows the Delete method and the CheckIndex method, which raises an exception if the list index is out of bounds. The SListIndexError resource identifier is the same one used by TList for its list index errors.

Listing 5-12 The Delete method of TS_CustomList

{ Delete an item from the list. Override this method if you need
  to do anything, such as free the pointer. }
function TS_CustomList.Delete(Index: LongInt): Pointer;
begin
  Result := Items[Index];
  Dec(fCount);
  HugeMove(fList, Index, Index+1, Count-Index);
end;
{ Check a list index and raise an exception if it is not valid. }
procedure TS_CustomList.CheckIndex(Index: LongInt);
begin
  if (Index < 0) or (Index >= Count) then
    raise EListError.CreateRes(SListIndexError)
end;

The Insert method makes room for the new item by moving every item with a higher index. You can also append an item by "inserting" it at the end. Thus, the check for a valid index needs to allow Index = Count as valid. Listing 5-13 shows the Insert method.

Listing 5-13 The Insert method of TS_CustomList

{ Insert Item at position, Index. Slide all other items
  over to make room. The user can insert to any valid index,
  or to one past the end of the list, thereby appending an
  item to the list. In the latter case, adjust the capacity
  if needed. }
procedure TS_CustomList.Insert(Index: LongInt; Item: Pointer);
var
  Ptr: ^Pointer;
begin
  if (Index < 0) or (Index > Count) then
    raise EListError.CreateRes(SListIndexError);
  if Count >= Capacity then
    Expand;
  { Make room for the inserted item. }
  Ptr := HugeOffset(fList, Index * SizeOf(Pointer));
  HugeMove(Ptr, 1, 0, Count-Index);
  Ptr^ := Item;
  Inc(fCount);
end;

The other methods are similar. To further illustrate how the TS_CustomList class manipulates its pointers and indexes, Listing 5-14 shows the Move and Exchange methods. Notice how Exchange calls CheckIndex on both index arguments, to make sure the indexes are valid. Every routine takes great pains to validate the indexes before using them. It is equally important, however, not to check an index more than once. TS_CustomList must be fast, since it often lies at the heart of many other classes and routines. It is important to eliminate as much waste as possible, even if that means making the code a little harder to read.

Listing 5-14 The Move and Exchange methods of TS_CustomList

{ Exchange the items at indexes I1 and I2. }
procedure TS_CustomList.Exchange(I1, I2: LongInt);
var
  Tmp: Pointer;
  P, Q: ^Pointer;
begin
  CheckIndex(I1);
  CheckIndex(I2);
  P := HugeOffset(fList, I1 * SizeOf(Pointer));
  Q := HugeOffset(fList, I2 * SizeOf(Pointer));
  Tmp := P^;
  P^ := Q^;
  Q^ := Tmp;
end;
{ Move an item from CurIndex to NewIndex. Only move the
  items that lie between CurIndex and NewIndex, leaving
  the rest of the list alone.
}
procedure TS_CustomList.Move(CurIndex, NewIndex: LongInt);
var
  Tmp: Pointer;
  Ptr: ^Pointer;
begin
  CheckIndex(NewIndex);
  if NewIndex <> CurIndex then
  begin
    Tmp := Items[CurIndex];
    if NewIndex < CurIndex then
    begin
      Ptr := HugeOffset(fList, NewIndex * SizeOf(Pointer));
      HugeMove(Ptr, 1, 0, CurIndex-NewIndex)
    end
    else if CurIndex < NewIndex then
    begin
      Ptr := HugeOffset(fList, CurIndex * SizeOf(Pointer));
      HugeMove(Ptr, 0, 1, NewIndex-CurIndex);
      Ptr := HugeOffset(fList, NewIndex * SizeOf(Pointer));
    end;
    Ptr^ := Tmp;
  end;
end;

The Pack method is particularly interesting. It packs nil slots by deleting them from the list. The simple approach is just to start at one end of the list, look for the next nil item, and delete it from the list. Repeat this process until the entire list has been processed. This is how Delphi's TList.Pack method works. The problem with this algorithm is that it is very inefficient if there are many nil items in a large list. A better approach would delete many adjacent nil items at once, instead of individually. Listing 5-15 shows this better way of writing the Pack method. A derived class might want to do something with the slots that are packed, so the PackProc procedure is called for each index. The default is not to have a PackProc procedure.

Listing 5-15 An efficient implementation of the Pack method

{ Pack the list by removing nil slots. After packing
  Count might be smaller. After each loop iteration,
  the following is invariant:
    Items[k] <> nil for all k <= i
  Thus, when at the end of the loop, the list is packed.
  The loop marches through the list, using the I index.
  Whenever Items[I] = nil, collect a maximal string of nil slots,
  and then shift down the remaining items, adjusting Count to match.
}
procedure TS_CustomList.DoPack(PackProc: TS_PackProc);
var
  I, J, K: LongInt;
  P, Q: ^Pointer;
begin
  { Instead of a for loop, use a while loop, and use the
    current value of Count for each iteration, since Count
    changes during the loop. }
  I := 0;
  P := fList;
  while I < Count do
  begin
    if P^ <> nil then
    begin
      Inc(I);
      P := HugeOffset(fList, I*SizeOf(Pointer));
    end
    else
    begin
      if Assigned(PackProc) then
        PackProc(I);
      { Collect a run of nil slots. }
      for J := I+1 to Count-1 do
      begin
        P := HugeOffset(fList, J * SizeOf(Pointer));
        if P^ <> nil then
          Break
        else if Assigned(PackProc) then
          PackProc(J)
        else
      end;
      { Shift slots if there is a non-nil value.
        If all the remaining slots are nil, then the loop is done. }
      if P^ = nil then
      begin
        fCount := I;
        Break;
      end;
      { Now shift the slots; setting the newly vacated slots to nil,
        as a safety measure. Stop at the next nil slot. }
      K := I;
      while J < Count do
      begin
        P := HugeOffset(fList, K * SizeOf(Pointer));
        Q := HugeOffset(fList, J * SizeOf(Pointer));
        P^ := Q^;
        { Check after assigning to P^, so the check for nil at
          the top of the loop is true. A small inefficiency for
          greater programming ease and maintainability. }
        if Q^ = nil then
          Break;
        Q^ := nil;
        Inc(K);
        Inc(J);
      end;
      { Adjust Count by the number of nil slots removed. }
      Dec(fCount, J-K);
      { Set the loop counter to the next nil slot. }
      I := K;
    end;
  end;
end;
procedure TS_CustomList.Pack;
begin
  DoPack(nil)
end;

TS_CustomList in Delphi 2.0

Now that you have gone through all that trouble to implement huge lists in Delphi 1.0, what about Delphi 2.0? The TList class can handle large lists simply because of the 32-bit environment, but it still has limitations that make it difficult to use as a base class for a custom list class. It seems obvious to use the same scheme of TS_CustomList and TS_List in Delphi 2.0, but the implementation of TS_CustomList is much simpler because there is no need to perform any segment shenanigans.

There are two ways to approach TS_CustomList in Delphi 2.0. One way is to keep as much code as possible from the Delphi 1.0 implementation, and simply remove the segment manipulations. The other way is to use the TList class for the underlying implementation. The examples in this section present both approaches, for comparison.

Listing 5-16 shows how to keep most of the same code for TS_CustomList between Delphi 1.0 and Delphi 2.0 by implementing the HugeMove method and HugeOffset function. The HugeOffset function is just a simple addition. In Delphi 2.0, HugeMove just calls Move, and in Delphi 1.0, it performs the magic from Listing 5-11. In Delphi 2.0, the Move procedure can copy any amount of memory, can handle overlapping memory, and is fast. What more can you ask for?

Listing 5-16 Implementing HugeOffset and HugeMove in Delphi 2.0

{$ifdef WIN32}
procedure HugeMove(Base: Pointer; Dst, Src, Size: LongInt);
var
  SrcPtr, DstPtr: Pointer;
  MoveSize: Word;
begin
  SrcPtr := Pointer( PChar(Base) + Src*SizeOf(Pointer) );
  DstPtr := Pointer( PChar(Base) + Dst*SizeOf(Pointer) );
  Move(SrcPtr^, DstPtr^, Size*SizeOf(Pointer));
end;
{$endif}

Another approach is to reuse as much of TList as possible by writing TS_CustomList methods as wrappers for their equivalent TList methods. Listing 5-17 shows how to implement the Delete and Insert methods of TS_CustomList by calling the corresponding methods of TList.

Listing 5-17 Implementing TS_CustomList with a TList object

procedure TS_CustomList.Insert(Index: LongInt; Ptr: Pointer);
begin
  fList.Insert(Index, Ptr);
end;
function TS_CustomList.Delete(Index: LongInt): Pointer;
begin
  Result := fList[Index];
  fList.Delete(Index);
end;

Given these two different ways to implement TS_CustomList in Delphi 2.0, how do you decide which is better? There is no single definition of better. You need to consider your particular situation. If you decide to add a new method to TS_CustomList, is it important that you add it to the Delphi 1.0 and Delphi 2.0 implementations? Must you maintain strict compatibility? If so, then you probably want to share as much of the source code as possible, which calls for the first solution.

If you tend to use Delphi 2.0, and want compatibility with Delphi 1.0 only in a few minor situations, you might prefer the second solution. When a future version of Delphi is released, you might need to change TS_CustomList again, and the simpler implementation of the second solution would be easier to manage.

The CD-ROM contains the full source code for both solutions in Listings\Chap05\S_List.pas and Listings\Chap05\S_List32.pas. Choose the one that works best for you.

Tempest Software
- Secrets of Delphi 2
  - Chapter 5 Dynamic Arrays: The TList Class
    - Dynamic Arrays with TList
    - TList Overview
    - Using the TList Class
    - Limitations of the TList Class
    - A Better TList Class
    - Summary