Wait… WHAT!?

Wait… WHAT!?

Exercise for the reader – how much faster would this be with a hash table or binary search?

unit Data.Db;

function TFields.FindField(const FieldName: string): TField;

var

  I: Integer;

  HashValue: Cardinal;

begin

  if FList.Count > 0 then

  begin

    HashValue := TNamedItem.HashName(FieldName);

    for I := 0 to FList.Count – 1 do

    begin

      Result := FList.Items[I];

      if (Result.FFieldNameHashValue = HashValue) and

         (AnsiCompareText(Result.FFieldName, FieldName) = 0) then

        Exit;

    end;

  end;

  Result := nil;

end;

36 thoughts on “Wait… WHAT!?


  1. How many fields do you have?


    If it’s dozens or hundreds, hashtable would win, if it’s half a dozen, the hashtable would lose (both in terms of performance and code complexity).


    Naive binary search would likely lose to both that snippet and a hastable (because AnsiCompareText is not cheap), but a binary search on the hash might work better than both if you have no more than dozens of fields.


    If you have dozens or hundreds of fields, I suspect this method is going to be more canary than problem 😉


    Also a bigger problem is if you’re calling that code multiple times per record (in which case you deserve the crappy performance), rather than once per field.

    Like


  2. Lars, I tend to disagree. This is a list that is not sorted. Sure you could have a sorted and hashed fields dictionary used to speed up searches, in case you need to search often (hint: use persistent fields).


    Oh, you can do that, just with a little code. And why it’s not in each and every TDataSet by default? Because in many cases it is not used, would imply an overhead in each and every application.


    I know this might sound as taking excuses, but a library has design principles in terms of being general purpose and balancing speed/memory/time for ALL users. Adding a hash for the field name for faster searches (like shown above) still costs extra to some, but probably is a good trade-off. Pushing FindField to the max might not suit every user as well… and adding that feature when needed is really at reasonably easy reach. 


    I’m not pointing this out for the specific example, but because while we have don and probably still do mistakes that we need to correct, there are scenarios in which there are design decisions that are not in the best interest of some of our users, but they tend to balance different usage scenarios. And you know developers use Delphi for the most diverse applications out there. Good enough you have the power to see (full source code) and to add your own “improvements”.


    I’ve witnessed other cases in which a speed up for a customer would result in a slow down for another, and even a complete crash!

    Like


  3. See below for my current quick fix which creates the indexed lookup the first time FieldByName function is called.  Note that I did this in a wrapper and not by modifying the code.  


    The 220+ views have from a few to 70+ columns.


    This significantly speeded up reading records and accessing by named field instead of by order – which IMO is the future-proof way of reading row values (i.e. no sequence dependencies).  


    If there is a better way to set property values from a query, than


    Obj.Prop1 := DataSet.FieldByName[‘PropField1’].AsPropertyType;


    Obj.Prop2 := DataSet.FieldByName[‘PropField2’].AsPropertyType;




    I would be willing to try it out.


    function TPSDRecordset_FD_Abstract.FieldByName(const FieldName:String):TField;


    var


      F : TField;


      ix : Integer;


    begin


      if not Assigned(IndexedFieldList)


      then begin


        IndexedFieldList := TFieldIndexList.Create;


        IndexedFieldList.Sorted := False;


        for F in DataSet.Fields


         do IndexedFieldList.AddObject(LowerCase(F.FieldName), TObject(F));


         IndexedFieldList.Sorted := True;


      end;


      if IndexedFieldList.Find(LowerCase(FieldName), ix)


       then Result := IndexedFieldList.Items[ix]


        else raise EPSDDbFieldNotFound.Create(‘Field “‘+FieldName+'” not found in record set!’);


    end;

    Like


  4. Furthermore…


    There are two more things I wonder about in this code.


    1. Why create a hash, but still compare both the hash and the field name?


    2. MSSQL supports unicode table and column names.  


    CREATE TABLE អតិថិជន


    (


     ឈ្មោះអ្នកខ្ចី INT


    )


    Is this supported?  (Thinking about that AnsiCompareText)

    Like


  5. Eric Grange – For most reads, we use the simple way of just passing the record set to the object instance and let it pick out the fields from the current row.  If it is necessary to read a large number of rows, we have routines that indeed do only fetch the field once.  

    Like


  6. Николай Зверев Case sensitivity can be turned off, and tends to be off by default – at least for MSSQL, MySQL, and Oracle.  We are about to move to PostgreSQL, where that is NOT the case – so that will get interesting.

    Like


  7. Eric Grange – I lower-cased the field names on TDictionary.Add and TryGetValue.  


    Any other alternative you could recommend that would beat a sorted TStringList with a custom compare string (SysUtils.CompareStr)?

    Like


  8. Lars Fosdal unless you have lots of accented and international character, an optimistic case-insensitive string comparison can be faster than LowerCase + CompareStr, it saves the memory allocation and most calls to the sluggish Unicode-aware functions


    http://www.delphitools.info/2011/05/04/optimistic-unicode-case-insensitive-comparetext/


    basically you start comparing under the assumption it’s ASCII, and fallback to Unicode comparison only if that is not the case

    Like


  9. Lars Fosdal since you’re creating a HashValue for the field name anyway, have you tried to use this as a custom hash for the TDictionary? 


    Also, how many fields are there? I would have thought TDictionary would have been faster once the number of fields goes up.

    Like


  10. Steve Maughan – I don’t make a hash, I just use the lower case field name.  Field counts: 75, 74, 73, 59, 59, 55, 54, 54, 50, 49, 46, 46, 45, 45, 43, 43, 41, 40, 39, 39, 39, 38, 38, 37, 36, 36, 36, 35, 35, 34, 32, 32, 32, 31, 31, 30, 29, 29, 29, 27, 27, 27, 27, 25, 24, 24, 24, 23, 23, 23, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 1, 1

    Like


  11. Well, given how pervasive the Generics Collections have become, it makes you wonder if they should do more to ensure that they are super fast.  


    F.x. Why do we call BobJenkinsHash, when it’s just a wrapper for HashLittle?

    Like


  12. In one complex form we have I found that over half the time spent during its 15+ second loading time is due to FindField. Most of it was actually spent hashing the strings. If you look at the HashName function you should begin to cry (unless they’ve sorted it out since XE3).

    Like


  13. In a loop, always use a local variable with each field.


    It will be faster and more readable.


    The whole db.pas unit is far from optimized. For instance, TDataSet is a real bottleneck.

    Like


  14. A. Bouchez – If you need to read many rows, that really helps – but if you have many relations, and a large set of data – you still will need to look up a lot of fields, and if each of those are a linear enumeration through the list of fields – it really is seriously slow.


    To make an efficient lookup mechanism, there are already several constraints that helps.  We know there will not be duplicate field names.  We know the number of fields will be fairly moderate, and we know the list will constant once built, for the lifetime of the retrieval.  We could – in theory – cache the list between uses – if we make the assumption that the database schema doesn’t change between uses.

    Like


  15. Eric Grange Indeed. When profiling the slow loading form, I found like 70% of the time was spent in string allocation and Move. The first due to AnsiUpperCase and then the Move you mention.


    The reason FieldByName was called so much during startup on this form was due to our DevExpress grids, which apparently uses it to populate the grids, along the lines of


      col.Value := FieldByName(col.DataBinding.FieldName).Value;


    One of the grids contained rather a lot of records.


    Unfortunately our “main” table is rather denormalized (almost 2^9 columns, doh!), so in our case the hashing can still pay off, but I’m certain a FastCode-style rewrite of that hash function would do wonders.

    Like


  16. Jeroen Wiert Pluimers FWIW the hash table and hash set currently in Spring4D are just wrappers around Generics.Collections.TDictionary. TDictionary is fast on a good day, but requires a good hash function as it’s very prone to clustering, in which case you’re back to O(n).

    Like


  17. —-


    If there is a better way to set property values from a query, than


    Obj.Prop1 := DataSet.FieldByName[‘PropField1’].AsPropertyType;


    Obj.Prop2 := DataSet.FieldByName[‘PropField2’].AsPropertyType;




    I would be willing to try it out.


    —–


    Is this code in a loop? Or even just executed once, but executed again and again as some current rec somewhere else changes?


    Don’t want to use persistent fields at design time because they tie you down to much?


    Then set up “semi-persistent” fields: declare an instance member for every field you use, populate them using FieldByName once when you open the dataset and use the instance members in the rest of your code.


    But I am sure you know about that? So what am I missing here?

    Like


  18. Did anyone create his own dataset and added a new implementation of FindField? Would be interesting if the speed gains are mainly in benchmarks or if there is a noticable difference in real world applications. 

    Like


  19. Markus Müller 


    XE6, 385 queries,executed on local db on a Core i7. Where multi-row, TFields are retrieved once, and reused for each row.


    s.ms DB


    14.059 ADO


    13.715 ADO Optimized


    11.083 FireDAC


    10.701 FireDAC Optimized

    Like

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.