Wait… WHAT!?

Lars Fosdal Delphi Developers: Delphi Code Rants September 19, 2014 1 Minute

Wait… WHAT!?

Exercise for the reader – how much faster would this be with a hash table or binary search?

unit Data.Db;

function TFields.FindField(const FieldName: string): TField;

var

I: Integer;

HashValue: Cardinal;

begin

if FList.Count > 0 then

begin

HashValue := TNamedItem.HashName(FieldName);

for I := 0 to FList.Count – 1 do

begin

Result := FList.Items[I];

if (Result.FFieldNameHashValue = HashValue) and

(AnsiCompareText(Result.FFieldName, FieldName) = 0) then

Exit;

end;

Result := nil;

end;

Published by Lars Fosdal

Software Developer View all posts by Lars Fosdal

Published September 19, 2014

36 thoughts on “Wait… WHAT!?”

Arioch The says:

September 19, 2014 at 07:53

Do you remember what they did in xe2 with normalizing trect ?

Loading...

Reply
Eric Grange says:

September 19, 2014 at 07:58

How many fields do you have?

If it’s dozens or hundreds, hashtable would win, if it’s half a dozen, the hashtable would lose (both in terms of performance and code complexity).

Naive binary search would likely lose to both that snippet and a hastable (because AnsiCompareText is not cheap), but a binary search on the hash might work better than both if you have no more than dozens of fields.

If you have dozens or hundreds of fields, I suspect this method is going to be more canary than problem 😉

Also a bigger problem is if you’re calling that code multiple times per record (in which case you deserve the crappy performance), rather than once per field.

Loading...

Reply
Marco Cantù says:

September 19, 2014 at 08:07

Lars, I tend to disagree. This is a list that is not sorted. Sure you could have a sorted and hashed fields dictionary used to speed up searches, in case you need to search often (hint: use persistent fields).

Oh, you can do that, just with a little code. And why it’s not in each and every TDataSet by default? Because in many cases it is not used, would imply an overhead in each and every application.

I know this might sound as taking excuses, but a library has design principles in terms of being general purpose and balancing speed/memory/time for ALL users. Adding a hash for the field name for faster searches (like shown above) still costs extra to some, but probably is a good trade-off. Pushing FindField to the max might not suit every user as well… and adding that feature when needed is really at reasonably easy reach.

I’m not pointing this out for the specific example, but because while we have don and probably still do mistakes that we need to correct, there are scenarios in which there are design decisions that are not in the best interest of some of our users, but they tend to balance different usage scenarios. And you know developers use Delphi for the most diverse applications out there. Good enough you have the power to see (full source code) and to add your own “improvements”.

I’ve witnessed other cases in which a speed up for a customer would result in a slow down for another, and even a complete crash!

Loading...

Reply
Николай Зверев says:

September 19, 2014 at 09:23

This is not hash table. This is just a list with hash values. Code shows that the developers tried to reduce the number of AnsiCompareText calls.

Loading...

Reply
Lars Fosdal says:

September 19, 2014 at 09:38

See below for my current quick fix which creates the indexed lookup the first time FieldByName function is called. Note that I did this in a wrapper and not by modifying the code.

The 220+ views have from a few to 70+ columns.

This significantly speeded up reading records and accessing by named field instead of by order – which IMO is the future-proof way of reading row values (i.e. no sequence dependencies).

If there is a better way to set property values from a query, than

Obj.Prop1 := DataSet.FieldByName[‘PropField1’].AsPropertyType;

Obj.Prop2 := DataSet.FieldByName[‘PropField2’].AsPropertyType;

…

I would be willing to try it out.

function TPSDRecordset_FD_Abstract.FieldByName(const FieldName:String):TField;

var

F : TField;

ix : Integer;

begin

if not Assigned(IndexedFieldList)

then begin

IndexedFieldList := TFieldIndexList.Create;

IndexedFieldList.Sorted := False;

for F in DataSet.Fields

do IndexedFieldList.AddObject(LowerCase(F.FieldName), TObject(F));

IndexedFieldList.Sorted := True;

end;

if IndexedFieldList.Find(LowerCase(FieldName), ix)

then Result := IndexedFieldList.Items[ix]

else raise EPSDDbFieldNotFound.Create(‘Field “‘+FieldName+'” not found in record set!’);

end;

Loading...

Reply
Lars Fosdal says:

September 19, 2014 at 09:44

Furthermore…

There are two more things I wonder about in this code.

1. Why create a hash, but still compare both the hash and the field name?

2. MSSQL supports unicode table and column names.

CREATE TABLE អតិថិជន

(

ឈ្មោះអ្នកខ្ចី INT

)

Is this supported? (Thinking about that AnsiCompareText)

Loading...

Reply
Marco Cantù says:

September 19, 2014 at 09:48

Two strings might have the same hash. The Ansi I’m not convinced either.

Loading...

Reply
Николай Зверев says:

September 19, 2014 at 09:51

1. Hash function for a completely different strings can return the same value.

2. Usually when accessing such fields:

– names of the fields are enclosed in quotation marks (in SQL-query);

– names of the fields are case sensitive.

Loading...

Reply
Lars Fosdal says:

September 19, 2014 at 09:59

Eric Grange – For most reads, we use the simple way of just passing the record set to the object instance and let it pick out the fields from the current row. If it is necessary to read a large number of rows, we have routines that indeed do only fetch the field once.

Loading...

Reply
Steve Maughan says:

September 19, 2014 at 10:28

Is this an ideal candidate for TDictionary?

Loading...

Reply
Lars Fosdal says:

September 19, 2014 at 10:50

Steve Maughan, I just tested that on the same dataset. It added about 250ms to the load time. Plain generic TDictionary is not a speed demon.

Loading...

Reply
Eric Grange says:

September 19, 2014 at 10:51

Steve Maughan Lars Fosdal and TDictionary is case sensitive in that case no? so it actually had an easier job

Loading...

Reply
Lars Fosdal says:

September 19, 2014 at 11:01

Николай Зверев Case sensitivity can be turned off, and tends to be off by default – at least for MSSQL, MySQL, and Oracle. We are about to move to PostgreSQL, where that is NOT the case – so that will get interesting.

Loading...

Reply
Uwe Raabe says:

September 19, 2014 at 11:07

Marco Cantù The Ansi in the name is misleading. It actually calls the CompareStringW function in kernel32.

Loading...

Reply
Marco Cantù says:

September 19, 2014 at 11:12

Uwe Raabe Good catch, I should have known…

Loading...

Reply
Lars Fosdal says:

September 19, 2014 at 11:21

Uwe Raabe – That’s one less worry. Thanks!

Loading...

Reply
Lars Fosdal says:

September 19, 2014 at 11:25

Eric Grange – I lower-cased the field names on TDictionary.Add and TryGetValue.

Any other alternative you could recommend that would beat a sorted TStringList with a custom compare string (SysUtils.CompareStr)?

Loading...

Reply
Eric Grange says:

September 19, 2014 at 11:55

Lars Fosdal unless you have lots of accented and international character, an optimistic case-insensitive string comparison can be faster than LowerCase + CompareStr, it saves the memory allocation and most calls to the sluggish Unicode-aware functions

http://www.delphitools.info/2011/05/04/optimistic-unicode-case-insensitive-comparetext/

basically you start comparing under the assumption it’s ASCII, and fallback to Unicode comparison only if that is not the case

Loading...

Reply
Steve Maughan says:

September 19, 2014 at 13:09

Lars Fosdal since you’re creating a HashValue for the field name anyway, have you tried to use this as a custom hash for the TDictionary?

Also, how many fields are there? I would have thought TDictionary would have been faster once the number of fields goes up.

Loading...

Reply
Lars Fosdal says:

September 19, 2014 at 14:34

Steve Maughan – I don’t make a hash, I just use the lower case field name. Field counts: 75, 74, 73, 59, 59, 55, 54, 54, 50, 49, 46, 46, 45, 45, 43, 43, 41, 40, 39, 39, 39, 38, 38, 37, 36, 36, 36, 35, 35, 34, 32, 32, 32, 31, 31, 30, 29, 29, 29, 27, 27, 27, 27, 25, 24, 24, 24, 23, 23, 23, 22, 22, 22, 21, 21, 20, 20, 20, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 17, 17, 17, 17, 17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 12, 12, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 1, 1

Loading...

Reply
Steve Maughan says:

September 19, 2014 at 15:03

Lars Fosdal I’m amazed TDictionary is slower since it only costs the hashing of the FieldName and rest is a direct look-up.

Loading...

Reply
Lars Fosdal says:

September 19, 2014 at 15:25

Well, given how pervasive the Generics Collections have become, it makes you wonder if they should do more to ensure that they are super fast.

F.x. Why do we call BobJenkinsHash, when it’s just a wrapper for HashLittle?

Loading...

Reply
Asbjørn Heid says:

September 19, 2014 at 16:48

In one complex form we have I found that over half the time spent during its 15+ second loading time is due to FindField. Most of it was actually spent hashing the strings. If you look at the HashName function you should begin to cry (unless they’ve sorted it out since XE3).

Loading...

Reply
Lars Fosdal says:

September 19, 2014 at 23:33

The brute force simple binary search cuts the time quite a bit, but I think I’ll a proper hash list might speed it up further.

Loading...

Reply
A. Bouchez says:

September 19, 2014 at 23:37

In a loop, always use a local variable with each field.

It will be faster and more readable.

The whole db.pas unit is far from optimized. For instance, TDataSet is a real bottleneck.

Loading...

Reply
Lars Fosdal says:

September 19, 2014 at 23:42

A. Bouchez – If you need to read many rows, that really helps – but if you have many relations, and a large set of data – you still will need to look up a lot of fields, and if each of those are a linear enumeration through the list of fields – it really is seriously slow.

To make an efficient lookup mechanism, there are already several constraints that helps. We know there will not be duplicate field names. We know the number of fields will be fairly moderate, and we know the list will constant once built, for the lifetime of the retrieval. We could – in theory – cache the list between uses – if we make the assumption that the database schema doesn’t change between uses.

Loading...

Reply
Jeroen Wiert Pluimers says:

September 20, 2014 at 08:42

You might want to try Spring4D generic stuff and compare if it is faster than the built in stuff. It is better designed, but I’m not sure if it is faster.

Loading...

Reply
Eric Grange says:

September 20, 2014 at 09:34

Lars Fosdal had forgotten about it http://www.delphitools.info/2010/11/30/fieldbyname-or-why-a-profiler-is-your-friend/

Loading...

Reply
Asbjørn Heid says:

September 20, 2014 at 09:51

Eric Grange Indeed. When profiling the slow loading form, I found like 70% of the time was spent in string allocation and Move. The first due to AnsiUpperCase and then the Move you mention.

The reason FieldByName was called so much during startup on this form was due to our DevExpress grids, which apparently uses it to populate the grids, along the lines of

col.Value := FieldByName(col.DataBinding.FieldName).Value;

One of the grids contained rather a lot of records.

Unfortunately our “main” table is rather denormalized (almost 2^9 columns, doh!), so in our case the hashing can still pay off, but I’m certain a FastCode-style rewrite of that hash function would do wonders.

Loading...

Reply
Asbjørn Heid says:

September 20, 2014 at 10:12

Jeroen Wiert Pluimers FWIW the hash table and hash set currently in Spring4D are just wrappers around Generics.Collections.TDictionary. TDictionary is fast on a good day, but requires a good hash function as it’s very prone to clustering, in which case you’re back to O(n).

Loading...

Reply
Jeroen Wiert Pluimers says:

September 20, 2014 at 13:21

Asbjørn Heid thanks for clarifying that.

Loading...

Reply
Marjan Venema says:

September 21, 2014 at 19:41

—-

If there is a better way to set property values from a query, than

Obj.Prop1 := DataSet.FieldByName[‘PropField1’].AsPropertyType;

Obj.Prop2 := DataSet.FieldByName[‘PropField2’].AsPropertyType;

…

I would be willing to try it out.

—–

Is this code in a loop? Or even just executed once, but executed again and again as some current rec somewhere else changes?

Don’t want to use persistent fields at design time because they tie you down to much?

Then set up “semi-persistent” fields: declare an instance member for every field you use, populate them using FieldByName once when you open the dataset and use the instance members in the rest of your code.

But I am sure you know about that? So what am I missing here?

Loading...

Reply
Lars Fosdal says:

September 21, 2014 at 22:06

Marjan Venema – No offence, but you should have read all the previous comments.

Loading...

Reply
Markus Müller says:

September 22, 2014 at 06:54

Did anyone create his own dataset and added a new implementation of FindField? Would be interesting if the speed gains are mainly in benchmarks or if there is a noticable difference in real world applications.

Loading...

Reply
Marjan Venema says:

September 22, 2014 at 09:16

Lars Fosdal Non taken. I did skip over them a bit thinking they were mostly about hash tables/functions etc.

Loading...

Reply
Lars Fosdal says:

September 22, 2014 at 10:29

Markus Müller

XE6, 385 queries,executed on local db on a Core i7. Where multi-row, TFields are retrieved once, and reused for each row.

s.ms DB

14.059 ADO

13.715 ADO Optimized

11.083 FireDAC

10.701 FireDAC Optimized

Loading...

Reply

Wait… WHAT!?

Like this:

Related

Published by Lars Fosdal

36 thoughts on “Wait… WHAT!?”

Leave a Reply to Eric GrangeCancel reply

Share this:

Like this:

Related

Published by Lars Fosdal

36 thoughts on “Wait… WHAT!?”

Leave a Reply to Eric GrangeCancel reply

Discover more from Lars Fosdal