XE8 Findings
Found some minor tweaks in XE8. Will add to this post as I find more.
Breaking changes in FireDAC.UI.Intf
TFDScriptOutputKind used to be TFDScriptOuputKind
I actually QPd this, so blame me 😛
Breaking changes in FireDAC.comp.client
TFDConnectionLoginEvent = procedure (AConnection: TFDCustomConnection; AParams: TFDConnectionDefParams) of object;
used to be
TFDConnectionLoginEvent = procedure (AConnection: TFDCustomConnection; const AConnectionDef: IFDStanConnectionDef);
TFDErrorEvent = procedure (ASender, AInitiator: TObject; var AException: Exception) of object;
used to be
TFDErrorEvent = procedure (ASender: TObject; const AInitiator: IFDStanObject; var AException: Exception) of object;
TFDConnectionRecoverEvent = procedure (ASender, AInitiator: TObject; AException: Exception; var AAction: TFDPhysConnectionRecoverAction) of object;
used to be
TFDConnectionRecoverEvent = procedure (ASender: TObject; const AInitiator: IFDStanObject; AException: Exception; var AAction: TFDPhysConnectionRecoverAction) of object;
This one as a bit annoying as I actually was accessing the IFDStanObject.Name property. I guess .classname will have to do.
H2135 is not new at all.
I wonder why it didn’t hint me in XE7?Doh… I blame sleep deprivation…
Removed it.
The changes in System.Generics.Collections are interesting too. Stefan Glienke can probably comment much better than me – it’s very similar to this blog post: http://delphisorcery.blogspot.com.au/2014/10/new-language-feature-in-xe7.html
Lots of switches based on IsManagedType(T), or SizeOf(T), etc, presumably with the other cases being optimised out at compiletime, leading to fast special-cased behaviour. Look at TList.Insert, for example.
Also the ToolsAPI has some hooks that seem to be related to GetIt.
David Millington The changes in System.Generics.Collections are their attempt to fight binary size. I leave it to you to find out if that was successful.
As for “fast special-cased behavior – execute this code in XE7 and in XE8:
program Project1;
{$APPTYPE CONSOLE}
uses
Diagnostics,
Generics.Collections,
SysUtils;
var
list: TList;
i: Integer;
sw: TStopwatch;
begin
list := TList.Create;
sw := TStopwatch.StartNew;
for i := 1 to 10000000 do
list.Add(i);
Writeln(sw.ElapsedMilliseconds);
Readln;
end.
XE7 32-bit Release: 120-130ms
XE7 64-bit Release: 130-180ms
XE8 32-bit Release: 150-160ms
XE8 64-bit Release: 160-180ms
Also XE8 shows (briefly) a recompile dialog every single time it’s run, and running from inside the IDE (although ‘running without debugging’) seriously impacts the execution speed – it can take twice as long, eg typical 32-bit Release times under XE8 are 300ms when run without debugging, which drops to 160 the moment you run truly without debugging from Explorer. Goodness knows what it’s doing. That doesn’t happen in XE7.
Stefan Glienke Interesting benchmark there. Even with pre-allocated memory using Capacity, the difference is clear.
It also seems that basic array access perf has degraded in XE8. Try this variant:
program Project1;
{$APPTYPE CONSOLE}
uses
Diagnostics,
Generics.Collections,
SysUtils;
const
N = 400000000;
var
list: TList;
i: Integer;
sw: TStopwatch;
arr: TArray;
begin
list := TList.Create;
list.Capacity := N;
sw := TStopwatch.StartNew;
for i := 1 to N do
list.Add(i);
Writeln(sw.ElapsedMilliseconds);
sw := TStopwatch.StartNew;
SetLength(arr, N);
for i := 0 to N-1 do
arr[i] := i;
Writeln(sw.ElapsedMilliseconds);
Readln;
end.
David Heffernan On release mode and on win32 its the same here (I had to reduce N to 100 mio though as it was giving me a out of memory exception)
It also creates the exact same code (O+, W-):
for i := 0 to N-1 do
xor ebx,ebx
arr[i] := i;
mov eax,[$004e03f0]
mov [eax+ebx*4],ebx
inc ebx
for i := 0 to N-1 do
cmp ebx,$05f5e100
jnz $004d67b5
One problem with the changes to TList is that now the extra TListHelper array is causing some additional cycles for some members (see TListHelper.FItems).
Also inlining is still not good enough – it does not inline some things that should be inlined to gain some performance. The main performance drop in this example is caused by not inlining TListHelper.InternalGrowCheck.
Stefan Glienke I was running under 64 bit
David Heffernan Same story – identical code generated.
for i := 0 to N-1 do
mov [rel $0002dbad],$00000000
arr[i] := i;
mov rax,[rel $0002dbc6]
movsxd rcx,[rel $0002db9f]
mov edx,[rel $0002db99]
mov [rax+rcx*4],edx
add dword ptr [rel $0002db8f],$01
for i := 0 to N-1 do
cmp [rel $0002db85],$05f5e100
jnz Project1 + $143
nop
Stefan Glienke Very odd. I can see same code too. But def slower in XE8 for me. Presumably just a duff test case. Perhaps something about memory layout makes the difference. I don’t know.
Anyway, do you know why TList.Add is slower? I think that this size reducing helper just has slower code. So instead of this in XE7
System.Generics.Collections.pas.917: FItems[Count] := Value;
000000000054039B 488B4308 mov rax,[rbx+$08]
000000000054039F 48634B10 movsxd rcx,[rbx+$10]
00000000005403A3 893488 mov [rax+rcx*4],esi
We have this in XE8
System.Generics.Collections.pas.2419: PCardinal(FItems^)[FCount] := Cardinal(Value);
0000000000453CCA 488B442440 mov rax,[rsp+$40]
0000000000453CCF 488B4030 mov rax,[rax+$30]
0000000000453CD3 488B4C2440 mov rcx,[rsp+$40]
0000000000453CD8 486309 movsxd rcx,[rcx]
0000000000453CDB 8B13 mov edx,[rbx]
0000000000453CDD 891488 mov [rax+rcx*4],edx
Pretty sucky if you ask me.
David Heffernan I commented on that 2 posts ago (might have missed it because I edited).
Stefan Glienke No, I didn’t see that text before. Must have crossed with your edit. I’m dubious that lack of inlining TListHelper.InternalGrowCheck is a big issue. I much more suspicious of the extra indirection that can be seen in the access of the items array. That rather awful
function TListHelper.GetFItems: PPointer;
begin
Result := PPointer(PByte(@Self)+ SizeOf(Self));
end;
I cannot help feeling that Emba have thrown the baby out with the bath water.
David Heffernan You can be dubious all day I profiled it and I know it’s the main cause. I have been working on some optimizations in Generics.Collections for some while and have been trying to remove some code from the generic type into some non generic type to reduce the overhead and came up with a slightly similar approach and also in my case the non inlining of the GrowCheck method caused a ~15% performance drop.
Of course the extra indirection is not making it any better. And the reduction in binary size is also insignificant (every TList class where T is a class still adds to the binary size although it has 100% the same code).
Stefan Glienke You think that the asm that I posted performs just as well as the XE7 version?
I’m still pretty surprised that inlining that method could make such a difference.
Stefan Glienke In XE7, I can see a 15% difference in the 32 bit compiler with GrowCheck inlined, compared with it not inlined. For the 64 bit compiler, inlining makes no difference in my test.
David Heffernan I did not say that but it’s not the main reason in this particular case. I think we can argue all day and in the end we are both of the same opinion, these changes are bad and missing their goal. In fact I argued enough about the generic bloat issue in the past so I will leave that to other people now.
I solved the problem in Spring4D 1.2 by kind of hardcasting a TObjectList to a IList when T is a class which only causes an overhead of a few hundred bytes for each list of T where T is a class type. Other generic types will follow when necessary. Also using the new DynamicArray can reduce the binary bloat when you just need some syntax sugar on an array (more than what was added in XE7).
don’t mind me, I wanna follow the thread for Stefan Glienke and David Heffernan comments (:
David Millington About: “Also XE8 shows (briefly) a recompile dialog every single time it’s run”. Do you have IDE Fix Pack installed in XE7? If so then the missing IDE Fix Pack for XE8 may be the reason for the constant recompile 😉
Andreas Hausladen I do indeed have IDE Fix Pack installed! It’s an essential 🙂 (And while you’re here, thankyou for making it.) I didn’t realise it affected the compile dialog when there were no changes to the project, though.
It fixes the IDE bug that is causing the recompile.
+Stefan The thing is that we are looking at different compilers. I focus on x64 ‘cos that’s where my code runs predominantly. You are looking at x86.
For x64 the grow check inlining seems not to matter.
It seems that Emba care more about executable size than perf. What they ought to do is solve the problem properly in the compiler/linker.
David Heffernan Agreed proper fix belongs to compiler/linker. But we’ll be waiting for the hell to freeze over first…