
Threads and UI
I have a feeling that sometimes, something, somewhere is trying to UI things from a non-UI thread. Naturally, this is something we strive to avoid – but with all the injection of handlers – and literally hundreds of places where someone could think “surely, it would be clever to update that status label now” – it is a bit of a grunt task to review all the code.
So, before embarking on such a review quest, my question is:
Which symptoms could arise from touching UI from a non UI thread?
AV? Deadlocks? Temporary freezes? Donald Trump?
I am seeing temporary freezes (20+ seconds) in the UI. The background threads are all looking active, doing and logging their stuff.
My first thought was a lock contention – but I log all locks that fail – and the log shows nothing of the sort, so I am in a bit of a pickle here.
/sub
not sure if related, refreshing a trackbar causes it to initiate an animation to repaint the trackbar (vcl), and another refresh while animating gets stuck. With Fmx I haven’t seen any issue about ui non-ui
/sub
in your threads, try the following:
procedure TMyShinyThread.Execute;
begin
while NeedToStayAlive do begin
if NeedToDoSomething then begin
Log(‘My name is Jeff, gonna do some work’);
TwerkEmBits;
Log(‘My name is Jeff, I’m done!’);
end;
end;
end;
see which thread typically does work for 20+ seconds, and then, review from there
I had a similar issue in the past, where a background thread was doing a lot of queries to bring in more work, for some reason, the thread managed to play ball with main thread, which made the app freeze.
alternatively, check for missing Synchronize() calls OR, if you’re displaying data in a db aware control from a large db with many connections, then, most likely that’s your problem, I usually use background thread for fetching data, and only display it when it’s fully available on client side
Dorin Duminica Good suggestion, but all the threads do pre-post twerk announcement, and they all appear to do their number in style. It is only the UI that is stuck in a corner. I am going to try a Synchronize in the BG thread event handlers next to see if it has any effect.
Synchronize works by sending a window message to the main thread, so I was thinking that you should be able to see that message if you hook the wndproc. I’d try TApplicationEvents first.
if your UI is non responsive for 20+ seconds, use something like madexcept and it’s madtraceprocess tool and when the non responsive time comes up, you have about 10-15+ seconds to start the madtraceprocess which will dump the “current” actions of all threads. And see what the main thread does.
Then we have something more concrete to work on 😉
Woops sent too early. My point was then you could log and see how it correlates with those you expect.
Dorin Duminica That is our MO too. BG thread picks up notification of change, do the read(s), then posts a “u got changes ready to pick up” message to the UI.
This is pretty easy to debug. Run under the debugger. When you get a freeze, pause the app in the debugger. This suspends all threads. Select the main UI thread from the thread’s list (you did name your threads didn’t you?) and then look at the call stack.
Mircea Pfa Naturally, the symptom only happens for the user, some 5-15 times a day, so debugging is out of the question.
Lars Fosdal depending on the user, I may or may not send them the madtraceprocess tool to get me that “bugreport”. Some users can deal with it, some not. Where not, I usually teamviewer in (or other remote app) and do it myself, as long as the issue is reproducible. If not, I “train” the user how to use the tool (it’s really a simple thing, but some can’t even manage that without some demo)
Mircea Pfa Ah, never mind – now I see what you mean. We use EurekaLog – so I’ll have a look if I can set that up to do something similar. Good suggestion!
Asbjørn Heid – We hook the message loop, and fan out to handlers in response to notifications of changes. But – we don’t see notifications during the freeze. – that is – I can see them being posted – but they are not picked up. There are a lot of these handlers… which is why I am looking for clues to what typically happens on UI thread access from a BG thread.
We had something similar with one of our customers, there the file server would sometimes take 30+ seconds to open a file in a shared directory. Fairly randomly. We loaded pdfs and rtfs for display. This would then hang the app for 30+ seconds.
David Heffernan Threads are named. Activity is logged. The problem is that I can’t reproduce the problem being logged in against the same server/db as the client. Only an active user sees the problem some 5-15 times a day. If I can debug it, I can solve it – but this one is a harder nut to crack, since I can’t do the decisions that the active user is doing (based on his plans and priorities) – and if I was to slow down the work of few dozens of people – that could become costly overtime pay.
Mircea Pfa Eurekalog indeed has a freeze detector. http://eurekalog.blogspot.no/2009/03/eurekalogs-anti-freeze-feature.html Now, I just need to grab call stacks from all the threads…
Edit: https://www.eurekalog.com/help/eurekalog/index.php?topic_unit_efreeze.php
Asbjørn Heid Let me guess – Antivirus software is scanning the files before it lets you access them?
Separate process sending WM_NULL messages with timeout. When they timeout use EurekaLog’s trace process functionality to dump all the stack traces.
Lars Fosdal
Did you check the code in the syncronize functions? Some days ago, I created a watch dog thread, to check if the db connection is still alive. I did the test in a Syncronized function. But trying to reconnect, if the network is instable, did block the UI thread, if the user was opening a table (using the same db connection).
David Heffernan https://www.eurekalog.com/help/eurekalog/index.php?topic_unit_efreeze.php
Markus Ja Initially, there were no use of Synchronize. Trying a version now that has it.
I’m a madExcept user myself, I’m sure it comes to the same thing. Always best to debug problems with information as opposed to guesswork!
David Berneda If you are suggesting that UI work can be done away from the main thread in FMX, you are wrong
David Heffernan I absolutely agree – although getting at the right situational information can be a challenge at times.
Lars Fosdal That is our suspicion, but we haven’t found the root cause. We ended up working around the issue (caching webservice… says something).
Asbjørn Heid I’ve seen the same when programmatically trying to check a file version over a UNC path. It takes 30+ seconds when MS System Center Endpoint Protection is installed.
Update: Not a single client UI freeze for the last three hours, after adding Synchronize around the background processing. I won’t draw a final conclusion until end of day – but I have high hopes.
Now – do I just leave that Synchronize there – or go on a witch hunt for whoever added UI stuff in an background event handler? I think I’ll just leave it there. I guess since it happened once, it could happen again – forgetful as we humans are.
Synchronize makes the background thread pointless. Remove the background thread, and call the function directly from the main thread. Or, work out what is wrong with the background thread and fix it in a way that doesn’t push the whole task back on to the main thread.
Working on the second option, David Heffernan.
See you after Easter 😛
Let us know what you found Lars Fosdal
/sub
/stub
/sub
The GREAT thing about this community, is the endless opportunity to make a fool of yourself. This is a bit embarrassing, as it turned out my issues were not UI related after all.
I have a five year old class which is the fundament of our locking mechanism. It has various versions of locking – such as Lock which fails with an exception if the lock attempt fails, a TryToLock which will give up and return false if it fails.
Common to them both, is that they capture the stack of the initial lock, and if a second lock fails – it will dump initial lock class, time, thread id, and call stack to OutputDebugString, and it ends up in our logs.
Well, that was the theory, anyways.
I have a CustomDebugOut unit that I use to “replace” , i.e. outscope OutputDebugString. All I need to do, is to include CustomDebugOut in the uses list, and it all goes to the log.
Guess what – that five year old locking class unit, didn’t include the CustomDebugOut unit, so the lock failures never ended up in the log!
So – once that is fixed – I suddenly see distinct proof of a deadlock in the log. I should have seen it sooner – because the different thread IDs on a couple of consecutive log lines, were a dead giveaway.
On the bright side, there was no background thread code that diddled with the UI 🙂
There you are. When looking for a needle in a haystack, you start clutching at straws and end up with a pin prick 😛
So, basically, I guess I am using you guys as my debugging by teddybear substitute.
I hope you are well entertained ;)
Lars Fosdal Well at least you found it 🙂
You say the application froze for 20+ seconds at a time. This implies it unfroze at some point. That again implies the deadlock got resolved.
What I don’t quite understand is this. How did the deadlock get resolved?
That is, you say you have a Lock and a TryLock, the former raising an exception if it fails. This to me means you wait for some amount time before giving up on the lock, and if so raise an exception. The TryLock should be implemented by not waiting at all.
So how did the deadlock resolve itself without locking exceptions leaving a trail?
Asbjørn Heid – The contested queue used TryLoLock which eventually timed out and the thread relinquished the queue – skipping it’s work, allowing the other thread to continue.
This basically meant an update went unnoticed. No exceptions happened, and the error was not logged due to aforementioned reasons.
I’ve changed the queue access to use a pessimistic lock with retries, i.e. it doesn’t hold the lock – just retries after a short sleep for as long as I wish (sleep length and total try count are parameters) – so that if there is contention, it won’t deadlock.
If it never gets access, it now properly logs :P
It nags me that I haven’t spotted the missing logging before now. To my excuse, it works in the debugger, since OutputDebugString then shows up in the execution log in the IDE.
Lars Fosdal In my view TryToLock sounds like a disaster waiting to happen, such as the one you just experienced. Either you are fine waiting for a lock, or you are not.
I would remove it and replace it with TryLock which does not wait at all. Then I would evaluate each call to TryToLock to see if it should call Lock or TryLock. Maybe some code needs to be rewritten.
Also, from your description it sounds to me like you have too many locks. In my experience it’s usually a sign that the design is not good if you need to aquire several locks at a time.
We do have too many locks. That is for sure.
Lars Fosdal Multithreading: easy, efficient, robust, pick any two…
We are aiming for robust, but … well, yeah – it’s a learning process.
As an aside…
Coming from Delphi with a fair share of multithread experience moving to cross-platform C++, one of the issues I faced was the lack of recursive locks. At first I cursed and thought “what rubbish is this”.
However I quickly realized that the lack of recursive locks was in fact a good thing, because it forced me to actually think carefully about when and where I locked.
I’ve since become convinced that if I feel the need to use a recursive lock, my design is wrong.
You know what they say… if it doesn’t want to work, you’re not using enough “FreeAndNil” calls 😉
^ Not helpful, I know… hopefully it made you laugh a little at least?