[My instance of] Automagically seems to enter some hanging state every now and then. It seems to be rather random and the effect is a unresponsive system.
When this happens, nothing can be seen on the signal trace and the automation stops. Starting and stopping the service brings back everything to normal.
When looking at the plugin system there seems to be a built-in weakness that we should be able to overcome. The typical design pattern for plugins is to have a execution thread that is fed by the signalhandler. The way to feed the execution thread is to use a Queue.
Item #1 - Improvements to design pattern
If the execution thread for some reason bails out, eventually the Queue will be full and since all plugins uses a blocking Queue.put(), the signal handler will block. Typically when feeding things like sensors and such into the plugins.
This block goes all the way to the SignalProcessingThread() in signals\models.py, traversing through the processSignal() site\plugins\plugins.py.
So, now to the discussion. Would it make sense to use a non blocking Queue.put() in our plugins?
Example from plugins\tellstick.py:
Code: Select all
def signalHandler(signal):
if signal.content == 'terminate':
workQueue.put(None)
elif signal.content.startswith('tellstick,do:'):
for i in signal.content[13:].split(';'):
if i != '':
workQueue.put(i)
Code: Select all
# Update the plugins signalhandler to raise exceptions
try:
workQueue.put(i, False)
execept Queue.Full:
raise PluginQueueFull("The plugins queue is full")
##
## Update signals\models.py to catch it.
##
try:
plugins.processSignal(s)
except PluginQueueFull:
# Kill the plugins thread and re-start it?
I notice that some of the plugins, such as the tellduslive plugin, executes quite a lot of code in the signalHandler. I wonder if that is working well in this design pattern; e.g. when waiting on a socket, the signalling system is "hanging".
I think at least the tellduslive plugin needs some refactoring to reduce the risk of a hanging system.
I think the above mentioned items only becomes "critical" when Automagically starts interacting with the outside world.
A quick look tells me:
OK: 1-wire, email, httpreq, tellstick, datafetcher, datafetcher, timedeventes
Questionable: hdmi-cec, tellduslive
What are your thoughts on the above?
/Marcus