DIO Module unstable behaviour

Topics about the Hardware of Revolution Pi
User avatar
Mathias
Posts: 130
Joined: 29 Nov 2016, 10:46
Answers: 0

Re: DIO Module unstable behaviour

Post by Mathias »

Hi Enrico,

at a certain time the communication between the RevPi and the DIO stops to work. The piControl Driver counts the communication errors for each module. That is the increasing number you see in in kern.log.
It would be interesting to see what happens before. Are there messages that erros occur also?

Which version of the piControl do you use? The compile date is shown in kern.log at boot time.

I think you should make an update of your system. At least of the two packets raspberrypi-kernel and raspberrypi-firmware. Then you should update the firmware of the DIO with 'piTest -f'. The current version is 1.4, you use 1.3.

Please tell me, if that helps.

Mathias
mezz
Posts: 40
Joined: 05 Mar 2017, 23:56
Answers: 0

Re: DIO Module unstable behaviour

Post by mezz »

Hi Mathias,

thanks for the prompt reply, attached the journald kernel logs after a fresh reboot, in case you can spot anything suspicious.
I could not notice piControl errors before the first "too many communication errors", I can try again next time, but since the logs are rotated it may have disappeared.
In the last attachments I added the full log sequence from /var/log/kern* starting from a fresh piControl reboot and I could not see particular errors.
For instance:

Code: Select all

tail kern.log.3_last.1000.lines_20180923.log 	
head kern.log.2_first.1000.lines_20180923.log 
But it may have gone missing since I have not investigated how the logs are rolled over.

Anyway the piControl version is 244.

Code: Select all

Sep 24 23:00:20 RevPi6733 kernel: piControl: loading out-of-tree module taints kernel.
Sep 24 23:00:20 RevPi6733 kernel: piControl: built: Di 17. Jul 14:44:40 CEST 2018
Sep 24 23:00:20 RevPi6733 kernel: piControl: RevPi Core
Sep 24 23:00:20 RevPi6733 kernel: piControl: MAJOR-No.  : 244
Sep 24 23:00:20 RevPi6733 kernel: piControl: MAJOR-No.  : 244  MINOR-No.  : 0
Sep 24 23:00:20 RevPi6733 kernel: piControl: vfs_read returned 0: b92fcb00, 40477
Sep 24 23:00:20 RevPi6733 kernel: piControl: 5 devices found
Sep 24 23:00:20 RevPi6733 kernel: piControl: 612 entries in total
Sep 24 23:00:20 RevPi6733 kernel: piControl: cl-comp:  0 addr  6  bit ff  len   8
Sep 24 23:00:20 RevPi6733 kernel: piControl: cl-comp:  1 addr 81  bit ff  len   8
Sep 24 23:00:20 RevPi6733 kernel: piControl: cl-comp:  2 addr 82  bit ff  len   8
Sep 24 23:00:20 RevPi6733 kernel: piControl: cl-comp:  3 addr 194  bit ff  len   8
Sep 24 23:00:20 RevPi6733 kernel: piControl: cl-comp:  4 addr 195  bit ff  len   8
Sep 24 23:00:20 RevPi6733 kernel: piControl: filp_open -1190112000
Sep 24 23:00:20 RevPi6733 kernel: piControl: ksz8851HardwareReset
Sep 24 23:00:20 RevPi6733 kernel: piControl: ksz8851HardwareReset
Sep 24 23:00:20 RevPi6733 kernel: piControl: number of CPUs: 4
Sep 24 23:00:20 RevPi6733 kernel: piControl: mGate thread started
Sep 24 23:00:20 RevPi6733 kernel: piControl: piIO thread started
Sep 24 23:00:20 RevPi6733 kernel: piControl: set priority of spi0 to 54
Sep 24 23:00:20 RevPi6733 kernel: piControl: PADS 0 = 0x1b   slew=1  hyst=1  drive=3
Sep 24 23:00:20 RevPi6733 kernel: piControl: PADS 1 = 0x1b   slew=1  hyst=1  drive=3
Sep 24 23:00:20 RevPi6733 kernel: piControl: PADS 2 = 0x1b   slew=1  hyst=1  drive=3
Sep 24 23:00:20 RevPi6733 kernel: piControl: piControlInit done

I have noticed that every now and then at start-up the system behaves as if the pibridge was not plugged in, since everything flashes red, this happens also when the system IOs are completely disconnected, it seems somehow a bit touchy, trying a second time (or adjusting the pibridge again) the system boots up.
So at times the initial handshaking seems to fail too, even if I made sure that the bridge was firmly pushed into the slots.

An apt list --upgradable after an apt update does not show the raspberrypi-kernel/-firmware modules, the system was updated recently to stretch, a month ago or so.

For the module firmware upgrade I need to be physically there and this will happen next week.

Although I have set up a RevPi Core 1 with 4 DIO modules with older firmware and the system has been running flawlessly for months now -truth being told I had to have a RevPi Core 3 replaced instead because the ssh connection was dropping without reason and the system was crashing for some instability-.

Well let me know if you spot any errors at boot time in case please, I will try to check daily and see if I can find any other errors in the meantime.
Next week the firmware upgrade.
Also If you think that a downgrade to jessie may be a solution as previously experienced by someone else do let me know.

Thanks for your help, regards

Enrico
Attachments
journald_kernel_logs_20180924.tar.gz
(6.76 KiB) Downloaded 525 times
mezz
Posts: 40
Joined: 05 Mar 2017, 23:56
Answers: 0

Re: DIO Module unstable behaviour

Post by mezz »

Hi Mathias,

this morning it happened again for about 40 minutes, I managed to see some more lines before the piControl logs started, below a section and attached the log files.

Lines like this:
Sep 26 05:43:11 RevPi6733 kernel: [15721.542117] NOHZ: local_softirq_pending 80

Code: Select all


Sep 24 23:00:32 RevPi6733 kernel: [   15.415769] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1
Sep 26 05:43:11 RevPi6733 kernel: [15721.542117] NOHZ: local_softirq_pending 80
Sep 26 05:43:11 RevPi6733 kernel: [28106.892644] NOHZ: local_softirq_pending 80
Sep 26 05:43:11 RevPi6733 kernel: [41096.406798] NOHZ: local_softirq_pending 80
Sep 26 05:43:11 RevPi6733 kernel: [41700.570155] NOHZ: local_softirq_pending 80
Sep 26 05:43:11 RevPi6733 kernel: [57408.820494] NOHZ: local_softirq_pending 80
Sep 26 05:43:11 RevPi6733 kernel: [73117.070592] NOHZ: local_softirq_pending 80
Sep 26 05:43:11 RevPi6733 kernel: [79158.706926] NOHZ: local_softirq_pending 80
Sep 26 05:43:11 RevPi6733 kernel: [85804.504318] NOHZ: local_softirq_pending 80
Sep 26 05:43:11 RevPi6733 kernel: [110576.268561] piControl: too many communication errors -> set inputs to default 0 11 0 0   
0 0 0 0
Sep 26 05:43:11 RevPi6733 kernel: [110576.288580] piControl: too many communication errors -> set inputs to default 0 12 0 0   
0 0 0 0
Sep 26 05:43:11 RevPi6733 kernel: [110576.308643] piControl: too many communication errors -> set inputs to default 0 13 0 0   
0 0 0 0

Does this shed some more light on the issue?

Thanks

Enrico
Attachments
kern.log.1_first.1000.lines_20180926.log.tar.gz
(5.43 KiB) Downloaded 524 times
kern.log.2_first.1000.lines_20180926.log.tar.gz
(12.97 KiB) Downloaded 534 times
mezz
Posts: 40
Joined: 05 Mar 2017, 23:56
Answers: 0

Re: DIO Module unstable behaviour

Post by mezz »

Hi,
it happened again tonight, same sequence:

Code: Select all

........
Sep 27 14:07:07 RevPi6733 kernel: [   14.856405] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
Sep 27 14:07:09 RevPi6733 kernel: [   16.475928] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1
Sep 28 23:40:11 RevPi6733 kernel: [26944.648122] NOHZ: local_softirq_pending 80
Sep 28 23:40:11 RevPi6733 kernel: [29059.219880] NOHZ: local_softirq_pending 80
Sep 28 23:40:11 RevPi6733 kernel: [34798.775392] NOHZ: local_softirq_pending 80
Sep 28 23:40:11 RevPi6733 kernel: [39330.000907] NOHZ: local_softirq_pending 80
Sep 28 23:40:11 RevPi6733 kernel: [45371.633423] NOHZ: local_softirq_pending 80
Sep 28 23:40:11 RevPi6733 kernel: [63798.620850] NOHZ: local_softirq_pending 80
Sep 28 23:40:11 RevPi6733 kernel: [74975.638637] NOHZ: local_softirq_pending 80
Sep 28 23:40:11 RevPi6733 kernel: [78902.707030] NOHZ: local_softirq_pending 80
Sep 28 23:40:11 RevPi6733 kernel: [90381.806193] NOHZ: local_softirq_pending 80
Sep 28 23:40:11 RevPi6733 kernel: [93704.707449] NOHZ: local_softirq_pending 80
Sep 28 23:40:11 RevPi6733 kernel: [120799.562166] piControl: too many communication errors -> set inputs to default 0 0 11 0   0 0 0 0
Sep 28 23:40:11 RevPi6733 kernel: [120799.582178] piControl: too many communication errors -> set inputs to default 0 0 12 0   0 0 0 0
Sep 28 23:40:11 RevPi6733 kernel: [120799.602220] piControl: too many communication errors -> set inputs to default 0 0 13 0   0 0 0 0
............
mezz
Posts: 40
Joined: 05 Mar 2017, 23:56
Answers: 0

Re: DIO Module unstable behaviour

Post by mezz »

Hi Mathias,

I updated the DIO module firmware to version 1.4 and the system has been running with no kernel errors for a week now. So thanks that fixed the instability for now, let's see what happens in the next month.

I guess this is why a downgrade to Jessie solved the problem for the other use (Lode), because 1.3 is stable on Jessie but not on Stretch. Also as far as I have seen Jessie can not even see the DIO firmware 1.4, it stops at 1.3.
An update from 1.3 -> 1.4 on Jessie is not possible apparently.

A "piTest -d" on Stretch does say that the firmware should be updated but I have not seen it anywhere in the documentation and if the RevPi Core 3 is now delivered with Stretch but if one has DIO firmware <1.4 the system becomes unstable.
According to my tests the wiring has nothing to do with the instability, unless time proves me wrong, actually this statement is misleading and made me spend quite some time checking my wiring.

Unless I have overseen the doc this is not mentioned in the help, I think once this error has been verified by Kunbus the required DIO firmware update should be clearly stated somewhere otherwise new Stretch-based cores may go wild for no obvious reason on many live DIO modules.

Enrico
Post Reply