Regular crashes with Ubuntu 10.04 Server

Post Reply
fanf
Posts: 10
Joined: Tue Jul 19, 2011 10:41 am

Regular crashes with Ubuntu 10.04 Server

Post by fanf »

Hi,

We bought a fit-pc2i to be used as a small appliance for a product we are developping (to replace a "larger" nettop we were using).

The product seems great, small size, good capacity but we are having random stability issues (once a day or once every few days) .

The setup:
Ubuntu 10.04 server (stock install with a few additional packages)
No screen / No keyboard
2 USB devices (Phidget, Canon DSLR)
Reachable via SSH (over Internet)

The system is running python script via crontab at regular intervals, this script perfoms a few simple actions and send data to Internet over FTP.

So we know the system crashes when we stop receiving data.

The system is currently "crashed", I cannot login remotely.
If I try to login over SSH, I see the password prompt and then nothing happens.
I can also telnet port 80 (apache2 is configured), if I try to open the web page via a brower I can see the password prompt (there is an .htaccess) but pages do not display afterwards.

A reboot solves the problem until next "crash".

After previous crash I saw kernel error messages related to disk if I recall properly in the logs.
I currently don't have physical access to the box but I will update this post with more details (exact kernel version, syslogs extract) as soon as possible.

In the meantime do you have any ideas of possible causes ?

Until this problem is solved, is there a way to have access to an "emergency" terminal (reachable) that could stay within RAM and allowing very simple actions (such as reboot) remotely in case of a HDD crash.

Thanks for your assistance.

Regards,
F.

irads

Re: Regular crashes with Ubuntu 10.04 Server

Post by irads »

I cannot tell if the problem is related to SW (e.g. a driver) or HW (e.g. disk or RAM). I suggest to run the system for a while in laboratory conditions that will make eliminating the stability issues easier.

fanf
Posts: 10
Joined: Tue Jul 19, 2011 10:41 am

Re: Regular crashes with Ubuntu 10.04 Server

Post by fanf »

Hi,

I just got access back (after going on site to reboot the box), here is a message from the logs at time the computer crash:

Jul 20 06:46:14 wcpse002 kernel: [64200.312090] INFO: task jbd2/sda1-8:239 blocked for more than 120 seconds.
Jul 20 06:46:14 wcpse002 kernel: [64200.312539] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 06:46:14 wcpse002 kernel: [64200.313046] jbd2/sda1-8 D 0001310c 0 239 2 0x00000000
Jul 20 06:46:14 wcpse002 kernel: [64200.313075] f6e4fecc 00000046 00000000 0001310c 00000000 c088e5c0 f6e15be4 c088e5c0
Jul 20 06:46:14 wcpse002 kernel: [64200.313116] 526edeba 00003a47 c088e5c0 c088e5c0 f6e15be4 c088e5c0 c088e5c0 f6f17a40
Jul 20 06:46:14 wcpse002 kernel: [64200.313160] 526ebd8a 00003a47 f6e15940 f6e4ff58 f742d878 f6e4ff4c f6e4ff7c c02d0ef1
Jul 20 06:46:14 wcpse002 kernel: [64200.313200] Call Trace:
Jul 20 06:46:14 wcpse002 kernel: [64200.313234] [<c02d0ef1>] jbd2_journal_commit_transaction+0x171/0x1030
Jul 20 06:46:14 wcpse002 kernel: [64200.313256] [<c014dc59>] ? load_balance_newidle+0x99/0x340
Jul 20 06:46:14 wcpse002 kernel: [64200.313274] [<c0146ff3>] ? finish_task_switch+0x43/0xc0
Jul 20 06:46:14 wcpse002 kernel: [64200.313293] [<c0131008>] ? default_spin_lock_flags+0x8/0x10
Jul 20 06:46:14 wcpse002 kernel: [64200.313312] [<c016402c>] ? lock_timer_base+0x2c/0x60
Jul 20 06:46:14 wcpse002 kernel: [64200.313328] [<c0170940>] ? autoremove_wake_function+0x0/0x50
Jul 20 06:46:14 wcpse002 kernel: [64200.313347] [<c02d7a85>] kjournald2+0x95/0x1c0
Jul 20 06:46:14 wcpse002 kernel: [64200.313363] [<c0170940>] ? autoremove_wake_function+0x0/0x50
Jul 20 06:46:14 wcpse002 kernel: [64200.313378] [<c02d79f0>] ? kjournald2+0x0/0x1c0
Jul 20 06:46:14 wcpse002 kernel: [64200.313393] [<c01706b4>] kthread+0x74/0x80
Jul 20 06:46:14 wcpse002 kernel: [64200.313407] [<c0170640>] ? kthread+0x0/0x80
Jul 20 06:46:14 wcpse002 kernel: [64200.313424] [<c010a447>] kernel_thread_helper+0x7/0x10
Jul 20 06:46:14 wcpse002 kernel: [64200.313443] INFO: task rsyslogd:1526 blocked for more than 120 seconds.
Jul 20 06:46:14 wcpse002 kernel: [64200.313821] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 06:46:14 wcpse002 kernel: [64200.314291] rsyslogd D 0000bd5f 0 1526 1 0x00000000
Jul 20 06:46:14 wcpse002 kernel: [64200.314309] f7135cf4 00000082 f4eb6000 0000bd5f 00000000 c088e5c0 f5d38f64 c088e5c0
Jul 20 06:46:14 wcpse002 kernel: [64200.314337] 597f2e28 00003a47 c088e5c0 c088e5c0 f5d38f64 c088e5c0 c088e5c0 f63f6c40
Jul 20 06:46:14 wcpse002 kernel: [64200.314364] 00000001 00003a47 f5d38cc0 cb8c1e40 f742d800 cb8c1e40 f7135d54 c02cf683
Jul 20 06:46:14 wcpse002 kernel: [64200.314391] Call Trace:
Jul 20 06:46:14 wcpse002 kernel: [64200.314409] [<c02cf683>] start_this_handle+0x1f3/0x3f0
Jul 20 06:46:14 wcpse002 kernel: [64200.314426] [<c023a4bf>] ? generic_write_end+0x7f/0xb0
Jul 20 06:46:14 wcpse002 kernel: [64200.314443] [<c0170940>] ? autoremove_wake_function+0x0/0x50
Jul 20 06:46:14 wcpse002 kernel: [64200.314460] [<c02cf9f8>] jbd2_journal_start+0x98/0xd0
Jul 20 06:46:14 wcpse002 kernel: [64200.314478] [<c02af590>] ext4_journal_start_sb+0xc0/0xf0
Jul 20 06:46:14 wcpse002 kernel: [64200.314494] [<c022f565>] ? generic_getxattr+0x85/0x90
Jul 20 06:46:14 wcpse002 kernel: [64200.314510] [<c022f4e0>] ? generic_getxattr+0x0/0x90
Jul 20 06:46:14 wcpse002 kernel: [64200.314527] [<c0296924>] ext4_dirty_inode+0x24/0x50
Jul 20 06:46:14 wcpse002 kernel: [64200.314543] [<c02328a1>] __mark_inode_dirty+0x31/0x180
Jul 20 06:46:14 wcpse002 kernel: [64200.314560] [<c01d3cd4>] ? file_remove_suid+0x24/0x80
Jul 20 06:46:14 wcpse002 kernel: [64200.314577] [<c0228075>] file_update_time+0xb5/0x130
Jul 20 06:46:14 wcpse002 kernel: [64200.314593] [<c01d5aa8>] __generic_file_aio_write+0x1b8/0x510
Jul 20 06:46:14 wcpse002 kernel: [64200.314612] [<c035bf18>] ? __rb_erase_color+0x78/0x170
Jul 20 06:46:14 wcpse002 kernel: [64200.314628] [<c035c0c4>] ? rb_erase+0xb4/0x120
Jul 20 06:46:14 wcpse002 kernel: [64200.314646] [<c0174038>] ? lock_hrtimer_base+0x28/0x50
Jul 20 06:46:14 wcpse002 kernel: [64200.314662] [<c01d5e57>] generic_file_aio_write+0x57/0xc0
Jul 20 06:46:14 wcpse002 kernel: [64200.314679] [<c0290951>] ext4_file_write+0x41/0xd0
Jul 20 06:46:14 wcpse002 kernel: [64200.314697] [<c0213af4>] do_sync_write+0xc4/0x100
Jul 20 06:46:14 wcpse002 kernel: [64200.314714] [<c0170940>] ? autoremove_wake_function+0x0/0x50
Jul 20 06:46:14 wcpse002 kernel: [64200.314733] [<c0301244>] ? security_file_permission+0x14/0x20
Jul 20 06:46:14 wcpse002 kernel: [64200.314750] [<c0213c94>] ? rw_verify_area+0x64/0xe0
Jul 20 06:46:14 wcpse002 kernel: [64200.314768] [<c05b0258>] ? schedule+0x668/0x870
Jul 20 06:46:14 wcpse002 kernel: [64200.314784] [<c0213db2>] vfs_write+0xa2/0x1a0
Jul 20 06:46:14 wcpse002 kernel: [64200.314799] [<c0213a30>] ? do_sync_write+0x0/0x100
Jul 20 06:46:14 wcpse002 kernel: [64200.314815] [<c02146a2>] sys_write+0x42/0x70
Jul 20 06:46:14 wcpse002 kernel: [64200.314830] [<c01096c3>] sysenter_do_call+0x12/0x28

fanf
Posts: 10
Joined: Tue Jul 19, 2011 10:41 am

Re: Regular crashes with Ubuntu 10.04 Server

Post by fanf »

Regarding frequency, just a small grep on a part of the error message gives the following:

kern.log:Jul 17 07:15:12 wcpse002 kernel: [157920.300093] INFO: task jbd2/sda1-8:237 blocked for more than 120 seconds.
kern.log:Jul 17 07:15:12 wcpse002 kernel: [157920.301455] INFO: task rsyslogd:633 blocked for more than 120 seconds.
kern.log:Jul 17 07:15:12 wcpse002 kernel: [157920.302858] INFO: task mandb:32606 blocked for more than 120 seconds.
kern.log:Jul 17 07:15:12 wcpse002 kernel: [157920.304331] INFO: task updatedb.mlocat:32712 blocked for more than 120 seconds.
kern.log:Jul 17 07:15:13 wcpse002 kernel: [157920.306067] INFO: task apache2:610 blocked for more than 120 seconds.
kern.log:Jul 17 07:15:13 wcpse002 kernel: [157920.307438] INFO: task python:669 blocked for more than 120 seconds.
kern.log:Jul 17 07:15:13 wcpse002 kernel: [157920.308880] INFO: task python:752 blocked for more than 120 seconds.
kern.log:Jul 17 07:15:13 wcpse002 kernel: [157920.310309] INFO: task date:757 blocked for more than 120 seconds.
kern.log:Jul 17 07:17:12 wcpse002 kernel: [158040.308087] INFO: task jbd2/sda1-8:237 blocked for more than 120 seconds.
kern.log:Jul 17 07:17:12 wcpse002 kernel: [158040.361452] INFO: task flush-8:0:266 blocked for more than 120 seconds.
kern.log:Jul 20 06:46:14 wcpse002 kernel: [64200.312090] INFO: task jbd2/sda1-8:239 blocked for more than 120 seconds.
kern.log:Jul 20 06:46:14 wcpse002 kernel: [64200.313443] INFO: task rsyslogd:1526 blocked for more than 120 seconds.
kern.log:Jul 20 06:46:14 wcpse002 kernel: [64200.314894] INFO: task mandb:1411 blocked for more than 120 seconds.
kern.log:Jul 20 06:46:14 wcpse002 kernel: [64200.316382] INFO: task gphoto2:1553 blocked for more than 120 seconds.
kern.log:Jul 20 06:46:14 wcpse002 kernel: [64200.317629] INFO: task convert:1557 blocked for more than 120 seconds.
kern.log:Jul 20 06:56:14 wcpse002 kernel: [64800.316092] INFO: task jbd2/sda1-8:239 blocked for more than 120 seconds.
kern.log:Jul 20 06:56:14 wcpse002 kernel: [64800.317453] INFO: task rsyslogd:1526 blocked for more than 120 seconds.
kern.log:Jul 20 06:56:14 wcpse002 kernel: [64800.319047] INFO: task mandb:1411 blocked for more than 120 seconds.
kern.log:Jul 20 06:56:14 wcpse002 kernel: [64800.320361] INFO: task python:1639 blocked for more than 120 seconds.
kern.log:Jul 20 06:56:14 wcpse002 kernel: [64800.355144] INFO: task convert:1793 blocked for more than 120 seconds.
kern.log.1:Jul 13 06:39:06 wcpse002 kernel: [69600.304092] INFO: task jbd2/sda1-8:239 blocked for more than 120 seconds.
kern.log.1:Jul 13 06:39:06 wcpse002 kernel: [69600.305454] INFO: task rsyslogd:1114 blocked for more than 120 seconds.
kern.log.1:Jul 13 06:39:06 wcpse002 kernel: [69600.306880] INFO: task apache2:21965 blocked for more than 120 seconds.
kern.log.1:Jul 13 06:39:06 wcpse002 kernel: [69600.308111] INFO: task apache2:22355 blocked for more than 120 seconds.
kern.log.1:Jul 13 06:39:06 wcpse002 kernel: [69600.309407] INFO: task apache2:22706 blocked for more than 120 seconds.
kern.log.1:Jul 13 06:39:06 wcpse002 kernel: [69600.310596] INFO: task apache2:24511 blocked for more than 120 seconds.
kern.log.1:Jul 13 06:39:07 wcpse002 kernel: [69600.311778] INFO: task apache2:24512 blocked for more than 120 seconds.
kern.log.1:Jul 13 06:39:07 wcpse002 kernel: [69600.313196] INFO: task python:1279 blocked for more than 120 seconds.
kern.log.1:Jul 13 06:39:07 wcpse002 kernel: [69600.314444] INFO: task mandb:1344 blocked for more than 120 seconds.
kern.log.1:Jul 13 06:39:07 wcpse002 kernel: [69600.349806] INFO: task apache2:1345 blocked for more than 120 seconds.
syslog:Jul 20 06:46:14 wcpse002 kernel: [64200.312090] INFO: task jbd2/sda1-8:239 blocked for more than 120 seconds.
syslog:Jul 20 06:46:14 wcpse002 kernel: [64200.313443] INFO: task rsyslogd:1526 blocked for more than 120 seconds.
syslog:Jul 20 06:46:14 wcpse002 kernel: [64200.314894] INFO: task mandb:1411 blocked for more than 120 seconds.
syslog:Jul 20 06:46:14 wcpse002 kernel: [64200.316382] INFO: task gphoto2:1553 blocked for more than 120 seconds.
syslog:Jul 20 06:46:14 wcpse002 kernel: [64200.317629] INFO: task convert:1557 blocked for more than 120 seconds.
syslog:Jul 20 06:56:14 wcpse002 kernel: [64800.316092] INFO: task jbd2/sda1-8:239 blocked for more than 120 seconds.
syslog:Jul 20 06:56:14 wcpse002 kernel: [64800.317453] INFO: task rsyslogd:1526 blocked for more than 120 seconds.
syslog:Jul 20 06:56:14 wcpse002 kernel: [64800.319047] INFO: task mandb:1411 blocked for more than 120 seconds.
syslog:Jul 20 06:56:14 wcpse002 kernel: [64800.320361] INFO: task python:1639 blocked for more than 120 seconds.
syslog:Jul 20 06:56:14 wcpse002 kernel: [64800.355144] INFO: task convert:1793 blocked for more than 120 seconds.
syslog.4:Jul 13 06:39:06 wcpse002 kernel: [69600.304092] INFO: task jbd2/sda1-8:239 blocked for more than 120 seconds.
syslog.4:Jul 13 06:39:06 wcpse002 kernel: [69600.305454] INFO: task rsyslogd:1114 blocked for more than 120 seconds.
syslog.4:Jul 13 06:39:06 wcpse002 kernel: [69600.306880] INFO: task apache2:21965 blocked for more than 120 seconds.
syslog.4:Jul 13 06:39:06 wcpse002 kernel: [69600.308111] INFO: task apache2:22355 blocked for more than 120 seconds.
syslog.4:Jul 13 06:39:06 wcpse002 kernel: [69600.309407] INFO: task apache2:22706 blocked for more than 120 seconds.
syslog.4:Jul 13 06:39:06 wcpse002 kernel: [69600.310596] INFO: task apache2:24511 blocked for more than 120 seconds.
syslog.4:Jul 13 06:39:07 wcpse002 kernel: [69600.311778] INFO: task apache2:24512 blocked for more than 120 seconds.
syslog.4:Jul 13 06:39:07 wcpse002 kernel: [69600.313196] INFO: task python:1279 blocked for more than 120 seconds.
syslog.4:Jul 13 06:39:07 wcpse002 kernel: [69600.314444] INFO: task mandb:1344 blocked for more than 120 seconds.
syslog.4:Jul 13 06:39:07 wcpse002 kernel: [69600.349806] INFO: task apache2:1345 blocked for more than 120 seconds.
syslog.7:Jul 9 07:10:29 wcpse002 kernel: [124560.300086] INFO: task jbd2/sda1-8:238 blocked for more than 120 seconds.
syslog.7:Jul 9 07:10:29 wcpse002 kernel: [124560.301448] INFO: task rsyslogd:629 blocked for more than 120 seconds.
syslog.7:Jul 9 07:10:29 wcpse002 kernel: [124560.302843] INFO: task mandb:2039 blocked for more than 120 seconds.
syslog.7:Jul 9 07:10:29 wcpse002 kernel: [124560.304318] INFO: task sh:2110 blocked for more than 120 seconds.
syslog.7:Jul 9 07:10:29 wcpse002 kernel: [124560.305723] INFO: task sh:2114 blocked for more than 120 seconds.
syslog.7:Jul 9 07:10:29 wcpse002 kernel: [124560.306994] INFO: task sh:2115 blocked for more than 120 seconds.
syslog.7:Jul 9 07:10:29 wcpse002 kernel: [124560.308433] INFO: task sh:2116 blocked for more than 120 seconds.
syslog.7:Jul 9 07:12:29 wcpse002 kernel: [124680.308078] INFO: task jbd2/sda1-8:238 blocked for more than 120 seconds.
syslog.7:Jul 9 07:12:29 wcpse002 kernel: [124680.309423] INFO: task rsyslogd:629 blocked for more than 120 seconds.
syslog.7:Jul 9 07:12:29 wcpse002 kernel: [124680.344770] INFO: task mandb:2039 blocked for more than 120 seconds.

fanf
Posts: 10
Joined: Tue Jul 19, 2011 10:41 am

Re: Regular crashes with Ubuntu 10.04 Server

Post by fanf »

I just found those kind of errors:

kern.log.1:Jul 10 18:59:18 wcpse002 kernel: [253488.942974] Free swap = 0kB
kern.log.1:Jul 10 18:59:18 wcpse002 kernel: [253488.942979] Total swap = 2975736kB

I need to find which process is causing this kind of issue.

fanf
Posts: 10
Joined: Tue Jul 19, 2011 10:41 am

Re: Regular crashes with Ubuntu 10.04 Server

Post by fanf »

Investigation are still in progress but I'm almost sure it is not related to the fit-pc2i but most likely a program running.

I will update the post one I found the cause, it might be useful to other people.

fanf
Posts: 10
Joined: Tue Jul 19, 2011 10:41 am

Re: Regular crashes with Ubuntu 10.04 Server

Post by fanf »

Investigations still in progress.

This is caused by a bug in a python script, so not related in any ways to fit-pc.

Post Reply

Return to “Ubuntu 10.04”