Frequent Lasso crashes (Linux)

classic Classic list List threaded Threaded
10 messages Options
i
Reply | Threaded
Open this post in threaded view
|

Frequent Lasso crashes (Linux)

i
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Frequent Lasso crashes (Linux)

Alan Linnenbank
In our case it had to do with the thread killing process. We set the siteadmin to "NOT" kill threads at all. This fixed at least the most crashes.

Alan

On dinsdag, 30 oktober 2007 16:06, noah williamsson <[hidden email]> wrote:
Hi!

I'm experiencing frequent crashes with Lasso 8.5.4 on RHEL 4.
I'm actually seeing crashes with other versions aswell on Mac OS X,
but that's a different story and I'll focus on Linux for now.


These crashes cause "Internal Server Error" to be displayed all
over all our Lasso based services. I could probably try to minimize
the impact by starting up one Lasso Site for each of our services
but that's not fixing the problem.


I know I'm not alone here in Sweden with seeing crashes like this.
I mention this because it probably means we're all dealing with
non-ASCII characters.
UTF-8/Unicode related bugs tend to go unnoticed for a long time in
QA tests of software developed in ASCII countries so to speak. ;)


I'm having a hard time to debug this and could really use some help.

I know a reproducible case is key to tracking down bugs but due to
the nature of Lasso, being a multithreaded process taking care of
a lot of different requests, it's really hard to figure out which
request or format file causing the error.

In case of memory corruptions, one request might corrupt memory,
another may make it worse and a third, coming three hours later,
might actually trigger a fatal problem within Lasso..

It might even be thread related. Or related to how signals are
handled. Both are hard to do right, especially when it comes to
cross-platform apps.

Speculating gets me nowhere so let's have a look at some hard
facts.



This is /usr/local/Lasso 8 Professional/LassoErrors.txt:
# tail -n20 LassoErrors.txt
[10/30/07 14:04:34] Unable to get pid from child process for site default
[10/30/07 14:04:34] Unable to restart site process: 1 default
[10/30/07 14:07:05] Failed to hand socket to child. errno: 32 strerror: "Broken pipe" site id: 1 site pid: 5704

[10/30/07 14:07:06] Failed to hand socket to child. errno: 32 strerror: "Broken pipe" site id: 1 site pid: 5704

[10/30/07 14:07:07] Restarted site process 1 default
[10/30/07 14:07:22] Unable to get pid from child process for site default
[10/30/07 14:07:22] Unable to restart site process: 1 default
[10/30/07 14:09:16] Failed to hand socket to child. errno: 32 strerror: "Broken pipe" site id: 1 site pid: 6694

[10/30/07 14:09:16] Failed to hand socket to child. errno: 32 strerror: "Broken pipe" site id: 1 site pid: 6694

[10/30/07 14:09:17] Failed to hand socket to child. errno: 32 strerror: "Broken pipe" site id: 1 site pid: 6694

[10/30/07 14:09:18] Restarted site process 1 default
[10/30/07 14:09:33] Unable to get pid from child process for site default
[10/30/07 14:09:33] Unable to restart site process: 1 default
[10/30/07 14:09:48] Unable to get pid from child process for site default
[10/30/07 14:09:48] Unable to restart site process: 1 default


Ok, so apparently Lasso crashes frequently. Not cool.
I put in "ulimit -c unlimited" right after the "else" in line 35 of
/usr/sbin/lasso8ctl (a shellscript).

This means I'll now get coredumps of the crashing process.
After restarting Lasso and forgetting about it over the weekend I
found there was 14G worth of coredumps from 175 crashes over a
four day period.


I used to be fairly fluent with debugging non-threaded apps written in C
but Lasso is multi-threaded does use C++ code so I'm a bit at loss here.
And it's not like in Mac OS X either, where you get to know in what
thread the crash occurred but either way, here goes:

[root@lasso coredumps]# gdb -q ../../../Lasso8Service core.10044
(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `./Lasso8Service --nolisten --ischild --siteid=1 --sitename=default --pipename=l'.
Program terminated with signal 6, Aborted.

..boring symbol loading stuff removed..
(gdb) i threads
  25 process 10044  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  24 process 10045  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  23 process 10046  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  22 process 10047  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  21 process 10053  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  20 process 10054  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  19 process 10055  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  18 process 10056  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  17 process 10057  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  16 process 10058  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  15 process 10059  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  14 process 10060  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  13 process 10061  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  12 process 10062  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  11 process 10064  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  10 process 10070  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  9 process 10071  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  8 process 10072  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  7 process 11644  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  6 process 3253  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  5 process 3255  0x08123be1 in LPExecuteBytes ()
  4 process 3257  0x00381b65 in umtx_atomic_dec_3_6 () from /usr/local/lib/libicuuc.so.36
  3 process 3262  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  2 process 3263  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
* 1 process 3251  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x009fc6f4 in raise () from /lib/tls/libpthread.so.0
#2  0x032bac89 in MagickSignalHandler (signal_number=6) at magick/magick.c:828
#3  <signal handler called>
#4  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#5  0x007cf7a5 in raise () from /lib/tls/libc.so.6
#6  0x007d1209 in abort () from /lib/tls/libc.so.6
#7  0x00c3814b in __gnu_cxx::__verbose_terminate_handler () from /usr/lib/libstdc++.so.6
#8  0x00c35e61 in __cxa_call_unexpected () from /usr/lib/libstdc++.so.6
#9  0x00c35e96 in std::terminate () from /usr/lib/libstdc++.so.6
#10 0x00c35fdf in __cxa_throw () from /usr/lib/libstdc++.so.6
#11 0x0816e486 in sigcatch ()
#12 <signal handler called>
#13 0x00809680 in malloc_consolidate () from /lib/tls/libc.so.6
#14 0x0080a643 in _int_malloc () from /lib/tls/libc.so.6
#15 0x0080c401 in malloc () from /lib/tls/libc.so.6
#16 0x00c363b7 in operator new () from /usr/lib/libstdc++.so.6
#17 0x081d85a7 in std::vector<__gnu_cxx::_Hashtable_node<std::pair<icu_3_6::UnicodeString const, var_ref_t_> >*, std::allocator<__gnu_cxx::_Hashtable_node<std::pair<icu_3_6::UnicodeString const, var_ref_t_> >*> >::reserve ()
#18 0x081c27d4 in Variant::SetType ()
#19 0x081acc15 in Variant::OpAssign ()
#20 0x0811938f in LPExecuteBytes ()
#21 0x081afbea in var_code_t_::Execute ()
#22 0x08113697 in LPInvokeInstance ()
#23 0x0824998e in template_sort::upper_bound<var_ref_t_*, var_ref_t_, cont_CMP> ()
#24 0x08249b79 in template_sort::__inplace_merge<var_ref_t_*, int, cont_CMP> ()
#25 0x08249be1 in template_sort::__inplace_merge<var_ref_t_*, int, cont_CMP> ()
#26 0x0824a20a in template_sort::__stable_sort<var_ref_t_*, cont_CMP> ()
#27 0x0824a746 in osSort<var_ref_t_*, cont_CMP> ()
#28 0x0824a7e9 in imp_sortwith<var_list_t_> ()
#29 0x081aeb5d in var_code_t_::Execute ()
#30 0x08113697 in LPInvokeInstance ()
#31 0x0811a80a in LPExecuteBytes ()
---Type <return> to continue, or q <return> to quit---
#32 0x08101e5a in lasso_runChildren2 ()
#33 0x0819d509 in SubstitutionTag::FormatChildren ()
#34 0x0827c4fa in InlineTag::Format ()
#35 0x081af7e7 in var_code_t_::Execute ()
#36 0x08113697 in LPInvokeInstance ()
#37 0x0811c44e in LPExecuteBytes ()
#38 0x081afbea in var_code_t_::Execute ()
#39 0x08113697 in LPInvokeInstance ()
#40 0x08106866 in LPDocumentRec::Execute ()
#41 0x08132282 in Lasso::FormulateResponse ()
#42 0x08131fb7 in Lasso::FormatBasedOnError ()
#43 0x081346d0 in Lasso::ProcessRequestNew ()
#44 0x08136b19 in Lasso::RunSession ()
#45 0x08137676 in Lasso::RunThread ()
#46 0x081376a4 in Lasso::thread_start ()
#47 0x0816e24e in os_entry ()
#48 0x009f6371 in start_thread () from /lib/tls/libpthread.so.0
#49 0x0086fffe in clone () from /lib/tls/libc.so.6


I have no idea if this backtrace is even interesting.

Lasso got a signal in frame #13.
Did the OS deliver the initial signal because
malloc_consolidate() did something bad, or did some other thread
do something bad and caused a signal to be delivered to all
threads? I have no idea.



Let's try another coredump.
# gdb -q ../../../Lasso8Service core.29502
(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `./Lasso8Service --nolisten --ischild --siteid=1 --sitename=default --pipename=l'.
Program terminated with signal 6, Aborted.

..boring symbol loading stuff removed..

#0  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x009fc6f4 in raise () from /lib/tls/libpthread.so.0
#2  0x04f20c89 in MagickSignalHandler (signal_number=6) at magick/magick.c:828
#3  <signal handler called>
#4  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#5  0x007cf7a5 in raise () from /lib/tls/libc.so.6
#6  0x007d1209 in abort () from /lib/tls/libc.so.6
#7  0x009fafcd in unwind_cleanup () from /lib/tls/libpthread.so.0
#8  0x00a55470 in _Unwind_DeleteException () from /lib/libgcc_s.so.1
#9  0x00c35364 in __cxa_end_catch () from /usr/lib/libstdc++.so.6
#10 0x0816e111 in sighupcatch ()
#11 <signal handler called>
#12 0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#13 0x009f8d9c in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#14 0x0816df4c in osThread::SleepMillis ()
#15 0x0812dbf5 in busy_main_loop ()
#16 0x0812eb82 in Lasso::DoRun ()
#17 0x080edfab in lasso_run ()
#18 0x0815c006 in main ()
(gdb)



Here's yet another.

[root@lasso coredumps]# gdb -q ../../../Lasso8Service core.19416
(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `./Lasso8Service --nolisten --ischild --siteid=1 --sitename=default --pipename=l'.
Program terminated with signal 6, Aborted.

...

#0  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x009fc6f4 in raise () from /lib/tls/libpthread.so.0
#2  0x03ba9c89 in MagickSignalHandler (signal_number=6) at magick/magick.c:828
#3  <signal handler called>
#4  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#5  0x00a807a5 in raise () from /lib/tls/libc.so.6
#6  0x00a82209 in abort () from /lib/tls/libc.so.6
#7  0x00ab471a in __libc_message () from /lib/tls/libc.so.6
#8  0x00abafbf in _int_free () from /lib/tls/libc.so.6
#9  0x00abb33a in free () from /lib/tls/libc.so.6
#10 0x00c34fd1 in operator delete () from /usr/lib/libstdc++.so.6
#11 0x081b1efc in Variant::Init ()
#12 0x081b7846 in Variant::SetReference ()
#13 0x081abfd9 in Variant::OpAssign ()
#14 0x0812c004 in ExecBinaryOp ()
#15 0x0811997f in LPExecuteBytes ()
#16 0x081afbea in var_code_t_::Execute ()
#17 0x08113697 in LPInvokeInstance ()
#18 0x0811a80a in LPExecuteBytes ()
#19 0x081afbea in var_code_t_::Execute ()
#20 0x08113697 in LPInvokeInstance ()
#21 0x08126450 in LPExecuteBytes ()
#22 0x081afbea in var_code_t_::Execute ()
#23 0x080bfa76 in Ops_Code::Format ()
#24 0x081aeb5d in var_code_t_::Execute ()
#25 0x08113697 in LPInvokeInstance ()
#26 0x0811a80a in LPExecuteBytes ()
#27 0x081afbea in var_code_t_::Execute ()
#28 0x08113697 in LPInvokeInstance ()
#29 0x0811c44e in LPExecuteBytes ()
#30 0x081afbea in var_code_t_::Execute ()
#31 0x08113697 in LPInvokeInstance ()
#32 0x08106866 in LPDocumentRec::Execute ()
---Type <return> to continue, or q <return> to quit---
#33 0x0827964d in include ()
#34 0x081aeb5d in var_code_t_::Execute ()
#35 0x08113697 in LPInvokeInstance ()
#36 0x0811c44e in LPExecuteBytes ()
#37 0x081afbea in var_code_t_::Execute ()
#38 0x08113697 in LPInvokeInstance ()
#39 0x08106866 in LPDocumentRec::Execute ()
#40 0x0827964d in include ()
#41 0x081aeb5d in var_code_t_::Execute ()
#42 0x08113697 in LPInvokeInstance ()
#43 0x0811c44e in LPExecuteBytes ()
#44 0x081afbea in var_code_t_::Execute ()
#45 0x08113697 in LPInvokeInstance ()
#46 0x08106866 in LPDocumentRec::Execute ()
#47 0x08132282 in Lasso::FormulateResponse ()
#48 0x08131fb7 in Lasso::FormatBasedOnError ()
#49 0x081346d0 in Lasso::ProcessRequestNew ()
#50 0x08136b19 in Lasso::RunSession ()
#51 0x08137676 in Lasso::RunThread ()
#52 0x081376a4 in Lasso::thread_start ()
#53 0x0816e24e in os_entry ()
#54 0x009f6371 in start_thread () from /lib/tls/libpthread.so.0
#55 0x00b20ffe in clone () from /lib/tls/libc.so.6



And here's a fourth, final, example.


#0  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x009fc6f4 in raise () from /lib/tls/libpthread.so.0
#2  0x04816c89 in MagickSignalHandler (signal_number=6) at magick/magick.c:828
#3  <signal handler called>
#4  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#5  0x007cf7a5 in raise () from /lib/tls/libc.so.6
#6  0x007d1209 in abort () from /lib/tls/libc.so.6
#7  0x0080371a in __libc_message () from /lib/tls/libc.so.6
#8  0x008097f4 in malloc_consolidate () from /lib/tls/libc.so.6
#9  0x0080a643 in _int_malloc () from /lib/tls/libc.so.6
#10 0x0080c401 in malloc () from /lib/tls/libc.so.6
#11 0x00c363b7 in operator new () from /usr/lib/libstdc++.so.6
#12 0x08160637 in std::vector<__gnu_cxx::_Hashtable_node<std::pair<icu_3_6::UnicodeString const, RefCountedTracker<Namespace> > >*, std::allocator<__gnu_cxx::_Hashtable_node<std::pair<icu_3_6::UnicodeString const, RefCountedTracker<Namespace> > >*> >::reserve ()
#13 0x0815dfa4 in Namespace::Namespace ()
#14 0x0815e07a in Namespace::NewFull ()
#15 0x0815f45f in Namespace::NewLocal ()
#16 0x081af8b5 in var_code_t_::Execute ()
#17 0x08113697 in LPInvokeInstance ()
#18 0x08126450 in LPExecuteBytes ()
#19 0x081afbea in var_code_t_::Execute ()





If restarting Lasso from the console I often get messages like
these after a while. They usually mean the process has used a
stale pointer from a malloc() call or called free() twice on
the same pointer from malloc().


*** glibc detected *** malloc(): memory corruption (fast): 0x892b88b8 ***
*** glibc detected *** double free or corruption (fasttop): 0x8c661088 ***
*** glibc detected *** malloc(): memory corruption (fast): 0x8ba014b8 ***




We're running Lasso 8.5.4 on RHEL 4, x86.

[root@lasso coredumps]# rpm -qa |grep -i lasso
Lasso-Documentation-8.5.4-1
Lasso-Apache2Connector-8.5.4-1
Lasso-Service-8.5.4-1

[root@lasso coredumps]# rpm -qa|egrep -e 'mysql|java|zlib'
zlib-1.2.1.2-1.2
gcc-java-3.4.6-3.1
java-1.4.2-gcj-compat-devel-1.4.2.0-27jpp
gcc4-java-4.1.0-18.EL4.3
java-1.4.2-gcj-compat-1.4.2.0-27jpp
mysqlclient14-4.1.14-4.el4s1.1
zlib-devel-1.2.1.2-1.2
mysqlclient10-3.23.58-4.RHEL4.1




Btw, I've seen the same thing as this guy too.
Not sure if it's related.
http://www.listsearch.com/Lasso/Message/index.lasso?230276

Right before crashes I've seen there's multiple threads of
a few pages running at very long times (several minutes).
Usually these pages delivers in less than a second.


  -- noah

--
This list is a free service of LassoSoft: http://www.LassoSoft.com/
Search the list archives: http://www.ListSearch.com/Lasso/Browse/
Manage your subscription: http://www.ListSearch.com/Lasso/



--
This list is a free service of LassoSoft: http://www.LassoSoft.com/
Search the list archives: http://www.ListSearch.com/Lasso/Browse/
Manage your subscription: http://www.ListSearch.com/Lasso/

Reply | Threaded
Open this post in threaded view
|

Re: Frequent Lasso crashes (Linux)

Alan Linnenbank
In reply to this post by i
Also I know LassoSoft is working on this issue.

Alan

On dinsdag, 30 oktober 2007 16:06, noah williamsson <[hidden email]> wrote:
Hi!

I'm experiencing frequent crashes with Lasso 8.5.4 on RHEL 4.
I'm actually seeing crashes with other versions aswell on Mac OS X,
but that's a different story and I'll focus on Linux for now.


These crashes cause "Internal Server Error" to be displayed all
over all our Lasso based services. I could probably try to minimize
the impact by starting up one Lasso Site for each of our services
but that's not fixing the problem.


I know I'm not alone here in Sweden with seeing crashes like this.
I mention this because it probably means we're all dealing with
non-ASCII characters.
UTF-8/Unicode related bugs tend to go unnoticed for a long time in
QA tests of software developed in ASCII countries so to speak. ;)


I'm having a hard time to debug this and could really use some help.

I know a reproducible case is key to tracking down bugs but due to
the nature of Lasso, being a multithreaded process taking care of
a lot of different requests, it's really hard to figure out which
request or format file causing the error.

In case of memory corruptions, one request might corrupt memory,
another may make it worse and a third, coming three hours later,
might actually trigger a fatal problem within Lasso..

It might even be thread related. Or related to how signals are
handled. Both are hard to do right, especially when it comes to
cross-platform apps.

Speculating gets me nowhere so let's have a look at some hard
facts.



This is /usr/local/Lasso 8 Professional/LassoErrors.txt:
# tail -n20 LassoErrors.txt
[10/30/07 14:04:34] Unable to get pid from child process for site default
[10/30/07 14:04:34] Unable to restart site process: 1 default
[10/30/07 14:07:05] Failed to hand socket to child. errno: 32 strerror: "Broken pipe" site id: 1 site pid: 5704

[10/30/07 14:07:06] Failed to hand socket to child. errno: 32 strerror: "Broken pipe" site id: 1 site pid: 5704

[10/30/07 14:07:07] Restarted site process 1 default
[10/30/07 14:07:22] Unable to get pid from child process for site default
[10/30/07 14:07:22] Unable to restart site process: 1 default
[10/30/07 14:09:16] Failed to hand socket to child. errno: 32 strerror: "Broken pipe" site id: 1 site pid: 6694

[10/30/07 14:09:16] Failed to hand socket to child. errno: 32 strerror: "Broken pipe" site id: 1 site pid: 6694

[10/30/07 14:09:17] Failed to hand socket to child. errno: 32 strerror: "Broken pipe" site id: 1 site pid: 6694

[10/30/07 14:09:18] Restarted site process 1 default
[10/30/07 14:09:33] Unable to get pid from child process for site default
[10/30/07 14:09:33] Unable to restart site process: 1 default
[10/30/07 14:09:48] Unable to get pid from child process for site default
[10/30/07 14:09:48] Unable to restart site process: 1 default


Ok, so apparently Lasso crashes frequently. Not cool.
I put in "ulimit -c unlimited" right after the "else" in line 35 of
/usr/sbin/lasso8ctl (a shellscript).

This means I'll now get coredumps of the crashing process.
After restarting Lasso and forgetting about it over the weekend I
found there was 14G worth of coredumps from 175 crashes over a
four day period.


I used to be fairly fluent with debugging non-threaded apps written in C
but Lasso is multi-threaded does use C++ code so I'm a bit at loss here.
And it's not like in Mac OS X either, where you get to know in what
thread the crash occurred but either way, here goes:

[root@lasso coredumps]# gdb -q ../../../Lasso8Service core.10044
(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `./Lasso8Service --nolisten --ischild --siteid=1 --sitename=default --pipename=l'.
Program terminated with signal 6, Aborted.

..boring symbol loading stuff removed..
(gdb) i threads
  25 process 10044  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  24 process 10045  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  23 process 10046  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  22 process 10047  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  21 process 10053  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  20 process 10054  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  19 process 10055  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  18 process 10056  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  17 process 10057  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  16 process 10058  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  15 process 10059  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  14 process 10060  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  13 process 10061  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  12 process 10062  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  11 process 10064  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  10 process 10070  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  9 process 10071  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  8 process 10072  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  7 process 11644  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  6 process 3253  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  5 process 3255  0x08123be1 in LPExecuteBytes ()
  4 process 3257  0x00381b65 in umtx_atomic_dec_3_6 () from /usr/local/lib/libicuuc.so.36
  3 process 3262  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
  2 process 3263  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
* 1 process 3251  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x009fc6f4 in raise () from /lib/tls/libpthread.so.0
#2  0x032bac89 in MagickSignalHandler (signal_number=6) at magick/magick.c:828
#3  <signal handler called>
#4  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#5  0x007cf7a5 in raise () from /lib/tls/libc.so.6
#6  0x007d1209 in abort () from /lib/tls/libc.so.6
#7  0x00c3814b in __gnu_cxx::__verbose_terminate_handler () from /usr/lib/libstdc++.so.6
#8  0x00c35e61 in __cxa_call_unexpected () from /usr/lib/libstdc++.so.6
#9  0x00c35e96 in std::terminate () from /usr/lib/libstdc++.so.6
#10 0x00c35fdf in __cxa_throw () from /usr/lib/libstdc++.so.6
#11 0x0816e486 in sigcatch ()
#12 <signal handler called>
#13 0x00809680 in malloc_consolidate () from /lib/tls/libc.so.6
#14 0x0080a643 in _int_malloc () from /lib/tls/libc.so.6
#15 0x0080c401 in malloc () from /lib/tls/libc.so.6
#16 0x00c363b7 in operator new () from /usr/lib/libstdc++.so.6
#17 0x081d85a7 in std::vector<__gnu_cxx::_Hashtable_node<std::pair<icu_3_6::UnicodeString const, var_ref_t_> >*, std::allocator<__gnu_cxx::_Hashtable_node<std::pair<icu_3_6::UnicodeString const, var_ref_t_> >*> >::reserve ()
#18 0x081c27d4 in Variant::SetType ()
#19 0x081acc15 in Variant::OpAssign ()
#20 0x0811938f in LPExecuteBytes ()
#21 0x081afbea in var_code_t_::Execute ()
#22 0x08113697 in LPInvokeInstance ()
#23 0x0824998e in template_sort::upper_bound<var_ref_t_*, var_ref_t_, cont_CMP> ()
#24 0x08249b79 in template_sort::__inplace_merge<var_ref_t_*, int, cont_CMP> ()
#25 0x08249be1 in template_sort::__inplace_merge<var_ref_t_*, int, cont_CMP> ()
#26 0x0824a20a in template_sort::__stable_sort<var_ref_t_*, cont_CMP> ()
#27 0x0824a746 in osSort<var_ref_t_*, cont_CMP> ()
#28 0x0824a7e9 in imp_sortwith<var_list_t_> ()
#29 0x081aeb5d in var_code_t_::Execute ()
#30 0x08113697 in LPInvokeInstance ()
#31 0x0811a80a in LPExecuteBytes ()
---Type <return> to continue, or q <return> to quit---
#32 0x08101e5a in lasso_runChildren2 ()
#33 0x0819d509 in SubstitutionTag::FormatChildren ()
#34 0x0827c4fa in InlineTag::Format ()
#35 0x081af7e7 in var_code_t_::Execute ()
#36 0x08113697 in LPInvokeInstance ()
#37 0x0811c44e in LPExecuteBytes ()
#38 0x081afbea in var_code_t_::Execute ()
#39 0x08113697 in LPInvokeInstance ()
#40 0x08106866 in LPDocumentRec::Execute ()
#41 0x08132282 in Lasso::FormulateResponse ()
#42 0x08131fb7 in Lasso::FormatBasedOnError ()
#43 0x081346d0 in Lasso::ProcessRequestNew ()
#44 0x08136b19 in Lasso::RunSession ()
#45 0x08137676 in Lasso::RunThread ()
#46 0x081376a4 in Lasso::thread_start ()
#47 0x0816e24e in os_entry ()
#48 0x009f6371 in start_thread () from /lib/tls/libpthread.so.0
#49 0x0086fffe in clone () from /lib/tls/libc.so.6


I have no idea if this backtrace is even interesting.

Lasso got a signal in frame #13.
Did the OS deliver the initial signal because
malloc_consolidate() did something bad, or did some other thread
do something bad and caused a signal to be delivered to all
threads? I have no idea.



Let's try another coredump.
# gdb -q ../../../Lasso8Service core.29502
(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `./Lasso8Service --nolisten --ischild --siteid=1 --sitename=default --pipename=l'.
Program terminated with signal 6, Aborted.

..boring symbol loading stuff removed..

#0  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x009fc6f4 in raise () from /lib/tls/libpthread.so.0
#2  0x04f20c89 in MagickSignalHandler (signal_number=6) at magick/magick.c:828
#3  <signal handler called>
#4  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#5  0x007cf7a5 in raise () from /lib/tls/libc.so.6
#6  0x007d1209 in abort () from /lib/tls/libc.so.6
#7  0x009fafcd in unwind_cleanup () from /lib/tls/libpthread.so.0
#8  0x00a55470 in _Unwind_DeleteException () from /lib/libgcc_s.so.1
#9  0x00c35364 in __cxa_end_catch () from /usr/lib/libstdc++.so.6
#10 0x0816e111 in sighupcatch ()
#11 <signal handler called>
#12 0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#13 0x009f8d9c in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#14 0x0816df4c in osThread::SleepMillis ()
#15 0x0812dbf5 in busy_main_loop ()
#16 0x0812eb82 in Lasso::DoRun ()
#17 0x080edfab in lasso_run ()
#18 0x0815c006 in main ()
(gdb)



Here's yet another.

[root@lasso coredumps]# gdb -q ../../../Lasso8Service core.19416
(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `./Lasso8Service --nolisten --ischild --siteid=1 --sitename=default --pipename=l'.
Program terminated with signal 6, Aborted.

...

#0  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x009fc6f4 in raise () from /lib/tls/libpthread.so.0
#2  0x03ba9c89 in MagickSignalHandler (signal_number=6) at magick/magick.c:828
#3  <signal handler called>
#4  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#5  0x00a807a5 in raise () from /lib/tls/libc.so.6
#6  0x00a82209 in abort () from /lib/tls/libc.so.6
#7  0x00ab471a in __libc_message () from /lib/tls/libc.so.6
#8  0x00abafbf in _int_free () from /lib/tls/libc.so.6
#9  0x00abb33a in free () from /lib/tls/libc.so.6
#10 0x00c34fd1 in operator delete () from /usr/lib/libstdc++.so.6
#11 0x081b1efc in Variant::Init ()
#12 0x081b7846 in Variant::SetReference ()
#13 0x081abfd9 in Variant::OpAssign ()
#14 0x0812c004 in ExecBinaryOp ()
#15 0x0811997f in LPExecuteBytes ()
#16 0x081afbea in var_code_t_::Execute ()
#17 0x08113697 in LPInvokeInstance ()
#18 0x0811a80a in LPExecuteBytes ()
#19 0x081afbea in var_code_t_::Execute ()
#20 0x08113697 in LPInvokeInstance ()
#21 0x08126450 in LPExecuteBytes ()
#22 0x081afbea in var_code_t_::Execute ()
#23 0x080bfa76 in Ops_Code::Format ()
#24 0x081aeb5d in var_code_t_::Execute ()
#25 0x08113697 in LPInvokeInstance ()
#26 0x0811a80a in LPExecuteBytes ()
#27 0x081afbea in var_code_t_::Execute ()
#28 0x08113697 in LPInvokeInstance ()
#29 0x0811c44e in LPExecuteBytes ()
#30 0x081afbea in var_code_t_::Execute ()
#31 0x08113697 in LPInvokeInstance ()
#32 0x08106866 in LPDocumentRec::Execute ()
---Type <return> to continue, or q <return> to quit---
#33 0x0827964d in include ()
#34 0x081aeb5d in var_code_t_::Execute ()
#35 0x08113697 in LPInvokeInstance ()
#36 0x0811c44e in LPExecuteBytes ()
#37 0x081afbea in var_code_t_::Execute ()
#38 0x08113697 in LPInvokeInstance ()
#39 0x08106866 in LPDocumentRec::Execute ()
#40 0x0827964d in include ()
#41 0x081aeb5d in var_code_t_::Execute ()
#42 0x08113697 in LPInvokeInstance ()
#43 0x0811c44e in LPExecuteBytes ()
#44 0x081afbea in var_code_t_::Execute ()
#45 0x08113697 in LPInvokeInstance ()
#46 0x08106866 in LPDocumentRec::Execute ()
#47 0x08132282 in Lasso::FormulateResponse ()
#48 0x08131fb7 in Lasso::FormatBasedOnError ()
#49 0x081346d0 in Lasso::ProcessRequestNew ()
#50 0x08136b19 in Lasso::RunSession ()
#51 0x08137676 in Lasso::RunThread ()
#52 0x081376a4 in Lasso::thread_start ()
#53 0x0816e24e in os_entry ()
#54 0x009f6371 in start_thread () from /lib/tls/libpthread.so.0
#55 0x00b20ffe in clone () from /lib/tls/libc.so.6



And here's a fourth, final, example.


#0  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x009fc6f4 in raise () from /lib/tls/libpthread.so.0
#2  0x04816c89 in MagickSignalHandler (signal_number=6) at magick/magick.c:828
#3  <signal handler called>
#4  0x0078a7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#5  0x007cf7a5 in raise () from /lib/tls/libc.so.6
#6  0x007d1209 in abort () from /lib/tls/libc.so.6
#7  0x0080371a in __libc_message () from /lib/tls/libc.so.6
#8  0x008097f4 in malloc_consolidate () from /lib/tls/libc.so.6
#9  0x0080a643 in _int_malloc () from /lib/tls/libc.so.6
#10 0x0080c401 in malloc () from /lib/tls/libc.so.6
#11 0x00c363b7 in operator new () from /usr/lib/libstdc++.so.6
#12 0x08160637 in std::vector<__gnu_cxx::_Hashtable_node<std::pair<icu_3_6::UnicodeString const, RefCountedTracker<Namespace> > >*, std::allocator<__gnu_cxx::_Hashtable_node<std::pair<icu_3_6::UnicodeString const, RefCountedTracker<Namespace> > >*> >::reserve ()
#13 0x0815dfa4 in Namespace::Namespace ()
#14 0x0815e07a in Namespace::NewFull ()
#15 0x0815f45f in Namespace::NewLocal ()
#16 0x081af8b5 in var_code_t_::Execute ()
#17 0x08113697 in LPInvokeInstance ()
#18 0x08126450 in LPExecuteBytes ()
#19 0x081afbea in var_code_t_::Execute ()





If restarting Lasso from the console I often get messages like
these after a while. They usually mean the process has used a
stale pointer from a malloc() call or called free() twice on
the same pointer from malloc().


*** glibc detected *** malloc(): memory corruption (fast): 0x892b88b8 ***
*** glibc detected *** double free or corruption (fasttop): 0x8c661088 ***
*** glibc detected *** malloc(): memory corruption (fast): 0x8ba014b8 ***




We're running Lasso 8.5.4 on RHEL 4, x86.

[root@lasso coredumps]# rpm -qa |grep -i lasso
Lasso-Documentation-8.5.4-1
Lasso-Apache2Connector-8.5.4-1
Lasso-Service-8.5.4-1

[root@lasso coredumps]# rpm -qa|egrep -e 'mysql|java|zlib'
zlib-1.2.1.2-1.2
gcc-java-3.4.6-3.1
java-1.4.2-gcj-compat-devel-1.4.2.0-27jpp
gcc4-java-4.1.0-18.EL4.3
java-1.4.2-gcj-compat-1.4.2.0-27jpp
mysqlclient14-4.1.14-4.el4s1.1
zlib-devel-1.2.1.2-1.2
mysqlclient10-3.23.58-4.RHEL4.1




Btw, I've seen the same thing as this guy too.
Not sure if it's related.
http://www.listsearch.com/Lasso/Message/index.lasso?230276

Right before crashes I've seen there's multiple threads of
a few pages running at very long times (several minutes).
Usually these pages delivers in less than a second.


  -- noah

--
This list is a free service of LassoSoft: http://www.LassoSoft.com/
Search the list archives: http://www.ListSearch.com/Lasso/Browse/
Manage your subscription: http://www.ListSearch.com/Lasso/



--
This list is a free service of LassoSoft: http://www.LassoSoft.com/
Search the list archives: http://www.ListSearch.com/Lasso/Browse/
Manage your subscription: http://www.ListSearch.com/Lasso/

Reply | Threaded
Open this post in threaded view
|

Re: Frequent Lasso crashes (Linux)

Fletcher Sandbeck-3
In reply to this post by i
On 10/30/07 at 4:06 PM, [hidden email] (noah williamsson) wrote:

>These crashes cause "Internal Server Error" to be displayed all
>over all our Lasso based services. I could probably try to minimize
>the impact by starting up one Lasso Site for each of our services
>but that's not fixing the problem.

We'll look at the crash reports you provide in this message and
see if we can discern any information about why Lasso is crashing.

I would suggest that using multiple sites might help you track
down the problem.  If you have several domain names then
separating them into different processes will make it easier to
determine whether the issue is affecting all sites or one
specific site.

You can create a new site in Server Admin by duplicating an
existing site.  This will copy all the settings over so you
don't need to configure anything.  You could configure one site
per domain.  Or, use one site as a temporary sandbox and assign
each domain to it individually to see if you can determine
whether the instability is associated with one domain.

If you're seeing a lot of instability then using multiple sites
will also help mitigate the impact of a crash by ensuring that
only pages loads for the affected domain show the Internal
Server Error message.

[fletcher]

--
Fletcher Sandbeck                         [hidden email]
LassoSoft, LLC                          http://www.lassosoft.com


--
This list is a free service of LassoSoft: http://www.LassoSoft.com/
Search the list archives: http://www.ListSearch.com/Lasso/Browse/
Manage your subscription: http://www.ListSearch.com/Lasso/

Reply | Threaded
Open this post in threaded view
|

Re: Frequent Lasso crashes (Linux)

Petrus Näslund-3
We are seeing the exact same behaviour as Noah on our RHEL4 server.
Let's hope that LassoSoft will be able to track and fix this soon.

//petrus



30 okt 2007 kl. 16.30 skrev Fletcher Sandbeck:

> On 10/30/07 at 4:06 PM, [hidden email] (noah williamsson) wrote:
>
>> These crashes cause "Internal Server Error" to be displayed all
>> over all our Lasso based services. I could probably try to minimize
>> the impact by starting up one Lasso Site for each of our services
>> but that's not fixing the problem.
>
> We'll look at the crash reports you provide in this message and see  
> if we can discern any information about why Lasso is crashing.
>
> I would suggest that using multiple sites might help you track down  
> the problem.  If you have several domain names then separating them  
> into different processes will make it easier to determine whether  
> the issue is affecting all sites or one specific site.
>
> You can create a new site in Server Admin by duplicating an  
> existing site.  This will copy all the settings over so you don't  
> need to configure anything.  You could configure one site per  
> domain.  Or, use one site as a temporary sandbox and assign each  
> domain to it individually to see if you can determine whether the  
> instability is associated with one domain.
>
> If you're seeing a lot of instability then using multiple sites  
> will also help mitigate the impact of a crash by ensuring that only  
> pages loads for the affected domain show the Internal Server Error  
> message.
>
> [fletcher]
>
> --
> Fletcher Sandbeck                         [hidden email]
> LassoSoft, LLC                          http://www.lassosoft.com
>
>
> --
> This list is a free service of LassoSoft: http://www.LassoSoft.com/
> Search the list archives: http://www.ListSearch.com/Lasso/Browse/
> Manage your subscription: http://www.ListSearch.com/Lasso/
>

--
This list is a free service of LassoSoft: http://www.LassoSoft.com/
Search the list archives: http://www.ListSearch.com/Lasso/Browse/
Manage your subscription: http://www.ListSearch.com/Lasso/

i
Reply | Threaded
Open this post in threaded view
|

Re: Frequent Lasso crashes (Linux)

i
In reply to this post by Fletcher Sandbeck-3
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Frequent Lasso crashes (Linux)

Michael Coninx
A couple of months ago, our server also crashed randomly, or so it seemed.
In our case the problem was with ImageMagick, creating thumbnails
with an invalid path if I remember correctly.

Although the code was wrapped in protect tags, the server crashed,
leaving no clear sign of what the problem was. I don't remember exactly
how I figured out the problem, but seeing the next line in your crash dump,
maybe your crash is also related with ImageMagick ?

#2  0x032bac89 in MagickSignalHandler (signal_number=6) at
magick/magick.c:828

Or is it just a coincidence that this line also comes back in the
examples you provided,
and has it got nothing to do with the image tags.

With kind regard,

Michael
noah williamsson wrote:

> Fletcher Sandbeck skrev:
>> On 10/30/07 at 4:06 PM, [hidden email] (noah williamsson) wrote:
>>
>>> These crashes cause "Internal Server Error" to be displayed all
>>> over all our Lasso based services. I could probably try to minimize
>>> the impact by starting up one Lasso Site for each of our services
>>> but that's not fixing the problem.
>>
>> We'll look at the crash reports you provide in this message and see
>> if we can discern any information about why Lasso is crashing.
>
> They are probably pretty useless and served just as an example of what
> was going on. Reply to me privately and I'll hook you up with the
> actual coredumps (compressed!) instead.
>
>
>> I would suggest that using multiple sites might help you track down
>> the problem.  If you have several domain names then separating them
>> into different processes will make it easier to determine whether the
>> issue is affecting all sites or one specific site.
>
> In this case the entire server is dedicated for one specific service.
> This service is in production and for public use and to me, the only
> sane workaround is to port the service to PHP unless there's a chance
> this will be fixed within reasonable time.
>
> Lasso's stability is a huge problem for us.
>
>
>  -- noah
>


--
This list is a free service of LassoSoft: http://www.LassoSoft.com/
Search the list archives: http://www.ListSearch.com/Lasso/Browse/
Manage your subscription: http://www.ListSearch.com/Lasso/

Reply | Threaded
Open this post in threaded view
|

Re: Frequent Lasso crashes (Linux)

Fletcher Sandbeck-3
In reply to this post by i
On 10/30/07 at 4:49 PM, [hidden email] (noah williamsson) wrote:

> >>These crashes cause "Internal Server Error" to be displayed all
> >>over all our Lasso based services. I could probably try to minimize
> >>the impact by starting up one Lasso Site for each of our services
> >>but that's not fixing the problem.
> >
> >We'll look at the crash reports you provide in this message and
> >see if we can discern any information about why Lasso is crashing.
>
> They are probably pretty useless and served just as an example of what
> was going on. Reply to me privately and I'll hook you up with the
> actual coredumps (compressed!) instead.

Send them to <[hidden email]>.

[fletcher]

--
Fletcher Sandbeck                         [hidden email]
LassoSoft, LLC                          http://www.lassosoft.com


--
This list is a free service of LassoSoft: http://www.LassoSoft.com/
Search the list archives: http://www.ListSearch.com/Lasso/Browse/
Manage your subscription: http://www.ListSearch.com/Lasso/

i
Reply | Threaded
Open this post in threaded view
|

Re: Frequent Lasso crashes (Linux)

i
In reply to this post by Michael Coninx
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Frequent Lasso crashes (Linux)

Johan Solve
In reply to this post by Fletcher Sandbeck-3
On 10/30/07, Fletcher Sandbeck <[hidden email]> wrote:
> You can create a new site in Server Admin by duplicating an
> existing site.  This will copy all the settings over so you
> don't need to configure anything.

Apparently everything in the Site folder is duplicated, including more
than 14GB of coredumps... Lasso failed at this mid copy (crashed
apparently), so we ended up with an incomplete new site...
I'm copying the remaining files and directories manually. Is this OK?

--
     Johan Sölve    [FSA Partner, Lasso Partner]
     Web Application/Lasso/FileMaker Developer
     MONTANIA SOFTWARE & SOLUTIONS
http://www.montania.se   mailto:[hidden email]
 (spam-safe email address, replace '-' with 'a')

--
This list is a free service of LassoSoft: http://www.LassoSoft.com/
Search the list archives: http://www.ListSearch.com/Lasso/Browse/
Manage your subscription: http://www.ListSearch.com/Lasso/