Forum OpenACS Development: Naviserver on Windows - "select failed" when using ns_sockselect

Hi

I'm not having success sending email through Naviserver on Windows. I've tried both ns_sendmail and acs_mail_lite::send. The problem sees to be at a lower level opening sockets. I'm on Windows 2012. I can manually connect and send mails using a telnet session to my mail server, so I know it's definitely a Naviserver issue.

ns_sendmail is reporting a "select failed: no such file or directory" error. Digging into it, I have it narrowed down to the following test case:

set smtphost [ns_config -set ns/parameters mailhost "localhost"]
lassign [ns_sockopen -timeout 60 $smtphost 25] rfd wfd
set fds [ns_sockselect -timeout 60 $wfd {} {}]

I get this:
ERROR:
select failed: no such file or directory
while executing
"ns_sockselect -timeout 60 $wfd {} {}"

Using acs_mail_lite::send, seems to be a bit more random. It never sends the mail, and usually there is nothing in the error log, but from time to time, I see the following in the log.

[16/Mar/2017:09:32:21][4036.88][-sched:6-] Error: Could not send queued mail (message 1): error reading "sock768": socket is not connected
[16/Mar/2017:09:34:29][4036.8e8][-conn:openacs:4-] Error: acs-mail-lite::smtp: error error reading "sock788": socket is not connected while executing

Also I'm seeing this in the log:

[16/Mar/2017:16:11:07][4500.f60][-sched:11-] Notice: acs-mail-queue: failure: select failed: no such file or directory

I presume this means that ACS Mail Lite was scheduling the mail, not sending it immediately?

Dear Brian, the interaction with a SMTP server via acs-mail-lite has always been a problem in the Windows version.
I believe that in the "old" ]po[ installer, the system was using Cygwin sendmail. In my installer there's a binary called "blat", a command line SMTP mailer for which also the sources are available (https://sourceforge.net/projects/blat/).
It is very easy to modify acs-mail-lite so that it uses this tool.
I did that in the past as a paid consultancy for some of the people who used Windows-OpenACS for real.
Hope it helps,
Maurizio
Thanks Maurizio, especially for the pointer to blat.

I'm more concerned about ns_sendmail than ACS Mail Lite, as that is what we use (and it definitely works on AOLserver on Windows). Also, I would be concerned about the ns_sockselect problem, as we have used socket calls in the past.

Brian

Just a quick one. Is the server running with administrative privileges?

Maurizio

Yes, it is.
Brian, have you tried Tcllib's smtp::sendmessage? Back in 2006, I also encountered ns_sendmail failures in the C select() call on Windows (XP I think), so I replaced my use of ns_sendmail with a wrapper around smtp::sendmessage, which worked fine.

Btw, also back then, I'd previously seen ns_sendmail work just fine on Windows XP running on VMware Workstation, but then fail every time after we switched to running XP in VMware Server on a different physical machine. Why, I do not know.

Thanks Andrew, will check it out!
Dear Brian,

the change to use acs_mail_lite::send (and therefore using tcllib's smtp::* APIby default) instead of ns_sendmail happened 7 years ago [1]. in case, the error comes from tcllib, maybe [2] can help to debug.

-gn

[1] https://github.com/openacs/openacs-core/commit/0416b1113e92d6531ea87becf5c59864ecdc62c1
[2] https://openacs.org/forums/message-view?message_id=5346643

Dear Brian,
I believe you are not on the right path to track this problem.
This is what happened:

First we have the warning: Warning: ns_sendmail is no longer supported in OpenACS. Use acs_mail_lite::send instead.
Then we have the error:
Error: acs-mail-lite::smtp: error error reading "sock788": socket is not connected while executing
Please notice an SMTP error.

In other words: ns_sendmail is actually replaced by acs_mail_lite::send. But this, in turn, uses SMTP and in your system does not connect to it as you probably put no information about it in the configuration.

This is the cause of the error message you are seeing (it's not a socket problem), it is a SMTP problem, you are trying to connect to an undefined SMTP server...

If you configure properly the SMTP parameters, and on Windows, at the moment, sendimmediately must be set to 1/true, then a command like the following should work
acs_mail_lite::send -from_addr mailto:Maurizio.Martignano@spazioit.con -to_addr mailto:Brian.Fenton@gmail.com -subject "acs_mail_lite::send_test" -body "Hello there, this is a test.".

Then the last point, on BLAT. Naviserver is a web server, it is mostly a web server. BLAT is SMTP client. There are more people working on BLAT that on Naviserver. So the chances that BLAT works, and follows properly all the INET/MAIL standards and their evolution is much higher than the ones for Naviserver.

Hope it helps,
Maurizio

Thanks to your advice Gustaf and Maurizio, I was able to debug the problem, and trace it to the -originator value being to smtp::sendmessage. Maurizio, you were correct that I was on the wrong track trying to debug ns_sendmail. So I have mails working now successfully, which is great news.

Having said that, I have been seeing some very strange stuff in the error log, binary characters and other strange chunks of data. For example, when I was logging the smtp commands, the following was logged:

Notice: send cmd: smtp::sendmessage ::mime::1 -originator bounce-1117-9C0499595A910E12E497926278F46FE356E9CE07-371@localhost -header {From mailto:brian.fenton@quest.ie} -header {Reply-To mailto:brian.fenton@quest.ie} -he meters.parameter_id
and apm_parameters.parameter_name =

There are 2 strange things there: 1) the 3rd -header is corrupted and contains binary characters (which I couldn't copy from the log), and 2) the SQL clause somehow appearing on the end of the smtp command!

I'm happy to help looking in that if you think it's an issue.

thanks again!
Brian

Dear Brian,
at the moment this is my main concern, this sort of erratic behaviour...

In these days I have "normalized" the treatment of socket things like EINTR, EWOULDBLOCK, EINPROGRESS.
Gustaf did some of it, but some other stuff was missing.
Basically for new compilers, instead of the above values we need to use WSAEINTR, WSAEWOULDBOCK, and so on...
This change alone made lot of the crap in the log disappear.

But there's still a long way to go.

The strategy I'm following at the moment is, ehm, rather awful... Whenever I think I have identified something, some area that in the code could be the source of the problem, I check it against how it was done in AOLserver, hoping to find some differences....

For instance, the unbalanced bracket thing, does not show up till the sweep-procs background processing start...

There's is something touching with threading and memory, but haven't track it yet. And I am doing this in my spare time...

Dear Brian, the problem with the broken entries in the log file should be fixed by [1]. see also [2] -g

[1] https://bitbucket.org/naviserver/naviserver/commits/d7313607a25837faf18005b6fa48248b79fafec7
[2] https://openacs.org/forums/message-view?message_id=5355068

Thanks Gustaf,

will try to test as soon as possible.

Brian