Server (NES) / App Server (iPS)
1. How to improve the performance of the servers with ndd ?
2. What causes FIN_WAIT_2 and TIME_WAIT? Are they cause for alarm?
3. NAS Load Balancing And Clustering Question/Answers !
4. How can I determine the number of concurrent active NAS sessions?
5. I have installed at least two Netscape Application Server instances as single instances and configured them with a common Directory Server under the same global configuration name. How can I configure them in a cluster so that they can participate in distributed data synchronization? 
6. What is the key to disable UDP ping in the registry? 
7. When accessing servlets, the Netscape Application Server(NAS) 4.x binds to NASApp. Is it possible to change the NASApp to another name?
8. Netscape Application Server: kcs.log file shows entries "10096• Communication failure" or "unknown host IP address : myhost". 
9. NAS Clustering - web connector recognises the NAS Servers with a different ip address ? 
10. How to see if your web server is using Persistent Connections (Keep-Alives).
xxx home - top of the page -
10. How to see if your web server is using Persistent Connections (Keep-Alives).

Persistent Connections, often known as HTTP Keep-Alives, allow a client and a server to exchange more data with fewer connections. This feature can increase or decrease overall performance in certain contexts.

You can verify your server's behavior with two methods:

Telnet
You can simulate the behavior of a real browser by typing the
correct header lines. Lines you need to type are in blue.

telnet sampleserver 80
Trying 123.456.789.012...
Connected to sampleserver.
Escape character is '^]'.
GET /foobar.html HTTP/1.0[Return]
Connection: keep-alive[Return]
[Return]
HTTP/1.1 404 Not found
Server: Netscape-Enterprise/3.5.1G
Date: Fri, 13 Nov 1998 21:09:59 GMT
Content-type: text/html
Content-length: 207
Connection: keep-alive      <• note the header line in the
response
 
 
 

Not Found
The requested object does not exist on this server. The link you followed is either outdated, inaccurate, or the server has been instructed not to let you have it. GET /barfoo.html HTTP/1.0[Return] <•connection stays for next request Connection: keepalive[Return] [Return] HTTP/1.1 404 Not found Server: Netscape-Enterprise/3.5.1G Date: Fri, 13 Nov 1998 21:10:18 GMT Content-type: text/html Content-length: 207 Connection: keep-alive
Not Found
The requested object does not exist on this server. The link you followed is either outdated, inaccurate, or the server has been instructed not to let you have it. (connection closes after timout period) If a server has Persistent Connections turned off, the server will respond with "Connection: close" and disconnect after responding. Trying 123.456.789.012... Connected to sampleserver. Escape character is '^]'. GET /Foobar.html HTTP/1.0[Return] Connection: keep-alive[Return] [Return] HTTP/1.1 404 Not found Server: Netscape-Enterprise/3.5.1G Date: Fri, 13 Nov 1998 21:19:38 GMT Content-type: text/html Content-length: 207 Connection: close
Not Found
The requested object does not exist on this server. The link you followed is either outdated, inaccurate, or the server has been instructed not to let you have it. Connection closed by foreign host. (The disconnect happens immediately) Use of a packet trace. If Persistent Connections are in use, a single TCP/IP connection will service more than one HTTP request. Reading packet traces is a complex topic beyond the scope of this article. To learn more about packet level tracing, read Troubleshooting Tcp/Ip (Network Troubleshooting Library), by Mark A. Miller. Servers running in secure mode (HTTPS): Servers using HTTPS present a problem because the methods provided above do not work with encryption. You can consider two solutions if you are using the telnet method: Temporarily restart the server without encryption. Telnet through a utility that performs SSL encryption. A version of Telnet that support SSL encryption is necesary. Note for Enterprise Server administrators: Persistent connections exist as a connection state and as an open connection. Using "KeepAliveTimeout 0" in the magnus.conf is insufficient to fully disable this feature. KeepAliveTimeout causes the Enterprise Server to disconnect a connection immediately after sending an HTTP response. The server still sends a "Connection: keepalive" header in the response. In other words, it sends the same response as normal, then waits zero seconds and disconnects. Since the client still receives "Connection: keepalive", the normal chain of events for handling a persistent connection will occur on the client side, even if the connection itself closes very soon. Some clients may fail to properly support persistent connections. In some cases, the reception of "Connection: keepalive" may create problems even if the network connection closes. "KeepAliveTimeout 0" changes the network (TCP/IP connection) behavior to be identical to non-persistent connection; however, it is not the same from a system-wide (client-network-server) perspective. To fully deactivate persistent connections in the Enterprise Server, see article http://help.netscape.com/kb/server/980226- 1.html.
xxx home - top of the page -
9. I want to form a cluster with 3 machines running NAS 4.0. The IP addresses are ip1, ip2 and ip3. The WebConnector is installed in a different network. We have a NAT between these two networks. The webconnector sees the 3 nas machines as ip4, ip5 and ip6. The ip address translation is done by the NAT. For forming a cluster, what are the ip addresses should i enter ? all the six ip addresses or [ip1, ip2 and ip3].
You have two topics, the addresses used for the cluster and the
addresses used for the distributed application components.
Remember that the cluster mearly addresses the distributed
synchronization of session and state data. The cluster
communication is between the NAS servers only. The web
connection plays no part in the cluster and isn't even aware
that one exists. Therefore, the cluster addresses used should be
the addresses that the NAS servers can use to talk to each
other ... ip1,ip2, ip3.
When you deploy application components to NAS servers, and you
want the NAS servers to share the load (load balance), then you
have to configure a few more things. For one, you need to tell
each NAS server about the other NAS servers that host the same
application components. That way, one NAS server can bounce a
request to another if the first is too busy. This information is
also used by the web connector to know where it is possible to
send a particular request. This request bouncing is done only if
you're using "user defined" load balancing and is accomplished
by one NAS sending the request back to the web connector and
telling it which other NAS server to send the request to. In
that case, the IP address has to be one that the web connector
can use to reach a NAS server ... ip4, ip5, ip6. The trick here
is that the NAS servers are sharing load statistics with each
other to determine who is the "best" for any specific
servlet/applogic during a period of time. The statistics are
shared via an IP multicaste but the servers know of eachother by
their IP address - the same IP address that's used to tell the
web connector where to bounce a request to.

Each of your NAS server has two IP addresses, one that they can
use themselves (ip1, ip2, ip3) and another that they can't use
(ip4, ip5, ip6) but they still need to know about so they can
communicate it to the web connector. This makes the
configuration for the load balancing a bit tough if you want to
use the "user defined" load balancing strategy. If you want to
use any of the web-connector based response-time load balancing
strategies then that makes things a lot easier. If you use web-
connector driver load balancing then the NAS servers don't need
to do anything except run the requests that they're told to run
by the web connectors. They no longer bounce requests if they're
too busy. They nolonger swap load statistics with each other.
Note that they can still be in a cluster sharing session and
state data - that's a different topic and it's configured with a
different set of IPs.

So, what I would recommend is...
1) configure the clusters using the IP addresses that the NAS
servers can use to talk to eachother (ip1, ip2, ip3).
2) mark you application components as being distributed between
application servers identified using the web connector usable IP
addresses (ip4, ip5, ip6).
3) use web-connector driver (response-time) load balancing
strategies - not the server-driver (user-defined) load balancing.
xxx home - top of the page -
8. Netscape Application Server: kcs.log file shows entries "10096• Communication failure" or "unknown host IP address : myhost". When I examine the kcs.log file for my Netscape Application Server, I see entries that resemble the following: [09/05/99 16:10:02:3] info: REQ-012: thread add 10096- - Communication failure - unknown host IP address : myhost What does this mean? How do I prevent this error from happening?
Probable Cause:

This error means that the server running the Application Server
engine was unable to translate the text string "myhost" into a
full IP Address.

If you visually inspect the <NASROOT>/registry/reg.dat file, you
may see references to the "myhost" string within the entries,
along with a port number. (for instance,
LSOFTWARE\Kiva\Enterprise\2.0\Admin\Host most likely contains
the name of your host).

Solution:

Per the man page (man resolver), the operating system goes
through a series of steps to translate a "chatty" name in to an
actual IP address. The most common steps to resolve the problem
listed in this technote is to verify whether a simple
interactive shell session (on the same machine that the
Application Server server is running on) is able to resolve the
address (using the ping myhost command, possibly).

(If this succeeds, then additional steps will need to be
followed to determine why the address is resolvable within a
shell and not by the other processes on the operating system -
this is beyond the scope of this TechNote.)

Assuming that your interactive shell session is not able to
resolve the name into an IP address, you should check the
following:

Check that your /etc/resolv.conf file contains lines similar to
the following:

search mcom.com netscape.com
nameserver 192.168.1.1

(With appropriate domain name suffixes used instead
of "mcom.com" and "netscape.com" and a valid DNS server instead
of 192.168.1.1).

When you try to "ping myhost", the resolver will add each of the
suffixes that are listed in your resolv.conf file (i.e.: in the
sample config file given above, myhost.mcom.com and
myhost.netscape.com would both be sent as queries to the DNS
server at 192.168.1.1).

You should also try to manually ping your DNS servers to
validate that they are, indeed, accessible via the network from
your Application Server.

If this does successful resolve your problem, you may consider
adding the actual IP address of your host to the /etc/hosts file
to allow the resolver to convert the name to an IP address even
in the absence of your DNS server.

The appropriate entries would look something like this:
192.168.1.55 myhost myhost.mycompany.com

This would tell the operating system that if a request is made
to convert the string "myhost" or "myhost.mycompany.com" to an
IP address, then the operating system should use the address
192.168.1.55 as the address.

Note: For Solaris operating systems only, you should also
manually inspect your /etc/nsswitch.conf file to ensure that dns
and files are being referenced to translate strings to IP
addresses. A sample /etc/nsswitch.conf file is given:
 

#
# /etc/nsswitch.nis:
#
# An example file that could be copied over
to /etc/nsswitch.conf; it
# uses NIS (YP) in conjunction with files.
#
# "hosts:" and "services:" in this file are used only if the
# /etc/netconfig file has a "-" for nametoaddr_libs of "inet"
transports.
# the following two lines obviate the "+" entry in /etc/passwd
and /etc/group.

passwd:     files nis
group:      files nis

# consult /etc "files" only if nis is down.
hosts:      files dns
networks:   nis [NOTFOUND=return] files
protocols:  nis [NOTFOUND=return] files
rpc:        nis [NOTFOUND=return] files
ethers:     nis [NOTFOUND=return] files
netmasks:   nis [NOTFOUND=return] files
bootparams: nis [NOTFOUND=return] files
publickey:  nis [NOTFOUND=return] files

netgroup:   nis

automount:  files nis
aliases:    files nis

# for efficient getservbyname() avoid nis
services:   files nis
sendmailvars:   files

For more information, consult the man pages for your operating system.
xxx home - top of the page -
7. When accessing servlets, the Netscape Application Server(NAS) 4.x binds to NASApp. Is it possible to change the NASApp to another name? For example, is it possible to change the path as follows From http://web server instance/NASApp/nsFortune/fortune to http://web server instance/abcd/nsFortune/fortune
Yes. To change the path for accessing servlets, use the
following steps to change the two keys in the registry:

Open kregedit.
Go to 4.0 (or 6.0 for NAS 6.x), CCS0, HTTPAPI.

Change the key value from SSPL_APP_PREFIX=NASApp to
SSPL_APP_PREFIX=new_name (abcd in the example above).
Go down to the folder SSPL_APP_PREFIX .
Change the default from NASApp to the new name ( abcd in the
example above).

Restart NAS, Netscape Enterprise Server, and Netscape Directory
Server. If you have a webless install, make sure you change the
registry settings in the web connector registry.
xxx home - top of the page -
6. What is the key to disable UDP ping in the registry?
In NAS 2.1, the flag is set under the CONN. In NAS 4.0, you will
have to manually add this key entry as follows:

Application Server/4.0/CCS0/CONN/DisableEcho 1

Setting that to "0" will enable the DisableEcho feature.
xxx home - top of the page -
5. I have installed at least two Netscape Application Server instances as single instances and configured them with a common Directory Server under the same global configuration name. How can I configure them in a cluster so that they can participate in distributed data synchronization?
Let's say there is one NAS instance running on host pss07 with
IP 206.222.250.10 and KXS port 13545, and a second NAS instance
is running on host naspat3with IP 206.222.250.25 and KXS port
10818. Assume both these instances are configured with the same
LDAP server under the same global configuration name. To create
a cluster of naspst3 and pss07, we need to make the following
changes in the Registry using kregedit:

Note: Do not modify the reg.dat with an editor as following
changes should be reflected in LDAP.)
Create a new key HKEY_LOCAL_MACHINE/SOFTWARE/Netscape/Application
Server/Clusters/CLUSTER NAME where CLUSTER NAME = name of the
the Cluster (for example, "pgoel-cluster".)

Create new key values MaxBackups MaxHops SyncPersChunkSz
SyncTimerInterval and a new subkey SyncServers under the key
pgoel-cluster.
Create new key value under the key
HKEY_LOCAL_MACHINE/SOFTWARE/Netscape/Application
Server/Clusters/CLUSTER NAME/SyncServers of format HOST IP:KXS
PORT No.=0 one for each NAS machine to participate in this
cluster.
Modify the following in the pss07 Registry:
Delete HKEY_LOCAL_MACHINE/SOFTWARE/Netscape/Application
Server/4.0/CCSO/ClusterName/pss07-NoDsync=0
Add HKEY_LOCAL_MACHINE/SOFTWARE/Netscape/Application
Server/4.0/CCSO/ClusterName/pgoel-cluster=0

Modify the following in the naspst3 Registry:
Delete HKEY_LOCAL_MACHINE/SOFTWARE/Netscape/Application
Server/4.0/CCSO/ClusterName/naspst3-NoDsync=0

Add HKEY_LOCAL_MACHINE/SOFTWARE/Netscape/Application
Server/4.0/CCSO/ClusterName/pgoel-cluster=0

To verify the modified configuration, follow these steps:

Deploy the FORTUNE sample application on both NAS machines.
Using the NAS administration tool, distribute the FORTUNE
application on both machines.

Start NAS on one of the machines. The KXS and KJS logs should
show the following messages:

KXS log :
[04/Oct/1999 16:25:47:6] info: PROT-004: protocol data added
[04/Oct/1999 16:25:47:6] info: SERVER-029: initialized DLM -
LoadB
[04/Oct/1999 16:25:47:6] info: SERVER-029: initialized DLM -
Request
[04/Oct/1999 16:25:47:6] info: PROT-004: protocol data added
[04/Oct/1999 16:25:47:6] info: SERVER-029: initialized DLM -
RequestP
[04/Oct/1999 16:25:47:6] info: SERVER-029: initialized DLM -
RequestStep
[04/Oct/1999 16:25:47:6] info: PROT-004: protocol data added
[04/Oct/1999 16:25:47:6] info: PROT-004: protocol data added

[04/Oct/1999 16:25:47:6] info: SERVER-029: initialized DLM -
ExecCli_NSAPI
[04/Oct/1999 16:25:47:6] info: PROT-004: protocol data added
[04/Oct/1999 16:25:47:6] info: SERVER-029: initialized DLM -
AdminCli
[04/Oct/1999 16:25:47:6] info: DSYNC-039: We(0xcedefa0a:13545),
are coming up as a Primary and max # of hot backup(s)=0
[04/Oct/1999 16:25:55:6] warning: CONN-OS: No such file or
directory
[04/Oct/1999 16:25:55:6] warning: CONN-025: connect failed
during MakeConn GXMGR  (host 0xcedefa19, port 10818, IsAlive)
[04/Oct/1999 16:25:55:6] warning: DSYNC-002: MakeConn failed in
Synch(SendMsg), prot=0x4
[04/Oct/1999 16:25:55:6] warning: DSYNC-011: RegisterServer:
SendRecv() failed; errno=0x2
[04/Oct/1999 16:25:55:6] info: PROT-004: protocol data added

KJS log :
Connected to LDAP server on pss07.mcom.com port 498
[06/Oct/1999 16:06:18:3] info: ENGINE-class_loader_created: New
class loader
com.kivasoft.engine.EngineClassLoaderNonVersionable@1dacd6d5 has
just been created
[06/Oct/1999 16:06:18:3] info: ENGINE-class_loader_created: New
class loader
com.kivasoft.engine.EngineClassLoader@1dace553 has just been
created
Initializing LDAP cache from server pss07.mcom.com port 498
LDAP cache initialization completed successfully.
[06/Oct/1999 16:06:18:8] info: PROT-004: protocol data added
[06/Oct/1999 16:06:18:8] info: PROT-004: protocol data added
[06/Oct/1999 16:06:18:8] info: PROT-004: protocol data added
[06/Oct/1999 16:06:18:8] info: PROT-004: protocol data added
[06/Oct/1999 16:06:18:8] info: PROT-006: new connection established
[06/Oct/1999 16:06:18:8] info: PROT-006: new connection
established
[06/Oct/1999 16:06:18:8] info: PROT-004: protocol data added
[06/Oct/1999 16:06:18:8] info: PROT-007: new acceptor spawned
[06/Oct/1999 16:06:18:8] info: PROT-006: new connection
established
[06/Oct/1999 16:06:19:0] info: PROT-004: protocol data added
[06/Oct/1999 16:06:19:0] info: EXTMGR-003: GXExtensionManager:
Extension service  ExtensionData is disabled
[06/Oct/1999 16:06:19:0] info: EXTMGR-003: GXExtensionManager:
Extension service  JavaExtData is disabled
[06/Oct/1999 16:06:19:0] info: EXTMGR-003: GXExtensionManager:
Extension service  LockManager is disabled
[06/Oct/1999 16:06:19:0] info: EXTMGR-003: GXExtensionManager:
Extension service  RLOPManager is disabled
[06/Oct/1999 16:06:19:3] info: PROT-004: protocol data added
[06/Oct/1999 16:06:19:7] info: REQ-012: thread add
[06/Oct/1999 16:06:19:7] info: REQ-012: thread add
[06/Oct/1999 16:06:19:7] info: REQ-012: thread add
[06/Oct/1999 16:06:19:7] info: REQ-012: thread add
[06/Oct/1999 16:06:19:7] info: REQ-012: thread add
[06/Oct/1999 16:06:19:7] info: REQ-012: thread add
[06/Oct/1999 16:06:19:7] info: REQ-012: thread add
[06/Oct/1999 16:06:19:7] info: REQ-012: thread add
[06/Oct/1999 16:06:20:0] info: REQ-012: thread add
[06/Oct/1999 16:06:20:0] info: REQ-012: thread add
[06/Oct/1999 16:06:20:0] info: PROT-006: new connection
established
[06/Oct/1999 16:06:20:0] info: REQ-012: thread add
[06/Oct/1999 16:06:20:0] info: REQ-012: thread add
[06/Oct/1999 16:06:20:0] info: REQ-012: thread add
[06/Oct/1999 16:06:20:0] info: REQ-012: thread add
[06/Oct/1999 16:06:20:0] info: REQ-012: thread add
[06/Oct/1999 16:06:20:0] info: ENGINE-ready: ready: 10820
Start NAS on the second machine. The KXS and KJS logs should
show the following messages:

KXS log :

[04/Oct/1999 16:34:15:5] info: PROT-004: protocol data added
[04/Oct/1999 16:34:15:5] info: SERVER-029: initialized DLM -
LoadB
[04/Oct/1999 16:34:15:5] info: SERVER-029: initialized DLM -
Request
[04/Oct/1999 16:34:15:5] info: PROT-004: protocol data added
[04/Oct/1999 16:34:15:5] info: SERVER-029: initialized DLM -
RequestP
[04/Oct/1999 16:34:15:5] info: SERVER-029: initialized DLM -
RequestStep
[04/Oct/1999 16:34:15:5] info: PROT-004: protocol data added
[04/Oct/1999 16:34:15:5] info: PROT-004: protocol data added
[04/Oct/1999 16:34:15:5] info: SERVER-029: initialized DLM -
ExecCli_NSAPI
[04/Oct/1999 16:34:15:5] info: PROT-004: protocol data added
[04/Oct/1999 16:34:15:5] info: SERVER-029: initialized DLM -
AdminCli
[04/Oct/1999 16:34:15:5] info: DSYNC-039: We(0xcedefa0a:13545),
are coming up as  a Primary and max # of hot backup(s)=0
[04/Oct/1999 16:34:15:6] info: PROT-006: new connection
established
[04/Oct/1999 16:34:15:6] warning: DSYNC-009: RegisterServer:
yielding to another  Primary
[04/Oct/1999 16:34:15:6] info: PROT-004: protocol data added
[04/Oct/1999 16:34:15:6] info: SERVER-029: initialized DLM -
Synchronizer
[04/Oct/1999 16:34:15:6] info: SERVER-029: initialized DLM -
State
[04/Oct/1999 16:34:15:6] info: SERVER-029: initialized DLM -
Session

KJS log :

Connected to LDAP server on pss07.mcom.com port 498
[04/Oct/1999 18:10:04:6] info: ENGINE-class_loader_created: New
class loader
com.kivasoft.engine.EngineClassLoaderNonVersionable@1dacd6d5 has
just been created
[04/Oct/1999 18:10:04:6] info: ENGINE-class_loader_created: New
class loader com.kivasoft.engine.EngineClassLoader@1dace553 has
just been created
Initializing LDAP cache from server pss07.mcom.com port 498
LDAP cache initialization completed successfully.
[04/Oct/1999 18:10:05:2] info: PROT-004: protocol data added
[04/Oct/1999 18:10:05:2] info: PROT-004: protocol data added
[04/Oct/1999 18:10:05:2] info: PROT-004: protocol data added
[04/Oct/1999 18:10:05:2] info: PROT-004: protocol data added
[04/Oct/1999 18:10:13:2] warning: CONN-OS: No such file or
directory
[04/Oct/1999 18:10:13:2] warning: CONN-025: connect failed
during MakeConn GXMGR  (host 0xcedefa0a, port 13545, IsAlive)
[04/Oct/1999 18:10:13:2] warning: DSYNC-002: MakeConn failed in
Synch(SendMsg), prot=0x4
[04/Oct/1999 18:10:13:2] warning: DSYNC-011: RegisterServer:
SendRecv() failed; errno=0x2
[04/Oct/1999 18:10:13:2] info: PROT-006: new connection
established
[04/Oct/1999 18:10:13:2] info: PROT-004: protocol data added
[04/Oct/1999 18:10:13:2] info: PROT-006: new connection
established
[04/Oct/1999 18:10:13:2] info: PROT-007: new acceptor spawned
[04/Oct/1999 18:10:13:4] info: PROT-004: protocol data added
[04/Oct/1999 18:10:13:4] info: PROT-004: protocol data added
[04/Oct/1999 18:10:13:4] info: EXTMGR-003: GXExtensionManager:
Extension service  ExtensionData is disabled
[04/Oct/1999 18:10:13:4] info: EXTMGR-003: GXExtensionManager:
Extension service  JavaExtData is disabled
[04/Oct/1999 18:10:13:4] info: EXTMGR-003: GXExtensionManager:
Extension service  LockManager is disabled
[04/Oct/1999 18:10:13:4] info: EXTMGR-003: GXExtensionManager:
Extension service  RLOPManager is disabled
[04/Oct/1999 18:10:14:0] info: PROT-004: protocol data added
[04/Oct/1999 18:10:14:3] info: REQ-012: thread add
[04/Oct/1999 18:10:14:3] info: REQ-012: thread add
[04/Oct/1999 18:10:14:3] info: REQ-012: thread add
[04/Oct/1999 18:10:14:3] info: REQ-012: thread add
[04/Oct/1999 18:10:14:3] info: REQ-012: thread add
[04/Oct/1999 18:10:14:3] info: REQ-012: thread add
[04/Oct/1999 18:10:14:3] info: REQ-012: thread add
[04/Oct/1999 18:10:14:3] info: REQ-012: thread add
[04/Oct/1999 18:10:15:1] info: REQ-012: thread add
[04/Oct/1999 18:10:15:1] info: REQ-012: thread add
[04/Oct/1999 18:10:15:1] info: REQ-012: thread add
[04/Oct/1999 18:10:15:1] info: REQ-012: thread add
[04/Oct/1999 18:10:15:1] info: REQ-012: thread add
[04/Oct/1999 18:10:15:1] info: REQ-012: thread add
[04/Oct/1999 18:10:15:1] info: REQ-012: thread add
[04/Oct/1999 18:10:15:1] info: REQ-012: thread add
[04/Oct/1999 18:10:15:1] info: ENGINE-ready: ready: 10820
[04/Oct/1999 18:10:18:0] info: PROT-006: new connection
established
[04/Oct/1999 18:11:46:4] info: PROT-006: new connection
established

Stop NAS on the first machine. The KJS logs of the second should
show the following messages:

KJS log :

[04/Oct/1999 17:34:52:8] info: ENGINE-ready: ready: 10820
[04/Oct/1999 17:46:59:9] error: CONN-003: socket receive error
(RecvBuffer 1) (conn 0x460048, sock 38, host 0xcedefa0a, port
13545)
[04/Oct/1999 17:46:59:9] info: PROT-001: RecvBuffer failed
during GXRunConn::Main()
[04/Oct/1999 17:46:59:9] warning: DSYNC-045: We 0xcedefa19:10820), lost connection to the Primary,
0xcedefa0a:13545
[04/Oct/1999 17:46:59:9] warning: DSYNC-029: Primary died and no
backups are configured; we(0xcedefa19:10820), are going LOCAL
(NOTE : it says no backup configured because MaxBackups set to 0
here , set it appropriately )
[04/Oct/1999 17:51:32:8] info: ENGINE-ready: ready: 10820

Note: Your logs need not be exactly the same since the
configuration may differ.These logs are for reference only.
xxx home - top of the page -
4. How can I determine the number of concurrent active NAS sessions?
In NAS 2.1 SP7 and in NAS 4.0 SP1, a registry flag is added that
when enabled will print the number of active sessions to the kxs
log. Under Clusters, set the registry parameter SyncNodeDebug to
1. (Set to 0 to disable.)
xxx home - top of the page -
3. NAS Load Balancing And Clustering Question/Answers !
Greetings,

This topic of clusters versus load balancing versus
configuration of the web connector has been discussed several
times on this mailing list.  Please refer to the archives.

For the most part, this information comes right out of the
manual.

First, there are three separate topics to keep clear: Clustering
has to do with the synchronization of session and state data,
application distribution/deployment has to do with which
components are available on which machines and do the other
achines know about it, load balancing has to do with routing
requests based on various forms of performance feedback.  These
are three different topics.  You want to plan them together but
they are different topics.  Before you can load balance a
request you have to know which machines have the requested
component so you know which machines you can choose to balance
with (application deployment/distribution), then you have to
have some idea of which of those machines is "preferred" (load
balancing statistics), and finally, once the request gets routed
and starts to run on that machine you want to be sure that that
servlet can get access to the user's session data (cluster -
distributed session synchronization).
 

The cluster has to do with the distribution of session and state
data.  You don't get to say which server will be the primary.
You do get to give each server a priority that is used for
breaking ties when the servers need to find someone (a server)
to take on a job (be the one primary, be one of the backups,
etc.).   The servers will figure out for themselves which one
will take on the role of primary and backup.  The most you get
to influence that decision is in specifying the piorities and in
starting the servers in a particular order.  When you bring up
the first server it is the only one in the cluster at that point
and, therefore, has the best priority and so takes on the job of
Primary.  The next server to start up joins a cluster that
already has a Primary and so, of the servers available (it being
the only one without a job) it has the highest priority and so
takes on the job of Backup.  This continues as new servers come
up until the configured number of backup servers is reached and
then all other servers come up as Alternates.  This is a
simplified explination - there are other variations.

The role of Primary, Backup, and Alternate has to do with
session data synchronization and has nothing to do with load
balancing.  The roles of the NAS servers related to
synchronization of session data is in addition to the role of
handling requests.  When a request runs on some NAS server and
the running component (eg Servlet) needs to access the user's
session data then that NAS server will retrieve that user's
session from the Primary NAS.  That servlet will then use the
session data as necessary and that NAS will then send the
updated session object back to the Primary NAS who will then
send a copy of it to each of the Backup NAS servers.  All of the
NAS servers (regardless of cluster role) will be handling
requests.  If the Primary dies then the Backup with the best
(lowest) priority will take over as Primary.  That leave an
opening for a new Backup.  The Alternate with the best priority
will get promoted to Backup by the new Primary and will start
getting updates of session and state data.  This shifting of
cluster roles has nothing to do with how requests are getting
routed.

When you deploy your application components you need to make
sure that each NAS server knows which other NAS servers have the
same components.  This can be accomplished when you use NAB or
the deployment manager to deploy the application to more than
one NAS at the same time.  Otherwise, you can use the NAS
Administrator to manually set this information for each
servlet.  You can choose to say that your application components
are distributed with a specific set of servers (listing each IP

address) or with any NAS server that can be discovered via load
balancing statistics broadcasts (see below).

There are three different kinds of load balancing strategies
that you can pick from for NAS 4.0: user defined (this was the
only choice in NAS 2.1 - iAS6 adds a round-robbin strategy), or
per component response time (as measured by each web connector)
or server (regardless of component) response time (as measured
by each web connector).  Each has strengths and weaknesses,
costs and benefits.  The "User Defined" strategy provides the
greates control but also places the highest demands on the NAS
servers since they must then monitor each other's performance
and calculate which is "preferred" based on weighted priorities
that you define.  The servers exchange load statistics with each
other via an IP multicast to a specific multicast port.  It's
possible to load balance with machines that aren't in your
cluster or to leave out machines that are in your cluster
because they may be listening on a different multicast
address/port.  By default, all NAS servers are installed with
the same multicast address and port.  You have to use kregedit
to change this setting - it's not available in the NAS
Administrator.

With "User Define" load balancing, the web connector makes the
first attempt to figure out which NAS server to send a request
to (see below).  If the receiving NAS doesn't think that it's
the best suited then it can "hop" (bounce) the request to
another NAS server by bouncing it back to the web connector and
tell web connector to send the request to a different specific
NAS server.  The second NAS server can then check with it's load
table to see if it is really the best choice or if it too want's
to hop the request to another NAS server.  Once a NAS server
accepts the request then it sends it's response directly back
through the web connector and web server.  The other NAS servers
that had rejected the request has nothing more to do with it.
The web connector keeps track of these hops so that it can make
a better choice in the future.  This helps the web connector
build up a load table (see more below) to use for subsequent
routing decisions.  There's a limit (you specify for each NAS)
on how many times a request can be hopped to keep if from
getting endlessly bounced around.  Once a request has been
hopped more than the limit then the NAS to receive the request
must accept it - can't reject it.

With either of the "Response Time" load balancing strategies
it's entirely up to the web connector to decide where to send
the request.  The NAS servers no longer get a say.  They can't
reject a request.  The web connectors keep track of the response
time for requests and route the requests to the NAS servers that
have been servicing the requests the fastest.  This helps the
web connector build up a load table (see more below) to use for
subsequent routing decisions.

When you configure the web connector you give it the IP and Port
number of the KXS of one of the NAS servers.  The IP address you
pick has nothing to do with the cluster.  This is the "default"
NAS server for that web connector.  If the web connector can't
figure out where to send a request then it will send it to
the "default" NAS server.  Each web connector can be given a
different "default" NAS server.

When the Web Connector gets a request and tries to figure out
where to send it it goes through a few steps.

Check it's local load table to see which NAS server to send this
request to.  This load table is dynamically generated as the web
connector runs and routes requests.  It's not stored anywhere.
When the web server restarts the web connector starts with a
fresh (empty) load table.  Since it's generated based on the
results of doing request routing by that web connector, each web
connecor's load table will likely be different than any other
web connector's load table.  The load table's purpose is to keep
track of which NAS server is the "best" choice for a request.
The different load balancing strategies produce different types
of load tables.

If the web connector can't figure out where to send the request
after checking it's load table then it tries to just figure out
which servers the requested component (AppLogic, Servlet) lives
on.  It checks the CLASSDEF section of the NAS "registry".  The
CLASSDEF section is actually stored in the directory server.
The CLASSDEF section contains information about each registered
component and which NAS servers it's deployed to.  This doesn't
explain which server is the "best" choice at this time - only
which servers are candidates to receive a request for this
component.

If the web connector still can't figure out where to send the
request after checking the CLASSDEF section then it sends it to
the "default" NAS configured for that web connector.

There's no single NAS that does the load balancing for the
group.  You don't want that - single point of failure.  They
each are doing their own load balancing.  The exact activities
of the web connectors and NAS servers depends on which load
balancing strategy you've picked.  For "User Defined" it is the
NAS servers (each of them) which calculate which server is the
best, 2nd best, etc. for each component.
 

Regarding Q#4 - Generally, you can not use host names when
configuring NAS.  It want's IP addresses.  Perhaps this was an
attempt by the engineers to avoid the cost of doing DNS
lookups.  Even if you give host names for some of the
installation questions it does a one-time DNS lookup and stores
the IP address in the configuration.
 

Regarding Q#5 - the web connector has no idea of which NAS is
Primary and doesn't need to.  Remember that "Primary" only
refers to which NAS is the Primary state synch server and has
nothing to do with which NAS would be best suited to execute a
particular servlet.
 

Related to articles about load balancing - You should read the
Administration Guide for a more complete discussion of the load
balancing.  Or - take the class - it includes this full
discussion along with some labs (I'm one of the instructors).
Or check the knowledge base on
http://www.iplanet.com/support/online/browse/
xxx home - top of the page -
2. What causes FIN_WAIT_2 and TIME_WAIT? Are they cause for alarm?
TCP sockets have certain states that they go through. TIME_WAIT
 and FIN_WAIT_2 are perfectly normal states for sockets to go
through, and can be safely ignored. Netscape's website routinely
sees thousands of sockets left in the TIME_WAIT state and
hundreds of them in the FIN_WAIT_2 state.

TIME_WAIT is caused when the server closes the connection with
the client. As the initiator of the close of a connection, the
 TCP specification states that the server must keep the socket
 around in the TIME_WAIT state for at least 120 seconds. Because
of the high volume of connections inherent in a webserver, and
because the webserver initiates the close, it is normal for this
to lead to lots of TIME_WAITs. These connections are fully
 closed but will remain on the system for a minimum of 120
seconds.

The FIN_WAIT_2 connections are also normal for webservers. Many
web clients (including both Microsoft's Internet Explorer and
our own Netscape Navigator, (but not Netscape Communicator))
send RST packets when closing connections. Using a RST packet is
one of two ways of closing a connection. The other is
the "graceful shutdown" which includes a FIN/ACK/FIN/ACK
sequence. When closing via a graceful shutdown, both sides are
able to positively confirm shutdown (the remote system will
confirm the FIN packet with an ACK packet, ensuring that both
parties have been notified that the socket is closing). A RST
packet can become lost in transit and it will not be resent. If
this happens, the remote system is not notified that the socket
is being closed and will keep it open. In other words, RST is an
unreliable shutdown. If the RST packet gets lost in transit the
server is left waiting in the FIN_WAIT_2 state waiting to
receive the last FIN. The OS should timeout FIN_WAIT_2
connections after several minutes (possibly as many as 20-30
minutes).
 

Both of these conditions are normal for a active webserver, and
can be safely ignored. Both states will eventually be timed out
by the OS.
xxx home - top of the page -
1. How to improve the performance of the servers with ndd ?

A high traffic web site can often be drowned with accumulated,
dead,idle connections which have not properly closed.
The default timeout value is 2 hours (7200000ms). If you have a
high traffic website you should lower this value.
  /usr/sbin/ndd -set /dev/tcp tcp_keepalive_interval 900000
Setting to 900000 will reduce the number of FIN_WAIT_2
connections reported by netstat.

Retransmission's: Monitor the number of retransmits using the
"netstat -s" command. Examine the following values:
tcpOutDataSegs,tcpRetransSegs, tcpOutDataBytes, tcpRetransBytes;
if the retransmission values are greater than 30-40% of the
total, you should delay the retransmission to accommodate slower
networks with the following command:

 /usr/sbin/ndd -set /dev/tcp tcp_rexmit_interval_min 1000

Setting tcp_rexmit_interval_min as high as 10000 may be
necessary.

These settings can be added at system boot, e.g., by adding them
to /etc/init.d/inetinit (values are in milliseconds):
  /usr/sbin/ndd -set /dev/tcp tcp_keepalive_interval 900000
  /usr/sbin/ndd -set /dev/tcp tcp_rexmit_interval_min 3000
  /usr/sbin/ndd -set /dev/tcp tcp_rexmit_interval_initial 3000
  /usr/sbin/ndd -set /dev/tcp tcp_rexmit_interval_max 10000
  /usr/sbin/ndd -set /dev/tcp tcp_close_wait_interval 60000
  /usr/sbin/ndd -set /dev/tcp tcp_ip_abort_interval 60000

By default, the maximum connections for listenQ is 32 on Solaris
2.5.1. This may not be large enough on high traffic sites and
can be changed by setting:
  /usr/sbin/ndd /dev/tcp -set tcp_conn_req_max 1024

Note:
  This procedure is NOT for 2.5.1 w/patch 103582-11 or higher.
  Starting in Solaris 2.5.1 w/patch 103582-11 or higher and
Solaris 2.6,
  tcp_conn_req_max no longer exists.
  The ndd tcp_conn_req_max parameter has been replaced with two
enhanced parameters:
    tcp_conn_req_max_q and  tcp_conn_req_max_q0 parameters.

  The default values of these new parameters are equivalent (in
performance terms) to the value of 1024 in tcp_conn_req_max
(before tcp patch 103582-11) .These two variable were created by
Solaris 2.5.1 patch: 103582 (Version 11  on up) and built-in to
Solaris 2.6.  This Solaris patch addressed the TCP SYN attack
threat which was a CERT advisory  and bug 1182957.

To increase maximum connections w/patch 103582-11 or higher, use
this:
  /usr/sbin/ndd /dev/tcp -set tcp_conn_req_max_q 1024

Each server will behave differently and there is not an exact
set of settings that will work for all.  As always, make sure
you have the latest recommend patch cluster. The above should
only be used as guidelines and tried on your server as needed to
improve performance.
Go to Top