How come I get 'no server suitable for synchronization found' from my NTP client when the server is returning a valid NTP response to the client? · 4 May 2008, 12:44

This is one I hadn’t seen before last week. On our Solaris 9 clients, running

ntpdate -d -u ip.of.server

Kept on returning

no server suitable for synchronization found

even though the debug mode showed UDP responses coming back from the server. The server in question runs using UDP / unicast mode.

We used snoop to look at the NTP response

snoop -v -v port 123

(use -v -v to get the protocol decode output), and saw these suspicious field/value pairs:

NTP: Leap: 0x03 - clock unsynchronized
NTP: Reference clock: INIT
NTP: Reference time: 0x00000000.00000000

There were other headers, but they did not indicate problems. 0×03 in the Leap field, the INIT state, and a reference time of 0×00 indicated that the NTP server was not properly initialized / configured properly. Further investigation revealed that indeed, this was the case, the Sidewinder / G2 NTP server was not properly configured.

— Max Schubert



Solaris SVM: stuck in pre-maintenance mode: resolution · 21 April 2008, 18:24

Well, the Sun techs just recommended we restore from backups or upgrade our distribution .. great, thanks for the in-depth technical insight … so one system we did restore from backups.

A coworker of mine who is quite brilliant with bare metal troubleshooting was able to get the first system back online by doing the following:

rm -rf /mnt/dev/*

cp -Rp /dev/* /mnt/dev/
cp /etc/path_to_inst /mnt/etc/path_to_inst

He then unmounted the drive, moved it back to the original system, and voila, we could get into single user (maintenance) mode with

boot -m milestone=none

Turned out that the HBA card coincidentally (no joke) went bad during or after being removed. Replacing the HBA card fixed that issue .. and let us boot single user, great!

So, I then re-initialized the meta device database, restored all mirrors and submirrors and rebooted .. and … whoops, kernel starts complaining about /etc/system being full of junk and the system doesn’t boot.

A boot from CD-ROM showed that now both root partitions on both disks were full of what appeared to be random garbage (2 MB worth!)

The Sun tech wrote back about a day or two after this failed and proceeded to in essence ‘scold’ us for trying to copy devices from one system to another .. well, at least that got the system boot single user! She then asked again about restoring at which point I told her to just close the ticket as we were making more progress on our own than we did with her (she is resonsible as well for the ‘9MB zip file’ quote under the Humor section of my blog).

End result – we had to restore both systems and have no idea why breaking the mirror on these systems wasn’t something we could recover from the way we are supposed to be able to do with SVM.

Disappointing, especially since our tier 4 (Sun) was not able to help us get through this without restoring, in fact, they started suggesting restoring after 1 call to their tier 2 people. So much for paying for support contracts and expecting expertise :(.

— Max Schubert



Solaris SVM: stuck in pre-maintenance mode · 9 April 2008, 17:52

I rarely have pleasant encounters with LVM/disksuite/SVM. Maybe that is because I just suck at Solaris :p. The problem this time:

All self-tests pass, then the system starts to boot from the disk .. it gets to the point where the Sun copyright message and kernel revision is displayed, then the following is displayed over and over in a loop:

Requesting Maintenance Mode (see /lib/svc/share/README for more information.)

After breaking the boot process, and rebooting using boot -m debug, the following additional information is displayed:

INIT: Executing svc.startd

INIT: Restarting svc.startd
Requesting Maintenance Mode
(see /lib/svc/share/README for more information.)

Root console services are never started, svc.startd doesn’t dump core, or output messages in system logs, or output anything on STDOUT or STDERR.

Sun technicians are working with me to try to figure out what is causing svc.startd to die (boy does phone support NOT want to escalate a ticket to tier 3 .. they were suggesting I restore the system from backups rather than continue to help me figure out what is causing this (same thing is happening on two systems) .. frustrating!)

Will post more when I get a resolution to this (hopefully more than just “I restored from backups”).

— Max Schubert