Jun
29
2009

Alternating Tri-Boot for Linux Workstations

For small office/home office (SOHO) Linux workstations, I buy name-brand PCs (HP/Compaq most recently) and install Linux myself.  I’ve tried various installation strategies including Linux-only (reformatting the entire hard disk) and various forms of dual-boot or multi-boot.

Alternating tri-boot is what I call my preferred configuration for a work environment where down time or restore time due to a failed OS upgrade is costly schedule-wise.  Alternating tri-boot hosts three operating systems on a workstation: the as-delivered Windows XP partition, and two Linux distributions which both mount the same /home partition.

Description:

During normal use, two Linux distributions (e.g. openSUSE 11.0 and 10.2) and the original Microsoft OS are always bootable.  An extended partition containing four logical partitions is created. In the following diagram, the extended partition is aqua and the current Linux boot partition is the rightmost ext3 partition (labeled “root” and containing openSUSE 11.0):

Alternating Tri-boot - Two Linux OS Distros

Figure 1. Alternating Tri-boot Disk Partitions

When openSUSE 11.0 is booted, it mounts the common /home and /swap partitions and also mounts the 10.2 partition under /oldOS.  Moreover, the openSUSE 10.2 partition remains bootable since it was untouched during the installation of 11.0.  If booted as root, it would mount the same /home and swap partitions.  Thus all user data is available through normal login under either Linux OS.

Figure 2. Installation Progression for Successive Linux Versions

Figure 2. Installation Progression for Successive Linux Versions

Figure 2 shows how the contents of the partitions change as the newest Linux OS progresses from openSUSE 10.2 to 11.0 and finally 11.2.  The arrows show the new installations alternating between partitions sda6 and sda7, highlighting the point that current release is not disturbed when the next one is installed.  The availability of each release (10.2, 11.0 and 11.2) always spans two full release cycles.

Thus there are always three bootable OSes and new Linux installations alternate between the two Linux OS partitions — “alternating tri-boot”.

Benefits:

  • Never a “point of no return” during upgrades.  At any time during the installation of a new Linux distribution, you can stop and reboot into the most recent working configuration.
  • Mounting the previous distribution’s root partition under /oldOS solves  “lost configuration file” anxiety.  Ever been working on your new OS distribution for a few weeks, then fire up a seldom used application only to find out you forgot to copy/save its configuration file?  With the alternating OS partitions, that missing file is available somewhere under /oldOS.  It can be copied or diffed without having to restore from a tar file or other backup.
  • New hardware can be installed and debugged using the Windows utilities provided by the vendor.  Problems can be localized to either the hardware or the Linux driver.  No more calling the 800 number only to hear, “Load Windows and call me back.”
  • Present the machine as a Windows PC for on-site support and warranty issues.  For service calls, set the default OS back to Windows XP in the NT loader and let the technician have at it.
  • Alternating tri-boot retains the NT bootloader.  This is an integral part of avoiding point-of-no-return situations, since the original bootloader and the disk’s MBR are never overwritten or disabled during Linux installations.

Caveats:

  • In the midst of a new OS installation there is of course only one functional Linux partition.  During this window of vulnerability, you can’t fall back to booting the /oldOS partition if the current OS is damaged.  So don’t meddle with the configuration of the current OS while installing the new one.
  • Don’t let the /home partition be reformatted! Linux installer programs (e.g. YaST for openSUSE) usually scan the disks and partition tables and suggest an installation scenario.  Often they propose reformatting all of the Linux partitions.  This will wipe out the /home directories and all of your user data.  Only one partition (root) should be reformatted.

Coming soon – step-by-step instructions…

Jul
29
2008

"No input file specified" mod_rewrite Problem

I made several changes to some mod_rewrite rules which were working fine on my local Apache server.  I then published to a staging site hosted on GoDaddy for further testing. The rules which directed permalinks to a PHP program stopped working; “no input file specified” appeared on my browser instead. No access to the error logs on this bargain basement hosting plan makes debugging all but impossible.

I finally found this post which presented a solution:

“Turn off MultiViews. It seems when MultiViews is enabled there is confusion between MultiViews and the RewriteRules. So if you go to /user there will be no problem, MultiViews will translate it to /user.php. However when you go to /user/blah/login/blah or one of the other more complex clean URLs it gets confused.”

Adding “Options -Multiviews” to my .htaccess file fixed the problem. Not sure why this only happens on my GoDaddy account and not locally or at our other hosting accounts. Is this the one and only error that results from Multiviews and ModRewrite colliding, or are there others? If so, maybe I’ll shut off Multiviews on all my web sites until I need to add multiple language support.

May
01
2008

phpMyAdmin Installation, openSUSE 10.2

I installed the phpmyadmin package the other day using YaST and need to document some things:

  • Configuration area (config.inc.php file) ends up in /srv/www/htdocs/phpMyAdmin along with the rest of the installation.
  • Decided to use “cookie based authentication”; don’t think I need to remember the password.
  • The advanced features were turned on in config.sample.inc.php, but the YaST installer doesn’t load the schema required for this into the mySQL database (maybe it’s not possible unless done during mySQL installation). This resulted in endless error messages of “Table ‘phpmyadmin.pma_bookmark’ doesn’t exist”.
  • Learned that I needed to run create_tables.sql, which was not included in the openSUSE package. I downloaded it from the phpMyAdmin site, created user “pma”, defined corresponding controluser and controlpass entries in the config file. create_tables finally ran successfully and the bookmarks error messages went away.
  • Stopped annoying half-hour auto-logout with: $cfg['LoginCookieValidity'] = 3600 * 8;
Feb
15
2008

ATI Drivers for Radeon XPRESS 200, openSUSE 10.2

I didn’t have good results installing the ATI proprietary drivers in SUSE 10 using the ATI installer. The install itself was error-plagued and the X server was flakey afterwards. Thus when I upgraded to openSUSE 10.2 I chose not to install the proprietary drivers. For the past year I’ve been running with the non-3D Mesa/radeon drivers included with the release.

With ZENworks finally replaced with zypper on my system, package maintenance is fun once again, so I decided to try installing the proprietary ATI drivers. From among all of the different ways described on the web, I’ve selected “The Easy Way” from the openSUSE ATI documentation, which uses rpm packages supplied & maintained by ATI (AMD).

Pre-installation Status:

  1. glxgears - Mesa GLX Indirect renderer, runs about 90 FPS.
  2. Konqueror sysinfo: - Model: Radeon XPRESS 200 5954 (PCIE), Driver: radeon (No 3D Support).
  3. /usr/lib{,64}/libGL.so.1.2 - made “before” copies, since some installation instructions call for hiding them after installation. I want to be able to undo any hand edits to return to the standard linux driver if necessary.
  4. /etc/X11/xorg.conf - made a copy of the current file.

ATI Driver Installation Steps:

Based on the openSUSE ATI documentation, The Easy Way, openSUSE 10.3 10.2 10.1:

  1. zypper service-add http://www2.ati.com/suse/10.2 ATI – adds a YUM catalog containing the ATI proprietary drivers in rpm package format, enabling installation via YaST. I browsed here first, received a “no file found” message, and thought that the ati.com address had been retired by AMD. Fortunately, the instructions noted that the “above URLs are not browseable with a web browser, only by a YUM / REPO-MD capable packager manager”.
  2. Step 2 of the openSUSE instructions list a zypper command to install 2 ATI packages, but I wanted to see what was available and check dependencies before installing so I decided to use the YaST installer to carry out this step.
  3. Started YaST/Software Management, filtered for the new ATI catalog, and saw four packages: x11-video-fglrxG01 and 3 different versions of ati-fglrxG01-kmp (for bigsmp, debug, and default kernels). Choose the fglrx package that matches your kernel as shown by the uname command (default in my case). The dependency check was ok, so I clicked Accept. The X11 video driver package is 19MB, the kernel driver only 500KB.
  4. After the package installation was complete, I started YaST Software Management again to inspect the file lists of the new packages. The kernel module was installed into /lib/modules/2.6.18.8-0.8-default whereas my kernel is 2.6.18.8-0.9. The video driver package installs drivers into X11R6/lib{,64} and did not overwrite the original drivers.
  5. sax -r – per the instructions. I’ve never run this command before (except when installing the OS) and it worried me. The screen locks up, then it appears to kill the X server, but it reappears. Now I’ve got the SaX2 GUI and the Monitor tab’s Activate 3D acceleration option is active. After verifying that 1600×1200 resolution was set, I clicked OK without changing anything. Note: found the output from this command in /var/log/SaX.log.
  6. A box with a “Test” option appeared, I tested and the display was ok. Then Save, and an announcement that the changes will take effect when the graphics system is restarted. This should allow me to check xorg.conf. But no, the display started behaving badly, so I logged out and back in per the instructions.
  7. After completion, the accelerated 3D worked, but I’m now getting constant kernel errors:
    [fglrx:firegl_free_mutex] *ERROR* mutex id 0x.. not found in mutex list
    kernel: warning: many lost ticks.
    kernel: time source seems to be instable or some driver is hogging interupts
  8. Display didn’t look correct (see below) so I added a Modeline back in from the old xorg.conf file. This proved fatal to the X server. I managed to fix it by rebooting from an old SUSE 10.0 partition, otherwise it would have been time to rescue boot from DVD and attempt repairs. A vivid reminder of why I avoid X11 configuration whenever possible.
  9. The CRT Monitor parameters in xorg.conf switched from Modelines to Calculated and the display was being driven differently. There is now vertical compression at the top and bottom of the screen (as if the vertical scan rate was not constant).  Can not completely compensate with the monitor’s adjustments.
  10. glxgears - ATI Radeon Xpress Series renderer, now runs 1300 FPS.
  11. None of the OpenGL screensavers work; I think they are the source of most of the kernel mutex errors.

Conclusions:

  • YaST’s SaX2 is still an adventure and not up to the quality level of most other parts of YaST. There is little documentation (the openSUSE.org page is still a stub), and it runs without feedback, leaving me to wonder what choices are being made and where things are written. I would need to learn a lot of gory details about X11 configuration before I’d feel comfortable with SaX2 and the ATI drivers (or ATI/openSUSE would need to fix the frequent kernel errors).
  • Next release, I’ll install the 3D drivers first thing after installing the OS, then review the results and kernel errors. If I don’t like what I see, I can reload the OS, which is the only way I’ll feel confident that I’ve cleaned everything out.
  • Once I start using an OS release for production work, I’ll leave the graphics drivers unchanged for the duration. Updating midstream like I’ve done here is too risky (since I don’t need 3D for my projects).
Jan
06
2008

SMART smartd Configuration, openSUSE 10.2

Stumbled across smartd in YaST/System/System Services (runlevel) and turned it on. In addition to monitoring the long-term health trend of the disk, SMART also provides interesting real-time information about the hard drive: total hours, number of power cycles, current temperature, etc. I liked smartmontools so much that I also installed it on my WindowsXP machines, along with HDD Health.

I configured /etc/smartd.conf after reading the smartd.conf and smartctl man pages, the smartd log entries in the messages file, and comments within the file itself. Current setup:

  /dev/sda -d sat \
  -a -o off -S on \
  -s (O/../.././07|S/../.[27]/./08|L/.[02468]/15/./08) \
  -m root@localhost -M test

Notes on the configuration parameters:

  1. -d sat - lots of testing to determine this should be sat (SCSI to ATA Translation) and not ata. The various man pages are vague, with one saying “follow the hints appearing in the log file.” Well, the hint in my log file said “use -d ata or -d sat” (sigh). Both seemed to work, so I began with sat since the libata library is present. Subsequent tests of ata produced command failure messages in the log, thus sat is correct.
  2. -a - turn on the default recommended set of monitoring functions.
  3. -S on - “S” is variously described as “turning on SMART” or “turning on Attribute Autosave”. I believe this is a modal parameter within the disk drive itself, and starting smartd turns this on by default. The manual recommends adding to the configuration to ensure it doesn’t get shut off. It echoes “enabled SMART Attribute Autosave” to messages, reassuring me that SMART is running.
  4. -o off – “o” controls “SMART Automatic Offline Testing” (data collection really, not testing). The man page says this command is obsolete and that the collection intervals are chosen by the disk manufacturer. The “O” offline test (discussed next) causes the same data to be collected.
  5. -s REGEXP – schedule various off-line tests My schedule: O (offline immediate), every day, 7am; S (Short), every 5 days, 8am; L (Long) – 15th of alternate months. I haven’t discovered what the differences are between the short and long tests (for my disk the short test takes 2 minutes vs. 98 minutes for the long test).
  6. -m, -M - email address for important messages. “-M test” sends an email on startup to confirm everything is working.
  7. -i N – (command line only, not conf file) set the interval for polling the disk. Seems like this shoud be controllable from the conf file, but since it isn’t I’m staying with the default of 30 minutes.

Notes:

On Linux, “smartctl -c” always showed a failure in the offline data collection status:

(0×05) Offline data collection activity was aborted by an interrupting command from host.
Auto Offline Data Collection: Disabled.

Examining the messages file shows auto-collection being enabled and the scheduled offline collection being started. Everything seems normal, but auto-collection becomes disabled at some point. On two WindowsXP PCs, data collection activity completes normally and Offline Collection is enabled. I don’t know how auto-collection was enabled on these two machines; perhaps HDD Health did it.

Is the Linux collection failure connected with auto-collection becoming disabled? First attempt at solution: switched from “-o on” to off after realizing that my scheduled testing ran the same test. Changed short test schedule to one per week in case it was causing the interruption. The problem has gone away and smartctl now reports “data collection activity was completed without error”.

References:

  • Wikipedia – best description of the SMART attributes that I found (other than the spec itself which is exhausting). The external link “Out SMART your hard drive” helped explain why the raw data was different between my Maxtor and Seagate hard drives.
  • smartmontools – home page for the SMART tools. They are installed by default in openSUSE Linux, and available for download for Windows XP.
  • HDD Health – a graphical user interface that presents current SMART information and failure alerts for Windows PCs. Still need smartmontools to trigger the offline tests.
Dec
12
2007

How To Disable ZENworks ZMD, openSUSE 10.2

The openSUSE 10.2 update repository that I was connected to stopped receiving updates from the mother ship in late November. I checked some other mirrors and found them to be in the same state. Only the main ftp.suse.com repository is current (as of this post). I could find no mention of this problem on openSUSE.org.

This problem plus the announcement of openSUSE 11.0 Alpha caused me to reconsider my upgrade plans. I am happy with 10.2 but was planning to upgrade to 10.3 solely to eliminate the nightly 1.5 hour zmd update runs (which I’ve complained about at length in a previous post).

Since I am now going to have to poke around with updating anyway, I decided to look once more for instructions on how to disable the ZENworks ZEN management daemon (zmd). If I succeed in turning off zmd, I may skip the 10.3 release entirely and go directly from 10.2 to 11.0.

How to Disable ZMD:

I found this openSUSE-Community.org article which provides simple instructions for disabling and removing zmd. Using it as a guide, I performed the following steps:

  1. rczmd stop – this stopped zmd. Hallelujah! I’ve never known how to do this before now.
  2. Start the YaST, Software, Installation Source GUI.
  3. Add ftp.suse.com/pub/suse/update/10.2 as a new catalog.
  4. Disable (status = Off, update = Off) the old (utah.edu) mirror catalog. This catalog can be removed once the ftp catalog is working.
  5. Uncheck the “Synchronize with ZENworks” check box.
  6. Click Finish.
  7. The catalog information was successfully downloaded from ftp.suse.com, new updates appeared in the updater applet, and were successfully applied. Everything seems to be working.
  8. I restarted the Installation Source GUI and was dismayed to see the “Synchronize with ZENworks” box still checked. Don’t know if it is still turned on, or if the GUI is just urging me to turn it back on.
  9. Leaving zmd turned off but still installed, I let the system run for a few days until I verified that newly released updates from openSUSE were reported in the updater applet.
  10. Once verified, I removed the zmd related packages as described in the article:
    rpm -e zmd libzypp-zmd-backend sqlite-zmd rug zen-updater
  11. Restarted the Installation Source GUI: the “Synchronize with ZENworks” box is now grayed out.
  12. Rebooted; start-up works fine without ZENworks zmd.

Updater Applet:

If you are running KDE, you may need to perform the following step:

  1. Switch the panel updater applet (controlled  by /etc/sysconfig/sw_management) from zlm (the zen-updater update manager) to opensuse.

I had already done this manually during my previous battles with zmd, but I have some recollection that it will happen automatically if the zen-updater application isn’t found. For Gnome, I don’t know what applet (if any) will appear in the panel.

Aug
14
2007

NETDEV WATCHDOG transmit timed out, Realtek 8139

Periodically I restart my dual-boot workstation into Windows XP, run Microsoft Update and all other updaters to get everything current, then take a spin around the block just to make sure XP is still working fine. Among other updates this time, I accepted a new Realtek device driver from Microsoft Update. Normally, I only pull driver updates after they’ve been posted to Compaq support.

Win XP ran fine after all of the updates, but when I rebooted into openSUSE 10.2, it was unable to connect to the LAN. I found the following alarming entries in /var/log/messages:

  kernel: NETDEV WATCHDOG: eth0: transmit timed out
  kernel: eth0: Transmit timeout, status 0d 0000 c07f media 10.
  kernel: eth0: Tx queue start entry 4  dirty entry 0.
  kernel: eth0:  Tx descriptor 0 is 0008224e. (queue head)
  kernel: eth0:  Tx descriptor 1 is 0008224e.
  kernel: eth0:  Tx descriptor 2 is 0008224e.
  kernel: eth0:  Tx descriptor 3 is 0008024e.

Assuming that on-the-fly loading of the new driver by Windows XP plus a warm reboot had put the LAN chip in an undefined state, I did a full power-off reboot. No help. Perhaps my Linux configuration had been damaged coincidentally. To check, I rebooted a third time into an old, still functional SUSE 10.0 partition. It now failed with the same NETDEV errors. Anxiously, I rebooted into Windows XP, which still worked fine.

I was now faced with convincing evidence that a driver running under Windows XP was leaving the LAN interface firmware or hardware in a condition that could not be (or was not) properly re-initialized by Linux — something I would have dismissed as impossible had someone asked me prior to today.

Wanting my Linux back up as soon as possible, I ran Windows XP System Restore and rolled back out of the driver update. Holding my breath, I rebooted into openSUSE and found myself back on the LAN. Here are the before and after messages sections edited for easy comparison: messages-watchdog.txt and messages-normal.txt.

So Windows XP System Restore (which has saved my butt more than once) is the hero of this incident, with the problem likely in the Linux driver or the Realtek chip itself. I found 30K web hits on the error message, with this thread being the best match. No authoritative solution has appeared even though Linux users have been experiencing this problem consistently since 2005. Probably because debugging it would be a tedious, thankless task.

Aug
05
2007

Subversion – Subdirectory Branch and Merge

While rereading the SVN Book, I noticed instructions for creating a branch on a subdirectory of the trunk. So I decided to create a branch on just the php subdirectory of a project:

svn copy http://localhost/svn/ll/trunk/php http://localhost/svn/ll/branches/phpMVC

The branch development was adopted, thus requiring merging the code back into the trunk. Overall, using a branch on just the subdirectory I was working on made the development and merging easier. Commits by others could be retrieved by an update without needing a merge, minimizing disruptions during coding.

I screwed up the merge as follows:

  • svn merge -r 11:15 http:../branches/phpMVC – on the basis of a stop-on-copy command that started at 15 and ended at 11.  But other files in the branch had been modified up to revision 20.  So I will use the form svn merge -r 11:HEAD from now on.
  • I kept issuing the svn merge and svn switch  commands in the root directory instead of the php subdirectory.  These missteps were reported by SVN and I recovered, but I did scare myself a few times.

Also, I modified a file in a css subdirectory which I could not commit to the branch because it was not within the scope of the branch. Next time I will create the branch on the entire project and then svn switch only the subdirectory I’m working on. This leaves the option of switching other subdirectories if the work expands.

Jul
12
2007

PHP Array Index Implicit Casts and uniqid()

In addition to performing implicit string-to-number casts for arithmetic expressions, PHP implicitly casts array keys in the same manner. Thus, even if one puts an integer in quotes as an array key, PHP will convert it back to an integer. However (as we learn from the response to Bug 21954) explicitly casting the index to a string works.

So even though PHP claims to offer associative and indexed arrays, the implicit casting will be busily converting associative keys to index keys behind the curtain. Things get particularly ugly with integers g.t. 2^32 (above bug report plus Bug 34419). And notice the snippy resolution comment: “This hasn’t changed, will not change, and is not a bug.” So large integers that start off as strings are cast to integers, but being too large, are then converted to floats. That’s intuitive.

I was storing objects in an associative array, using the objects’ id as the keys and I was generating the ids with uniqid(). Uniqid generates 13 character strings that appear to be hexadecimal values cast to a string. Most of these values will have an [a-f] character, but occasionally they contain only digits. In this unfortunate case, these are cast to long integers then cast again to a float. Except that I serialized one such array and saw a negative integer listed as the array key! Whatever the exact mechanism, after unserialization I was unable to retrieve one object from an array, even using array_keys().

This is a PHP language design flaw, in my opinion. The default unique-id generator generates values that randomly map into two disjoint sets of array keys: associative string keys or numeric index keys. Given the schizophrenic array index casts, one would certainly expect uniqid() to uniformly generate strings or integers. So at a cost of about 12 hours, I now know to use the optional prefix argument all the time.

Such problems are avoided in Perl because arrays are declared to be associative or numeric, so keys aren’t being cast behind one’s back. PHP is younger than Perl, which I judge to be largely free from these types of asymmetries. And I didn’t look through all of the remaining PHP bug reports after finding the info I needed: since two years have elapsed, maybe PHP has improved this by now.

Jul
01
2007

Generating Website Snapshots (Thumbnails)

How can I capture a snapshot of a web page and save it as a JPEG or PNG file? Today I briefly looked into this question.

On-line Snapshot Generators

Two classes of snapshot generators are available on the web. BrowserCam and NetRenderer support web site developers who need to verify their web designs on multiple platforms and browser versions. These snapshots are full-size and present exactly what would appear on a user’s screen. NetRenderer describes their solution as follows: “we use a proprietary C# application to control parallel rendering and to generate the virtual screenshot images.” BrowserCam allows you to connect to their rendering machine using VNC.

The second class of web-available snapshot generators offer their service to website authors as a way of enriching their site’s graphics. These are typically thumbnail images, in terms of size, detail, and function. This class of generator is more in line with my possible use for snapshots.

Wordpress offers Snap’s snapshot generator on all links within these blogs (to go to their site, hover on any link then click on the SnapShot logo in the lower-right footer). New snapshots are generated relatively quickly (i.e. minutes, not hours).

A similar offering comes from ShrinkTheWeb. Let’s try it on the New York Public Library web site:

ShrinkTheWeb website thumbnail

Although NYPL might be cached, I tried it on an unlisted site and the thumbnail appeared within 10 seconds. The engine is usually available and offers parameters to control the size and quality of the snapshot. The image is directly embeddable in my web page, unlike Snap.

If you compare the actual web site to the various thumbnail engines, notice that Java, Flash, and JavaScript generated graphics may not appear. I would expect that BrowserCam would be most faithful in this regard, since one actually connects to a real browser.

Currently I can’t easily gain possession of the image file. I could capture the image with wget or Curl, but that’s not the same as directly generating the image. And Snap displays the image only in a pop-up.

Command Line Snapshot Generation

Ideally, I’d like to replicate snapshot functionality on a local workstation. In Linux, I would hope to generate a thumbnail by providing a URL and a view-port size to Firefox or Konqueror, then requesting that it save the image to a file in my preferred format. Something like this:

   browser --background --size 800x600 --jpeg www.yahoo.com > yh.jpg

Alas, man pages and user documentation gave no evidence that Firefox, Opera, or Konqueror support a background mode of operation. Others have succeeded, however. In his Planet-PHP Website Thumbnails entry from 2005, J Eichorn describes using Mozilla in just such a fashion. Update: This functionality is available online at Bluga.net Webthumbs.

Things have evolved from 2005 — now I guess SeaMonkey is the suite offering. If one wants access to the rendering engine, the Mozilla web site suggests embedding the Gecko engine directly using the Toolkit API.

 
Powered by Wordpress and MySQL. Theme by openark.org