Aug
05
2007

Subversion – Subdirectory Branch and Merge

While rereading the SVN Book, I noticed instructions for creating a branch on a subdirectory of the trunk. So I decided to create a branch on just the php subdirectory of a project:

svn copy http://localhost/svn/ll/trunk/php http://localhost/svn/ll/branches/phpMVC

The branch development was adopted, thus requiring merging the code back into the trunk. Overall, using a branch on just the subdirectory I was working on made the development and merging easier. Commits by others could be retrieved by an update without needing a merge, minimizing disruptions during coding.

I screwed up the merge as follows:

  • svn merge -r 11:15 http:../branches/phpMVC – on the basis of a stop-on-copy command that started at 15 and ended at 11.  But other files in the branch had been modified up to revision 20.  So I will use the form svn merge -r 11:HEAD from now on.
  • I kept issuing the svn merge and svn switch  commands in the root directory instead of the php subdirectory.  These missteps were reported by SVN and I recovered, but I did scare myself a few times.

Also, I modified a file in a css subdirectory which I could not commit to the branch because it was not within the scope of the branch. Next time I will create the branch on the entire project and then svn switch only the subdirectory I’m working on. This leaves the option of switching other subdirectories if the work expands.

Jul
12
2007

PHP Array Index Implicit Casts and uniqid()

In addition to performing implicit string-to-number casts for arithmetic expressions, PHP implicitly casts array keys in the same manner. Thus, even if one puts an integer in quotes as an array key, PHP will convert it back to an integer. However (as we learn from the response to Bug 21954) explicitly casting the index to a string works.

So even though PHP claims to offer associative and indexed arrays, the implicit casting will be busily converting associative keys to index keys behind the curtain. Things get particularly ugly with integers g.t. 2^32 (above bug report plus Bug 34419). And notice the snippy resolution comment: “This hasn’t changed, will not change, and is not a bug.” So large integers that start off as strings are cast to integers, but being too large, are then converted to floats. That’s intuitive.

I was storing objects in an associative array, using the objects’ id as the keys and I was generating the ids with uniqid(). Uniqid generates 13 character strings that appear to be hexadecimal values cast to a string. Most of these values will have an [a-f] character, but occasionally they contain only digits. In this unfortunate case, these are cast to long integers then cast again to a float. Except that I serialized one such array and saw a negative integer listed as the array key! Whatever the exact mechanism, after unserialization I was unable to retrieve one object from an array, even using array_keys().

This is a PHP language design flaw, in my opinion. The default unique-id generator generates values that randomly map into two disjoint sets of array keys: associative string keys or numeric index keys. Given the schizophrenic array index casts, one would certainly expect uniqid() to uniformly generate strings or integers. So at a cost of about 12 hours, I now know to use the optional prefix argument all the time.

Such problems are avoided in Perl because arrays are declared to be associative or numeric, so keys aren’t being cast behind one’s back. PHP is younger than Perl, which I judge to be largely free from these types of asymmetries. And I didn’t look through all of the remaining PHP bug reports after finding the info I needed: since two years have elapsed, maybe PHP has improved this by now.

Jul
01
2007

Generating Website Snapshots (Thumbnails)

How can I capture a snapshot of a web page and save it as a JPEG or PNG file? Today I briefly looked into this question.

On-line Snapshot Generators

Two classes of snapshot generators are available on the web. BrowserCam and NetRenderer support web site developers who need to verify their web designs on multiple platforms and browser versions. These snapshots are full-size and present exactly what would appear on a user’s screen. NetRenderer describes their solution as follows: “we use a proprietary C# application to control parallel rendering and to generate the virtual screenshot images.” BrowserCam allows you to connect to their rendering machine using VNC.

The second class of web-available snapshot generators offer their service to website authors as a way of enriching their site’s graphics. These are typically thumbnail images, in terms of size, detail, and function. This class of generator is more in line with my possible use for snapshots.

WordPress offers Snap’s snapshot generator on all links within these blogs (to go to their site, hover on any link then click on the SnapShot logo in the lower-right footer). New snapshots are generated relatively quickly (i.e. minutes, not hours).

A similar offering comes from ShrinkTheWeb. Let’s try it on the New York Public Library web site:

ShrinkTheWeb website thumbnail

Although NYPL might be cached, I tried it on an unlisted site and the thumbnail appeared within 10 seconds. The engine is usually available and offers parameters to control the size and quality of the snapshot. The image is directly embeddable in my web page, unlike Snap.

If you compare the actual web site to the various thumbnail engines, notice that Java, Flash, and JavaScript generated graphics may not appear. I would expect that BrowserCam would be most faithful in this regard, since one actually connects to a real browser.

Currently I can’t easily gain possession of the image file. I could capture the image with wget or Curl, but that’s not the same as directly generating the image. And Snap displays the image only in a pop-up.

Command Line Snapshot Generation

Ideally, I’d like to replicate snapshot functionality on a local workstation. In Linux, I would hope to generate a thumbnail by providing a URL and a view-port size to Firefox or Konqueror, then requesting that it save the image to a file in my preferred format. Something like this:

   browser --background --size 800x600 --jpeg www.yahoo.com > yh.jpg

Alas, man pages and user documentation gave no evidence that Firefox, Opera, or Konqueror support a background mode of operation. Others have succeeded, however. In his Planet-PHP Website Thumbnails entry from 2005, J Eichorn describes using Mozilla in just such a fashion. Update: This functionality is available online at Bluga.net Webthumbs.

Things have evolved from 2005 — now I guess SeaMonkey is the suite offering. If one wants access to the rendering engine, the Mozilla web site suggests embedding the Gecko engine directly using the Toolkit API.

Jun
11
2007

Canonical Web URLs using the Apache Rewrite Engine

With current browsers and typical ISP web hosting account http server setups, surfers can view a web site by typing either www.example.com or example.com into their browser’s URL field. Prevailing wisdom in the SEO community is that web administors should convert all visits to one or the other. They suggest that if your site’s visitors split half & half in their choice of URL, Google and other page rankers may see the two variants as separate sites and divide your page visits accordingly (or even drop one set of pages as duplicates). Matt Cutts from Google suggests canonicalization as a good idea, but doesn’t comment on any problems if you don’t. It seems to me that converting all visitors to a canonical URL has no downside and many benefits, so I decided to do it.

Using Apache’s Rewrite Engine to Force the use of one Server Name

We can consolidate all visitors to a single, canonical form of the server name portion of our URL using the Apache rewrite engine. The Apache documentation’s Rewriting Guide presents example code related to this case:

   RewriteCond %{HTTP_HOST}   !^www\\.domain\\.name [NC]
   RewriteCond %{HTTP_HOST}   !^$
   RewriteRule ^/(.*)         http://www.domain.name/$1 [L,R]

I tried this code on two different ISP accounts and could not get it to work. After checking all sorts of blind alleys, I realized that none of my other rewriting rules started with a slash, so I took it (shown in red) out and the rule started working. Either the example in the Apache manual is wrong, or some server installations pass the opening slash to the engine and others (like my two ISPs) do not.

Although the example now worked, it required hand-editing for each new site. So I wrote the following rule set which can be inserted into any root .htaccess file:

   # Force top-level "domain.com" requests to "www.domain.com":
   RewriteCond %{HTTP_HOST}   !^www\\. [NC]
   RewriteCond %{HTTP_HOST}   ^[^.]+\\.(com|edu|net|org)$ [NC]
   RewriteRule (.*)           http://www.%{HTTP_HOST}/$1 [R=permanent,L]
   # NOTE: Other rules must follow the www rule:
   RewriteRule ^sitemap.xml   sitemap.php [NC]

The second rewrite condition restricts the rule to primary domain requests, preventing intranet, subdomain, and/or localhost accesses from being rewritten. I also set the force redirect (R) flag code to permanent (301). The last rule (L) flag instructs the engine to stop rewriting and return the redirect at this point. The browser receives the 301 message and re-requests the page using the canonical form. Canonical page requests are scanned by any other rewriting rules, which should always follow the renaming rules.

To Prefix or Not to Prefix, That is the Question

My final rule set above chooses the “www” variant as the canonical form. However, the WWW is Deprecated folks advocate choosing the non-prefixed form — “use of the www subdomain is redundant and time consuming to communicate. The internet, media, and society are all better off without it.”

Excellent point, but small sites must follow the real-world trendsetters in matters like this. A quick survey shows that yahoo.com, google.com, and msn.com all choose “www” as their canonical URL format.

Jun
03
2007

Web Hosting Notes, Requirements, Comparison

I now have 3 different Apache hosting environments: my openSUSE 10.2 workstation, GoDaddy, and A2Hosting. Signing up for a month is the only way to determine if a web application will work on an ISP’s service. Their hosting environments are all slightly different. Here are the issues I’ve encountered getting a PHP web application running on each system.

openSUSE 10.2

  • Apache2, PHP 5.2, mod_php, mod_rewrite.
  • PHP magic_quotes_gpc is set to off in openSUSE (the proper choice, but not the PHP default as I painfully learned when installing at the ISPs).
  • PEAR supported in the Apache PHP include path.
  • Subdirectories written into by the application (e.g. Smarty template compilation area) need world-write privilege.

GoDaddy Hosting

  • Apache1.3, PHP4 (5.1.4 optional), CGI/fastCGI, mod_rewrite.
  • Cannot use php_flag because using CGI, must use php5.ini instead.
  • Switch from PHP 4 to 5 in .htaccess:
    AddHandler x-httpd-php5 .php
    AddHandler x-httpd-php .php4
  • PHP magic_quotes_gpc on by default, turn off with local php5.ini file.
  • AllowOverride Options disabled, can’t use in .htaccess.
  • Files created by web site users (i.e. by the httpd user) have the same uid/gid as my ftp login user. An excellent configuration approach, as I don’t need to give world write privileges to local data subdirectories.
  • ini_set(‘session.cache_limiter’, ‘private’) causes server 500 error.

A2 Hosting

  • Apache 1.x (server_signature empty), PHP 5.2, mod_php, mod_rewrite.
  • PHP magic_quotes_gpc is on by default, turn off using php_flag (here is the Running PHP as an Apache Module PHP man page describing how to use php.ini directives within .htaccess).
  • Subdirectories written to by the application need world-write privilege. My ftp area is a public_html directory, so A2 uses mod_userdir. The home directory looks similar to my local workstation.
  • ini_set(‘session.cache_limiter’, ‘private’) works fine. Moved this parameter to a php_flag directive.
  • A2 mentions PEAR support on their web page, but they don’t add the directory to the PHP include path and only offer a handful of modules. Will add modules per customer request, but what will happen if they move my account to a different server?

Web Hosting Requirements

  • PHP Extensions: CURL, XMLWriter.
  • Apache Modules: rewrite.
 
Powered by Wordpress and MySQL. Theme by openark.org