My first hours with PCLINUXOS

As a mentioned in my previous post I spent a little time last night installing PCLINUXOS on my laptop. Other than my machine at work most of my computer time is spent on this laptop and dealing with Ubuntu Hardy Heron was starting to take its toll on me.

I installed the MiniMe version of the distro, the Live CD of which weighs in at just over 200MB. It’s meant to be just enough to get get everything up and running so you can then, through ATP or Synaptic, install just the software you want. This was one of the major appeals of this Distro to me. The reason I gave up on Window, while I doubt I’ll ever run OSX, and why I’m moving away from Ubuntu is that I don’t so much extra crap built into the OS from the start. I want my operating system to operate my system, I’ll take care of the rest.

First Impressions

The first thing I noticed what that PCLINUXOS was able to start using the broadcom wifi in my laptop right out out of the gate. I opened up the network configuration tool and it asked me if I wanted to use the windows driver for via ndiswrapper for the card since there is no linux driver. I said yes and was done. In Ubuntu I had to use the wired ethernet to search Ubuntu’s forums for the right driver to use and instructions on how to set up ndiswapper.
PSLINUXOS time: 30 seconds
Ubuntu time: 45 minutes

My graphics card installed and worked correctly right from the get go. In Ubuntu I had to install the driver from Intel which was a hassle. Though the hassle was more Intel’s fault that Ubuntu’s.
PSLINUXOS time: 0
Ubuntu time: 20 minutes

Either MiniMe doesn’t come with klaptop or the install didn’t figure out that I was running a laptop because the utility wasn’t installed. But since I already had a working eithernet connection all I needed to do was install the utility via Synaptic and I was able to get all my power managment working just fine. I had to tweek the sleep settings a bit to get the machine to return from being suspended properly, but I’ve had to do this with every linux distro I’ve installed on it.
PCLINUXOS time: 2 minutes
Ubuntu time: 0

In the file /etc/acpi/events/sleep I changed:
action=/usr/sbin/pmsuspend memory
to
action=klaptop_acpi_helper --suspend

I then installed firefox3, thunderbird, openoffice, cd burning software, vlc, flash, and ksudoku. These packages represent about 99% of what I do on my laptop. Getting all the files downloaded and installed from Synaptic took about half an hour. I wouldn’t have to have installed these packages in Ubuntu, but I would have spent a lot of time uninstalling others.
PCLINUXOS time: 30 minutes
Ubuntu time: 1 hour

I lost all ability to connect to my Windows desktop via samba when I upgraded to Hardy Heron. From what I have gathered this is a problem with the Nautilus file manager. I tried changing the Thundar, but I never could get things working. Ubuntu would hang and hang while trying to connect then finally say it didn’t have an application to open a smb connection. In PCLINUXOS it connect right away. No hangs, no problems.
PCLINUXOS time: 0
Ubuntu time: weeks without success

Still to Do

I need to get Photoshop running, but I’m confident that the instructions I posted previously on the subject will work fine in PCLINUXOS. I also need to get the media buttons on the front of my laptop to work. I haven’t even attempted that yet so I’m not sure what it will take. It’s a minor thing, but they come in handy for changing volume.

So far I’m very happy with this Distro.



PCLINUXOS

How Hardy Heron is making me feel.I think I’ve about had it with Ubuntu Hardy Heron. It’s a sad day when my windows box is more stable than my linux laptop. I also made the mistake of upgrading my work machine from Gutsy Gibbon to HH last week and I’ve been regretting it ever since. Three times today alone I had Nautilus crash on me without the ability to restart it. When I’d kill the existing Nautilus process a new one would start automatically and crash until I rebooted the box. Reboots like that are unacceptable to me on a linux machine.

I’ve been playing with Ubuntu since the Warty Warthog days and this is the first upgrade that has not been a significant improvement. That makes me want to not give up on Ubuntu completely and instead downgrade back to Gutsy until the issues get worked out. Then there’s the bandwagon part of me that wants to jump on the the PCLINUXOS bandwagon.

Tomorrow after work I’m going to install PCLINUXOS on my home laptop and if things go well I’ll consider putting it on my work machine too. PCLOS has it’s .roots in Mandrake… aka Redhat, but it’s a full fledge distro in its own right now and it uses APT. I can work with anything so long as I have APT.

Thanks to Hayden Simons for the photo


Goodbye image servers

Saying goodbye tio a departed image server

A week ago tomorrow IDX decommissioned the last of its image servers. Over the last 2 and a half months I migrated a little over 20 million images, about 480 gigabytes, from our severs to Amazon’s S3 service. Most of that time was spend just occasionally checking in on the migration scripts that I had written or rewriting our image acquisition scripts to work with S3. We download images from about 190 sources every night as we gather MLS data on behalf of our clients.

The best part of the whole image migration and overhaul is that image acquisition is now tied into into our data balancer system. Each MLS in our system has a time stored in the database that is the earliest we can reliably download data from that source. Once we reach that time in the day the MLS goes through a series of steps triggered by a cronjob that runs once a minute. First the data is downloaded from what ever source makes it available. This can be ftp, http, soap, rets, or even direct sql connections. Next the data is parsed and made ready for insertion into our database. Once processing is done the data is geocoded so that we can easily map all the properties.

This was where the process stopped. When the app was first written image scripting was rushed as we were trying to meat our launch deadline. The image scripts were on different servers, so the data balancer couldn’t act on them directly. Instead each was launched as its own cronjob on one of the image servers. Every MLS is unique in the way we acquire images and is constantly changing, as such each must have its own acquisition script. Now each of those scripts is defined in our database.

Once per minute a script runs on our EC2 server that looks for image ready flags in our data balancing system. When it finds one it checks the database for the specific file that should be run to gather images. The script runs and then resets the image ready flag. As with data our image sources are varied. In some cases we generate URLs based on a know syntax, in some cases we’re given URLs by the MLS. In this cases we don’t need to store anything. Often we get images from some FTP source, via RETS, or in one case we download binary stored as BLOBs on a remote SQL server. Needless to say it’s complex to get all these images from 190 disparate sources, so anything we can do to automate things better is good.

My next project is building WSDL web services using NuSoap. This is uncharted territory for me, so I’m sure I’ll have more to say on this subject later.



I spoke a bit to soon

This post is a bit late as it describes things that happened last Thursday, but it was a busy weekend and all my “posting to stuff” energy got sucked up by twitter.

I knew it was risky to make a self congratulatory post about a feature that had just launched. All and all it was pretty successful and took a deent load off the server, but two bugs cropped up that were sever enough to force me to revert the code.

The first issue will be easy enough to fix. Our app has several features, including a property slideshow, that are called remotely via javascript includes which also rely on the results class. Because the caching mechanism needs a PHP session id to avoid having one user contaminating another’s search results these tools stopped returning any properties. Luckely I wrote the constructor of the results class to have an all purpose override array as one of it’s parameters. So all I need to do to fix this issue is to generate a sessionID for the javascript includes to pass through to the results class and they should work again.

The second issue is going to take more work and creativity. When I worte the results class I built out the featured properties function to be generic. My thinking was that any property lising that is owned by one or our clients is a featured property and thus belongs on the featured properties page. Our clients, however, seem to have disagreed. They’ve figured out that they can append search variables to their featured property URLs and do things like make featured properties pages that are only for million+ dollar homes, or just commercial listings, or… whatever they want. This is all fine and good when the featured property search is repoerformed each time the page is called and is user agnostic, but not so much when my caching mechaism was in place. The caching mecanism treats all featured properties searches the same effectively ignoring any search terms added on.

I reverted the the previous version of the results class from our SVN repository and all went back to normal. Once I tie up these last couple loose ends I’ll be able to push the caching mechanism back out as part of Wednesday’s doubledot release. Here’s hoping it goes better this time. The caching mechanism did seem to take a noticeable load off the server, so it seems like a worthy endeavor to retry.



Fun with caching

In the last couple of days I did some work to complicate the IDX application a bit. I applied the patch today that contained the changes and so far all seems well. Here’s the story.

About nine months ago I completed a reworking (aka complete rewrite from the ground up) of the application’s results class. This is the code that assembles all the properties that meet the criteria of the search that has been performed and makes them available for what every they need to do. Once all the various data tables had been queried the matching results were placed in a temporary heap table so that they could be sorted, filtered (based on client preferences and/or MLS rules), and truncated if need be. I decided to use temporary heap tables because they’re fast and since they’re session specific I knew that I wouldn’t have to worry about one user contaminating another’s results.

The system has been working beautifully for these last nine months but as our traffic has grown (now upwards of 44,000 hits a day) mySQL was having trouble keeping up. All the heap tables we using a lot of the server’s RAM and since the heap tables were being destroyed as soon as the page was delivered searches had to be rerun completely just to move from page to page.

Todays patched changed things. The heap tables are gone in favor of a searchCache table (one for each client in our system) where all search results end up. When the same search is run again (like when switching pages) the results can be pulled from the cache instead of all the data tables needing to be queried again. All results are tagged with the users PHP session ID to prevent result contamination and every 4 hours the cache is cleaned to prevent the tables from getting too large. Featured property searches are also cached in our system and, because they are the slowest queries we perform*, they are cached for 24 hours until we get new data.

I’m pleased so far. The patch was uploaded to our server 8 hours ago and thus far there are no reports of problems.

Thanks to bob the lomond for the photo.

*Featured results are the slowest because of the number of tables that have to be queried. Normal results only have to query 1 table per MLS being searched because they are property type specific. Featured properties are property type independent and thusly upwards of nine tables per MLS may need to be queried.