PXE / Net booting Open Solaris 200805 release.

Introduction

After the disk in my laptop recently died I decided that it would be a good time to try and install the latest version of Open Solaris onto my laptop. Unfortunately the only option I have for installing new operating systems onto this laptop is via PXE boot, this has not been an issue in the past as I’ve been installing Debian / Ubuntu.

Solaris is fully capable of being network installed, in fact there are a multitude of tools out there that assist you in doing so. However, this is not true for the latest release of Open Solaris, support is planned but its currently not available. As a learning exercise I decided that I’d attempt to get this working on my laptop, turns out its entirely possible with a little bit of fiddling.

Note: This does not create a hands-off installer, it merely boots the livecd from the network. You can then install the system as you would normally from the gui installer.

Note 2: Part of this guide contains an ugly hack, its awful and may not work for you at all - you have been warned.

Requirements

You will need to the following to PXE / Net boot Open Solaris

  • Time
  • One copy of the Open Solaris CD Image.
  • One system already running Open Solaris (to manipulate the boot archive)
  • One PXE Boot Server (one part DHCP server, one part TFTP server and one part NFS server)
  • One PXE Boot Client (with sufficient hardware to run Open Solaris)
  • One network which both the client and server are connected to.

Setting up PXE Boot.

I’m going to assume that you know how to setup a PXE boot server, there are many documents regarding this for other operating systems which can be used to fill in the blanks here. Personally I used ubuntu Gutsy Gibbon as my PXE boot server.

The following steps explain which files need to be taken from the cdrom image and placed on your PXE boot server.

  • Mount the cdrom image somewhere on your system (I chose /mnt/cdromimage/)
  • From the cdrom image copy out the following files into your tftp directory (/tftpboot/ on my system)

# cp /mnt/cdromimage/boot/grub/pxegrub /tftpboot/boot/grub/
# cp /mnt/cdromimage/boot/x86.microroot /tftpboot/solaris/
# cp -r /mnt/cdromimage/platform/* /tftpboot/solaris/platform/

  • Note: you do not need all of the contents of of the /platform/ directory, however, its only 6mb in total its easier to include it all than remove what we don’t need.
  • Create the file /tftpboot/boot/grub/menu.lst with the following contents:

default=0
timeout=60
timeout=3
title Open Solaris PXE Install
kernel$ /solaris/platform/i86pc/kernel/$ISADIR/unix -m verbose -B install_server=[ip of your nfs server]:/export/opensolaris/
module /solaris/x86.microroot

  • When setting up your dhcpd boot entry for the Open Solaris client machine ensure that you include the following dhcp options (this is for the ubuntu dhcpd server, this is different under Solaris)

option grubmenu code 150 = text;

host laptop {
hardware ethernet 00:XX:00:XX:00:XX;
option grubmenu “/boot/grub/menu.lst”;
fixed-address 192.168.1.34;
option subnet-mask 255.255.255.0;
option routers 192.168.1.10;
next-server 192.168.1.20;
filename “/boot/grub/pxegrub”;
}

With your dhcpd and tftpd services up and running you should now be in a position to boot the client, this will get you to a point where the system has loaded the kernel + microroot. The boot will fail when we attempt to mount the cd rom.

Editing the miniroot

To complete the boot process we need to update the microroot and tell it to mount the cdrom from an nfs server instead of from a local cdrom drive. Now is the time to load up the Open Solaris machine you installed earlier, mount the open solaris iso and copy the x86.miniroot file somewhere handy (ready for editing).
The miniroot is a gziped ufs file system image, its relatively easy for us to open it up and manipulate as we please.

# mv x86.microroot x86.microroot.gz;gunzip x86.microroot.gz
# lofiadm -a /path/to/x86.microroot /dev/lofi/1
# mount /dev/lofi/1 /mnt/microroot

So now we have the microroot mounted under /mnt/microroot, lets make some changes. First of all there are three utilities I’ve added to the microroot to aid in net booting. These can be added in the following way:

# cp /usr/sbin/prtconf /mnt/microroot/usr/sbin/
# cp /usr/sbin/i86/prtconf /mnt/microroot/usr/sbin/i86/
# cp /usr/bin/cut /mnt/microroot/usr/bin/
# cp /usr/bin/sed /mnt/microroot/usr/bin

We now have all the utilities required on the microroot to mount the cd image over nfs, the next step is to update the script that mounts the cdrom image.

Note: Here is where the nasty hack is, as I couldn’t convince nwamd (network automagic daemon) or the default network service to actually bring up the network correctly I run perform a dhcp request in the script that mounts the cdrom image. Ideally we should bring networking online via the networking physical service (or nwamd if thats what its actually designed to be used for) and have the live-fs-root service depend on the networking. Anyway, as there is an official net boot utility in the pipeline I hope this will suffice until thats available.

On with the hacking, edit the /mnt/microroot/lib/svc/method/live-fs-root startup script. Find the following line:

echo “\rPreparing live image for use” >/dev/msglog

Just below this line add in our few lines to grab the install_server value, send out a dhcp request followed by mounting the network file system.

# grab the install_server setting from boot
install_server=`/usr/sbin/prtconf -v /devices|/usr/bin/sed -n ‘/install_server/{;n;p;}’|/usr/bin/cut -f 2 -d\’`
# Awful hack - configure dhcp for all the plumbed interfaces
/sbin/ifconfig -a dhcp
# now mount the install server onto /.cdrom
/sbin/mount $install_server /.cdrom

The above few lines do not perform any sanity checks, if any of the commands fail you’ll just get dumped to a single user login prompt.

Now that we are finished editing the micro root its time to unmount and re-compress it before dumping it back into the tftp directory on your PXE boot server.

# umount /mnt/microroot
# lofiadm -d /dev/lofi/1
# gzip x86.microroot;mv x86.microroot.gz x86.microroot

Finally..

The last thing to do is to export the contents of the cd image over nfs, I chose to export it from /export/opensolaris/. It may help you to know that Solaris has issues mounting nfs v4 from Linux servers (certainly from ubuntu gutsy release), I had to disable nfs v4 services under linux to mount the nfs file system from my Solaris system.

If everything has gone smoothly you should now be in a position to boot your client machine, which once the pxe boot has completed should run just as it would had it been booted from cdrom.

If you get dumped to a single user login prompt its safe to assume something has gone wrong, I’d recommend logging in (root pw: opensolaris) and verifying that each of the steps we take in the mount cdrom script works correctly.

code
hacks
networking
debug
Solaris
Open Solaris
PXE
Net Boot

Comments (2)

Permalink

ZFS Compression, when du and ls appear to disagree.

I have recently been migrated to a new ZFS based home directory at work, I’ve got hourly snapshots of my data and (as I’ve just found it) automatic on disk compression is enabled. Recently, when dealing with a corrupt .tar archive the on disk compression managed to suitably confuse me - enough to write this blog entry.

After saving my .tar file into my home directory I decided to verify the size on disk with the size on the ftp, the tar file on the ftp was roughly 20mb.

# du -hs download.tar
6.2M download.tar

That didn’t look right so I verified the file size with ls

# ls -ahl download.tar
-rw-r–r– 1 iain staff 20M May 9 17:41 download.tar

So, ls thinks my file is 20mb in size and du thinks its only 6mb, turns out that this is the expected behavior when you have compression enabled on the file system.
Both du and ls get their information about a file by performing the stat() system call against the file, /bin/stat under Solaris Nevada shows us the information returned from the stat() system call.

#/bin/stat download.tar
File: `download.tar’
Size: 20616912 Blocks: 12769 IO Block: 8192 regular file
Device: 5102d9ah/84946330d Inode: 76465 Links: 1
Access: (0644/-rw-r–r–) Uid: (50000/iain) Gid: ( 10/ staff)
Access: 2008-05-09 17:53:17.653543700 +0100
Modify: 2008-05-09 17:41:18.136951400 +0100
Change: 2008-05-09 17:41:18.136951400 +0100

The file attributes returned contain the files size in bytes, this is what ls returns to us when performing a directory listing. However, the same is not true for du. Based on the source of du.c one can see that du calculates the file size on disk by multiplying the number of blocks the file takes on disk up by 512.

12769 * 512 = 6537728 Bytes or 6.23Mb

Although there is a clear explanation for this behavior I think that it may cause some confusion. I am sure that many people rely on du returning the actual size of the file, this could cause problems if you are preparing to copy the files to cdrom or to any other file system that does not perform compression. It is certainly something that anyone administrating ZFS file systems should be aware of.

debug
Solaris
ZFS

Comments (0)

Permalink

Referential Integrity and Django.

Since late 2007 I’ve been working on a project involving Django, initially we started developing the software using 0.96 stable release, however recently made the decision to move to the SVN release. One of the main reasons behind this was the availability of new features and of course plenty of bug fixes.

During some initial testing of our Django app I noticed that I could break referential integrity right down at the database level with the sql tables that were created. There are two reasons for this:

  1. MySQL by default uses the MyISAM storage engine for its tables, this lacks support for transactions, row-level locking and foreign keys.
  2. Django bug 6374 - Foreign key constraints not added across apps when creating tables.

Fixing point number one was fairly straight forward, you can simply reconfigure MySQL to use InnoDB as the default table type which has support for foreign keys among other things. Realising that we were hitting a bug in Django took a little bit longer, so how do you know if you’re hitting the bug (it’s not explained very well in the ticket).

Imagine the following simple model (please excuse the formatting).

from django.contrib.auth.models import User

Class Item(models.Model):
id = models.AutoField(primary_key=True)
title = models.CharField()
desc = models.TextField()
owner = modesl.ForeignKey(User)

When the SQL is generated for this model we would expect to see a foreign key constraint created, unfortunately Django was not creating one thus allow us to assign owners to items who do not exist in the owners table. The foreign key constraint should look something like this:

ALTER TABLE `item_item` ADD CONSTRAINT user_refs_id_4ce20cfc FOREIGN KEY (`owner`) REFERENCES `auth_user` (`id`);

If you’re running 0.96 then you could be running into this bug, its certainly something to look out for. If you are running the SVN release you’re in luck, as I’m pleased to say this bug was fixed in Changeset 7215 (Committed to the trunk 10th March 2008).

Speaking of changesets, 7477 is a big step forward, this merges the Query Set Refactor Branch with the SVN Trunk, if you’re still running 0.96 there are literally hundreds of reasons to move on up to SVN - though watch out for those backward incompatible changes.

code
python
Linux
debug
Django
MySQL

Comments (0)

Permalink

When good hard drives go bad - 2008 remix.

Its that time of year, I had the same issue in 2007. This time I’m pleased(?) to say that only my laptop hard disk has died, I’m also pleased to say that this time I had a backup. Fingers crossed this is the only disk (this year).

hardware

Comments (0)

Permalink

High cpu usage from eog / gnome-panel ? blame gtk’s recent-manager.

After re-organising a large number of photos using eog I started to notice spikes of cpu usage while moving from image to image. At first this was just inconvenient but as time went on (and as more photos were viewed) the cpu usage spikes became longer and longer. Is my Ubuntu install getting slower? it feels like I’m back in Windows, time to investigate.

Quick Investigation.

Under Linux we can use strace to follow all the system calls a particular process makes, its very high level as we won’t be able to trace any library code. However, it may give us some a clue as to where we should start looking.
Using the command ’strace -o eog.strace eog *.jpg’ we can run eog for a little while and then look through the output for something interesting. After browsing through the output I found the following was appearing after we move from one image to the next but before the next image was displayed.

close(15) = 0
munmap(0xb634f000, 4096) = 0
time(NULL) = 1206991162
time(NULL) = 1206991162
time(NULL) = 1206991162

[ several hundred calls to the time() system call]

time(NULL) = 1206991162
time(NULL) = 1206991163
time(NULL) = 1206991163
gettimeofday({1206991163, 403748}, NULL) = 0
gettimeofday({1206991163, 404023}, NULL) = 0
gettimeofday({1206991163, 404288}, NULL) = 0

We call time() several hundred times in what appears to be a tight loop? something certainly seems broken or at the very least not optimal. Looking through the strace output which traced loading eog, displaying one image, moving to a new image and then exiting we called time() 25267 times.

Preparing To Go Deeper

Usually I debug apps under Solaris which has the awesome power of dtrace along with bundled binaries which are not stripped, this makes it really easy to trace library calls and user land calls. By default Ubuntu packages are shipped with all the binaries stripped which has upset some people I know. The Ubuntu devs are clearly not stupid and provide some debugging aids, not just debug builds (which aren’t really *that* useful in a production environment) but by providing separate packages which include the debugging symbols.

One can add the debug symbol packages to the Gutsy release of Ubuntu by adding the following to their /etc/apt/sources.list file:

deb http://ddebs.ubuntu.com/ gutsy main universe
deb http://ddebs.ubuntu.com/ gutsy-updates main universe

Once that has been done the debugging symbols packages will show up with ‘-dbgsym’ appended to them in apt, for now I am just going to install the eog-dbgsym package. I’m also making sure that I have gdb installed, as that is what I plan to dig deeper with.

Digging Deeper.

Firstly, lets start up eog as we did before but this time attach gdb and set a break point for the time() call.

# eog *.jpg
!! note I know eog’s pid is 23700 !!
# gdb -p 23700
!! loading gdb output….
0xffffe410 in __kernel_vsyscall ()
(gdb) b time
Breakpoint 1 at 0xb73fde25
(gdb) c
Continuing.

!! move to a new image with eog !!

[Switching to Thread -1228670288 (LWP 23700)]

Breakpoint 1, 0xb73fde25 in time () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0 0xb73fde25 in time () from /lib/tls/i686/cmov/libc.so.6
#1 0xb74fae7b in g_bookmark_file_set_mime_type () from /usr/lib/libglib-2.0.so.0
#2 0xb7a34499 in gtk_recent_manager_add_full () from /usr/lib/libgtk-x11-2.0.so.0
#3 0×08067c42 in eog_window_display_image (window=0×8124118, image=0×827ece8) at eog-window.c:870
#4 0×080683df in eog_job_load_cb (job=0×82c6400, data=0×8124118) at eog-window.c:1286
#5 0xb75b8c09 in g_cclosure_marshal_VOID__VOID () from /usr/lib/libgobject-2.0.so.0
#6 0xb75ab772 in g_closure_invoke () from /usr/lib/libgobject-2.0.so.0
#7 0xb75bc323 in ?? () from /usr/lib/libgobject-2.0.so.0
#8 0×082d5450 in ?? ()
#9 0×00000000 in ?? ()
(gdb)

I repeated this a few times to see if the back trace changed, it did not (much). We’re calling some code in libgtk, specifically gtk_recent_manager_add_full(). What is gtk recent manager? what are we bookmarking with ‘g_bookmark_file_set_mime_type()’, all very relevant questions I feel. To dig deeper lets install the gtk and glib debugging symbols packages and look at an improved stack trace (note: here we have passed the first few ‘time()’ break points and are now stuck in the loop which calls time() an excessive number of times.

Breakpoint 1, 0xb74b6e25 in time () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0 0xb74b6e25 in time () from /lib/tls/i686/cmov/libc.so.6
#1 0xb7aecacf in build_recent_info (bookmarks=0×8128a88, info=0×82c1728) at /build/buildd/gtk+2.0-2.12.0/gtk/gtkrecentmanager.c:1677
#2 0xb7aece1c in IA__gtk_recent_manager_get_items (manager=0×81189b0) at /build/buildd/gtk+2.0-2.12.0/gtk/gtkrecentmanager.c:1311
#3 0×08063f2a in eog_window_update_recent_files_menu (window=0×8124118) at eog-window.c:3468
#4 0xb7671c09 in IA__g_cclosure_marshal_VOID__VOID (closure=0×8279148, return_value=0×0, n_param_values=1, param_values=0xbfb05fac, invocation_hint=0xbfb05ebc,
marshal_data=0×80641c0) at /build/buildd/glib2.0-2.14.1/gobject/gmarshal.c:77
#5 0xb7664772 in IA__g_closure_invoke (closure=0×8279148, return_value=0×0, n_param_values=1, param_values=0xbfb05fac, invocation_hint=0xbfb05ebc)
at /build/buildd/glib2.0-2.14.1/gobject/gclosure.c:490
#6 0xb7675323 in signal_emit_unlocked_R (node=0×81255d8, detail=0, instance=0×81189b0, emission_return=0×0, instance_and_params=0xbfb05fac)
at /build/buildd/glib2.0-2.14.1/gobject/gsignal.c:2440
#7 0xb7676847 in IA__g_signal_emit_valist (instance=0×81189b0, signal_id=110, detail=0, var_args=0xbfb061ec “��˷�Ǯ���˷(b��!ծ�\210\212\022\b�9+\b”)
at /build/buildd/glib2.0-2.14.1/gobject/gsignal.c:2199
#8 0xb7676a09 in IA__g_signal_emit (instance=0×81189b0, signal_id=110, detail=0) at /build/buildd/glib2.0-2.14.1/gobject/gsignal.c:2243
#9 0xb7aec7cc in gtk_recent_manager_changed (recent_manager=0×8298778) at /build/buildd/gtk+2.0-2.12.0/gtk/gtkrecentmanager.c:1380
#10 0xb7aed521 in IA__gtk_recent_manager_add_full (manager=0×81189b0, uri=0×82b39d8 “file:///home/iain/Photo-0178.jpg”, data=0×82a2ba0)
at /build/buildd/gtk+2.0-2.12.0/gtk/gtkrecentmanager.c:1001
#11 0×08067c42 in eog_window_display_image (window=0×8124118, image=0×827ed70) at eog-window.c:870
#12 0×080683df in eog_job_load_cb (job=0×84d9dc0, data=0×8124118) at eog-window.c:1286
#13 0xb7671c09 in IA__g_cclosure_marshal_VOID__VOID (closure=0×82b8430, return_value=0×0, n_param_values=1, param_values=0xbfb0656c, invocation_hint=0xbfb0647c,
marshal_data=0×80680c0) at /build/buildd/glib2.0-2.14.1/gobject/gmarshal.c:77
#14 0xb7664772 in IA__g_closure_invoke (closure=0×82b8430, return_value=0×0, n_param_values=1, param_values=0xbfb0656c, invocation_hint=0xbfb0647c)
at /build/buildd/glib2.0-2.14.1/gobject/gclosure.c:490
#15 0xb7675323 in signal_emit_unlocked_R (node=0×83234e0, detail=0, instance=0×84d9dc0, emission_return=0×0, instance_and_params=0xbfb0656c)
at /build/buildd/glib2.0-2.14.1/gobject/gsignal.c:2440
#16 0xb7676847 in IA__g_signal_emit_valist (instance=0×84d9dc0, signal_id=207, detail=0,
var_args=0xbfb067ac “\b7\v\b\b7\v\b
#17 0xb7676a09 in IA__g_signal_emit (instance=0×84d9dc0, signal_id=207, detail=0) at /build/buildd/glib2.0-2.14.1/gobject/gsignal.c:2243
#18 0×08084c97 in eog_job_finished (job=0×84d9dc0) at eog-jobs.c:120
#19 0×0808454e in notify_finished (job=0×84d9dc0) at eog-job-queue.c:65
#20 0xb75ca551 in g_idle_dispatch (source=0xb4726718, callback=0×1a, user_data=0×84d9dc0) at /build/buildd/glib2.0-2.14.1/glib/gmain.c:4132
#21 0xb75cc11c in IA__g_main_context_dispatch (context=0×80e0010) at /build/buildd/glib2.0-2.14.1/glib/gmain.c:2061
#22 0xb75cf55f in g_main_context_iterate (context=0×80e0010, block=1, dispatch=1, self=0×80b75f8) at /build/buildd/glib2.0-2.14.1/glib/gmain.c:2694
#23 0xb75cf909 in IA__g_main_loop_run (loop=0×832feb0) at /build/buildd/glib2.0-2.14.1/glib/gmain.c:2898
#24 0xb7a919e4 in IA__gtk_main () at /build/buildd/gtk+2.0-2.12.0/gtk/gtkmain.c:1144
#25 0×08060c31 in main (argc=5, argv=Cannot access memory at address 0×4
) at main.c:221

Still not much closer but now we have some source files, functions and line numbers to go and look at, perhaps the problem isn’t just with eog but with gtk instead?

Read The Source

From the above stack trace and my tracing of eog I can see that we’re spending a lot of time in build_recent_info() gtkrecentmanager.c:1090, which takes two arguments, a GBookmarkFile and GtkRecentInfo. After some hunting through the source I found that the GBookmarkFile eventually references back to the real file “~/.recently-used.xbel” this file appears to be an xml formatted file for managing “recently” used documents.

Part of the gtk_recent_manager_add_full() function is to add the recently viewed file/image/movie/song to this history file, in this case its an image. Once we’ve added it (which includes adding a time stamp) we then update the last viewed images which are displayed in the file menu of eog. Also (though I found this out some what later) we send a signal to gnome-panel to update the ‘places -> recent documents’ menu.

Once eog (or gnome-panel) receives this ‘update’ signal they run through every item stored in the .recently-used.xbel file, calling time() for each entry allowing us to find the last five items viewed in eog (or for gnome-panel its the last ten). On the surface this seems like a fine idea, though somewhat inefficient (why not just call time() the once for each run through the file, not per entry) making the current implementation slightly flawed.

The Fatal Problem

There is no default limit to the amount of entries we store in the .recently-used.xbel file the more images you view in eog (or as I later found out any gtk app which has been coded to use gtk-recent-manager) the more entries there will be, however, eog will only ever display the last five. This coupled with when running through the list to display the last five entries, time() is called against each entry - this is what leads to the gradual slowdown I described at the beginning of this post.

A single entry in the .recently-used.xbel file is 14 lines long, based on my file I currently have 1734 entries (and growing each day), thus when I move from one image to the next in eog time() is called at least 1734 times in eog followed by gnome-panel performing a second 1734 calls. Clearly this is not acceptable, something must be done - on a slow 1.3ghz laptop this leads to about 5 - 10 seconds of 100% cpu time purely to display the last five images view in eog.

Possible Solutions

  • Add a cron entry to delete the .recently-used.xbel file once a week

This is clearly a hack / workaround. It does not address the root cause, it also does not address calling time() multiple times without any need.

  • Set a default limit of 25 entries per application to be stored and only call time() once per run through of the .recently-used.

A much better solution, this would limit the size of the .recently-used.xbel file and make sorting it more efficient. gnome-panel only displays the last ten viewed documents, eog only displays the last five. Therefore holding the last 25 entries is still overkill but its at least manageable.

Conclusion

I’d love to fix this my self but I just don’t have the time to dedicate to it, hopefully this blog entry will aid people who have found the same issue and possibly help the gtk developers to improve their software.

GTK Devs please fix this!

code
Linux
hacks
gtk
debug

Comments (0)

Permalink

Which Jython?

After reading about Sun hiring some Jython developers I decided to install it myself under Debian testing to see what all the fuss is about. As a big Python fan a world without the GIL (Gloabl Interpreter Lock) of CPython sounds just lovely.

After installing it I had a little poke about, along the way I hit upon this lovely gem.

iain@hostname ~>file `which jython`
/usr/bin/jython: perl script text executable

Shocking results! luckily its just a Debian wrapper script but theres a certain amount of irony involved thats for sure.

code
python
jython

Comments (0)

Permalink

I can hardly hear you say what I should do

Every so often I look through a list of search terms people use and end up at this site, lots of them make sense and are relevant. However, there are a few which I cannot explain, most of them are questions which clearly need answering, lets see if I can help.

I can hardly hear you say what I should do

An interesting question, one without context. Perhaps its more of a problem statement than a question. In such situations it is important to find the root cause of the problem before you can work round it. There are many reasons why you may not be able to hear someone speak, for example:

  1. You may be in a noisy environment.
  2. You may be separated by a great distance.
  3. You may be communicating via phone with a sub quality line.
  4. You may be using an ‘internet phone’ when the internet is busy.
  5. You may be starting to lose you hearing.

When taking orders or commands from someone it is important that you understand what they say. Poor communication can often lead to problems, wasted time and in situations where safety is paramount, death. Before completing any order you should first confirm that you have fully understood it. To do this you will most likely need to respond in some form, based on the situations above I have listed some steps which you could take.

  1. If in a noisy environment try pointing at your ear (either one will do), look confused, shake your head and clearly mouth the words “I can’t hear you”.
  2. If separated by a great distance perhaps consider moving closer to whom you are attempting to communicate with. The solution to point one can also be attempted if moving closer is not possible.
  3. If on the phone you can shout loudly “I can’t hear you” or “you’re breaking up”, this is to be followed by hanging up and calling back. When calling back make sure to explain that you hung up because you couldn’t hear them properly.
  4. If using an ‘internet phone’ I might recommend waiting until the internet is less busy, the internet is often busiest on Sunday nights when a lot of middle aged people are surfing ebay and checking email.
  5. If you think you are losing your hearing I would recommend consulting your doctor.

I hope this has helped somewhat, there are of course many other situations too numerous to list here where you may not be able to hear someone clearly. Finally, asking msn live search or google what to do is probably not the best idea.

nontech
entropy

Comments (0)

Permalink

Attention! Do not look at adverts: The anti-advertisement adveritsment

Here is something you don’t see often, a parody warning sticker. The original warning sticker is stuck on nearly every metro door on the Paris metro, the parody version is harder to find. First of all we have the original warning sticker:

Original paris metro warning advert

The sticker reads:

“Attention!: Ne mets pas tes mains sur la porte: tu risques de te faire pincer tres fort”

And in English (translated badly by myself):

“Danger! Do not trap your hands on the door: you risk being crushed strongly”

Clearly, this little sticker warns against putting your hands in the door. Luckly for you the pink rabbit has injured his or her hands to prove the inherent danger that the doors provide.

Other than doors the Paris metro also has a lot of adverts, some clever people have parodied the above sticker and put the pink rabbit in another dangerous situation (one which we all constantly face):

Danger! Do not put your eyes on the adverts

(click to make it larger)

The caption reads

“Attention! ne mets pas tes yeux sur les pubs: tu risques de te faire manipuler tres fort”

Or translated into English:

“Danger! do not put your eyes on the adverts: you risk being strongly manipulated”

The only remaining question is if more anti-advertisement advertisements should be introduced…..

France
nontech

Comments (0)

Permalink

Redundant bonding of wireless and wired interfaces in Ubuntu.

Recently while attending a networking course which covered redundant multipathing for high availability systems I got to thinking - could this be applied to my wired / wireless network at home? The end result being that removing the wired network from my laptop by unplugging the cable would instantly fail over to my wireless connection. I’m sure many people are thinking “but gnome network manager already does that” well, not quite.

With redundant bonding or multi-pathing both interfaces are connected to the same network and put into a special group. These interfaces are constantly connected to the network and are ready to work, however only one of them is active at any one time. Should the active interface fail (such as a network cable being removed) the mac address, ip address and all other configuration is almost instantly assigned to one of the other available network interfaces. This allows you to continue working as if nothing had happened, songs playing from network shares will continue playing, instant messenger conversations will continue to work, downloads will not be interrupted… the list goes on.

It turns out that under Linux this works incredibly well, read on for details.

Requirements.

In terms of networking you must have your wired and wireless networks on the same physical network segment, in this post I am also assuming that your wireless network is setup and that you have knowledge of networking.
As for packages you will need to install the ifenslave package under ubuntu.

# apt-get install ifenslave-2.6

Step One.

Warning: if you follow the steps I’ve included here, reboot and anything goes wrong, I accept no responsibility - thanks.

Under Linux high availability and link aggregation is handled by the bonding module, this can be enabled by adding the following line to your /etc/modules configuration file.

bonding mode=1 miimon=100 downdelay=200 updelay=200

This will load the bonding module upon next boot, however, it can be loaded at any time using the modprobe command the options are as follows.

  • mode=1 - This enables the active backup mode, this will provide link level redundancy but it does not allow for any kind of link aggregation.
  • miimon=100 - This enables link level monitoring of the connection (the default value is 0 - which disables link monitoring). The value passed is the frequency in milliseconds that the link is checked. Link level monitoring only takes into account the physical connection, not if the network is correctly configured or not.
  • downdelay=200 - This is the delay in milliseconds before the link is marked as failed, it must be a multiple of miimon
  • updelay=200 - This is the delay in milliseconds before the link is marked as active, it must be a multiple of miimon

Step Two.

Next the bonding interface (bond0) must be configured, this can be done in the /etc/network/interfaces file an entry such as this needs to be added:

auto bond0
iface bond0 inet static
address 192.168.1.34
netmask 255.255.255.0
gateway 192.168.1.24
broadcast 192.168.1.255
post-up ifenslave bond0 eth0 eth1
post-up echo “eth1″ > /sys/class/net/bond0/bonding/primary
pre-down ifenslave -d bond0 eth0 eth1

The network address, netmask, gateway and broadcast should be configured for your network. This is the one single address that your machine will be known as, assuming that one of the connections to your laptop is available. You should be able to use dhcp to configure this interface, however I have chosen a static address.

  • ‘post-up ifenslave bond0 eth0 eth1′ - This line assigns my wired (eth1) and wireless (eth0) interfaces into my failover group bond0. You should replace these with the interfaces you want to use.
  • ‘post-up echo “eth1″ > /sys/class/net/bond0/bonding/primary’ - This line specifies a primary slave, that is if this interface is available it will always be used in preference to the others. In this case eth1 is my wired interface, this ensures that when I return to my desk and plug my network cable back in I will be using the faster ethernet network and not my slower wireless.
  • ‘pre-down ifenslave -d bond0 eth0 eth1′ - This line removes the eth0 and eth1 interfaces from the bond group when networking is stopped.

Step Three.

Now reboot for the settings to take effect. It is of course possible to do all of this without rebooting, one would simply perform the following steps:

  1. modprobe the bonding driver
  2. ifconfig the bonding interface
  3. add the default route for the bonding interface
  4. ifenslave the network devices
  5. set the primary bonding interface

To Test that everything is working simply ping a host on your local network then once its going unplug the wired interface it should fail over to the wireless interface without dropping any packets. You can then re-insert the network cable and instantly switch back to the wired interface.
Help, its not working.

I had a few problems getting this working, firstly ifenslave would assign the bonding interface the mac address of my wired interface. As part of my wireless network security I have mac filtering enabled, however, only the mac address of the wireless card was allowed. Adding the mac of the bonding interface to my mac filter list cleared up that problem.

My other problem involved gnome network applet, I believe that this process is called nm-monitor. This would attempt to reconfigure my wireless network or wired network whenever I removed the network cable to my laptop. This appeared to prevent the failover from working as quickly as it should have. killing off the nm-monitor fixed this little issue.

I’ve now been using this for roughly two weeks without any issues, being able to unplug my laptop and take it to another room of the house without having to even consider whats going to happen to my network has been a real step forward.

Linux
hardware
networking

Comments (2)

Permalink

When Good Hard Drives Go Bad

Picture the scene, you return to your system and casually attempt to create a new file on one of your disks. Permission Denied. What?

  1. Check you’re logged in as the correct user - check
  2. Check that the user has write permissions - check

Following this you start to get a sinking feeling, something is not right. In fact, something is very wrong but you don’t know what that is just yet. Although, the situation feels vaguely farmilar. Without recalling why you decide to check if the partition you’re attempting to write to has been mounted read only. It has, your mind races and you check /var/log/messages for the worst.

ide: failed opcode was: unknown
end_request: I/O error, dev hdd, sector 61866003
Buffer I/O error on device hdd1, logical block 30933001
hdd: dma_intr: status=0×51 { DriveReady SeekComplete Error }
hdd: dma_intr: error=0×40 { UncorrectableError }, LBAsec= 63963157, high=3, low=1460081, sector=65011733

And now you realise why this felt so familiar, its because the same thing has happened to one of your disks at the beginning of summer three years in a row. You really should have made backups by now.

I’m sorry to say that this is a true story, three different branded disks in different systems in different houses all in different conditions. The only common factor is that these disks all ran 24/7 and were used as a ‘media repository’. Having been in this situation before I would like to share my experiences of simple disk recovery under Linux.
Step One.

Unmount the faulty disk (stop reading this, do it now). Forget about reading from it, forget writing to it and whatever happens do not fsck the faulty disk. Running fsck against a disk with physical errors will only make the problem worse. I recommend that you disconnect the disk and store it somewhere safe.
Step Two.

Buy/locate/steal/borrow a replacement disk, as you’re going to need somewhere to copy your data to. You should then connect both disks, you new disk and the faulty disk to your system. If possible try to get them on separate IDE buses, I’ve had one disk fail which created IDE bus resets when reading from it. This had a negative effect on all disks on the same IDE bus (including the good disk I was copying to). Format your new disk and check that its working, do not store anything important on it just yet as it will be overwritten.

It should also be noted that if you have a disk or array which is already in-use but has enough free space available, then you can write an image of your damaged disk to this space. The disk image can then be fscked and mounted via a loop back mount. The exact details of this method are not covered here but it should be easy to adjust the methods described here.
Step 3.

Install GNU DD Rescue onto your system ( http://www.gnu.org/software/ddrescue/ddrescue.html ) the package name under Debian / ‘bunty (Ubuntu) is gddrescue. This is possibly one of the most useful data recovery tools I’ve used. It is dd designed with data recovery in mind, it will allow you to read directly from the disk ignoring any errors which may exist in the file system. The website lists its most important features but most importantly to us its very easy to use and fully automatic. Once you’ve read the manual page for dd rescue its time to start on recovering your data.
Step 4.

Make sure that both your old faulty disk and your new disk are unmounted, once that is done invoke dd rescue.

# ddrescue /dev/hdd1 /dev/hdc1 /logfile

In this example, /dev/hdd1 is my old faulty disk (the source) and /dev/hdc1 is my new disk (the target) the /logfile will create a log file which dd rescue can use to resume its recovery should you need to stop it for any reason. This will take some time so sit back and wait, the output from dd rescue is very clear. It will keep you informed as to how much data it has copied, how much is unreadable and how many errors have occoured.

For my most recent disk failure on a 250 Gigabyte drive ddrescue was able to read all but ~2 Megabytes of data. Which is really quite fantastic when you consider that with the faulty disk mounted read only I was unable to copy any files without producing hundreds of errors.
Step 5.

The moment of truth, dd rescue has read all it can read and now its time to fsck the new disk.

# fsck -C /dev/hdc1

This will run fsck with a handy progress bar, If you’re very lucky fsck will complete without complaining and you have successfully rescued all of your data. If you’re not so lucky (as I have been this time) you may have suffered some fairly nasty file system corruption, fsck will go through asking you if you would like to fix the errors. You should make a note of which files have gone and let fsck do its thing, there may be better ways to handle this, if there are I would love to hear them. If you have only suffered damage to the file system structure and your actual data files are okay then you will end up with a load of files in the lost+found directory. Simply put these are files which exist on disk but no longer have a file name (and other things) associated with them, so fsck will move them into the lost+found directory.
Step 6.

Thats it really, you should have most or all of your data back, unless you’ve been really unlucky. Now is the time to create a backup plan, do it now while the fear is still in you. Otherwise six months down the line you’ll be thinking to yourself, “nahh disk failure wont happen to me (again)”.
It should be noted that there are variations on this method, if anyone knows of any better ways to handle a failed disk please let me know, I’d love to know if there are any steps or tools I’ve missed out upon.

Linux
hardware

Comments (1)

Permalink