From agshew at gmail.com Mon Jun 1 10:39:42 2009 From: agshew at gmail.com (Andrew Shewmaker) Date: Mon, 1 Jun 2009 11:39:42 -0600 Subject: [Warewulf] debugging provisiond Message-ID: I'm having an issue with the passwdfile module where it isn't putting the contents of /etc/perceus/modules/passwdfile/all into /etc/passwd I've double checked that the module is activated, and the link to the nodescript is in place. I've also made sure that /etc/passwd is writeable. I've restarted the perceus service, and I've recreated my hybrid image. I've passed in the enable-debug=2 kernel parameter in, but I don't see anything useful on the node's console while it is booting. What should I be looking at? I must be doing something dumb. Thank you. -- Andrew Shewmaker From agshew at gmail.com Mon Jun 1 12:53:38 2009 From: agshew at gmail.com (Andrew Shewmaker) Date: Mon, 1 Jun 2009 13:53:38 -0600 Subject: [Warewulf] debugging provisiond In-Reply-To: References: Message-ID: On Mon, Jun 1, 2009 at 11:39 AM, Andrew Shewmaker wrote: > I'm having an issue with the passwdfile module where it isn't putting > the contents of /etc/perceus/modules/passwdfile/all into /etc/passwd ... > What should I be looking at? I must be doing something dumb. Like having perceus-provisiond installed in the vnfs, but with its files listed in the hybridize file. I didn't get symlinks for provisiond, but I did for other stuff. I think I have it now. -- Andrew Shewmaker From gmkurtzer at gmail.com Mon Jun 1 13:42:36 2009 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Mon, 1 Jun 2009 13:42:36 -0700 Subject: [Warewulf] debugging provisiond In-Reply-To: References: Message-ID: <571f1a060906011342h20988a0bo27e9cbef6664c04b@mail.gmail.com> Hello Andrew, Good catch... Now I wonder why!? If you notice anything further please let us know. Otherwise, please send the hybridized entries for provisiond so we can try and replicate. Thanks, Greg On Mon, Jun 1, 2009 at 12:53 PM, Andrew Shewmaker wrote: > On Mon, Jun 1, 2009 at 11:39 AM, Andrew Shewmaker wrote: >> I'm having an issue with the passwdfile module where it isn't putting >> the contents of /etc/perceus/modules/passwdfile/all into /etc/passwd > ... >> What should I be looking at? ?I must be doing something dumb. > > Like having perceus-provisiond installed in the vnfs, but with its > files listed in the hybridize file. ?I didn't get symlinks for > provisiond, but I did for other stuff. > > I think I have it now. > > -- > Andrew Shewmaker > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > -- Greg Kurtzer http://www.infiscale.com/ http://www.perceus.org/ http://www.caoslinux.org/ From ranjeet.rai at orkash.com Fri Jun 5 02:28:30 2009 From: ranjeet.rai at orkash.com (ranjeet.rai at orkash.com) Date: Fri, 05 Jun 2009 14:58:30 +0530 Subject: [Warewulf] Upgrading Perceus to 1.5.2 Message-ID: <4A28E53E.60905@orkash.com> Hi, I have configured CAOS NSA with Perceus 1.5.0 for my testing 2 node cluster. I am trying to upgrading perceus with it's latest available version i.e 1.5.2. I downloaded the tar.gz package and configured and installed (./configure, make & make install). Now, when I run the command #/etc/init.d/perceus restart/start/stop etc. I get the message "you need to initialize Perceus before starting the services !" please help me to troubleshoot this message and run my perceus again. Thanks in advance. Best regards, Ranjeet Rai From griznog at gmail.com Fri Jun 5 03:56:41 2009 From: griznog at gmail.com (John Hanks) Date: Fri, 5 Jun 2009 06:56:41 -0400 Subject: [Warewulf] Upgrading Perceus to 1.5.2 In-Reply-To: <4A28E53E.60905@orkash.com> References: <4A28E53E.60905@orkash.com> Message-ID: On Fri, Jun 5, 2009 at 5:28 AM, ranjeet.rai at orkash.com wrote: > Hi, > > I have configured CAOS NSA with Perceus 1.5.0 for my testing 2 node > cluster. I am trying to upgrading perceus with it's latest available > version i.e 1.5.2. I downloaded the tar.gz package and configured and > installed (./configure, make & make install). > > Now, when I run the command #/etc/init.d/perceus restart/start/stop etc. > I get the message "you need to initialize Perceus before starting the > services !" > Hmm, this implies it wasn't just me being stupid. I went through this a few days ago on CentOS 5 and had the same issue. My impression was that the make install step somehow foobar'd the --prefix and put new database (and possibly other) paths into play. Rather than troubleshoot, I just started over as mine was a throwaway test system. But this does imply to me that there is a problem with the "make install" step in 1.5.2 and where it thinks things should go. You might find your old databases in the old location, I'd try poking around /usr/local/var/lib/perceus and /var/lib/perceus (or related paths) and see if you have multiple database localtions. In my case I thought I did that correctly and backed it up but it turned out those were empty (thus my opting for a do-over.) I can't prove now if I backed up the wrong thing or if "make install" overwrote my db files so other than "me too" I'm about as little help as you can get. jbh From astevens at infiscale.com Fri Jun 5 07:38:13 2009 From: astevens at infiscale.com (astevens at infiscale.com) Date: Fri, 5 Jun 2009 14:38:13 +0000 Subject: [Warewulf] Upgrading Perceus to 1.5.2 Message-ID: <391821016-1244212717-cardhu_decombobulator_blackberry.rim.net-419156927-@bxe1251.bisx.prod.on.blackberry> On Caos NSA, to update the system, all you have to do is... smart update smart upgrade From your root prompt, or you can update the system via sidekick as well, no need to mess with the rpm's or do anything else :) ------Original Message------ From: ranjeet.rai at orkash.com Sender: warewulf-bounces at caoslinux.org To: warewulf at caoslinux.org ReplyTo: The Warewulf Cluster Toolkit Subject: [Warewulf] Upgrading Perceus to 1.5.2 Sent: Jun 5, 2009 2:28 AM Hi, I have configured CAOS NSA with Perceus 1.5.0 for my testing 2 node cluster. I am trying to upgrading perceus with it's latest available version i.e 1.5.2. I downloaded the tar.gz package and configured and installed (./configure, make & make install). Now, when I run the command #/etc/init.d/perceus restart/start/stop etc. I get the message "you need to initialize Perceus before starting the services !" please help me to troubleshoot this message and run my perceus again. Thanks in advance. Best regards, Ranjeet Rai _______________________________________________ Warewulf mailing list Warewulf at caoslinux.org http://lists.caosity.org/mailman/listinfo/warewulf Sent via BlackBerry from T-Mobile From ranjeet.rai at orkash.com Sun Jun 7 21:13:07 2009 From: ranjeet.rai at orkash.com (ranjeet.rai at orkash.com) Date: Mon, 08 Jun 2009 09:43:07 +0530 Subject: [Warewulf] Warewulf Digest, Vol 54, Issue 3 In-Reply-To: References: Message-ID: <4A2C8FD3.4020305@orkash.com> Hi, I tried upgrading via "smart update/upgrade, it resolves the package perceus 1.5.1. I feel the latest 1.5.2 is not updated in the image list or there may be some other issues. Please help me to achieve this. Best regards, Ranjeet Rai warewulf-request at caoslinux.org wrote: > Send Warewulf mailing list submissions to > warewulf at caoslinux.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.caosity.org/mailman/listinfo/warewulf > or, via email, send a message with subject or body 'help' to > warewulf-request at caoslinux.org > > You can reach the person managing the list at > warewulf-owner at caoslinux.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Warewulf digest..." > > > Today's Topics: > > 1. Re: Upgrading Perceus to 1.5.2 (astevens at infiscale.com) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 5 Jun 2009 14:38:13 +0000 > From: astevens at infiscale.com > Subject: Re: [Warewulf] Upgrading Perceus to 1.5.2 > To: "The Warewulf Cluster Toolkit" > Message-ID: > <391821016-1244212717-cardhu_decombobulator_blackberry.rim.net-419156927- at bxe1251.bisx.prod.on.blackberry> > > Content-Type: text/plain > > On Caos NSA, to update the system, all you have to do is... > > smart update > smart upgrade > > >From your root prompt, or you can update the system via sidekick as well, no need to mess with the rpm's or do anything else :) > ------Original Message------ > From: ranjeet.rai at orkash.com > Sender: warewulf-bounces at caoslinux.org > To: warewulf at caoslinux.org > ReplyTo: The Warewulf Cluster Toolkit > Subject: [Warewulf] Upgrading Perceus to 1.5.2 > Sent: Jun 5, 2009 2:28 AM > > Hi, > > I have configured CAOS NSA with Perceus 1.5.0 for my testing 2 node > cluster. I am trying to upgrading perceus with it's latest available > version i.e 1.5.2. I downloaded the tar.gz package and configured and > installed (./configure, make & make install). > > Now, when I run the command #/etc/init.d/perceus restart/start/stop etc. > I get the message "you need to initialize Perceus before starting the > services !" > > please help me to troubleshoot this message and run my perceus again. > > Thanks in advance. > > Best regards, > > Ranjeet Rai > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > > Sent via BlackBerry from T-Mobile > > ------------------------------ > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > > End of Warewulf Digest, Vol 54, Issue 3 > *************************************** > > From ranjeet.rai at orkash.com Sun Jun 7 23:16:34 2009 From: ranjeet.rai at orkash.com (ranjeet.rai at orkash.com) Date: Mon, 08 Jun 2009 11:46:34 +0530 Subject: [Warewulf] Testing application for HPC In-Reply-To: References: Message-ID: <4A2CACC2.6030601@orkash.com> Hi, I have created a three node cluster with CAOS NSA and perceus 1.5.1. I want to test this cluster by running some other application on these parallel node. Please, help me to test this HPC. I want to run this test to visualize the working of our cluster environment with any freely/openly available application. Any support and guidance in this aspects will highly be appreciated. Best regards, Ranjeet Rai Orkash Services Pvt Ltd Mob: +91 9810499844 Tel: +91 124 2345773 www.orkash.com ... ensuring Assurance in uncertainty and complexity This message including the attachments, if any, is a confidential business communication. If you are not the intended recipient it may be unlawful for you to read, copy, distribute, disclose or otherwise use the information in this e-mail. If you have received it in error or are not the intended recipient, please destroy it and notify the sender immediately. Thank you From ranjeet.rai at orkash.com Tue Jun 9 23:15:49 2009 From: ranjeet.rai at orkash.com (ranjeet.rai at orkash.com) Date: Wed, 10 Jun 2009 11:45:49 +0530 Subject: [Warewulf] HPC testing with JAVA application In-Reply-To: References: Message-ID: <4A2F4F95.3010207@orkash.com> Hi, I have created a 2 node cluster using perceus under Caos NSA. I want to run an application which is java based in combination with JPVM package. This environment also requires jdk environment on cluster nodes. How can I configure jdk for cluster nodes? Also, guidelines on testing the performance of this cluster will be appreciated. Best regards, Ranjeet Rai Orkash Services Pvt Ltd Mob: +91 9810499844 Tel: +91 124 2345773 www.orkash.com ... ensuring Assurance in uncertainty and complexity This message including the attachments, if any, is a confidential business communication. If you are not the intended recipient it may be unlawful for you to read, copy, distribute, disclose or otherwise use the information in this e-mail. If you have received it in error or are not the intended recipient, please destroy it and notify the sender immediately. Thank you From jsquyres at cisco.com Thu Jun 11 19:06:44 2009 From: jsquyres at cisco.com (Jeff Squyres) Date: Thu, 11 Jun 2009 19:06:44 -0700 Subject: [Warewulf] Perceus 1.5.2 build problem on RHEL4u4 Message-ID: <7D1819AA-9FD1-4BE5-B728-6FE347214B86@cisco.com> I was trying to upgrade from Perceus 1.3.4 (gasp!) to 1.5.2 today. My cluster head node is RHEL4u4 (unfortunately, can't change it :-( ). The perceus kernel failed to build -- here's a snipit of the failure (full config.log, config.out, and make.out attached): ... LD arch/x86/lib/built-in.o AR arch/x86/lib/lib.a LD vmlinux.o MODPOST vmlinux.o GEN .version CHK include/linux/compile.h UPD include/linux/compile.h CC init/version.o LD init/built-in.o LD vmlinux mm/built-in.o(.text+0x6f16): In function `test_clear_page_writeback': : undefined reference to `____ilog2_NaN' make[2]: *** [vmlinux] Error 1 make[2]: Leaving directory `/home/jsquyres/perceus/ perceus-1.5.2/3rd_party/_work/kernel/linux-2.6.28' ... make[2]: Leaving directory `/home/jsquyres/perceus/ perceus-1.5.2/3rd_party/_work/kernel/linux-2.6.28' cp _work/kernel/linux-2.6.28/arch/x86_64/boot/bzImage ../ cp: cannot stat `_work/kernel/linux-2.6.28/arch/x86_64/boot/bzImage': No such file or directory make[1]: *** [kernel] Error 1 make[1]: Leaving directory `/home/jsquyres/perceus/ perceus-1.5.2/3rd_party' make: *** [all-recursive] Error 1 Any ideas? Is Perceus 1.5.2 supported on RHEL4u4? Thanks! -- Jeff Squyres Cisco Systems -------------- next part -------------- A non-text attachment was scrubbed... Name: perceus-1.5.2.tar.bz2 Type: application/x-bzip2 Size: 26197 bytes Desc: not available Url : http://altruistic.infiscale.org/pipermail/perceus/attachments/20090611/62aec7c4/attachment.bz2 From gmkurtzer at gmail.com Thu Jun 11 20:11:03 2009 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Thu, 11 Jun 2009 20:11:03 -0700 Subject: [Warewulf] Perceus 1.5.2 build problem on RHEL4u4 In-Reply-To: <7D1819AA-9FD1-4BE5-B728-6FE347214B86@cisco.com> References: <7D1819AA-9FD1-4BE5-B728-6FE347214B86@cisco.com> Message-ID: <571f1a060906112011w49d8101h371a956a12e44df8@mail.gmail.com> Heya Jeff! We have seen this before and IIRC, the version of binutils is too old on RHEL4 to build recent kernels. There are several kludges that I can think of, none of which I would recommend on a public forum. ;) Feel free to contact me off list and I can work with you directly on this. Greg On Thu, Jun 11, 2009 at 7:06 PM, Jeff Squyres wrote: > I was trying to upgrade from Perceus 1.3.4 (gasp!) to 1.5.2 today. ?My > cluster head node is RHEL4u4 (unfortunately, can't change it :-( ). ?The > perceus kernel failed to build -- here's a snipit of the failure (full > config.log, config.out, and make.out attached): > > ... > ?LD ? ? ?arch/x86/lib/built-in.o > ?AR ? ? ?arch/x86/lib/lib.a > ?LD ? ? ?vmlinux.o > ?MODPOST vmlinux.o > ?GEN ? ? .version > ?CHK ? ? include/linux/compile.h > ?UPD ? ? include/linux/compile.h > ?CC ? ? ?init/version.o > ?LD ? ? ?init/built-in.o > ?LD ? ? ?vmlinux > mm/built-in.o(.text+0x6f16): In function `test_clear_page_writeback': > : undefined reference to `____ilog2_NaN' > make[2]: *** [vmlinux] Error 1 > make[2]: Leaving directory > `/home/jsquyres/perceus/perceus-1.5.2/3rd_party/_work/kernel/linux-2.6.28' > ... > make[2]: Leaving directory > `/home/jsquyres/perceus/perceus-1.5.2/3rd_party/_work/kernel/linux-2.6.28' > cp _work/kernel/linux-2.6.28/arch/x86_64/boot/bzImage ../ > cp: cannot stat `_work/kernel/linux-2.6.28/arch/x86_64/boot/bzImage': No > such file or directory > make[1]: *** [kernel] Error 1 > make[1]: Leaving directory `/home/jsquyres/perceus/perceus-1.5.2/3rd_party' > make: *** [all-recursive] Error 1 > > Any ideas? ?Is Perceus 1.5.2 supported on RHEL4u4? > > Thanks! > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > -- Greg Kurtzer http://www.infiscale.com/ http://www.perceus.org/ http://www.caoslinux.org/ From bernard at vanhpc.org Mon Jun 15 17:59:44 2009 From: bernard at vanhpc.org (Bernard Li) Date: Mon, 15 Jun 2009 17:59:44 -0700 Subject: [Warewulf] perceus module deactivate bug Message-ID: Hi all: Using Perceus 1.5.2. Activating and subsequently deactivating the same module on a node gives me this: # perceus module activate provision init/node/node01 Perceus Module 'provision' has been enabled in 'init/node/node01' # perceus module deactivate provision init/node/node01 WARNING: Inappropriate role name: init/node/node01 Perceus Module 'provision' has been disabled in 'init/node/node01' The warning seems like a bug to me, patch as follows: ---cut--- Index: scripts/lib/Perceus/Interface/Cmdline.pm =================================================================== --- scripts/lib/Perceus/Interface/Cmdline.pm (revision 2114) +++ scripts/lib/Perceus/Interface/Cmdline.pm (working copy) @@ -1587,8 +1587,8 @@ } foreach my $r ( @roles_req ) { - if ( $r =~ /^([a-zA-Z0-9]+\/[a-zA-Z0-9]+)$/ ) { - &dprint("Adding '$r' to untainted roles to activate"); + if ( $r =~ /^([a-zA-Z0-9]+(\/[a-zA-Z0-9]+)+)$/ ) { + &dprint("Adding '$r' to untainted roles to deactivate"); push(@roles, $1); } else { &wprint("Inappropriate role name: $r"); ---cut--- I also fixed a typo: "activate" --> "deactivate". Pretty simple patch, I could attach it if needed. Thanks, Bernard From gmkurtzer at gmail.com Fri Jun 19 17:40:57 2009 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Fri, 19 Jun 2009 17:40:57 -0700 Subject: [Warewulf] perceus module deactivate bug In-Reply-To: References: Message-ID: <571f1a060906191740x4dc6dfbci4c91a15c365d9297@mail.gmail.com> Got it. Thanks! On Mon, Jun 15, 2009 at 5:59 PM, Bernard Li wrote: > Hi all: > > Using Perceus 1.5.2. ?Activating and subsequently deactivating the > same module on a node gives me this: > > # perceus module activate provision init/node/node01 > Perceus Module 'provision' has been enabled in 'init/node/node01' > # perceus module deactivate provision init/node/node01 > WARNING: Inappropriate role name: init/node/node01 > Perceus Module 'provision' has been disabled in 'init/node/node01' > > The warning seems like a bug to me, patch as follows: > > ---cut--- > Index: scripts/lib/Perceus/Interface/Cmdline.pm > =================================================================== > --- scripts/lib/Perceus/Interface/Cmdline.pm ? ?(revision 2114) > +++ scripts/lib/Perceus/Interface/Cmdline.pm ? ?(working copy) > @@ -1587,8 +1587,8 @@ > ? ? } > > ? ? foreach my $r ( @roles_req ) { > - ? ? ? ?if ( $r =~ /^([a-zA-Z0-9]+\/[a-zA-Z0-9]+)$/ ) { > - ? ? ? ? ? ?&dprint("Adding '$r' to untainted roles to activate"); > + ? ? ? ?if ( $r =~ /^([a-zA-Z0-9]+(\/[a-zA-Z0-9]+)+)$/ ) { > + ? ? ? ? ? ?&dprint("Adding '$r' to untainted roles to deactivate"); > ? ? ? ? ? ? push(@roles, $1); > ? ? ? ? } else { > ? ? ? ? ? ? &wprint("Inappropriate role name: $r"); > ---cut--- > > I also fixed a typo: "activate" --> "deactivate". > > Pretty simple patch, I could attach it if needed. > > Thanks, > > Bernard > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > -- Greg Kurtzer http://www.infiscale.com/ http://www.perceus.org/ http://www.caoslinux.org/ From agshew at gmail.com Mon Jun 22 09:01:28 2009 From: agshew at gmail.com (Andrew Shewmaker) Date: Mon, 22 Jun 2009 10:01:28 -0600 Subject: [Warewulf] debugging provisiond In-Reply-To: <571f1a060906011342h20988a0bo27e9cbef6664c04b@mail.gmail.com> References: <571f1a060906011342h20988a0bo27e9cbef6664c04b@mail.gmail.com> Message-ID: On Mon, Jun 1, 2009 at 2:42 PM, Greg Kurtzer wrote: > On Mon, Jun 1, 2009 at 12:53 PM, Andrew Shewmaker wrote: > > On Mon, Jun 1, 2009 at 11:39 AM, Andrew Shewmaker wrote: > >> I'm having an issue with the passwdfile module where it isn't putting > >> the contents of /etc/perceus/modules/passwdfile/all into /etc/passwd > > ... > >> What should I be looking at? I must be doing something dumb. > > > > Like having perceus-provisiond installed in the vnfs, but with its > > files listed in the hybridize file. I didn't get symlinks for > > provisiond, but I did for other stuff. > > Hello Andrew, > > Good catch... Now I wonder why!? > > If you notice anything further please let us know. Otherwise, please > send the hybridized entries for provisiond so we can try and > replicate. Thanks for your response ... I didn't see it for a while because my email filter hid it from view. The good news is I finally found my problem. It had to do with a custom version of the umount script that we are using with our perceus/cfengine integration. I'm not sure why, but HYBRIDIZE=`grep -v "^#" $HYBRIDIZE_FILE | sed -e 's/^\///g'` was replaced with HYBRIDIZE=`grep -v "^#" $HYBRIDIZE_FILE` and that isn't good because then the find used for symlinking is looking at the master node file system's root instead of the vnfs. Most of the files existed in both places, but we have the occasional package which isn't. -- Andrew Shewmaker From chaz.whittemore at gmail.com Wed Jun 24 09:23:40 2009 From: chaz.whittemore at gmail.com (Chaz Whittemore) Date: Wed, 24 Jun 2009 12:23:40 -0400 Subject: [Warewulf] First cluster - some general caos/perceus questions Message-ID: <86079c030906240923g5fdb0417t52fa4dee974e506@mail.gmail.com> Hello all! I am setting up my first (production) cluster, and I had some general questions I was hoping you all could help me with. II am using Perceus/Warewulf/openmpi/slurm packages contained in caos NSA by default. 1) When running 1.0.8 on the master and all nodes, everything worked great. However, after updating all of the packages via smart (including kernel, slurm, and openmpi), some strange things happened. While slurm still works, I now have to manually restart the slurmd on the each node after restarts to get sinfo to see them as active. Also, mpirun seems to be broken now... maybe this is due to a version mismatch b/c the vnfs didnt get updated? 2) Is there a way to run smart updates on the vnfs capsule? I am used to yum and rpm with their root= and install-root parameters - is there an equivalent for smart? 3) At the fancy new graphical Perceus splash - how do I reduce the wait time from 30 seconds to something shorter? 4) I have configured the nodes to each have a scratch drive. I have partritioned the drives into two partitions, a swap and an ext3 for /scratch (formatted too) - and I uncommented the matching lines in the vnfs' /etc/fstab. However, when the node starts up, I see the message "Special device /dev/sda2 does not exist." - and upon further inspection, neither partitin gets mounted. I CAN, however, mount them immediately after boot with "mount /dev/sda2 /scratch" and "swapon /dev/sda1" and they work fine. Is there something I am missing? 5) This is more of a netrworking question - maybe someone here has some input. I initially set up the small cluster on an umanaged soho 8 port gigabit switch. But, after convincing the higher ups that we need a managed switch - I installed it, only to find that the nodes are taking twice as long to boot. Over three minuts each! (Only about 1:45 on the unmanaged switch). Is there something I should be doing with my managed switch out of the box to get it of comparible performance? Thanks everyone - especially for your patience!!! -chaz -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20090624/c7b3e3bb/attachment.html From gmkurtzer at gmail.com Wed Jun 24 10:15:28 2009 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Wed, 24 Jun 2009 10:15:28 -0700 Subject: [Warewulf] First cluster - some general caos/perceus questions In-Reply-To: <86079c030906240923g5fdb0417t52fa4dee974e506@mail.gmail.com> References: <86079c030906240923g5fdb0417t52fa4dee974e506@mail.gmail.com> Message-ID: <571f1a060906241015x708507adj77ffd211425ed5ea@mail.gmail.com> On Wed, Jun 24, 2009 at 9:23 AM, Chaz Whittemore wrote: > Hello all! I am setting up my first (production) cluster, and I had some > general questions I was hoping you all could help me with. II am using > Perceus/Warewulf/openmpi/slurm packages contained in caos NSA by default. Hello, and welcome aboard! ;) > > 1) When running 1.0.8 on the master and all nodes, everything worked great. > However, after updating all of the packages via smart (including kernel, > slurm, and openmpi), some strange things happened. While slurm still works, > I now have to manually restart the slurmd on the each node after restarts to > get sinfo to see them as active. Also, mpirun seems to be broken now... > maybe this is due to a version mismatch b/c the vnfs didnt get updated? Yes, I think your right about the VNFS not being at the same version as the master. Try this: # perceus vnfs mount [vnfs name] # smart -o rpm-root=/mnt/[vnfs name] upgrade # perceus vnfs umount [vnfs name] Please note that if the kernel version changes, you will have to update the VNFS configuration in /etc/perceus/vnfs/[vnfs name]/config. > > 2) Is there a way to run smart updates on the vnfs capsule? I am used to yum > and rpm with their root= and install-root parameters - is there an > equivalent for smart? Yep. See #1. > > 3) At the fancy new graphical Perceus splash - how do I reduce the wait time > from 30 seconds to something shorter? Edit the file /var/lib/perceus/tftp/pxelinux.cfg/default and change 300 to something smaller. > > 4) I have configured the nodes to each have a scratch drive. I have > partritioned the drives into two partitions, a swap and an ext3 for /scratch > (formatted too) - and I uncommented the matching lines in the vnfs' > /etc/fstab. However, when the node starts up, I see the message "Special > device /dev/sda2 does not exist." - and upon further inspection, neither > partitin gets mounted. I CAN, however, mount them immediately after boot > with "mount /dev/sda2 /scratch" and "swapon /dev/sda1" and they work fine. > Is there something I am missing? My guess is that the device drivers are taking some time to initialize and are not blocking. There are several solutions. The easiest is to kludge it with a "mount -a" in the /etc/rc.local. > > 5) This is more of a netrworking question - maybe someone here has some > input.? I initially set up the small cluster on an umanaged soho 8 port > gigabit switch. But, after convincing the higher ups that we need a managed > switch - I installed it, only to find that the nodes are taking twice as > long to boot. Over three minuts each! (Only about 1:45 on the unmanaged > switch). Is there something I should be doing with my managed switch out of > the box to get it of comparible performance? Can you identify the point of the boot that is it causing the slowness? During the provisioning itself, or timing out and retrying? At first guess check to make sure the switch is setting duplexing and auto-negotiating properly. > > Thanks everyone - especially for your patience!!! Best of luck! -- Greg Kurtzer http://www.infiscale.com/ http://www.perceus.org/ http://www.caoslinux.org/ From gwleong at gmail.com Wed Jun 24 15:51:20 2009 From: gwleong at gmail.com (Gary Leong) Date: Wed, 24 Jun 2009 15:51:20 -0700 Subject: [Warewulf] pmod Message-ID: Does anybody know the format for pmod? I can insert the module manually, but just wanted to know if it was better to use the actual import functionality on perceus. Gary -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20090624/de81c93b/attachment.html From WSChoong at lbl.gov Wed Jun 24 23:43:28 2009 From: WSChoong at lbl.gov (Woon-Seng Choong) Date: Wed, 24 Jun 2009 23:43:28 -0700 Subject: [Warewulf] Problem registering perceus Message-ID: I am trying to install Caos NSA and perceus. Let me just say that I have no problem registering with perceus, but I keep getting error registering perceus. I am trying to setup a cluster for a collaborator in Malaysia, and initially I don't even connect to the outside world. Every time I try to run a perceus command, it jumps into the registration. And when I try to boot up a node, it doesn't come up completely and I don't know if this is due to not registering. Is there a way to bypass the registration and make the node boot up properly. Thank you Seng From stefan at mdy.univie.ac.at Thu Jun 25 00:42:13 2009 From: stefan at mdy.univie.ac.at (Stefan Boresch) Date: Thu, 25 Jun 2009 09:42:13 +0200 Subject: [Warewulf] First cluster - some general caos/perceus questions In-Reply-To: <86079c030906240923g5fdb0417t52fa4dee974e506@mail.gmail.com> References: <86079c030906240923g5fdb0417t52fa4dee974e506@mail.gmail.com> Message-ID: <20090625074213.GF19565@loop.mdy.univie.ac.at> Hi Chaz, Greg already gave you the authoritative answers ... On Wed, Jun 24, 2009 at 12:23:40PM -0400, Chaz Whittemore wrote: > > 4) I have configured the nodes to each have a scratch drive. I have > partritioned the drives into two partitions, a swap and an ext3 for /scratch > (formatted too) - and I uncommented the matching lines in the vnfs' > /etc/fstab. However, when the node starts up, I see the message "Special > device /dev/sda2 does not exist." - and upon further inspection, neither > partitin gets mounted. I CAN, however, mount them immediately after boot > with "mount /dev/sda2 /scratch" and "swapon /dev/sda1" and they work fine. > Is there something I am missing? > I just want to confirm Greg's suspicion here as a fact. To speed up the boot process on the nodes, the are booting in 'fast' mode, not waiting for the dynamic devices to have "settled", hence your mount -a comes too early. If you are in a hurry, definitely follow Greg's advice and (mis)use rc.local. Otherwise look at the start up scripts to locate the trouble area. I had this 'fixed' in a local version of mine before I lost the changes in a complete reinstall :-( Since then I am happy with the rc.local kludge ... Good luck, Stefan -- Stefan Boresch Institute for Computational Biological Chemistry University of Vienna, Waehringerstr. 17 A-1090 Vienna, Austria Phone: -43-1-427752715 Fax: -43-1-427752790