From darylgrunau at gmail.com Mon Mar 3 12:27:37 2008 From: darylgrunau at gmail.com (Daryl Grunau) Date: Mon, 3 Mar 2008 13:27:37 -0700 Subject: [Warewulf] Subtle hybridize bug in Perceus 1.3.6 Message-ID: <62aac1510803031227u43487542y2e680ac83156842a@mail.gmail.com> Hi, I ran into a very subtle bug in the hybridize function of 1.3.6 (did not check earlier versions). The problem is that the rsync --excludes which eventually get formed into symlinks back to the NFS directory are formed from the hybridize file with their '/' anchor stripped off. This appears to work 99% of the time however consider the following example: Suppose the hybridize file contains /opt The resulting rsync line would turn into # cd $VNFSDIR/rootfs && rsync -qaSH --exclude=opt . $TMPDIR for some $TMPDIR. The problem arises if you have any other subdir in your rootfs that has "/opt/" in the path. Rsync's behavior is to exclude any such subdirectory if the pattern is not anchored at '/' (see the comments about include/exclude pattern rules in the rsync manpage). I stumbled onto this recently when I lost track of all my vnfs files that should have been located in /var/opt/moab. To fix: apply this patch to the umount script. Daryl ------ cut here ----- --- umount.ORIG 2008-02-23 11:21:21.000000000 -0700 +++ umount 2008-03-03 10:57:59.000000000 -0700 @@ -27,7 +27,7 @@ HYBRIDIZE_FILE="$VNFSDIR/hybridize" fi -HYBRIDIZE=`grep -v "^#" $HYBRIDIZE_FILE | sed -e 's/^\///g'` +HYBRIDIZE=`grep -v "^#" $HYBRIDIZE_FILE` EXCLUDES=`for i in $HYBRIDIZE; do echo "--exclude=$i "; done` @@ -41,7 +41,7 @@ cd $VNFSDIR/rootfs umount $VNFSDIR/rootfs/proc 2>/dev/null mkdir -p $TMPDIR -rsync -qaSH $EXCLUDES . $TMPDIR +rsync -qaRSH $EXCLUDES . $TMPDIR if [ -f "$TMPDIR/sbin/hotplug" ]; then mv $TMPDIR/sbin/hotplug $TMPDIR/sbin/hotplug.disabled From astevens at gravitypark.com Mon Mar 3 12:36:09 2008 From: astevens at gravitypark.com (Arthur Stevens) Date: Mon, 3 Mar 2008 12:36:09 -0800 Subject: [Warewulf] Subtle hybridize bug in Perceus 1.3.6 References: <62aac1510803031227u43487542y2e680ac83156842a@mail.gmail.com> Message-ID: <003f01c87d6e$367b3d60$cb00a8c0@terminal209> Thanks Daryl. All the QA really helps. :) ----- Original Message ----- From: "Daryl Grunau" To: Sent: Monday, March 03, 2008 12:27 PM Subject: [Warewulf] Subtle hybridize bug in Perceus 1.3.6 > Hi, I ran into a very subtle bug in the hybridize function of 1.3.6 > (did not check earlier versions). The problem is that the rsync > --excludes which eventually get formed into symlinks back to the NFS > directory are formed from the hybridize file with their '/' anchor > stripped off. This appears to work 99% of the time however consider > the following example: Suppose the hybridize file contains > > /opt > > The resulting rsync line would turn into > > # cd $VNFSDIR/rootfs && rsync -qaSH --exclude=opt . $TMPDIR > > for some $TMPDIR. The problem arises if you have any other subdir in > your rootfs that has "/opt/" in the path. Rsync's behavior is to > exclude any such subdirectory if the pattern is not anchored at '/' > (see the comments about include/exclude pattern rules in the rsync > manpage). I stumbled onto this recently when I lost track of all my > vnfs files that should have been located in /var/opt/moab. To fix: > apply this patch to the umount script. > > Daryl > > ------ cut here ----- > > --- umount.ORIG 2008-02-23 11:21:21.000000000 -0700 > +++ umount 2008-03-03 10:57:59.000000000 -0700 > @@ -27,7 +27,7 @@ > HYBRIDIZE_FILE="$VNFSDIR/hybridize" > fi > > -HYBRIDIZE=`grep -v "^#" $HYBRIDIZE_FILE | sed -e 's/^\///g'` > +HYBRIDIZE=`grep -v "^#" $HYBRIDIZE_FILE` > > EXCLUDES=`for i in $HYBRIDIZE; do echo "--exclude=$i "; done` > > @@ -41,7 +41,7 @@ > cd $VNFSDIR/rootfs > umount $VNFSDIR/rootfs/proc 2>/dev/null > mkdir -p $TMPDIR > -rsync -qaSH $EXCLUDES . $TMPDIR > +rsync -qaRSH $EXCLUDES . $TMPDIR > > if [ -f "$TMPDIR/sbin/hotplug" ]; then > mv $TMPDIR/sbin/hotplug $TMPDIR/sbin/hotplug.disabled > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > From astevens at gravitypark.com Tue Mar 4 10:20:00 2008 From: astevens at gravitypark.com (Arthur Stevens) Date: Tue, 4 Mar 2008 10:20:00 -0800 Subject: [Warewulf] Perceus 1.3.7 Released References: <62aac1510803031227u43487542y2e680ac83156842a@mail.gmail.com> Message-ID: <011e01c87e32$6fb16360$cb00a8c0@terminal209> With great pleasure we release Perceus 1.3.7. Some of the updates include.... a.. Better integration with SLURM b.. Backported SLURM Perceus module from the 1.4 devel tree c.. Fixed bogus warnings d.. Updated DHCP and IB integration for better RFC compliance e.. Fixed some of the hw_unload debugging f.. Updated users guide g.. Updated chroot generation scripts h.. Fixed some bash_completion features i.. Enabled HIGH_MEM in the 32 bit kernel j.. Fixed minor hybridization bug We are also busy working on the 1.4 branch with unheard of scalability, handling tens of thousands of nodes. Now is also the time to partner. Partners get priority access to code, support, and our roadmap of offerings. They also help ensure the survival of the project. A big thanks to all our existing partners as well as our new ones. A big welcome to our latest partners, R-Systems and Silicon Mechanics. A double thump on the head goes out to all of you Perceus/Warewulf resellers that have never contributed or only talk of partnering when you need assistance. You guys know who you are and are the reason places like BusyBox have halls of shame. Sign up now and get on the right track. Perceus is also available embedded on the new Intel EPSD gear including alcolu, melstone, shoffner and others. Ask your systems integrator for Perceus Embeded or contact us for for a supplier refferal. Also the first Perceus Embedded InfiniBand Cards are in the works! Other motherboard vendors availability expected by Summer. Please contact Infiscale for more information or to partner. Thanks, The Infiscale Team -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080304/ef936ca2/attachment.html From jsquyres at cisco.com Tue Mar 4 12:05:25 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 4 Mar 2008 15:05:25 -0500 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <011e01c87e32$6fb16360$cb00a8c0@terminal209> References: <62aac1510803031227u43487542y2e680ac83156842a@mail.gmail.com> <011e01c87e32$6fb16360$cb00a8c0@terminal209> Message-ID: <3A9935F9-98EC-4AE7-9B7D-222C36CA7AED@cisco.com> On Mar 4, 2008, at 1:20 PM, Arthur Stevens wrote: > With great pleasure we release Perceus 1.3.7. Congrats! > Some of the updates include.... > ? Better integration with SLURM > ? Backported SLURM Perceus module from the 1.4 devel tree Can you describe what these bullets mean? What does Perceus have to do with SLURM? -- Jeff Squyres Cisco Systems From astevens at gravitypark.com Tue Mar 4 12:12:45 2008 From: astevens at gravitypark.com (Arthur Stevens) Date: Tue, 4 Mar 2008 12:12:45 -0800 Subject: [Warewulf] Perceus 1.3.7 Released References: <62aac1510803031227u43487542y2e680ac83156842a@mail.gmail.com> <011e01c87e32$6fb16360$cb00a8c0@terminal209> <3A9935F9-98EC-4AE7-9B7D-222C36CA7AED@cisco.com> Message-ID: <000e01c87e34$1c5598b0$cb00a8c0@terminal209> Hi Jeff, Basically it means Perceus and SLURM play togeather better with less babysitting. They always have, it is just much easier to integrate now. A lot of us preffer or need features from SLURM that PBS and others don't quite take care of. It is becoming more and more common to use SLURM to make up for things missing in Maui and Torque too. In other words, it fills a gap nicely while we continue to work on our own scheduler subsystem. :) Thanks, Arthur ----- Original Message ----- From: "Jeff Squyres" To: "Arthur Stevens" ; "The Warewulf Cluster Toolkit" Sent: Tuesday, March 04, 2008 12:05 PM Subject: Re: [Warewulf] Perceus 1.3.7 Released On Mar 4, 2008, at 1:20 PM, Arthur Stevens wrote: > With great pleasure we release Perceus 1.3.7. Congrats! > Some of the updates include.... > ? Better integration with SLURM > ? Backported SLURM Perceus module from the 1.4 devel tree Can you describe what these bullets mean? What does Perceus have to do with SLURM? -- Jeff Squyres Cisco Systems From jsquyres at cisco.com Tue Mar 4 12:20:28 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 4 Mar 2008 15:20:28 -0500 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <000e01c87e34$1c5598b0$cb00a8c0@terminal209> References: <62aac1510803031227u43487542y2e680ac83156842a@mail.gmail.com> <011e01c87e32$6fb16360$cb00a8c0@terminal209> <3A9935F9-98EC-4AE7-9B7D-222C36CA7AED@cisco.com> <000e01c87e34$1c5598b0$cb00a8c0@terminal209> Message-ID: <1D328447-3D1A-4D78-9F2F-41A1A51D03FB@cisco.com> On Mar 4, 2008, at 3:12 PM, Arthur Stevens wrote: > Basically it means Perceus and SLURM play togeather better with less > babysitting. They always have, it is just much easier to integrate > now. > > A lot of us preffer or need features from SLURM that PBS and others > don't > quite take care of. It is becoming more and more common to use > SLURM to > make up for things missing in Maui and Torque too. > > In other words, it fills a gap nicely while we continue to work on > our own > scheduler subsystem. :) Meaning what, specifically? I.e., how exactly do SLURM and Perceus integrate together? I ask because I use both SLURM and Perceus on my cluster -- but in my setup, they are wholly separate systems that don't know about each other's existence. How can I have them work together? What features / benefits are available? -- Jeff Squyres Cisco Systems From gmkurtzer at gmail.com Tue Mar 4 12:52:39 2008 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Tue, 4 Mar 2008 12:52:39 -0800 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <1D328447-3D1A-4D78-9F2F-41A1A51D03FB@cisco.com> References: <62aac1510803031227u43487542y2e680ac83156842a@mail.gmail.com> <011e01c87e32$6fb16360$cb00a8c0@terminal209> <3A9935F9-98EC-4AE7-9B7D-222C36CA7AED@cisco.com> <000e01c87e34$1c5598b0$cb00a8c0@terminal209> <1D328447-3D1A-4D78-9F2F-41A1A51D03FB@cisco.com> Message-ID: <571f1a060803041252q1c8ddc40y3595cfc72d65bb04@mail.gmail.com> On Tue, Mar 4, 2008 at 12:20 PM, Jeff Squyres wrote: > On Mar 4, 2008, at 3:12 PM, Arthur Stevens wrote: > > > Basically it means Perceus and SLURM play togeather better with less > > babysitting. They always have, it is just much easier to integrate > > now. > > > > A lot of us preffer or need features from SLURM that PBS and others > > don't > > quite take care of. It is becoming more and more common to use > > SLURM to > > make up for things missing in Maui and Torque too. > > > > In other words, it fills a gap nicely while we continue to work on > > our own > > scheduler subsystem. :) > > Meaning what, specifically? I.e., how exactly do SLURM and Perceus > integrate together? > > I ask because I use both SLURM and Perceus on my cluster -- but in my > setup, they are wholly separate systems that don't know about each > other's existence. How can I have them work together? What > features / benefits are available? > There are many benefits of close integration between the cluster management and provisioning subsystems. For example, this would further enable green computing, grid and meta scheduling, matching jobs to node resources and current state, among other possibilities. This release includes as a starting point the ability for a Perceus module to automatically configure new nodes within SLURM and keep all nodes synchronized. Without mentioning names, other scheduler vendors are also doing the same with their own commercially available solutions. We provide a simple and static mechanism for vendors and third parties to control Perceus so check with your scheduler vendor if you are interested in this. -- Greg Kurtzer http://www.runlevelzero.net/ From jsquyres at cisco.com Tue Mar 4 13:01:43 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 4 Mar 2008 16:01:43 -0500 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <571f1a060803041252q1c8ddc40y3595cfc72d65bb04@mail.gmail.com> References: <62aac1510803031227u43487542y2e680ac83156842a@mail.gmail.com> <011e01c87e32$6fb16360$cb00a8c0@terminal209> <3A9935F9-98EC-4AE7-9B7D-222C36CA7AED@cisco.com> <000e01c87e34$1c5598b0$cb00a8c0@terminal209> <1D328447-3D1A-4D78-9F2F-41A1A51D03FB@cisco.com> <571f1a060803041252q1c8ddc40y3595cfc72d65bb04@mail.gmail.com> Message-ID: <3AD8EE0E-2A7E-4234-92AD-E84B1C050851@cisco.com> On Mar 4, 2008, at 3:52 PM, Greg Kurtzer wrote: > There are many benefits of close integration between the cluster > management and provisioning subsystems. For example, this would > further enable green computing, grid and meta scheduling, matching > jobs to node resources and current state, among other possibilities. > > This release includes as a starting point the ability for a Perceus > module to automatically configure new nodes within SLURM and keep all > nodes synchronized. That sounds great. How do I do that? Are other integration Perceus+SLURM features available in this release? -- Jeff Squyres Cisco Systems From glen at callident.com Tue Mar 4 13:58:20 2008 From: glen at callident.com (Glen Otero) Date: Tue, 4 Mar 2008 13:58:20 -0800 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <000e01c87e34$1c5598b0$cb00a8c0@terminal209> References: <62aac1510803031227u43487542y2e680ac83156842a@mail.gmail.com> <011e01c87e32$6fb16360$cb00a8c0@terminal209> <3A9935F9-98EC-4AE7-9B7D-222C36CA7AED@cisco.com> <000e01c87e34$1c5598b0$cb00a8c0@terminal209> Message-ID: On Mar 4, 2008, at 12:12 PM, Arthur Stevens wrote: > Hi Jeff, > > Basically it means Perceus and SLURM play togeather better with less > babysitting. They always have, it is just much easier to integrate > now. > > A lot of us preffer or need features from SLURM that PBS and others > don't > quite take care of. It is becoming more and more common to use > SLURM to > make up for things missing in Maui and Torque too. Not having used SLURM, can you briefly outline what things SLURM provides that Torque and PBS don't? Thanks! Glen > > In other words, it fills a gap nicely while we continue to work on > our own > scheduler subsystem. :) > > > Thanks, > > Arthur > > > ----- Original Message ----- > From: "Jeff Squyres" > To: "Arthur Stevens" ; "The Warewulf Cluster > Toolkit" > Sent: Tuesday, March 04, 2008 12:05 PM > Subject: Re: [Warewulf] Perceus 1.3.7 Released > > > On Mar 4, 2008, at 1:20 PM, Arthur Stevens wrote: > >> With great pleasure we release Perceus 1.3.7. > > Congrats! > >> Some of the updates include.... >> ? Better integration with SLURM >> ? Backported SLURM Perceus module from the 1.4 devel tree > > Can you describe what these bullets mean? What does Perceus have to > do with SLURM? > > -- > Jeff Squyres > Cisco Systems > > > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > From gmkurtzer at gmail.com Tue Mar 4 13:59:33 2008 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Tue, 4 Mar 2008 13:59:33 -0800 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <3AD8EE0E-2A7E-4234-92AD-E84B1C050851@cisco.com> References: <62aac1510803031227u43487542y2e680ac83156842a@mail.gmail.com> <011e01c87e32$6fb16360$cb00a8c0@terminal209> <3A9935F9-98EC-4AE7-9B7D-222C36CA7AED@cisco.com> <000e01c87e34$1c5598b0$cb00a8c0@terminal209> <1D328447-3D1A-4D78-9F2F-41A1A51D03FB@cisco.com> <571f1a060803041252q1c8ddc40y3595cfc72d65bb04@mail.gmail.com> <3AD8EE0E-2A7E-4234-92AD-E84B1C050851@cisco.com> Message-ID: <571f1a060803041359q36829b57k14d4b35a7648bfe3@mail.gmail.com> On Tue, Mar 4, 2008 at 1:01 PM, Jeff Squyres wrote: > On Mar 4, 2008, at 3:52 PM, Greg Kurtzer wrote: > > > There are many benefits of close integration between the cluster > > management and provisioning subsystems. For example, this would > > further enable green computing, grid and meta scheduling, matching > > jobs to node resources and current state, among other possibilities. > > > > This release includes as a starting point the ability for a Perceus > > module to automatically configure new nodes within SLURM and keep all > > nodes synchronized. > > That sounds great. How do I do that? In 1.3.7: # perceus module activate slurm > Are other integration Perceus+SLURM features available in this release? Just the above module that integrates nodes automatically with SLURM as they are introduced to Perceus. In actuality, we have a third party integration library that we are building. We are talking with partners and other developers as to what features they would find useful and building it up accordingly. The rest is really up to the scheduler developers and the community. We are providing all of the hooks in to make it as easy as possible to integrate with Perceus and we are happy to help further. But as much of this logic sits on the scheduler side, so we either wait for them or the community to help with it, or when we have the availability of resources we can work on it ourselves. If there are people interested in being part of this and helping with these tasks (or other Perceus related tasks) please let us know! Thanks, Greg -- Greg Kurtzer http://www.runlevelzero.net/ From gmkurtzer at gmail.com Tue Mar 4 14:53:04 2008 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Tue, 4 Mar 2008 14:53:04 -0800 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: References: <62aac1510803031227u43487542y2e680ac83156842a@mail.gmail.com> <011e01c87e32$6fb16360$cb00a8c0@terminal209> <3A9935F9-98EC-4AE7-9B7D-222C36CA7AED@cisco.com> <000e01c87e34$1c5598b0$cb00a8c0@terminal209> Message-ID: <571f1a060803041453t4c720a73gbbd30228a6ec4c0e@mail.gmail.com> On Tue, Mar 4, 2008 at 1:58 PM, Glen Otero wrote: > > On Mar 4, 2008, at 12:12 PM, Arthur Stevens wrote: > > > Hi Jeff, > > > > Basically it means Perceus and SLURM play togeather better with less > > babysitting. They always have, it is just much easier to integrate > > now. > > > > A lot of us preffer or need features from SLURM that PBS and others > > don't > > quite take care of. It is becoming more and more common to use > > SLURM to > > make up for things missing in Maui and Torque too. > > Not having used SLURM, can you briefly outline what things SLURM > provides that Torque and PBS don't? > Scalability, clean, elegant, light weight, ease of use, simplicity, speed, etc... What it lacks is a decent configurable scheduling algorithm and meta-scheduling. But for many of the clusters where they don't want or need complex scheduling algorithms, it is a perfect solution. -- Greg Kurtzer http://www.runlevelzero.net/ From laytonjb at charter.net Tue Mar 4 18:20:53 2008 From: laytonjb at charter.net (laytonjb at charter.net) Date: Tue, 4 Mar 2008 21:20:53 -0500 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <571f1a060803041453t4c720a73gbbd30228a6ec4c0e@mail.gmail.com> Message-ID: <20080304212053.D1VOU.138604.root@fepweb02> ---- Greg Kurtzer wrote: > On Tue, Mar 4, 2008 at 1:58 PM, Glen Otero wrote: > > > > On Mar 4, 2008, at 12:12 PM, Arthur Stevens wrote: > > > > > Hi Jeff, > > > > > > Basically it means Perceus and SLURM play togeather better with less > > > babysitting. They always have, it is just much easier to integrate > > > now. > > > > > > A lot of us preffer or need features from SLURM that PBS and others > > > don't > > > quite take care of. It is becoming more and more common to use > > > SLURM to > > > make up for things missing in Maui and Torque too. > > > > Not having used SLURM, can you briefly outline what things SLURM > > provides that Torque and PBS don't? > > > > Scalability, clean, elegant, light weight, ease of use, simplicity, > speed, etc... Not disbelieving you what-so-ever, but can you give some examples of the above for the scalability and speed? I've just found SLURM to be kind of a pain. Part of it may be that it's so different from Torque, SGE, etc. But I think part of it is that the documentation is a little weak (needs more examples). Plus, personally, I don't really like to have the job scheduler so tied into the MPI. I would rather have them separate (at a distance) and use the MPI job launch mechanism rather than rely on the integration of the scheduler and the MPI. Believe it or not, I actually think it's complicates things. Anyway, thanks for the comments. Jeff From laytonjb at charter.net Tue Mar 4 18:27:55 2008 From: laytonjb at charter.net (laytonjb at charter.net) Date: Tue, 4 Mar 2008 18:27:55 -0800 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: Message-ID: <20080304212755.KY4PO.138876.root@fepweb02> ---- Glen Otero wrote: > > On Mar 4, 2008, at 12:12 PM, Arthur Stevens wrote: > > > Hi Jeff, > > > > Basically it means Perceus and SLURM play togeather better with less > > babysitting. They always have, it is just much easier to integrate > > now. > > > > A lot of us preffer or need features from SLURM that PBS and others > > don't > > quite take care of. It is becoming more and more common to use > > SLURM to > > make up for things missing in Maui and Torque too. Could you explain a bit more about what SLURM has that Maui and Torque do not? (I haven't exercised it enough to know see the differences up to now). Thanks! Jeff From jsquyres at cisco.com Tue Mar 4 18:41:13 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Tue, 4 Mar 2008 21:41:13 -0500 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <20080304212053.D1VOU.138604.root@fepweb02> References: <20080304212053.D1VOU.138604.root@fepweb02> Message-ID: <106A8BAA-4858-4AD9-A7BB-3EE074A6F5C9@cisco.com> On Mar 4, 2008, at 9:20 PM, wrote: > Plus, personally, I don't really like to have the job scheduler so > tied > into the MPI. I would rather have them separate (at a distance) > and use the MPI job launch mechanism rather than rely on the > integration of the scheduler and the MPI. Believe it or not, I > actually > think it's complicates things. I know I'm devolving off-topic here, but I feel compelled to speak up... :-) First: separate resource manager from the scheduler. Most/all resource managers include scheduling capabilities, but they are two distinct functions. MPI's are typically integrated with the resource manager functionality, not the scheduler. Believe me when I tell you that rsh/ssh environments are *horrid* to program for and support from an mpirun point of view (particularly when dealing with scalability and run-time errors). Using a resource manager's native mechanisms is almost always better -- they're usually much more reliable and scalable. Plus, you get the nice side-effect that users literally *cannot* run outside of their job-allocated nodes (e.g., when using rsh/ssh, if a user accidentally uses the wrong hostfile, they might be able to launch outside of their job-allocated nodes if the sysadmin didn't disable rsh/ssh logins on a per-job basis -- we've all seen it happen!). Having written lots of code that integrates mpirun with resource managers and with rsh/ssh, I can confidently state that having mpirun integrated with the resource manager at best makes mpirun significantly simpler, and at worst makes it no more complex than trying to reasonably support rsh/ssh. Now back to your regularly scheduled programming... :-) -- Jeff Squyres Cisco Systems From jsquyres at cisco.com Wed Mar 5 04:20:05 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 5 Mar 2008 07:20:05 -0500 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <571f1a060803041359q36829b57k14d4b35a7648bfe3@mail.gmail.com> References: <62aac1510803031227u43487542y2e680ac83156842a@mail.gmail.com> <011e01c87e32$6fb16360$cb00a8c0@terminal209> <3A9935F9-98EC-4AE7-9B7D-222C36CA7AED@cisco.com> <000e01c87e34$1c5598b0$cb00a8c0@terminal209> <1D328447-3D1A-4D78-9F2F-41A1A51D03FB@cisco.com> <571f1a060803041252q1c8ddc40y3595cfc72d65bb04@mail.gmail.com> <3AD8EE0E-2A7E-4234-92AD-E84B1C050851@cisco.com> <571f1a060803041359q36829b57k14d4b35a7648bfe3@mail.gmail.com> Message-ID: <3D1092EE-06E5-44E5-A4F5-D22F92C62411@cisco.com> On Mar 4, 2008, at 4:59 PM, Greg Kurtzer wrote: >>> # perceus module activate slurm Cool! When I add some more nodes to my cluster, I'll give this a whirl. Thanks! -- Jeff Squyres Cisco Systems From astevens at gravitypark.com Wed Mar 5 08:33:17 2008 From: astevens at gravitypark.com (Arthur Stevens) Date: Wed, 5 Mar 2008 08:33:17 -0800 Subject: [Warewulf] Perceus 1.3.7 Released References: <20080304212755.KY4PO.138876.root@fepweb02> Message-ID: <006401c87ede$9e568950$6600a8c0@terminal209> I am under NDA so I can't give names and case examples, but I have personally seen it. I can say that I have seen several instances lately where SLURM has been used to compensate for scaling abilities with Maui and Torque on some very large installs. It is kinda the apples and oranges to a point, but after that point, I would go with SLURM. I am also more confident in the developers of SLURM as we have never had a complaint and have always been reachable, unlike the other devel camp where we tend to be treated like lepper children in the sandbox. Not creating new support nightmares or giving people a bad taste in their mouth from our software due to ill trained techs really goes a long way with us. Can I say that here? It really would depend on the complexity of what you are setting up and the exact needs. Bring Infiscale in on a job sometime and we will show you more about what we are talking about ;) Arthur ----- Original Message ----- From: To: "The Warewulf Cluster Toolkit" ; "Arthur Stevens" Cc: "Glen Otero" Sent: Tuesday, March 04, 2008 6:27 PM Subject: Re: [Warewulf] Perceus 1.3.7 Released > ---- Glen Otero wrote: >> >> On Mar 4, 2008, at 12:12 PM, Arthur Stevens wrote: >> >> > Hi Jeff, >> > >> > Basically it means Perceus and SLURM play togeather better with less >> > babysitting. They always have, it is just much easier to integrate >> > now. >> > >> > A lot of us preffer or need features from SLURM that PBS and others >> > don't >> > quite take care of. It is becoming more and more common to use >> > SLURM to >> > make up for things missing in Maui and Torque too. > > Could you explain a bit more about what SLURM has that Maui and > Torque do not? (I haven't exercised it enough to know see the > differences up to now). > > Thanks! > > Jeff > From astevens at gravitypark.com Wed Mar 5 08:34:34 2008 From: astevens at gravitypark.com (Arthur Stevens) Date: Wed, 5 Mar 2008 08:34:34 -0800 Subject: [Warewulf] Perceus 1.3.7 Released References: <20080304212053.D1VOU.138604.root@fepweb02> <106A8BAA-4858-4AD9-A7BB-3EE074A6F5C9@cisco.com> Message-ID: <006f01c87ede$cbe12ab0$6600a8c0@terminal209> Well said, and fully agreed with :) ----- Original Message ----- From: "Jeff Squyres" To: "The Warewulf Cluster Toolkit" Sent: Tuesday, March 04, 2008 6:41 PM Subject: Re: [Warewulf] Perceus 1.3.7 Released > On Mar 4, 2008, at 9:20 PM, wrote: > >> Plus, personally, I don't really like to have the job scheduler so >> tied >> into the MPI. I would rather have them separate (at a distance) >> and use the MPI job launch mechanism rather than rely on the >> integration of the scheduler and the MPI. Believe it or not, I >> actually >> think it's complicates things. > > > I know I'm devolving off-topic here, but I feel compelled to speak > up... :-) > > > > First: separate resource manager from the scheduler. Most/all > resource managers include scheduling capabilities, but they are two > distinct functions. MPI's are typically integrated with the resource > manager functionality, not the scheduler. > > Believe me when I tell you that rsh/ssh environments are *horrid* to > program for and support from an mpirun point of view (particularly > when dealing with scalability and run-time errors). Using a resource > manager's native mechanisms is almost always better -- they're usually > much more reliable and scalable. Plus, you get the nice side-effect > that users literally *cannot* run outside of their job-allocated nodes > (e.g., when using rsh/ssh, if a user accidentally uses the wrong > hostfile, they might be able to launch outside of their job-allocated > nodes if the sysadmin didn't disable rsh/ssh logins on a per-job basis > -- we've all seen it happen!). > > Having written lots of code that integrates mpirun with resource > managers and with rsh/ssh, I can confidently state that having mpirun > integrated with the resource manager at best makes mpirun > significantly simpler, and at worst makes it no more complex than > trying to reasonably support rsh/ssh. > > > > Now back to your regularly scheduled programming... :-) > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > From landman at scalableinformatics.com Wed Mar 5 08:42:53 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 05 Mar 2008 11:42:53 -0500 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <006401c87ede$9e568950$6600a8c0@terminal209> References: <20080304212755.KY4PO.138876.root@fepweb02> <006401c87ede$9e568950$6600a8c0@terminal209> Message-ID: <47CECD8D.8030701@scalableinformatics.com> Arthur Stevens wrote: > I am under NDA so I can't give names and case examples, but I have > personally seen it. > > I can say that I have seen several instances lately where SLURM has been > used to compensate for scaling abilities with Maui and Torque on some very > large installs. It is kinda the apples and oranges to a point, but after > that point, I would go with SLURM. I am also more confident in the SLURM was built for large scaling systems. Blue Gene et al. > developers of SLURM as we have never had a complaint and have always been > reachable, unlike the other devel camp where we tend to be treated like > lepper children in the sandbox. Not creating new support nightmares or :( W.r.t the comment (Jeff's?) about not integrating the launcher and the MPI, my take is, it would be simpler for us to make the system work if there is closer tying between the two. Using SGE with OpenMPI is very nice now, try MPICH 1.2.x with SGE sometime to see an operational definition of "broken". The qstat/qdel/qsub should just work, with as little extra magic as possible. I have played a bit with SLURM, and it does work ok. A bit different than the others, but not too painful (requires a little re-adjustment). Some of the schedulers/DRMs out there are good, some are ok, some are terrible. No comments as to which is which. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From laytonjb at charter.net Wed Mar 5 14:04:07 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed, 05 Mar 2008 17:04:07 -0500 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <106A8BAA-4858-4AD9-A7BB-3EE074A6F5C9@cisco.com> References: <20080304212053.D1VOU.138604.root@fepweb02> <106A8BAA-4858-4AD9-A7BB-3EE074A6F5C9@cisco.com> Message-ID: <47CF18D7.6080003@charter.net> Jeff Squyres wrote: > On Mar 4, 2008, at 9:20 PM, wrote: > > >> Plus, personally, I don't really like to have the job scheduler so >> tied >> into the MPI. I would rather have them separate (at a distance) >> and use the MPI job launch mechanism rather than rely on the >> integration of the scheduler and the MPI. Believe it or not, I >> actually >> think it's complicates things. >> > > > I know I'm devolving off-topic here, but I feel compelled to speak > up... :-) > Compelled? Do you mean committed? :) > > > First: separate resource manager from the scheduler. Most/all > resource managers include scheduling capabilities, but they are two > distinct functions. MPI's are typically integrated with the resource > manager functionality, not the scheduler. > You know what I mean. :) (I get tired of explaining that Maui is a scheduler not the entire resource manager). I usually don't change the scheduler from the default (I've never gotten very sophisticated in that regard and none of my users in the past have strangled me...) > Believe me when I tell you that rsh/ssh environments are *horrid* to > program for and support from an mpirun point of view (particularly > when dealing with scalability and run-time errors). Using a resource > manager's native mechanisms is almost always better -- they're usually > much more reliable and scalable. Plus, you get the nice side-effect > that users literally *cannot* run outside of their job-allocated nodes > (e.g., when using rsh/ssh, if a user accidentally uses the wrong > hostfile, they might be able to launch outside of their job-allocated > nodes if the sysadmin didn't disable rsh/ssh logins on a per-job basis > -- we've all seen it happen!). > I like the idea of users not being able to log into other nodes (not their own). That has some nice implications though. > Having written lots of code that integrates mpirun with resource > managers and with rsh/ssh, I can confidently state that having mpirun > integrated with the resource manager at best makes mpirun > significantly simpler, and at worst makes it no more complex than > trying to reasonably support rsh/ssh. > > > > Now back to your regularly scheduled programming... :-) > It's got to be me then. For some reason I just like to have the packages separate. This way I can plug in whatever resource manager/scheduler (what you want to call it) with whatever MPI I'm using. I absolutely trust you that coding for ssh is a bear, but once you do it, you can launch a job (theoretically) with any scheduler/resource manager (whatever you want to call it). So basically it's like having one API to code for (yep it's a pain, but it's only one). Going the other way you have to code for each one. While you have infinitely more experience coding for resource managers and I know there's the standard (is it DRMAA?) but do all resource managers adhere to it or do they like to modify it for grins? Thanks! Jeff From jsquyres at cisco.com Wed Mar 5 14:54:52 2008 From: jsquyres at cisco.com (Jeff Squyres) Date: Wed, 5 Mar 2008 17:54:52 -0500 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <47CF18D7.6080003@charter.net> References: <20080304212053.D1VOU.138604.root@fepweb02> <106A8BAA-4858-4AD9-A7BB-3EE074A6F5C9@cisco.com> <47CF18D7.6080003@charter.net> Message-ID: <96127F58-0FBA-47F0-A370-443558809900@cisco.com> On Mar 5, 2008, at 5:04 PM, Jeffrey B. Layton wrote: > It's got to be me then. For some reason I just like to have the > packages > separate. This way I can plug in whatever resource manager/scheduler > (what you want to call it) with whatever MPI I'm using. I absolutely > trust you that coding for ssh is a bear, but once you do it, you can > launch > a job (theoretically) with any scheduler/resource manager (whatever > you > want to call it). So basically it's like having one API to code for > (yep > it's a pain, but it's only one). > > Going the other way you have to code for each one. While you have > infinitely more experience coding for resource managers and I know > there's the standard (is it DRMAA?) but do all resource managers > adhere to it or do they like to modify it for grins? I will resist the urge to reply here; it's really off-topic for this list. :D If you want to continue the discussion, I'd be happy to -- feel free to post to an OMPI list or mail me individually. -- Jeff Squyres Cisco Systems From laytonjb at charter.net Wed Mar 5 15:35:13 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed, 05 Mar 2008 18:35:13 -0500 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <47CECD8D.8030701@scalableinformatics.com> References: <20080304212755.KY4PO.138876.root@fepweb02> <006401c87ede$9e568950$6600a8c0@terminal209> <47CECD8D.8030701@scalableinformatics.com> Message-ID: <47CF2E31.20606@charter.net> OK, I'll just say it and then take all the crap that goes along with it... I think SLURM sucks. (Whew - there I said it). We used it when I first joined LNXI (and on of the original developers worked at LNXI). We had tons of trouble with it. If you killed a job, SLURM would leave these things we called "Slurm-lets" on all of the nodes and we would have to manually log into each node and kill all of the slurm-lets (kind of "Whack-a-mole"). We finally gave up and when to PBS. But then again, our systems weren't very big at all. I guess another reason is that it just seems so foreign to me. I could figure out SGE pretty easily since I used PBS. I guess I just need to sit down and be SLURM-ized to figure it out. I do hear that it scales to massive levels better than PBS (I don't know about LSF or SGE at those scale). Jeff- care to help out a PBS user to learn SLURM. Jeff > Arthur Stevens wrote: > >> I am under NDA so I can't give names and case examples, but I have >> personally seen it. >> >> I can say that I have seen several instances lately where SLURM has been >> used to compensate for scaling abilities with Maui and Torque on some very >> large installs. It is kinda the apples and oranges to a point, but after >> that point, I would go with SLURM. I am also more confident in the >> > > SLURM was built for large scaling systems. Blue Gene et al. > > >> developers of SLURM as we have never had a complaint and have always been >> reachable, unlike the other devel camp where we tend to be treated like >> lepper children in the sandbox. Not creating new support nightmares or >> > > :( > > W.r.t the comment (Jeff's?) about not integrating the launcher and the > MPI, my take is, it would be simpler for us to make the system work if > there is closer tying between the two. Using SGE with OpenMPI is very > nice now, try MPICH 1.2.x with SGE sometime to see an operational > definition of "broken". The qstat/qdel/qsub should just work, with as > little extra magic as possible. > > I have played a bit with SLURM, and it does work ok. A bit different > than the others, but not too painful (requires a little re-adjustment). > > Some of the schedulers/DRMs out there are good, some are ok, some are > terrible. No comments as to which is which. > > From laytonjb at charter.net Wed Mar 5 15:37:15 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Wed, 05 Mar 2008 18:37:15 -0500 Subject: [Warewulf] Perceus 1.3.7 Released In-Reply-To: <96127F58-0FBA-47F0-A370-443558809900@cisco.com> References: <20080304212053.D1VOU.138604.root@fepweb02> <106A8BAA-4858-4AD9-A7BB-3EE074A6F5C9@cisco.com> <47CF18D7.6080003@charter.net> <96127F58-0FBA-47F0-A370-443558809900@cisco.com> Message-ID: <47CF2EAB.8070808@charter.net> Jeff Squyres wrote: > On Mar 5, 2008, at 5:04 PM, Jeffrey B. Layton wrote: > >> It's got to be me then. For some reason I just like to have the packages >> separate. This way I can plug in whatever resource manager/scheduler >> (what you want to call it) with whatever MPI I'm using. I absolutely >> trust you that coding for ssh is a bear, but once you do it, you can >> launch >> a job (theoretically) with any scheduler/resource manager (whatever you >> want to call it). So basically it's like having one API to code for (yep >> it's a pain, but it's only one). >> >> Going the other way you have to code for each one. While you have >> infinitely more experience coding for resource managers and I know >> there's the standard (is it DRMAA?) but do all resource managers >> adhere to it or do they like to modify it for grins? > > > I will resist the urge to reply here; it's really off-topic for this > list. :D > > If you want to continue the discussion, I'd be happy to -- feel free > to post to an OMPI list or mail me individually. Consider yourself pinged :) Care to educate me (I definitely need it). Thanks! Jeff From bkyoung at gmail.com Thu Mar 6 08:25:07 2008 From: bkyoung at gmail.com (Brandon Young) Date: Thu, 6 Mar 2008 10:25:07 -0600 Subject: [Warewulf] New node not booting in perceus Message-ID: <824ffea00803060825w429253c3yb11b384e28322bfa@mail.gmail.com> A few days ago, I wrote about a problem I was having in an old version of warewulf. I decided to upgrade to Perceus 1.3.6 and give that a try. I am still running into problems, and I hope people here can guide me as to where to look for clues. The scenario is: I have installed and configured perceus, and built a centos5 vnfs image using the included scripts. I used the Perceus User Guide, by the way, to do all this. I am running Perceus 1.3.6. The problem I have is that the client node never boots the vnfs. I am sure I haven't fully completed the configuration in some way, I just don't know what I've missed. I see the node boot into perceus, and see it get provisioned, then it pauses for about 30 seconds and says it's going to reboot ... and the whole process starts over, again. /var/log/messages shows: Mar 6 10:06:28 cluster04 perceus-dnsmasq[11422]: BOOTP(eth1) 00:1c:23:6e:f2:8e no address configured Mar 6 10:06:29 cluster04 perceus-dnsmasq[11422]: DHCPDISCOVER(eth1) 00:19:b9:f5:c9:f9 Mar 6 10:06:29 cluster04 perceus-dnsmasq[11422]: DHCPOFFER(eth1) 172.0.10.199 00:19:b9:f5:c9:f9 Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: DHCPREQUEST(eth1) 172.0.10.199 00:19:b9:f5:c9:f9 Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: DHCPACK(eth1) 172.0.10.19900:19:b9:f5:c9:f9 node0000 Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP error 0 TFTP Aborted received from 172.0.10.199 Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP failed sending /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent /var/lib/perceus//tftp/pxelinux.cfg/default to 172.0.10.199 Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent /var/lib/perceus//tftp/kernel to 172.0.10.199 Mar 6 10:06:34 cluster04 perceus-dnsmasq[11422]: TFTP sent /var/lib/perceus//tftp/initramfs.img to 172.0.10.199 Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPDISCOVER(eth1) 00:19:b9:f5:c9:f9 Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPOFFER(eth1) 172.0.10.199 00:19:b9:f5:c9:f9 Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPREQUEST(eth1) 172.0.10.199 00:19:b9:f5:c9:f9 Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPACK(eth1) 172.0.10.19900:19:b9:f5:c9:f9 node0000 Mar 6 10:06:45 cluster04 perceusd[11426]: Provisioning 'node0000' now... Mar 6 10:06:45 cluster04 mountd[11264]: authenticated mount request from 172.0.10.199:720 for /var/lib/perceus (/var/lib/perceus) [root at cluster04 ~]# showmount Hosts on cluster04: 172.0.10.199 But no further progress is made before node0000 reboots. My defaults.conf file look like: [root at cluster04 perceus]# cat defaults.conf # This is the template name for all new nodes as they are configured. # Define the node name range. The '#' characters symbolize the node number # in the order of initalized. If you don't allocate enough number spaces # here for what you defined in 'Total Nodes' then it will be automatically # padded. Node Name = node#### # What is the default group for new nodes (this doesn't have to exist # anywhere before hand) Group Name = cluster # Define the default VNFS image that should be assigned to new nodes Vnfs Name = centos-5.0-1.stateless.x86_64 # Are new nodes automatically enabled and provisionined? Enabled = 1 # What is the first node number that we should count at? First Node = 0 # This is the total node count that Perceus would ever try and allocate a # node to. It is safe to make this big, so you should leave it big. Total Nodes = 10000 My perceus.conf looks like: [root at cluster04 perceus]# cat perceus.conf # This is the configuration file for Perceus # Define the IP Address of the network file server vnfs transfer master = 172.0.8.1 # What protocol should be used to retireve the VNFS information. Generally # Supported options in this version of Perceus are: 'nfs' and 'http' but # this maybe overridden by particular VNFS capsules. vnfs transfer method = nfs # Define the VNFS transfer location if it is different from the default # ('statedir'). This gets used differently for different transfer methods # (e.g. NFS this replaces the path to statedir, while with http it is gets # prepended to the "/perceus" path). vnfs transfer prefix = # How long should we wait before considering a node as dead. Note, that if # you are not running node client daemons, then after provisioning the node # will never check in, and will no doubt expire. node timeout = Ifconfig on the perceus head node is: [root at cluster04 perceus]# ifconfig eth0 Link encap:Ethernet HWaddr 00:1C:23:C7:8C:98 inet addr:10.0.50.139 Bcast:10.0.50.255 Mask:255.255.255.0 inet6 addr: fe80::21c:23ff:fec7:8c98/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:5101316 errors:0 dropped:0 overruns:0 frame:0 TX packets:4728351 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:490573750 (467.8 MiB) TX bytes:518375590 (494.3 MiB) Interrupt:169 Memory:da000000-da012100 eth1 Link encap:Ethernet HWaddr 00:1C:23:C7:8C:9A inet addr:172.0.8.1 Bcast:172.0.11.255 Mask:255.255.252.0 inet6 addr: fe80::21c:23ff:fec7:8c9a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:22218 errors:0 dropped:0 overruns:0 frame:0 TX packets:18343 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2541504 (2.4 MiB) TX bytes:26964022 (25.7 MiB) Interrupt:169 Memory:d6000000-d6012100 The node I'm attempting to boot is a Dell PowerEdge 1955 blade. I am brand new to perceus, and am not sure where to look for clues. Any guidance would be greatly appreciated. -- Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080306/2b07d17b/attachment.html From gmkurtzer at gmail.com Thu Mar 6 08:35:21 2008 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Thu, 6 Mar 2008 08:35:21 -0800 Subject: [Warewulf] New node not booting in perceus In-Reply-To: <824ffea00803060825w429253c3yb11b384e28322bfa@mail.gmail.com> References: <824ffea00803060825w429253c3yb11b384e28322bfa@mail.gmail.com> Message-ID: <571f1a060803060835rcd5a371o336e595bff9c9bd0@mail.gmail.com> Can you describe what is happening on the node's console? On Thu, Mar 6, 2008 at 8:25 AM, Brandon Young wrote: > A few days ago, I wrote about a problem I was having in an old version of > warewulf. I decided to upgrade to Perceus 1.3.6 and give that a try. I am > still running into problems, and I hope people here can guide me as to where > to look for clues. The scenario is: I have installed and configured > perceus, and built a centos5 vnfs image using the included scripts. I used > the Perceus User Guide, by the way, to do all this. I am running Perceus > 1.3.6. The problem I have is that the client node never boots the vnfs. I > am sure I haven't fully completed the configuration in some way, I just > don't know what I've missed. I see the node boot into perceus, and see it > get provisioned, then it pauses for about 30 seconds and says it's going to > reboot ... and the whole process starts over, again. > > /var/log/messages shows: > > Mar 6 10:06:28 cluster04 perceus-dnsmasq[11422]: BOOTP(eth1) > 00:1c:23:6e:f2:8e no address configured > Mar 6 10:06:29 cluster04 perceus-dnsmasq[11422]: DHCPDISCOVER(eth1) > 00:19:b9:f5:c9:f9 > Mar 6 10:06:29 cluster04 perceus-dnsmasq[11422]: DHCPOFFER(eth1) > 172.0.10.199 00:19:b9:f5:c9:f9 > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: DHCPREQUEST(eth1) > 172.0.10.199 00:19:b9:f5:c9:f9 > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: DHCPACK(eth1) > 172.0.10.199 00:19:b9:f5:c9:f9 node0000 > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP error 0 TFTP Aborted > received from 172.0.10.199 > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP failed sending > /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > /var/lib/perceus//tftp/pxelinux.cfg/default to 172.0.10.199 > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > /var/lib/perceus//tftp/kernel to 172.0.10.199 > Mar 6 10:06:34 cluster04 perceus-dnsmasq[11422]: TFTP sent > /var/lib/perceus//tftp/initramfs.img to 172.0.10.199 > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPDISCOVER(eth1) > 00:19:b9:f5:c9:f9 > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPOFFER(eth1) > 172.0.10.199 00:19:b9:f5:c9:f9 > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPREQUEST(eth1) > 172.0.10.199 00:19:b9:f5:c9:f9 > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPACK(eth1) 172.0.10.199 > 00:19:b9:f5:c9:f9 node0000 > Mar 6 10:06:45 cluster04 perceusd[11426]: Provisioning 'node0000' now... > Mar 6 10:06:45 cluster04 mountd[11264]: authenticated mount request from > 172.0.10.199:720 for /var/lib/perceus (/var/lib/perceus) > > [root at cluster04 ~]# showmount > Hosts on cluster04: > 172.0.10.199 > > But no further progress is made before node0000 reboots. > > My defaults.conf file look like: > > > > [root at cluster04 perceus]# cat defaults.conf > # This is the template name for all new nodes as they are configured. > > # Define the node name range. The '#' characters symbolize the node number > # in the order of initalized. If you don't allocate enough number spaces > # here for what you defined in 'Total Nodes' then it will be automatically > # padded. > Node Name = node#### > > # What is the default group for new nodes (this doesn't have to exist > # anywhere before hand) > Group Name = cluster > > # Define the default VNFS image that should be assigned to new nodes > Vnfs Name = centos-5.0-1.stateless.x86_64 > > # Are new nodes automatically enabled and provisionined? > Enabled = 1 > > # What is the first node number that we should count at? > First Node = 0 > > # This is the total node count that Perceus would ever try and allocate a > # node to. It is safe to make this big, so you should leave it big. > Total Nodes = 10000 > > > > > > My perceus.conf looks like: > > [root at cluster04 perceus]# cat perceus.conf > # This is the configuration file for Perceus > > # Define the IP Address of the network file server > vnfs transfer master = 172.0.8.1 > > # What protocol should be used to retireve the VNFS information. Generally > # Supported options in this version of Perceus are: 'nfs' and 'http' but > # this maybe overridden by particular VNFS capsules. > vnfs transfer method = nfs > > # Define the VNFS transfer location if it is different from the default > # ('statedir'). This gets used differently for different transfer methods > # (e.g. NFS this replaces the path to statedir, while with http it is gets > # prepended to the "/perceus" path). > vnfs transfer prefix = > > # How long should we wait before considering a node as dead. Note, that if > # you are not running node client daemons, then after provisioning the node > # will never check in, and will no doubt expire. > node timeout = > > > > > Ifconfig on the perceus head node is: > > [root at cluster04 perceus]# ifconfig > > eth0 Link encap:Ethernet HWaddr 00:1C:23:C7:8C:98 > inet addr:10.0.50.139 Bcast:10.0.50.255 Mask:255.255.255.0 > inet6 addr: fe80::21c:23ff:fec7:8c98/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:5101316 errors:0 dropped:0 overruns:0 frame:0 > TX packets:4728351 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:490573750 (467.8 MiB) TX bytes:518375590 (494.3 MiB) > Interrupt:169 Memory:da000000-da012100 > > eth1 Link encap:Ethernet HWaddr 00:1C:23:C7:8C:9A > inet addr:172.0.8.1 Bcast:172.0.11.255 Mask:255.255.252.0 > inet6 addr: fe80::21c:23ff:fec7:8c9a/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:22218 errors:0 dropped:0 overruns:0 frame:0 > TX packets:18343 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:2541504 (2.4 MiB) TX bytes:26964022 (25.7 MiB) > Interrupt:169 Memory:d6000000-d6012100 > > The node I'm attempting to boot is a Dell PowerEdge 1955 blade. > > I am brand new to perceus, and am not sure where to look for clues. Any > guidance would be greatly appreciated. > > -- > Brandon > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > -- Greg Kurtzer http://www.runlevelzero.net/ From bkyoung at gmail.com Thu Mar 6 08:50:40 2008 From: bkyoung at gmail.com (Brandon Young) Date: Thu, 6 Mar 2008 10:50:40 -0600 Subject: [Warewulf] New node not booting in perceus In-Reply-To: <571f1a060803060835rcd5a371o336e595bff9c9bd0@mail.gmail.com> References: <824ffea00803060825w429253c3yb11b384e28322bfa@mail.gmail.com> <571f1a060803060835rcd5a371o336e595bff9c9bd0@mail.gmail.com> Message-ID: <824ffea00803060850g5a4d1d35l48cbf5198610f0b7@mail.gmail.com> I see the machine post, then PXE boot a kernel. At this point, I see the Infiscale Perceus screen. I then see it say: Etherlink found, requesting DHCP configuration via eth0 Provisioning from 172.0.8.1 ... now provisioning: node0000 vnfs: centos-5.0-1.stateless.x86_64.vnfs group: cluster Node ID: 00:19:B9:F5:C9:F9 Total provision time: 1 s Waiting 30 seconds, and rebooting ... Press [ENTER] to interrupt reboot and get a shell On Thu, Mar 6, 2008 at 10:35 AM, Greg Kurtzer wrote: > Can you describe what is happening on the node's console? > > > > > > On Thu, Mar 6, 2008 at 8:25 AM, Brandon Young wrote: > > A few days ago, I wrote about a problem I was having in an old version > of > > warewulf. I decided to upgrade to Perceus 1.3.6 and give that a try. I > am > > still running into problems, and I hope people here can guide me as to > where > > to look for clues. The scenario is: I have installed and configured > > perceus, and built a centos5 vnfs image using the included scripts. I > used > > the Perceus User Guide, by the way, to do all this. I am running > Perceus > > 1.3.6. The problem I have is that the client node never boots the vnfs. > I > > am sure I haven't fully completed the configuration in some way, I just > > don't know what I've missed. I see the node boot into perceus, and see > it > > get provisioned, then it pauses for about 30 seconds and says it's going > to > > reboot ... and the whole process starts over, again. > > > > /var/log/messages shows: > > > > Mar 6 10:06:28 cluster04 perceus-dnsmasq[11422]: BOOTP(eth1) > > 00:1c:23:6e:f2:8e no address configured > > Mar 6 10:06:29 cluster04 perceus-dnsmasq[11422]: DHCPDISCOVER(eth1) > > 00:19:b9:f5:c9:f9 > > Mar 6 10:06:29 cluster04 perceus-dnsmasq[11422]: DHCPOFFER(eth1) > > 172.0.10.199 00:19:b9:f5:c9:f9 > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: DHCPREQUEST(eth1) > > 172.0.10.199 00:19:b9:f5:c9:f9 > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: DHCPACK(eth1) > > 172.0.10.199 00:19:b9:f5:c9:f9 node0000 > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > > /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP error 0 TFTP > Aborted > > received from 172.0.10.199 > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP failed sending > > /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > > /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > > /var/lib/perceus//tftp/pxelinux.cfg/default to 172.0.10.199 > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > > /var/lib/perceus//tftp/kernel to 172.0.10.199 > > Mar 6 10:06:34 cluster04 perceus-dnsmasq[11422]: TFTP sent > > /var/lib/perceus//tftp/initramfs.img to 172.0.10.199 > > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPDISCOVER(eth1) > > 00:19:b9:f5:c9:f9 > > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPOFFER(eth1) > > 172.0.10.199 00:19:b9:f5:c9:f9 > > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPREQUEST(eth1) > > 172.0.10.199 00:19:b9:f5:c9:f9 > > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPACK(eth1) > 172.0.10.199 > > 00:19:b9:f5:c9:f9 node0000 > > Mar 6 10:06:45 cluster04 perceusd[11426]: Provisioning 'node0000' > now... > > Mar 6 10:06:45 cluster04 mountd[11264]: authenticated mount request > from > > 172.0.10.199:720 for /var/lib/perceus (/var/lib/perceus) > > > > [root at cluster04 ~]# showmount > > Hosts on cluster04: > > 172.0.10.199 > > > > But no further progress is made before node0000 reboots. > > > > My defaults.conf file look like: > > > > > > > > [root at cluster04 perceus]# cat defaults.conf > > # This is the template name for all new nodes as they are configured. > > > > # Define the node name range. The '#' characters symbolize the node > number > > # in the order of initalized. If you don't allocate enough number > spaces > > # here for what you defined in 'Total Nodes' then it will be > automatically > > # padded. > > Node Name = node#### > > > > # What is the default group for new nodes (this doesn't have to exist > > # anywhere before hand) > > Group Name = cluster > > > > # Define the default VNFS image that should be assigned to new nodes > > Vnfs Name = centos-5.0-1.stateless.x86_64 > > > > # Are new nodes automatically enabled and provisionined? > > Enabled = 1 > > > > # What is the first node number that we should count at? > > First Node = 0 > > > > # This is the total node count that Perceus would ever try and allocate > a > > # node to. It is safe to make this big, so you should leave it big. > > Total Nodes = 10000 > > > > > > > > > > > > My perceus.conf looks like: > > > > [root at cluster04 perceus]# cat perceus.conf > > # This is the configuration file for Perceus > > > > # Define the IP Address of the network file server > > vnfs transfer master = 172.0.8.1 > > > > # What protocol should be used to retireve the VNFS information. > Generally > > # Supported options in this version of Perceus are: 'nfs' and 'http' but > > # this maybe overridden by particular VNFS capsules. > > vnfs transfer method = nfs > > > > # Define the VNFS transfer location if it is different from the default > > # ('statedir'). This gets used differently for different transfer > methods > > # (e.g. NFS this replaces the path to statedir, while with http it is > gets > > # prepended to the "/perceus" path). > > vnfs transfer prefix = > > > > # How long should we wait before considering a node as dead. Note, that > if > > # you are not running node client daemons, then after provisioning the > node > > # will never check in, and will no doubt expire. > > node timeout = > > > > > > > > > > Ifconfig on the perceus head node is: > > > > [root at cluster04 perceus]# ifconfig > > > > eth0 Link encap:Ethernet HWaddr 00:1C:23:C7:8C:98 > > inet addr:10.0.50.139 Bcast:10.0.50.255 Mask:255.255.255.0 > > inet6 addr: fe80::21c:23ff:fec7:8c98/64 Scope:Link > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:5101316 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:4728351 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:490573750 (467.8 MiB) TX bytes:518375590 (494.3 MiB) > > Interrupt:169 Memory:da000000-da012100 > > > > eth1 Link encap:Ethernet HWaddr 00:1C:23:C7:8C:9A > > inet addr:172.0.8.1 Bcast:172.0.11.255 Mask:255.255.252.0 > > inet6 addr: fe80::21c:23ff:fec7:8c9a/64 Scope:Link > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:22218 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:18343 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:2541504 (2.4 MiB) TX bytes:26964022 (25.7 MiB) > > Interrupt:169 Memory:d6000000-d6012100 > > > > The node I'm attempting to boot is a Dell PowerEdge 1955 blade. > > > > I am brand new to perceus, and am not sure where to look for clues. Any > > guidance would be greatly appreciated. > > > > -- > > Brandon > > > > _______________________________________________ > > Warewulf mailing list > > Warewulf at caoslinux.org > > http://lists.caosity.org/mailman/listinfo/warewulf > > > > > > > > -- > Greg Kurtzer > http://www.runlevelzero.net/ > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080306/d4eacf70/attachment.html From jgans at lanl.gov Thu Mar 6 10:08:29 2008 From: jgans at lanl.gov (jgans) Date: Thu, 06 Mar 2008 11:08:29 -0700 Subject: [Warewulf] New node not booting in perceus In-Reply-To: <824ffea00803060850g5a4d1d35l48cbf5198610f0b7@mail.gmail.com> References: <824ffea00803060825w429253c3yb11b384e28322bfa@mail.gmail.com> <571f1a060803060835rcd5a371o336e595bff9c9bd0@mail.gmail.com> <824ffea00803060850g5a4d1d35l48cbf5198610f0b7@mail.gmail.com> Message-ID: <47D0331D.1000806@lanl.gov> Hi Brandon, Turning on the debug mode for the node in question will provide additional information in the node console: # perceus node set debug 1 n0000 This was suggested in a previous post (http://lists.caosity.org/pipermail/warewulf/2007-June/002921.html). It helped me track down a hardware incompatibility between a node and the perceus provisioning kernel that presented the same symptoms that you describe. Regards, Jason Brandon Young wrote: > I see the machine post, then PXE boot a kernel. At this point, I see > the Infiscale Perceus screen. I then see it say: > > Etherlink found, requesting DHCP configuration via eth0 > Provisioning from 172.0.8.1 ... > > now provisioning: node0000 > vnfs: centos-5.0-1.stateless.x86_64.vnfs > group: cluster > Node ID: 00:19:B9:F5:C9:F9 > Total provision time: 1 s > > > > Waiting 30 seconds, and rebooting ... > > Press [ENTER] to interrupt reboot and get a shell > > > On Thu, Mar 6, 2008 at 10:35 AM, Greg Kurtzer > wrote: > > Can you describe what is happening on the node's console? > > > > > > On Thu, Mar 6, 2008 at 8:25 AM, Brandon Young > wrote: > > A few days ago, I wrote about a problem I was having in an old > version of > > warewulf. I decided to upgrade to Perceus 1.3.6 and give that a > try. I am > > still running into problems, and I hope people here can guide me > as to where > > to look for clues. The scenario is: I have installed and configured > > perceus, and built a centos5 vnfs image using the included > scripts. I used > > the Perceus User Guide, by the way, to do all this. I am > running Perceus > > 1.3.6. The problem I have is that the client node never boots > the vnfs. I > > am sure I haven't fully completed the configuration in some way, > I just > > don't know what I've missed. I see the node boot into perceus, > and see it > > get provisioned, then it pauses for about 30 seconds and says > it's going to > > reboot ... and the whole process starts over, again. > > > > /var/log/messages shows: > > > > Mar 6 10:06:28 cluster04 perceus-dnsmasq[11422]: BOOTP(eth1) > > 00:1c:23:6e:f2:8e no address configured > > Mar 6 10:06:29 cluster04 perceus-dnsmasq[11422]: DHCPDISCOVER(eth1) > > 00:19:b9:f5:c9:f9 > > Mar 6 10:06:29 cluster04 perceus-dnsmasq[11422]: DHCPOFFER(eth1) > > 172.0.10.199 00:19:b9:f5:c9:f9 > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: DHCPREQUEST(eth1) > > 172.0.10.199 00:19:b9:f5:c9:f9 > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: DHCPACK(eth1) > > 172.0.10.199 00:19:b9:f5:c9:f9 node0000 > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > > /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 > > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP error 0 > TFTP Aborted > > received from 172.0.10.199 > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP failed > sending > > /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 > > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > > /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 > > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > > /var/lib/perceus//tftp/pxelinux.cfg/default to 172.0.10.199 > > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > > /var/lib/perceus//tftp/kernel to 172.0.10.199 > > Mar 6 10:06:34 cluster04 perceus-dnsmasq[11422]: TFTP sent > > /var/lib/perceus//tftp/initramfs.img to 172.0.10.199 > > > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: > DHCPDISCOVER(eth1) > > 00:19:b9:f5:c9:f9 > > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPOFFER(eth1) > > 172.0.10.199 00:19:b9:f5:c9:f9 > > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPREQUEST(eth1) > > 172.0.10.199 00:19:b9:f5:c9:f9 > > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPACK(eth1) > 172.0.10.199 > > 00:19:b9:f5:c9:f9 node0000 > > Mar 6 10:06:45 cluster04 perceusd[11426]: Provisioning > 'node0000' now... > > Mar 6 10:06:45 cluster04 mountd[11264]: authenticated mount > request from > > 172.0.10.199:720 for /var/lib/perceus > (/var/lib/perceus) > > > > [root at cluster04 ~]# showmount > > Hosts on cluster04: > > 172.0.10.199 > > > > But no further progress is made before node0000 reboots. > > > > My defaults.conf file look like: > > > > > > > > [root at cluster04 perceus]# cat defaults.conf > > # This is the template name for all new nodes as they are > configured. > > > > # Define the node name range. The '#' characters symbolize the > node number > > # in the order of initalized. If you don't allocate enough > number spaces > > # here for what you defined in 'Total Nodes' then it will be > automatically > > # padded. > > Node Name = node#### > > > > # What is the default group for new nodes (this doesn't have to > exist > > # anywhere before hand) > > Group Name = cluster > > > > # Define the default VNFS image that should be assigned to new nodes > > Vnfs Name = centos-5.0-1.stateless.x86_64 > > > > # Are new nodes automatically enabled and provisionined? > > Enabled = 1 > > > > # What is the first node number that we should count at? > > First Node = 0 > > > > # This is the total node count that Perceus would ever try and > allocate a > > # node to. It is safe to make this big, so you should leave it big. > > Total Nodes = 10000 > > > > > > > > > > > > My perceus.conf looks like: > > > > [root at cluster04 perceus]# cat perceus.conf > > # This is the configuration file for Perceus > > > > # Define the IP Address of the network file server > > vnfs transfer master = 172.0.8.1 > > > > # What protocol should be used to retireve the VNFS information. > Generally > > # Supported options in this version of Perceus are: 'nfs' and > 'http' but > > # this maybe overridden by particular VNFS capsules. > > vnfs transfer method = nfs > > > > # Define the VNFS transfer location if it is different from the > default > > # ('statedir'). This gets used differently for different > transfer methods > > # (e.g. NFS this replaces the path to statedir, while with http > it is gets > > # prepended to the "/perceus" path). > > vnfs transfer prefix = > > > > # How long should we wait before considering a node as dead. > Note, that if > > # you are not running node client daemons, then after > provisioning the node > > # will never check in, and will no doubt expire. > > node timeout = > > > > > > > > > > Ifconfig on the perceus head node is: > > > > [root at cluster04 perceus]# ifconfig > > > > eth0 Link encap:Ethernet HWaddr 00:1C:23:C7:8C:98 > > inet addr:10.0.50.139 > Bcast:10.0.50.255 Mask:255.255.255.0 > > > inet6 addr: fe80::21c:23ff:fec7:8c98/64 Scope:Link > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:5101316 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:4728351 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:490573750 (467.8 MiB) TX bytes:518375590 > (494.3 MiB) > > Interrupt:169 Memory:da000000-da012100 > > > > eth1 Link encap:Ethernet HWaddr 00:1C:23:C7:8C:9A > > inet addr:172.0.8.1 > Bcast:172.0.11.255 Mask:255.255.252.0 > > > inet6 addr: fe80::21c:23ff:fec7:8c9a/64 Scope:Link > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:22218 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:18343 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:2541504 (2.4 MiB) TX bytes:26964022 (25.7 MiB) > > Interrupt:169 Memory:d6000000-d6012100 > > > > The node I'm attempting to boot is a Dell PowerEdge 1955 blade. > > > > I am brand new to perceus, and am not sure where to look for > clues. Any > > guidance would be greatly appreciated. > > > > -- > > Brandon > > > > _______________________________________________ > > Warewulf mailing list > > Warewulf at caoslinux.org > > http://lists.caosity.org/mailman/listinfo/warewulf > > > > > > > > -- > Greg Kurtzer > http://www.runlevelzero.net/ > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > > ------------------------------------------------------------------------ > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080306/dfb3c4dd/attachment.html From bkyoung at gmail.com Thu Mar 6 12:09:09 2008 From: bkyoung at gmail.com (Brandon Young) Date: Thu, 6 Mar 2008 14:09:09 -0600 Subject: [Warewulf] New node not booting in perceus In-Reply-To: <47D0331D.1000806@lanl.gov> References: <824ffea00803060825w429253c3yb11b384e28322bfa@mail.gmail.com> <571f1a060803060835rcd5a371o336e595bff9c9bd0@mail.gmail.com> <824ffea00803060850g5a4d1d35l48cbf5198610f0b7@mail.gmail.com> <47D0331D.1000806@lanl.gov> Message-ID: <824ffea00803061209v57cdcafse6cd6fe58ffe26ea@mail.gmail.com> Yes! THat's what I was looking for. I ran across that mail in the last week or so and couldn't find it again, and only vaguely remembered the command, but couldn't piece it back to gether from the man page. Thanks, Jason. So, I did that. and now what I see on the console of the booting node is: Etherlink found, requesting DHCP configuration via eth0 Provisioning from 172.0.8.1 ... Beginning provisioning with debug level 1 now provisioning: node0000 vnfs: centos-5.0-1.stateless.x86_64.vnfs group: cluster Node ID: 00:19:B9:F5:C9:F9 Mounting via NFS 172.0.8.1:/var/lib/perceus/ Un-mounting 172.0.8.1:/var/lib/perceus/ Un-loading device drivers: \-> bnx2 mptsas Total provision time: 1 s Sleeping for 10 seconds for debug evaluation ... Waiting 30 seconds, and rebooting ... I don't understand why it mounts and then immediately unmounts the export. I see the mount request in /var/log/messages. Showmount, on the head node, indicates it is mounted by the node. Is there a higher debug level that would give more insight? Is the answer obvious and I'm just missing it? On Thu, Mar 6, 2008 at 12:08 PM, jgans wrote: > Hi Brandon, > > Turning on the debug mode for the node in question will provide additional > information in the node console: > > # perceus node set debug 1 n0000 > > This was suggested in a previous post ( > http://lists.caosity.org/pipermail/warewulf/2007-June/002921.html). It > helped me track down a hardware incompatibility between a node and the > perceus provisioning kernel that presented the same symptoms that you > describe. > > Regards, > > Jason > > Brandon Young wrote: > > I see the machine post, then PXE boot a kernel. At this point, I see the > Infiscale Perceus screen. I then see it say: > > Etherlink found, requesting DHCP configuration via eth0 > Provisioning from 172.0.8.1 ... > > now provisioning: node0000 > vnfs: centos-5.0-1.stateless.x86_64.vnfs > group: cluster > Node ID: 00:19:B9:F5:C9:F9 > Total provision time: 1 s > > > > Waiting 30 seconds, and rebooting ... > > Press [ENTER] to interrupt reboot and get a shell > > > On Thu, Mar 6, 2008 at 10:35 AM, Greg Kurtzer wrote: > > > Can you describe what is happening on the node's console? > > > > > > > > > > > > On Thu, Mar 6, 2008 at 8:25 AM, Brandon Young wrote: > > > A few days ago, I wrote about a problem I was having in an old version > > of > > > warewulf. I decided to upgrade to Perceus 1.3.6 and give that a try. > > I am > > > still running into problems, and I hope people here can guide me as to > > where > > > to look for clues. The scenario is: I have installed and configured > > > perceus, and built a centos5 vnfs image using the included scripts. I > > used > > > the Perceus User Guide, by the way, to do all this. I am running > > Perceus > > > 1.3.6. The problem I have is that the client node never boots the > > vnfs. I > > > am sure I haven't fully completed the configuration in some way, I > > just > > > don't know what I've missed. I see the node boot into perceus, and > > see it > > > get provisioned, then it pauses for about 30 seconds and says it's > > going to > > > reboot ... and the whole process starts over, again. > > > > > > /var/log/messages shows: > > > > > > Mar 6 10:06:28 cluster04 perceus-dnsmasq[11422]: BOOTP(eth1) > > > 00:1c:23:6e:f2:8e no address configured > > > Mar 6 10:06:29 cluster04 perceus-dnsmasq[11422]: DHCPDISCOVER(eth1) > > > 00:19:b9:f5:c9:f9 > > > Mar 6 10:06:29 cluster04 perceus-dnsmasq[11422]: DHCPOFFER(eth1) > > > 172.0.10.199 00:19:b9:f5:c9:f9 > > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: DHCPREQUEST(eth1) > > > 172.0.10.199 00:19:b9:f5:c9:f9 > > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: DHCPACK(eth1) > > > 172.0.10.199 00:19:b9:f5:c9:f9 node0000 > > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > > > /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 > > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP error 0 TFTP > > Aborted > > > received from 172.0.10.199 > > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP failed sending > > > /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 > > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > > > /var/lib/perceus//tftp/pxelinux.0 to 172.0.10.199 > > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > > > /var/lib/perceus//tftp/pxelinux.cfg/default to 172.0.10.199 > > > Mar 6 10:06:33 cluster04 perceus-dnsmasq[11422]: TFTP sent > > > /var/lib/perceus//tftp/kernel to 172.0.10.199 > > > Mar 6 10:06:34 cluster04 perceus-dnsmasq[11422]: TFTP sent > > > /var/lib/perceus//tftp/initramfs.img to 172.0.10.199 > > > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPDISCOVER(eth1) > > > 00:19:b9:f5:c9:f9 > > > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPOFFER(eth1) > > > 172.0.10.199 00:19:b9:f5:c9:f9 > > > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPREQUEST(eth1) > > > 172.0.10.199 00:19:b9:f5:c9:f9 > > > Mar 6 10:06:45 cluster04 perceus-dnsmasq[11422]: DHCPACK(eth1) > > 172.0.10.199 > > > 00:19:b9:f5:c9:f9 node0000 > > > Mar 6 10:06:45 cluster04 perceusd[11426]: Provisioning 'node0000' > > now... > > > Mar 6 10:06:45 cluster04 mountd[11264]: authenticated mount request > > from > > > 172.0.10.199:720 for /var/lib/perceus (/var/lib/perceus) > > > > > > [root at cluster04 ~]# showmount > > > Hosts on cluster04: > > > 172.0.10.199 > > > > > > But no further progress is made before node0000 reboots. > > > > > > My defaults.conf file look like: > > > > > > > > > > > > [root at cluster04 perceus]# cat defaults.conf > > > # This is the template name for all new nodes as they are configured. > > > > > > # Define the node name range. The '#' characters symbolize the node > > number > > > # in the order of initalized. If you don't allocate enough number > > spaces > > > # here for what you defined in 'Total Nodes' then it will be > > automatically > > > # padded. > > > Node Name = node#### > > > > > > # What is the default group for new nodes (this doesn't have to exist > > > # anywhere before hand) > > > Group Name = cluster > > > > > > # Define the default VNFS image that should be assigned to new nodes > > > Vnfs Name = centos-5.0-1.stateless.x86_64 > > > > > > # Are new nodes automatically enabled and provisionined? > > > Enabled = 1 > > > > > > # What is the first node number that we should count at? > > > First Node = 0 > > > > > > # This is the total node count that Perceus would ever try and > > allocate a > > > # node to. It is safe to make this big, so you should leave it big. > > > Total Nodes = 10000 > > > > > > > > > > > > > > > > > > My perceus.conf looks like: > > > > > > [root at cluster04 perceus]# cat perceus.conf > > > # This is the configuration file for Perceus > > > > > > # Define the IP Address of the network file server > > > vnfs transfer master = 172.0.8.1 > > > > > > # What protocol should be used to retireve the VNFS information. > > Generally > > > # Supported options in this version of Perceus are: 'nfs' and 'http' > > but > > > # this maybe overridden by particular VNFS capsules. > > > vnfs transfer method = nfs > > > > > > # Define the VNFS transfer location if it is different from the > > default > > > # ('statedir'). This gets used differently for different transfer > > methods > > > # (e.g. NFS this replaces the path to statedir, while with http it is > > gets > > > # prepended to the "/perceus" path). > > > vnfs transfer prefix = > > > > > > # How long should we wait before considering a node as dead. Note, > > that if > > > # you are not running node client daemons, then after provisioning > > the node > > > # will never check in, and will no doubt expire. > > > node timeout = > > > > > > > > > > > > > > > Ifconfig on the perceus head node is: > > > > > > [root at cluster04 perceus]# ifconfig > > > > > > eth0 Link encap:Ethernet HWaddr 00:1C:23:C7:8C:98 > > > inet addr:10.0.50.139 Bcast:10.0.50.255 Mask:255.255.255.0 > > > inet6 addr: fe80::21c:23ff:fec7:8c98/64 Scope:Link > > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > RX packets:5101316 errors:0 dropped:0 overruns:0 frame:0 > > > TX packets:4728351 errors:0 dropped:0 overruns:0 carrier:0 > > > collisions:0 txqueuelen:1000 > > > RX bytes:490573750 (467.8 MiB) TX bytes:518375590 (494.3MiB) > > > Interrupt:169 Memory:da000000-da012100 > > > > > > eth1 Link encap:Ethernet HWaddr 00:1C:23:C7:8C:9A > > > inet addr:172.0.8.1 Bcast:172.0.11.255 Mask:255.255.252.0 > > > inet6 addr: fe80::21c:23ff:fec7:8c9a/64 Scope:Link > > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > > RX packets:22218 errors:0 dropped:0 overruns:0 frame:0 > > > TX packets:18343 errors:0 dropped:0 overruns:0 carrier:0 > > > collisions:0 txqueuelen:1000 > > > RX bytes:2541504 (2.4 MiB) TX bytes:26964022 (25.7 MiB) > > > Interrupt:169 Memory:d6000000-d6012100 > > > > > > The node I'm attempting to boot is a Dell PowerEdge 1955 blade. > > > > > > I am brand new to perceus, and am not sure where to look for clues. > > Any > > > guidance would be greatly appreciated. > > > > > > -- > > > Brandon > > > > > > _______________________________________________ > > > Warewulf mailing list > > > Warewulf at caoslinux.org > > > http://lists.caosity.org/mailman/listinfo/warewulf > > > > > > > > > > > > > > -- > > Greg Kurtzer > > http://www.runlevelzero.net/ > > _______________________________________________ > > Warewulf mailing list > > Warewulf at caoslinux.org > > http://lists.caosity.org/mailman/listinfo/warewulf > > > > ------------------------------ > > _______________________________________________ > Warewulf mailing listWarewulf at caoslinux.orghttp://lists.caosity.org/mailman/listinfo/warewulf > > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080306/45e96ae4/attachment.html From mbaxter2 at hotmail.com Thu Mar 6 16:42:57 2008 From: mbaxter2 at hotmail.com (Michael Baxter) Date: Fri, 7 Mar 2008 11:42:57 +1100 Subject: [Warewulf] wwtop and perceus Message-ID: Hi, Ive recently installed perceus 1.3.6 (now i see 1.4 is out and ill probably upgrade soonish), i grabbed a hold of the warewulf-tools from https://www.perceus.org/svn/perceus/warewulf/3.0and build it, however wwtop seems to want to connect to port 9873 (and there is nothing running on that port), the usual display churns up on wwtop but it cant see the nodes. Just before the display turns up i get "Could not connect to localhost:9873!". Otherwise everything is going great and you should be proud of such a cool peice of software. Regards Michael _________________________________________________________________ Your Future Starts Here. Dream it? Then be it! Find it at www.seek.com.au http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Eseek%2Ecom%2Eau%2F%3Ftracking%3Dsk%3Ahet%3Ask%3Anine%3A0%3Ahot%3Atext&_t=764565661&_r=OCT07_endtext_Future&_m=EXT -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080307/ca714ecb/attachment.html From gmkurtzer at gmail.com Thu Mar 6 21:09:46 2008 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Thu, 6 Mar 2008 21:09:46 -0800 Subject: [Warewulf] wwtop and perceus In-Reply-To: References: Message-ID: <571f1a060803062109g5b33ecadt53b5aba2bb615e39@mail.gmail.com> Actually, Perceus 1.4 is still in active development it was 1.3.7 that was just released. Check that the warewulf daemon (warewulfd) is running on the master and the client daemons are running on the nodes (wulfd). Not sure if you are already using it but Caos NSA has both Perceus and Warewulf pre-integrated, interactive setup tools all in a 5-minute brain-dead install (it is even Intel Cluster Ready certified!). ;) Thank you very much for the compliments! On Thu, Mar 6, 2008 at 4:42 PM, Michael Baxter wrote: > > Hi, > > Ive recently installed perceus 1.3.6 (now i see 1.4 is out and ill probably > upgrade soonish), i grabbed a hold of the warewulf-tools from > https://www.perceus.org/svn/perceus/warewulf/3.0 > and build it, however wwtop seems to want to connect to port 9873 (and there > is nothing running on that port), the usual display churns up on wwtop but > it cant see the nodes. Just before the display turns up i get "Could not > connect to localhost:9873!". Otherwise everything is going great and you > should be proud of such a cool peice of software. > > Regards > Michael > > > ________________________________ > Find it at www.seek.com.au Your Future Starts Here. Dream it? Then be it! > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > -- Greg Kurtzer http://www.runlevelzero.net/ From gmkurtzer at gmail.com Thu Mar 6 21:09:46 2008 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Thu, 6 Mar 2008 21:09:46 -0800 Subject: [Warewulf] wwtop and perceus In-Reply-To: References: Message-ID: <571f1a060803062109g5b33ecadt53b5aba2bb615e39@mail.gmail.com> Actually, Perceus 1.4 is still in active development it was 1.3.7 that was just released. Check that the warewulf daemon (warewulfd) is running on the master and the client daemons are running on the nodes (wulfd). Not sure if you are already using it but Caos NSA has both Perceus and Warewulf pre-integrated, interactive setup tools all in a 5-minute brain-dead install (it is even Intel Cluster Ready certified!). ;) Thank you very much for the compliments! On Thu, Mar 6, 2008 at 4:42 PM, Michael Baxter wrote: > > Hi, > > Ive recently installed perceus 1.3.6 (now i see 1.4 is out and ill probably > upgrade soonish), i grabbed a hold of the warewulf-tools from > https://www.perceus.org/svn/perceus/warewulf/3.0 > and build it, however wwtop seems to want to connect to port 9873 (and there > is nothing running on that port), the usual display churns up on wwtop but > it cant see the nodes. Just before the display turns up i get "Could not > connect to localhost:9873!". Otherwise everything is going great and you > should be proud of such a cool peice of software. > > Regards > Michael > > > ________________________________ > Find it at www.seek.com.au Your Future Starts Here. Dream it? Then be it! > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > -- Greg Kurtzer http://www.runlevelzero.net/ From gmkurtzer at gmail.com Thu Mar 6 21:10:41 2008 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Thu, 6 Mar 2008 21:10:41 -0800 Subject: [Warewulf] wwtop and perceus In-Reply-To: References: Message-ID: <571f1a060803062110v6db2ef00p1d491559016a7e15@mail.gmail.com> Actually, Perceus 1.4 is still in active development it was 1.3.7 that was just released. Check that the warewulf daemon (warewulfd) is running on the master and the client daemons are running on the nodes (wulfd). Not sure if you are already using it but Caos NSA has both Perceus and Warewulf pre-integrated, interactive setup tools all in a 5-minute brain-dead install (it is even Intel Cluster Ready certified!). ;) Thank you very much for the compliments! On Thu, Mar 6, 2008 at 4:42 PM, Michael Baxter wrote: > > Hi, > > Ive recently installed perceus 1.3.6 (now i see 1.4 is out and ill probably > upgrade soonish), i grabbed a hold of the warewulf-tools from > https://www.perceus.org/svn/perceus/warewulf/3.0 > and build it, however wwtop seems to want to connect to port 9873 (and there > is nothing running on that port), the usual display churns up on wwtop but > it cant see the nodes. Just before the display turns up i get "Could not > connect to localhost:9873!". Otherwise everything is going great and you > should be proud of such a cool peice of software. > > Regards > Michael > > > ________________________________ > Find it at www.seek.com.au Your Future Starts Here. Dream it? Then be it! > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > -- Greg Kurtzer http://www.runlevelzero.net/ From gmkurtzer at gmail.com Thu Mar 6 21:09:46 2008 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Thu, 6 Mar 2008 21:09:46 -0800 Subject: [Warewulf] wwtop and perceus In-Reply-To: References: Message-ID: <571f1a060803062109g5b33ecadt53b5aba2bb615e39@mail.gmail.com> Actually, Perceus 1.4 is still in active development it was 1.3.7 that was just released. Check that the warewulf daemon (warewulfd) is running on the master and the client daemons are running on the nodes (wulfd). Not sure if you are already using it but Caos NSA has both Perceus and Warewulf pre-integrated, interactive setup tools all in a 5-minute brain-dead install (it is even Intel Cluster Ready certified!). ;) Thank you very much for the compliments! On Thu, Mar 6, 2008 at 4:42 PM, Michael Baxter wrote: > > Hi, > > Ive recently installed perceus 1.3.6 (now i see 1.4 is out and ill probably > upgrade soonish), i grabbed a hold of the warewulf-tools from > https://www.perceus.org/svn/perceus/warewulf/3.0 > and build it, however wwtop seems to want to connect to port 9873 (and there > is nothing running on that port), the usual display churns up on wwtop but > it cant see the nodes. Just before the display turns up i get "Could not > connect to localhost:9873!". Otherwise everything is going great and you > should be proud of such a cool peice of software. > > Regards > Michael > > > ________________________________ > Find it at www.seek.com.au Your Future Starts Here. Dream it? Then be it! > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > -- Greg Kurtzer http://www.runlevelzero.net/ From poknam at gmail.com Sun Mar 9 23:57:02 2008 From: poknam at gmail.com (PN) Date: Mon, 10 Mar 2008 14:57:02 +0800 Subject: [Warewulf] perceus with xen Message-ID: <92daa7bf0803092357m680ef056la13e51e98f24b06b@mail.gmail.com> has anyone tried perceus with xen? previously i tried perceus 1.3.6 and 5.1 kernel without problem. now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. however when the client node bootup, it shows Now provisiong: node0001 VNFS: GT8000 Group: cluster Node ID: 00:14:25:00:04 + cat /found_nics + ifconfig eth0 down + ifconfig eth1 down + ifconfig ib0 down + ifconfig ib1 down + [ ! -f /sbin/detect ] + . /etc/functions + . /etc/initramfs.conf + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 + MAX_TRIES=5 + echo Un-loading device drivers: + echo -ne \-> \->+ unload_module scsi_mod + grep -q ^scsi_mod /proc/modules + PATH=/sbin rmmod scsi_mod + unload_module ib_ipoib + grep -q ^ib_ipoib /proc/modules + PATH=/sbin rmmod ib_ipoib + echo -n ib_ipoib ib_ipoib+ cat /etc/modulerc + read i + /sbin/detect -q + read i + unload_module uhci_hcd + grep -q ^uhci_hcd /proc/modules + read i + unload_module uhci-hcd + grep -q ^uhci-hcd /proc/modules + read i + unload_module ehci-hcd + grep -q ^ehci-hcd /proc/modules + read i + unload_module ata_piix + grep -q ^ata_piix /proc/modules + PATH=/sbin rmmod ata_piix + echo -n ata_piix ata_piix+ read i + unload_module piix + grep -q ^piix /proc/modules + PATH=/sbin remmod piix + read i + unload_module ib_mthca + grep -q ^ib_mthca /proc/modules + PATH=/sbin rmmod ib_mthca + echo -n ib_mthca ib_mthca+ read i + unload_module tg3 + grep -q ^tg3 /proc/modules + PATH=/sbin rmmod tg3 + echo -n tg3 tg3+ read i + unload_module tg3 + grep -q ^tg3 /proc/modules + read i + echo Total provision time: 6 s Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could not ..... Has anyone seen this before? any comment is appreciated. thanks, PN -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080310/7d150550/attachment.html From astevens at gravitypark.com Mon Mar 10 09:30:12 2008 From: astevens at gravitypark.com (Arthur Stevens) Date: Mon, 10 Mar 2008 09:30:12 -0700 Subject: [Warewulf] perceus with xen References: <92daa7bf0803092357m680ef056la13e51e98f24b06b@mail.gmail.com> Message-ID: <004201c882cc$059c0770$6400a8c0@terminal209> Perceus now fully supports KVM and VMware and still supports Xen. With Abstractual, you get full integration of VM's over the cluster with the ability to migrate. Sounds like a Centos issue here, and we try not to support Centos ;) What happens when you try it with Caos NSA? Arthur ----- Original Message ----- From: PN To: The Warewulf Cluster Toolkit Sent: Sunday, March 09, 2008 11:57 PM Subject: [Warewulf] perceus with xen has anyone tried perceus with xen? previously i tried perceus 1.3.6 and 5.1 kernel without problem. now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. however when the client node bootup, it shows Now provisiong: node0001 VNFS: GT8000 Group: cluster Node ID: 00:14:25:00:04 + cat /found_nics + ifconfig eth0 down + ifconfig eth1 down + ifconfig ib0 down + ifconfig ib1 down + [ ! -f /sbin/detect ] + . /etc/functions + . /etc/initramfs.conf + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 + MAX_TRIES=5 + echo Un-loading device drivers: + echo -ne \-> \->+ unload_module scsi_mod + grep -q ^scsi_mod /proc/modules + PATH=/sbin rmmod scsi_mod + unload_module ib_ipoib + grep -q ^ib_ipoib /proc/modules + PATH=/sbin rmmod ib_ipoib + echo -n ib_ipoib ib_ipoib+ cat /etc/modulerc + read i + /sbin/detect -q + read i + unload_module uhci_hcd + grep -q ^uhci_hcd /proc/modules + read i + unload_module uhci-hcd + grep -q ^uhci-hcd /proc/modules + read i + unload_module ehci-hcd + grep -q ^ehci-hcd /proc/modules + read i + unload_module ata_piix + grep -q ^ata_piix /proc/modules + PATH=/sbin rmmod ata_piix + echo -n ata_piix ata_piix+ read i + unload_module piix + grep -q ^piix /proc/modules + PATH=/sbin remmod piix + read i + unload_module ib_mthca + grep -q ^ib_mthca /proc/modules + PATH=/sbin rmmod ib_mthca + echo -n ib_mthca ib_mthca+ read i + unload_module tg3 + grep -q ^tg3 /proc/modules + PATH=/sbin rmmod tg3 + echo -n tg3 tg3+ read i + unload_module tg3 + grep -q ^tg3 /proc/modules + read i + echo Total provision time: 6 s Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could not ..... Has anyone seen this before? any comment is appreciated. thanks, PN ------------------------------------------------------------------------------ _______________________________________________ Warewulf mailing list Warewulf at caoslinux.org http://lists.caosity.org/mailman/listinfo/warewulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080310/38f2efa2/attachment.html From griznog at gmail.com Mon Mar 10 09:41:32 2008 From: griznog at gmail.com (John Hanks) Date: Mon, 10 Mar 2008 10:41:32 -0600 Subject: [Warewulf] perceus with xen In-Reply-To: <004201c882cc$059c0770$6400a8c0@terminal209> References: <92daa7bf0803092357m680ef056la13e51e98f24b06b@mail.gmail.com> <004201c882cc$059c0770$6400a8c0@terminal209> Message-ID: If I want to boot a xen kernel, my grub.conf has two "module" lines, the kernel points to xen, then there is a module for the actual kernel and one for the initrd. Do you have an example of a working pxe config you can post to send xen + kernel + initrd? Does caos with a xen kernel handle this differently than centos? jbh On Mon, Mar 10, 2008 at 10:30 AM, Arthur Stevens wrote: > > > Perceus now fully supports KVM and VMware and still supports Xen. With > Abstractual, you get full integration of VM's over the cluster with the > ability to migrate. Sounds like a Centos issue here, and we try not to > support Centos ;) > > What happens when you try it with Caos NSA? > > Arthur > > > ----- Original Message ----- > From: PN > To: The Warewulf Cluster Toolkit > Sent: Sunday, March 09, 2008 11:57 PM > Subject: [Warewulf] perceus with xen > > > has anyone tried perceus with xen? > previously i tried perceus 1.3.6 and 5.1 kernel without problem. > now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. > however when the client node bootup, it shows > > Now provisiong: node0001 > VNFS: GT8000 > Group: cluster > Node ID: 00:14:25:00:04 > + cat /found_nics > + ifconfig eth0 down > + ifconfig eth1 down > + ifconfig ib0 down > + ifconfig ib1 down > + [ ! -f /sbin/detect ] > + . /etc/functions > + . /etc/initramfs.conf > + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 > + MAX_TRIES=5 > + echo Un-loading device drivers: > + echo -ne \-> > \->+ unload_module scsi_mod > + grep -q ^scsi_mod /proc/modules > + PATH=/sbin rmmod scsi_mod > + unload_module ib_ipoib > + grep -q ^ib_ipoib /proc/modules > + PATH=/sbin rmmod ib_ipoib > + echo -n ib_ipoib > ib_ipoib+ cat /etc/modulerc > + read i > + /sbin/detect -q > + read i > + unload_module uhci_hcd > + grep -q ^uhci_hcd /proc/modules > + read i > + unload_module uhci-hcd > + grep -q ^uhci-hcd /proc/modules > + read i > > + unload_module ehci-hcd > + grep -q ^ehci-hcd /proc/modules > + read i > > + unload_module ata_piix > + grep -q ^ata_piix /proc/modules > + PATH=/sbin rmmod ata_piix > + echo -n ata_piix > ata_piix+ read i > > + unload_module piix > + grep -q ^piix /proc/modules > + PATH=/sbin remmod piix > + read i > > + unload_module ib_mthca > + grep -q ^ib_mthca /proc/modules > + PATH=/sbin rmmod ib_mthca > + echo -n ib_mthca > ib_mthca+ read i > + unload_module tg3 > > + grep -q ^tg3 /proc/modules > + PATH=/sbin rmmod tg3 > + echo -n tg3 > tg3+ read i > + unload_module tg3 > > > + grep -q ^tg3 /proc/modules > + read i > + echo > > Total provision time: 6 s > > Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff > Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could not ..... > > Has anyone seen this before? any comment is appreciated. > > thanks, > PN > > ________________________________ > > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > From laytonjb at charter.net Mon Mar 10 14:12:07 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Mon, 10 Mar 2008 16:12:07 -0500 Subject: [Warewulf] perceus with xen In-Reply-To: <004201c882cc$059c0770$6400a8c0@terminal209> References: <92daa7bf0803092357m680ef056la13e51e98f24b06b@mail.gmail.com> <004201c882cc$059c0770$6400a8c0@terminal209> Message-ID: <47D5A427.3060708@charter.net> Can you tell the studio office waht Abstractual is? What OS's will you support for in addition to Caos NSA? Thanks! Jeff > Perceus now fully supports KVM and VMware and still supports Xen. With > Abstractual, you get full integration of VM's over the cluster with > the ability to migrate. Sounds like a Centos issue here, and we try > not to support Centos ;) > > What happens when you try it with Caos NSA? > > Arthur > > ----- Original Message ----- > *From:* PN > *To:* The Warewulf Cluster Toolkit > *Sent:* Sunday, March 09, 2008 11:57 PM > *Subject:* [Warewulf] perceus with xen > > has anyone tried perceus with xen? > previously i tried perceus 1.3.6 and 5.1 kernel without problem. > now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. > however when the client node bootup, it shows > > Now provisiong: node0001 > VNFS: GT8000 > Group: cluster > Node ID: 00:14:25:00:04 > + cat /found_nics > + ifconfig eth0 down > + ifconfig eth1 down > + ifconfig ib0 down > + ifconfig ib1 down > + [ ! -f /sbin/detect ] > + . /etc/functions > + . /etc/initramfs.conf > + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 > + MAX_TRIES=5 > + echo Un-loading device drivers: > + echo -ne \-> > \->+ unload_module scsi_mod > + grep -q ^scsi_mod /proc/modules > + PATH=/sbin rmmod scsi_mod > + unload_module ib_ipoib > + grep -q ^ib_ipoib /proc/modules > + PATH=/sbin rmmod ib_ipoib > + echo -n ib_ipoib > ib_ipoib+ cat /etc/modulerc > + read i > + /sbin/detect -q > + read i > + unload_module uhci_hcd > + grep -q ^uhci_hcd /proc/modules > + read i > + unload_module uhci-hcd > + grep -q ^uhci-hcd /proc/modules > + read i > + unload_module ehci-hcd > + grep -q ^ehci-hcd /proc/modules > + read i > + unload_module ata_piix > + grep -q ^ata_piix /proc/modules > + PATH=/sbin rmmod ata_piix > + echo -n ata_piix > ata_piix+ read i > + unload_module piix > + grep -q ^piix /proc/modules > + PATH=/sbin remmod piix > + read i > + unload_module ib_mthca > + grep -q ^ib_mthca /proc/modules > + PATH=/sbin rmmod ib_mthca > + echo -n ib_mthca > ib_mthca+ read i > + unload_module tg3 > + grep -q ^tg3 /proc/modules > + PATH=/sbin rmmod tg3 > + echo -n tg3 > tg3+ read i > + unload_module tg3 > + grep -q ^tg3 /proc/modules > + read i > + echo > > Total provision time: 6 s > > Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff > Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could > not ..... > > Has anyone seen this before? any comment is appreciated. > > thanks, > PN > > ------------------------------------------------------------------------ > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > ------------------------------------------------------------------------ > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > From laytonjb at charter.net Mon Mar 10 14:17:59 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Mon, 10 Mar 2008 16:17:59 -0500 Subject: [Warewulf] perceus with xen In-Reply-To: <47D5A427.3060708@charter.net> References: <92daa7bf0803092357m680ef056la13e51e98f24b06b@mail.gmail.com> <004201c882cc$059c0770$6400a8c0@terminal209> <47D5A427.3060708@charter.net> Message-ID: <47D5A587.8040607@charter.net> That should have been "studio audience". Man it's a bad day for spelling. > Can you tell the studio office waht Abstractual is? What OS's will you > support > for in addition to Caos NSA? > > Thanks! > > Jeff > > >> Perceus now fully supports KVM and VMware and still supports Xen. With >> Abstractual, you get full integration of VM's over the cluster with >> the ability to migrate. Sounds like a Centos issue here, and we try >> not to support Centos ;) >> >> What happens when you try it with Caos NSA? >> >> Arthur >> >> ----- Original Message ----- >> *From:* PN >> *To:* The Warewulf Cluster Toolkit >> *Sent:* Sunday, March 09, 2008 11:57 PM >> *Subject:* [Warewulf] perceus with xen >> >> has anyone tried perceus with xen? >> previously i tried perceus 1.3.6 and 5.1 kernel without problem. >> now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. >> however when the client node bootup, it shows >> >> Now provisiong: node0001 >> VNFS: GT8000 >> Group: cluster >> Node ID: 00:14:25:00:04 >> + cat /found_nics >> + ifconfig eth0 down >> + ifconfig eth1 down >> + ifconfig ib0 down >> + ifconfig ib1 down >> + [ ! -f /sbin/detect ] >> + . /etc/functions >> + . /etc/initramfs.conf >> + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 >> + MAX_TRIES=5 >> + echo Un-loading device drivers: >> + echo -ne \-> >> \->+ unload_module scsi_mod >> + grep -q ^scsi_mod /proc/modules >> + PATH=/sbin rmmod scsi_mod >> + unload_module ib_ipoib >> + grep -q ^ib_ipoib /proc/modules >> + PATH=/sbin rmmod ib_ipoib >> + echo -n ib_ipoib >> ib_ipoib+ cat /etc/modulerc >> + read i >> + /sbin/detect -q >> + read i >> + unload_module uhci_hcd >> + grep -q ^uhci_hcd /proc/modules >> + read i >> + unload_module uhci-hcd >> + grep -q ^uhci-hcd /proc/modules >> + read i >> + unload_module ehci-hcd >> + grep -q ^ehci-hcd /proc/modules >> + read i >> + unload_module ata_piix >> + grep -q ^ata_piix /proc/modules >> + PATH=/sbin rmmod ata_piix >> + echo -n ata_piix >> ata_piix+ read i >> + unload_module piix >> + grep -q ^piix /proc/modules >> + PATH=/sbin remmod piix >> + read i >> + unload_module ib_mthca >> + grep -q ^ib_mthca /proc/modules >> + PATH=/sbin rmmod ib_mthca >> + echo -n ib_mthca >> ib_mthca+ read i >> + unload_module tg3 >> + grep -q ^tg3 /proc/modules >> + PATH=/sbin rmmod tg3 >> + echo -n tg3 >> tg3+ read i >> + unload_module tg3 >> + grep -q ^tg3 /proc/modules >> + read i >> + echo >> >> Total provision time: 6 s >> >> Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff >> Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could >> not ..... >> >> Has anyone seen this before? any comment is appreciated. >> >> thanks, >> PN >> >> ------------------------------------------------------------------------ >> _______________________________________________ >> Warewulf mailing list >> Warewulf at caoslinux.org >> http://lists.caosity.org/mailman/listinfo/warewulf >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Warewulf mailing list >> Warewulf at caoslinux.org >> http://lists.caosity.org/mailman/listinfo/warewulf >> >> > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > From astevens at gravitypark.com Mon Mar 10 15:55:34 2008 From: astevens at gravitypark.com (Arthur Stevens) Date: Mon, 10 Mar 2008 15:55:34 -0700 Subject: [Warewulf] perceus with xen References: <92daa7bf0803092357m680ef056la13e51e98f24b06b@mail.gmail.com> <004201c882cc$059c0770$6400a8c0@terminal209> <47D5A427.3060708@charter.net> <47D5A587.8040607@charter.net> Message-ID: <00af01c88301$d9fd4800$6400a8c0@terminal209> Hi Jeff, Great Article in Linux Magazine (March 2008 pg 46 Perceus/Warewulf - Tres Cool Cluster Tool) I suggest it to those looking for a great article to give someone as an intro to Perceus to pick up a copy. Also good to show your support for Perceus and make sure all of the March issues get bought out :) As far as Abstractual, it is one of our new offerings that creates what we call the 'Abstract Fabric' which is the mesh of assorted servers, appliances and assorted services that ammount to an enterprises everything. :) This fabric usually consist of a series of layers, and what Abstractual does is create a virtual presence by enabling the enterprise to basically have one single point for maintaining it all. By allowing users to simply add more servers and be able to control how many are on doing what, combined with virtualization and the ability to span applications and vms over single or multiple systems, you no longer have to base stuff on node count, but overall resources. You can also power down the nodes your not using for amazing decreases in power consumption. Think of always having enough power when you need it, but not having rack after rack screaming away with nothing going on. Abstractual also allows you to sell off your extra cycles or to share cycles with other organizations and other grids. It will likely be available for download along side Perceus this summer, along with a few other 'disruptive' new (and still open) products :) For instance, say you are a lab or isp with several users spread out everywhere. You need the ability to give them the horsepower you want, but you don't want to have 50 clusters set up for 10 groups. You also don't want the power bill for this when not in use either. Here steps in Abstractual. It can control and manage tens of thousands of servers in multiple geographical locations and allow you to spread jobs, services, resources and even move virtual machines around to save power, bandwidth cost, or increase performance. Think of it as the first app that takes advantage of the provisioning power of Perceus along with a new ease of use and HPC style performace for the enterprise. We actually plan on supporting all os's (was just joking about Centos), and even support Windows. We currently are supporting VNFS for Caos, RedHat Ent 4/5, SuSE 10.1, Ubuntu, Centos and Windows. We are also working on the install for Fedora and Gentoo. Experimental support for Solaris is going on as well. Our virtualization software is 100% compatible with any x86 compatible install (32 or 64 bit) and you also have the option to use vmware instead of the built in virtualization capabilities that are powered by a modded KVM. Xen support is there with a little work and swapping of kernel. I will make sure the xen kernel I have is available for download if it is not. Any one with further questions can also mail me directly at either this address or at the same username at Infiscale.com. Thanks to everyone for their support and all the awesome encoragement. We hope the future releases help enable you, the users, because thats really what all of this is about. Arthur ----- Original Message ----- From: "Jeffrey B. Layton" To: ; "The Warewulf Cluster Toolkit" Cc: "Arthur Stevens" Sent: Monday, March 10, 2008 2:17 PM Subject: Re: [Warewulf] perceus with xen > That should have been "studio audience". Man it's a bad day for spelling. > >> Can you tell the studio office waht Abstractual is? What OS's will you >> support >> for in addition to Caos NSA? >> >> Thanks! >> >> Jeff >> >> >>> Perceus now fully supports KVM and VMware and still supports Xen. With >>> Abstractual, you get full integration of VM's over the cluster with the >>> ability to migrate. Sounds like a Centos issue here, and we try not to >>> support Centos ;) >>> What happens when you try it with Caos NSA? >>> Arthur >>> >>> ----- Original Message ----- >>> *From:* PN >>> *To:* The Warewulf Cluster Toolkit >>> *Sent:* Sunday, March 09, 2008 11:57 PM >>> *Subject:* [Warewulf] perceus with xen >>> >>> has anyone tried perceus with xen? >>> previously i tried perceus 1.3.6 and 5.1 kernel without problem. >>> now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. >>> however when the client node bootup, it shows >>> Now provisiong: node0001 >>> VNFS: GT8000 >>> Group: cluster >>> Node ID: 00:14:25:00:04 >>> + cat /found_nics >>> + ifconfig eth0 down >>> + ifconfig eth1 down >>> + ifconfig ib0 down >>> + ifconfig ib1 down >>> + [ ! -f /sbin/detect ] >>> + . /etc/functions >>> + . /etc/initramfs.conf >>> + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 >>> + MAX_TRIES=5 >>> + echo Un-loading device drivers: >>> + echo -ne \-> >>> \->+ unload_module scsi_mod >>> + grep -q ^scsi_mod /proc/modules >>> + PATH=/sbin rmmod scsi_mod >>> + unload_module ib_ipoib >>> + grep -q ^ib_ipoib /proc/modules >>> + PATH=/sbin rmmod ib_ipoib >>> + echo -n ib_ipoib >>> ib_ipoib+ cat /etc/modulerc >>> + read i >>> + /sbin/detect -q >>> + read i >>> + unload_module uhci_hcd >>> + grep -q ^uhci_hcd /proc/modules >>> + read i >>> + unload_module uhci-hcd >>> + grep -q ^uhci-hcd /proc/modules >>> + read i >>> + unload_module ehci-hcd >>> + grep -q ^ehci-hcd /proc/modules >>> + read i >>> + unload_module ata_piix >>> + grep -q ^ata_piix /proc/modules >>> + PATH=/sbin rmmod ata_piix >>> + echo -n ata_piix >>> ata_piix+ read i >>> + unload_module piix >>> + grep -q ^piix /proc/modules >>> + PATH=/sbin remmod piix >>> + read i >>> + unload_module ib_mthca >>> + grep -q ^ib_mthca /proc/modules >>> + PATH=/sbin rmmod ib_mthca >>> + echo -n ib_mthca >>> ib_mthca+ read i >>> + unload_module tg3 >>> + grep -q ^tg3 /proc/modules >>> + PATH=/sbin rmmod tg3 >>> + echo -n tg3 >>> tg3+ read i >>> + unload_module tg3 >>> + grep -q ^tg3 /proc/modules >>> + read i >>> + echo >>> Total provision time: 6 s >>> Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff >>> Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could >>> not ..... >>> Has anyone seen this before? any comment is appreciated. >>> thanks, >>> PN >>> >>> ------------------------------------------------------------------------ >>> _______________________________________________ >>> Warewulf mailing list >>> Warewulf at caoslinux.org >>> http://lists.caosity.org/mailman/listinfo/warewulf >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Warewulf mailing list >>> Warewulf at caoslinux.org >>> http://lists.caosity.org/mailman/listinfo/warewulf >>> >> >> _______________________________________________ >> Warewulf mailing list >> Warewulf at caoslinux.org >> http://lists.caosity.org/mailman/listinfo/warewulf >> >> > > From poknam at gmail.com Tue Mar 11 00:31:33 2008 From: poknam at gmail.com (PN) Date: Tue, 11 Mar 2008 15:31:33 +0800 Subject: [Warewulf] perceus with xen In-Reply-To: <004201c882cc$059c0770$6400a8c0@terminal209> References: <92daa7bf0803092357m680ef056la13e51e98f24b06b@mail.gmail.com> <004201c882cc$059c0770$6400a8c0@terminal209> Message-ID: <92daa7bf0803110031t49d2a8t690cd0c1de462dd5@mail.gmail.com> i haven't tried Caos yet. i have a VM using xen, also centos 5.1, that can successfully boot up (stateful). i don't know why it can't boot up with perceus. anyway, i will try it with perceus 1.3.6 and see what happen. thanks, PN 2008/3/11, Arthur Stevens : > > Perceus now fully supports KVM and VMware and still supports Xen. With > Abstractual, you get full integration of VM's over the cluster with the > ability to migrate. Sounds like a Centos issue here, and we try not to > support Centos ;) > > What happens when you try it with Caos NSA? > > Arthur > > ----- Original Message ----- > *From:* PN > *To:* The Warewulf Cluster Toolkit > *Sent:* Sunday, March 09, 2008 11:57 PM > *Subject:* [Warewulf] perceus with xen > > > has anyone tried perceus with xen? > previously i tried perceus 1.3.6 and 5.1 kernel without problem. > now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. > however when the client node bootup, it shows > > Now provisiong: node0001 > VNFS: GT8000 > Group: cluster > Node ID: 00:14:25:00:04 > + cat /found_nics > + ifconfig eth0 down > + ifconfig eth1 down > + ifconfig ib0 down > + ifconfig ib1 down > + [ ! -f /sbin/detect ] > + . /etc/functions > + . /etc/initramfs.conf > + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 > + MAX_TRIES=5 > + echo Un-loading device drivers: > + echo -ne \-> > \->+ unload_module scsi_mod > + grep -q ^scsi_mod /proc/modules > + PATH=/sbin rmmod scsi_mod > + unload_module ib_ipoib > + grep -q ^ib_ipoib /proc/modules > + PATH=/sbin rmmod ib_ipoib > + echo -n ib_ipoib > ib_ipoib+ cat /etc/modulerc > + read i > + /sbin/detect -q > + read i > + unload_module uhci_hcd > + grep -q ^uhci_hcd /proc/modules > + read i > + unload_module uhci-hcd > + grep -q ^uhci-hcd /proc/modules > + read i > + unload_module ehci-hcd > + grep -q ^ehci-hcd /proc/modules > + read i > + unload_module ata_piix > + grep -q ^ata_piix /proc/modules > + PATH=/sbin rmmod ata_piix > + echo -n ata_piix > ata_piix+ read i > + unload_module piix > + grep -q ^piix /proc/modules > + PATH=/sbin remmod piix > + read i > + unload_module ib_mthca > + grep -q ^ib_mthca /proc/modules > + PATH=/sbin rmmod ib_mthca > + echo -n ib_mthca > ib_mthca+ read i > + unload_module tg3 > + grep -q ^tg3 /proc/modules > + PATH=/sbin rmmod tg3 > + echo -n tg3 > tg3+ read i > + unload_module tg3 > + grep -q ^tg3 /proc/modules > + read i > + echo > > Total provision time: 6 s > > Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff > Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could not ..... > > Has anyone seen this before? any comment is appreciated. > > thanks, > PN > > ------------------------------ > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080311/52fb33eb/attachment.html From weizju at gmail.com Wed Mar 19 05:55:54 2008 From: weizju at gmail.com (weil) Date: Wed, 19 Mar 2008 20:55:54 +0800 Subject: [Warewulf] How to Install an OS to disk (stateful) using perceus? Message-ID: <9212b37b0803190555g1a3ab724reffca0afee40d564@mail.gmail.com> HI, Everybody, I am really a fresh man in preceus, so could any body tell me how to install an OS do disk(statefull) using percues? I can boot the client node successfully using preceus 1.3.6 now, but the os kernel and related applications are in the ramdisk. How could I persist all of them into the disk, so I need not to get the os image from network when I reboot the cliend node. Thanks! Wei,Li -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080319/0db7d676/attachment.html From gmkurtzer at gmail.com Wed Mar 19 13:28:00 2008 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Wed, 19 Mar 2008 13:28:00 -0700 Subject: [Warewulf] How to Install an OS to disk (stateful) using perceus? In-Reply-To: <9212b37b0803190555g1a3ab724reffca0afee40d564@mail.gmail.com> References: <9212b37b0803190555g1a3ab724reffca0afee40d564@mail.gmail.com> Message-ID: <571f1a060803191328y2e107535q272d8326c5d11d3f@mail.gmail.com> We have done some custom VNFS capsules for people that do this, but nothing that we have released as it was specific to the specific setup and operating system. When we have something suitable for releasing, are you open to testing? Thanks, Greg On Wed, Mar 19, 2008 at 5:55 AM, weil wrote: > HI, Everybody, > > I am really a fresh man in preceus, so could any body tell me how to install > an OS do disk(statefull) using percues? > > I can boot the client node successfully using preceus 1.3.6 now, but the os > kernel and related applications are in the ramdisk. How could I persist all > of them into the disk, so I need not to get the os image from network when I > reboot the cliend node. > > > Thanks! > Wei,Li > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > -- Greg Kurtzer http://www.runlevelzero.net/ From weizju at gmail.com Wed Mar 19 20:00:18 2008 From: weizju at gmail.com (weil) Date: Thu, 20 Mar 2008 11:00:18 +0800 Subject: [Warewulf] How to Install an OS to disk (stateful) using Message-ID: <9212b37b0803192000k1b52722cm9a6cc127fb4643c8@mail.gmail.com> Hi Greg, Thanks very much for your kindly reply, we are very eager to get some custom VNFS capsules for test. So you could give me a link to download those capsules and tell me the specific setup and operating system? Thank! Weil -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080320/8c59f4fc/attachment.html From weizju at gmail.com Wed Mar 19 20:47:41 2008 From: weizju at gmail.com (weil) Date: Thu, 20 Mar 2008 11:47:41 +0800 Subject: [Warewulf] How to Install an OS to disk (stateful) using perceus? Message-ID: <9212b37b0803192047l2cb83727x78079d646b977ecd@mail.gmail.com> Hi Greg, Thanks very much for your kindly reply, we are very eager to get some custom VNFS capsules for test. So you could give me a link to download those capsules and tell me the specific setup and operating system? Thank! Weil -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080320/5abcb5ee/attachment.html From pscadmin at avalon.umaryland.edu Thu Mar 20 05:03:43 2008 From: pscadmin at avalon.umaryland.edu (pscadmin) Date: Thu, 20 Mar 2008 08:03:43 -0400 Subject: [Warewulf] rocks cluster to perceus In-Reply-To: References: Message-ID: <47E2529F.1020600@avalon.umaryland.edu> Hello, I heard that you guys have some utility that will rollover rocks to perceus. Where I can get it? Also, I wonder if there is support for perceus on Solaris. Thanks, psc From astevens at gravitypark.com Thu Mar 20 10:35:03 2008 From: astevens at gravitypark.com (Arthur Stevens) Date: Thu, 20 Mar 2008 10:35:03 -0700 Subject: [Warewulf] rocks cluster to perceus References: <47E2529F.1020600@avalon.umaryland.edu> Message-ID: <001401c88ab0$bb1b60f0$cb00a8c0@terminal209> We can help out with both. The conversion from Rocks to Perceus is pretty slick. I have been seeing a lot of it, even from past Rocks vendors and tell you the truth, we love it :) We can get you signed up for our partner program at Infiscale and get you access to the conversion scripts and support. The Sun support is a bit tricky as we have not been doing as much with Sun anymore since Sun decided to not sopport us (or apparently anyone else we have talked to) because we are based on Linux and they are only supporting 100% Solaris. They did not give us the test gear nor software support needed and promised unfortunately. If you have an existing Sun setup we can still help, but if looking to aquire, unless your getting gear that they got from Intel EPSD or a trusted vendor, I would look elsewhere imho.. We can support Solaris, but currently not recommending the use of their Geared for Solaris/Windows hardware that often requires closed source Nvidia drivers anyways. We still support Solaris the OS, just not the Sun hardware. Solaris support is currently a commercial offering as everything involved is much more time consuming. Right now we are recommending running Solaris in a VM or if possible, converting to Linux. :) I will private mail Infiscale contact information to you. Thanks, Arthur ----- Original Message ----- From: "pscadmin" To: Sent: Thursday, March 20, 2008 5:03 AM Subject: [Warewulf] rocks cluster to perceus > Hello, I heard that you guys have some utility that will rollover rocks > to perceus. Where I can get it? Also, I wonder if there is support for > perceus on Solaris. > > Thanks, > psc > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > From poknam at gmail.com Tue Mar 25 01:49:06 2008 From: poknam at gmail.com (PN) Date: Tue, 25 Mar 2008 16:49:06 +0800 Subject: [Warewulf] perceus with xen In-Reply-To: <92daa7bf0803110031t49d2a8t690cd0c1de462dd5@mail.gmail.com> References: <92daa7bf0803092357m680ef056la13e51e98f24b06b@mail.gmail.com> <004201c882cc$059c0770$6400a8c0@terminal209> <92daa7bf0803110031t49d2a8t690cd0c1de462dd5@mail.gmail.com> Message-ID: <92daa7bf0803250149o3bbaae96t502b14075336ecfe@mail.gmail.com> perceus 1.3.6 provides the same result as 1.3.7. after some investigations, i find out that perceus misuses the vmlinuz-2.6.18-53.el5xen as the vm's kernel. here is the grub for the dom0 machine: ... title CentOS (2.6.18-53.el5xen) root (hd0,0) kernel /xen.gz-2.6.18-53.el5 module /vmlinuz-2.6.18-53.el5xen ro root=LABEL=/ rhgb quiet module /initrd-2.6.18-53.el5xen.img title CentOS-base (2.6.18-53.el5) root (hd0,0) kernel /vmlinuz-2.6.18-53.el5 ro root=LABEL=/ rhgb quiet initrd /initrd-2.6.18-53.el5.img however i can't find the grub.conf under my compute node's image. anyone can tell me how to change the boot settings? thanks, PN 2008/3/11, PN : > > i haven't tried Caos yet. > i have a VM using xen, also centos 5.1, that can successfully boot up > (stateful). > i don't know why it can't boot up with perceus. > anyway, i will try it with perceus 1.3.6 and see what happen. > > thanks, > PN > > > 2008/3/11, Arthur Stevens : > > > > Perceus now fully supports KVM and VMware and still supports Xen. With > > Abstractual, you get full integration of VM's over the cluster with the > > ability to migrate. Sounds like a Centos issue here, and we try not to > > support Centos ;) > > > > What happens when you try it with Caos NSA? > > > > Arthur > > > > ----- Original Message ----- > > *From:* PN > > *To:* The Warewulf Cluster Toolkit > > *Sent:* Sunday, March 09, 2008 11:57 PM > > *Subject:* [Warewulf] perceus with xen > > > > > > has anyone tried perceus with xen? > > previously i tried perceus 1.3.6 and 5.1 kernel without problem. > > now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. > > however when the client node bootup, it shows > > > > Now provisiong: node0001 > > VNFS: GT8000 > > Group: cluster > > Node ID: 00:14:25:00:04 > > + cat /found_nics > > + ifconfig eth0 down > > + ifconfig eth1 down > > + ifconfig ib0 down > > + ifconfig ib1 down > > + [ ! -f /sbin/detect ] > > + . /etc/functions > > + . /etc/initramfs.conf > > + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 > > + MAX_TRIES=5 > > + echo Un-loading device drivers: > > + echo -ne \-> > > \->+ unload_module scsi_mod > > + grep -q ^scsi_mod /proc/modules > > + PATH=/sbin rmmod scsi_mod > > + unload_module ib_ipoib > > + grep -q ^ib_ipoib /proc/modules > > + PATH=/sbin rmmod ib_ipoib > > + echo -n ib_ipoib > > ib_ipoib+ cat /etc/modulerc > > + read i > > + /sbin/detect -q > > + read i > > + unload_module uhci_hcd > > + grep -q ^uhci_hcd /proc/modules > > + read i > > + unload_module uhci-hcd > > + grep -q ^uhci-hcd /proc/modules > > + read i > > + unload_module ehci-hcd > > + grep -q ^ehci-hcd /proc/modules > > + read i > > + unload_module ata_piix > > + grep -q ^ata_piix /proc/modules > > + PATH=/sbin rmmod ata_piix > > + echo -n ata_piix > > ata_piix+ read i > > + unload_module piix > > + grep -q ^piix /proc/modules > > + PATH=/sbin remmod piix > > + read i > > + unload_module ib_mthca > > + grep -q ^ib_mthca /proc/modules > > + PATH=/sbin rmmod ib_mthca > > + echo -n ib_mthca > > ib_mthca+ read i > > + unload_module tg3 > > + grep -q ^tg3 /proc/modules > > + PATH=/sbin rmmod tg3 > > + echo -n tg3 > > tg3+ read i > > + unload_module tg3 > > + grep -q ^tg3 /proc/modules > > + read i > > + echo > > > > Total provision time: 6 s > > > > Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff > > Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could not > > ..... > > > > Has anyone seen this before? any comment is appreciated. > > > > thanks, > > PN > > > > ------------------------------ > > > > _______________________________________________ > > Warewulf mailing list > > Warewulf at caoslinux.org > > http://lists.caosity.org/mailman/listinfo/warewulf > > > > > > _______________________________________________ > > Warewulf mailing list > > Warewulf at caoslinux.org > > http://lists.caosity.org/mailman/listinfo/warewulf > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080325/91db9216/attachment.html From tegner at nada.kth.se Tue Mar 25 04:37:38 2008 From: tegner at nada.kth.se (tegner at nada.kth.se) Date: Tue, 25 Mar 2008 12:37:38 +0100 (MET) Subject: [Warewulf] Costum made network driver Message-ID: <27625.150.227.15.253.1206445058.squirrel@webmail.csc.kth.se> Hi all, I'm new to this so please forgive me if this is a stupid question that has already been answered (didn't find a good way to search the archives). I'm using centos 5.1 (and perceus 1.3.7). The driver-module (r8169.ko, included in the distribution) for the nic on my nodes is not working, and I need to build my own. In order for this to work, is it enough to build this driver (r8169) and put it in the VNFS image? Regards, /jon From pscadmin at avalon.umaryland.edu Tue Mar 25 05:06:29 2008 From: pscadmin at avalon.umaryland.edu (pscadmin) Date: Tue, 25 Mar 2008 08:06:29 -0400 Subject: [Warewulf] Warewulf Digest, Vol 39, Issue 14 In-Reply-To: References: Message-ID: <47E8EAC5.9090003@avalon.umaryland.edu> I have a dumb questions: Why do people need xen or vmware on clusters -- doesn't it defeat the purpose of clustering, which is harvest computing power across computing nodes? warewulf-request at caoslinux.org wrote: > Send Warewulf mailing list submissions to > warewulf at caoslinux.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.caosity.org/mailman/listinfo/warewulf > or, via email, send a message with subject or body 'help' to > warewulf-request at caoslinux.org > > You can reach the person managing the list at > warewulf-owner at caoslinux.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Warewulf digest..." > > > Today's Topics: > > 1. Re: perceus with xen (PN) > 2. Costum made network driver (tegner at nada.kth.se) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 25 Mar 2008 16:49:06 +0800 > From: PN > Subject: Re: [Warewulf] perceus with xen > To: "The Warewulf Cluster Toolkit" > Message-ID: > <92daa7bf0803250149o3bbaae96t502b14075336ecfe at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > perceus 1.3.6 provides the same result as 1.3.7. > > after some investigations, i find out that perceus misuses the > vmlinuz-2.6.18-53.el5xen as the vm's kernel. > > here is the grub for the dom0 machine: > ... > title CentOS (2.6.18-53.el5xen) > root (hd0,0) > kernel /xen.gz-2.6.18-53.el5 > module /vmlinuz-2.6.18-53.el5xen ro root=LABEL=/ rhgb quiet > module /initrd-2.6.18-53.el5xen.img > title CentOS-base (2.6.18-53.el5) > root (hd0,0) > kernel /vmlinuz-2.6.18-53.el5 ro root=LABEL=/ rhgb quiet > initrd /initrd-2.6.18-53.el5.img > > however i can't find the grub.conf under my compute node's image. > anyone can tell me how to change the boot settings? > > thanks, > PN > > 2008/3/11, PN : > >> i haven't tried Caos yet. >> i have a VM using xen, also centos 5.1, that can successfully boot up >> (stateful). >> i don't know why it can't boot up with perceus. >> anyway, i will try it with perceus 1.3.6 and see what happen. >> >> thanks, >> PN >> >> >> 2008/3/11, Arthur Stevens : >> >>> Perceus now fully supports KVM and VMware and still supports Xen. With >>> Abstractual, you get full integration of VM's over the cluster with the >>> ability to migrate. Sounds like a Centos issue here, and we try not to >>> support Centos ;) >>> >>> What happens when you try it with Caos NSA? >>> >>> Arthur >>> >>> ----- Original Message ----- >>> *From:* PN >>> *To:* The Warewulf Cluster Toolkit >>> *Sent:* Sunday, March 09, 2008 11:57 PM >>> *Subject:* [Warewulf] perceus with xen >>> >>> >>> has anyone tried perceus with xen? >>> previously i tried perceus 1.3.6 and 5.1 kernel without problem. >>> now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. >>> however when the client node bootup, it shows >>> >>> Now provisiong: node0001 >>> VNFS: GT8000 >>> Group: cluster >>> Node ID: 00:14:25:00:04 >>> + cat /found_nics >>> + ifconfig eth0 down >>> + ifconfig eth1 down >>> + ifconfig ib0 down >>> + ifconfig ib1 down >>> + [ ! -f /sbin/detect ] >>> + . /etc/functions >>> + . /etc/initramfs.conf >>> + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 >>> + MAX_TRIES=5 >>> + echo Un-loading device drivers: >>> + echo -ne \-> >>> \->+ unload_module scsi_mod >>> + grep -q ^scsi_mod /proc/modules >>> + PATH=/sbin rmmod scsi_mod >>> + unload_module ib_ipoib >>> + grep -q ^ib_ipoib /proc/modules >>> + PATH=/sbin rmmod ib_ipoib >>> + echo -n ib_ipoib >>> ib_ipoib+ cat /etc/modulerc >>> + read i >>> + /sbin/detect -q >>> + read i >>> + unload_module uhci_hcd >>> + grep -q ^uhci_hcd /proc/modules >>> + read i >>> + unload_module uhci-hcd >>> + grep -q ^uhci-hcd /proc/modules >>> + read i >>> + unload_module ehci-hcd >>> + grep -q ^ehci-hcd /proc/modules >>> + read i >>> + unload_module ata_piix >>> + grep -q ^ata_piix /proc/modules >>> + PATH=/sbin rmmod ata_piix >>> + echo -n ata_piix >>> ata_piix+ read i >>> + unload_module piix >>> + grep -q ^piix /proc/modules >>> + PATH=/sbin remmod piix >>> + read i >>> + unload_module ib_mthca >>> + grep -q ^ib_mthca /proc/modules >>> + PATH=/sbin rmmod ib_mthca >>> + echo -n ib_mthca >>> ib_mthca+ read i >>> + unload_module tg3 >>> + grep -q ^tg3 /proc/modules >>> + PATH=/sbin rmmod tg3 >>> + echo -n tg3 >>> tg3+ read i >>> + unload_module tg3 >>> + grep -q ^tg3 /proc/modules >>> + read i >>> + echo >>> >>> Total provision time: 6 s >>> >>> Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff >>> Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could not >>> ..... >>> >>> Has anyone seen this before? any comment is appreciated. >>> >>> thanks, >>> PN >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> Warewulf mailing list >>> Warewulf at caoslinux.org >>> http://lists.caosity.org/mailman/listinfo/warewulf >>> >>> >>> _______________________________________________ >>> Warewulf mailing list >>> Warewulf at caoslinux.org >>> http://lists.caosity.org/mailman/listinfo/warewulf >>> >>> >>> > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://lists.caosity.org/pipermail/warewulf/attachments/20080325/91db9216/attachment-0001.html > > ------------------------------ > > Message: 2 > Date: Tue, 25 Mar 2008 12:37:38 +0100 (MET) > From: tegner at nada.kth.se > Subject: [Warewulf] Costum made network driver > To: warewulf at caoslinux.org > Message-ID: > <27625.150.227.15.253.1206445058.squirrel at webmail.csc.kth.se> > Content-Type: text/plain;charset=iso-8859-1 > > Hi all, > > I'm new to this so please forgive me if this is a stupid question that has > already been answered (didn't find a good way to search the archives). > > I'm using centos 5.1 (and perceus 1.3.7). The driver-module (r8169.ko, > included in the distribution) for the nic on my nodes is not working, and > I need to build my own. In order for this to work, is it enough to build > this driver (r8169) and put it in the VNFS image? > > Regards, > > /jon > > > > > ------------------------------ > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > > End of Warewulf Digest, Vol 39, Issue 14 > **************************************** > From griznog at gmail.com Tue Mar 25 06:55:29 2008 From: griznog at gmail.com (John Hanks) Date: Tue, 25 Mar 2008 07:55:29 -0600 Subject: [Warewulf] Warewulf Digest, Vol 39, Issue 14 In-Reply-To: <47E8EAC5.9090003@avalon.umaryland.edu> References: <47E8EAC5.9090003@avalon.umaryland.edu> Message-ID: I can give some examples, real and some hypothetical. One of our users has a proprietary application that, given a 4 core node performs better with 4 instances running in 4 VMs than it does on 1 4 core system. After benchmarking this case, it was clear that VMs were the solution (and a special purpose cluster). We have a faculty who wants his problem to run on hundreds of nodes, but doesn't care how fast they are and needs very little memory, I only have 64 nodes but I do have 256 cores. If I could boot my nodes with xen, I could give him a higher number of nodes using only a fraction of the total cluster. Suppose I have a mix of serial and parallel jobs, with the serial jobs being low priority but not being written in such a way as they can do their own checkpointing. If my real nodes run xen and my compute nodes are virtual, I can suspend low priority serial jobs by suspending the serial VM nodes when a parallel job needs to run, then resume them when the parallel jobs are finished. Combine this with preemption and you have a very elegant solution that allows high priority jobs to always start right away AND can bring your total utilization closer to 100% because you never need to drain nodes to start a big parallel job. For those users who want a 3 month queue time, submitting their job as an entire VM that can be suspended/resumed at the end of each $YOURFAVORITEQUEUEMAXTIME makes both of you much happier. And they get checkpointing as a side effect. Emergency maintenance: 1. Suspend all VMs. 2. Do maintenance. 3. Resume all VMs. (Note there was nothing there about losing jobs, notifying users of a prolonged emergency outage, waiting for queues to drain, etc...) Last, with VMs I could more easily run a Windows based cluster for the handful of users who need to run a windows only app (like a special proprietary DLL for matlab, for instance). This'd make it easy to bring the windows cluster up/down as needed. I think the benefits of cluster virtualization far outweight the performance penalty of virtualization, especially given that the performance penalty continues to drop as virtualization gets better. If I were the perceus developers I'd have an option for the perceus kernel to be a [xen|kvm|vmwareesx|???] kernel and not bother with the kexec at all on real nodes, only kexec and provision virtual nodes. jbh On Tue, Mar 25, 2008 at 6:06 AM, pscadmin wrote: > I have a dumb questions: Why do people need xen or vmware on clusters -- > doesn't it defeat the purpose of clustering, which is harvest computing > power across computing nodes? > > warewulf-request at caoslinux.org wrote: > > Send Warewulf mailing list submissions to > > warewulf at caoslinux.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://lists.caosity.org/mailman/listinfo/warewulf > > or, via email, send a message with subject or body 'help' to > > warewulf-request at caoslinux.org > > > > You can reach the person managing the list at > > warewulf-owner at caoslinux.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Warewulf digest..." > > > > > > Today's Topics: > > > > 1. Re: perceus with xen (PN) > > 2. Costum made network driver (tegner at nada.kth.se) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Tue, 25 Mar 2008 16:49:06 +0800 > > From: PN > > Subject: Re: [Warewulf] perceus with xen > > To: "The Warewulf Cluster Toolkit" > > Message-ID: > > <92daa7bf0803250149o3bbaae96t502b14075336ecfe at mail.gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > > perceus 1.3.6 provides the same result as 1.3.7. > > > > after some investigations, i find out that perceus misuses the > > vmlinuz-2.6.18-53.el5xen as the vm's kernel. > > > > here is the grub for the dom0 machine: > > ... > > title CentOS (2.6.18-53.el5xen) > > root (hd0,0) > > kernel /xen.gz-2.6.18-53.el5 > > module /vmlinuz-2.6.18-53.el5xen ro root=LABEL=/ rhgb quiet > > module /initrd-2.6.18-53.el5xen.img > > title CentOS-base (2.6.18-53.el5) > > root (hd0,0) > > kernel /vmlinuz-2.6.18-53.el5 ro root=LABEL=/ rhgb quiet > > initrd /initrd-2.6.18-53.el5.img > > > > however i can't find the grub.conf under my compute node's image. > > anyone can tell me how to change the boot settings? > > > > thanks, > > PN > > > > 2008/3/11, PN : > > > >> i haven't tried Caos yet. > >> i have a VM using xen, also centos 5.1, that can successfully boot up > >> (stateful). > >> i don't know why it can't boot up with perceus. > >> anyway, i will try it with perceus 1.3.6 and see what happen. > >> > >> thanks, > >> PN > >> > >> > >> 2008/3/11, Arthur Stevens : > >> > >>> Perceus now fully supports KVM and VMware and still supports Xen. With > >>> Abstractual, you get full integration of VM's over the cluster with the > >>> ability to migrate. Sounds like a Centos issue here, and we try not to > >>> support Centos ;) > >>> > >>> What happens when you try it with Caos NSA? > >>> > >>> Arthur > >>> > >>> ----- Original Message ----- > >>> *From:* PN > >>> *To:* The Warewulf Cluster Toolkit > >>> *Sent:* Sunday, March 09, 2008 11:57 PM > >>> *Subject:* [Warewulf] perceus with xen > >>> > >>> > >>> has anyone tried perceus with xen? > >>> previously i tried perceus 1.3.6 and 5.1 kernel without problem. > >>> now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. > >>> however when the client node bootup, it shows > >>> > >>> Now provisiong: node0001 > >>> VNFS: GT8000 > >>> Group: cluster > >>> Node ID: 00:14:25:00:04 > >>> + cat /found_nics > >>> + ifconfig eth0 down > >>> + ifconfig eth1 down > >>> + ifconfig ib0 down > >>> + ifconfig ib1 down > >>> + [ ! -f /sbin/detect ] > >>> + . /etc/functions > >>> + . /etc/initramfs.conf > >>> + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 > >>> + MAX_TRIES=5 > >>> + echo Un-loading device drivers: > >>> + echo -ne \-> > >>> \->+ unload_module scsi_mod > >>> + grep -q ^scsi_mod /proc/modules > >>> + PATH=/sbin rmmod scsi_mod > >>> + unload_module ib_ipoib > >>> + grep -q ^ib_ipoib /proc/modules > >>> + PATH=/sbin rmmod ib_ipoib > >>> + echo -n ib_ipoib > >>> ib_ipoib+ cat /etc/modulerc > >>> + read i > >>> + /sbin/detect -q > >>> + read i > >>> + unload_module uhci_hcd > >>> + grep -q ^uhci_hcd /proc/modules > >>> + read i > >>> + unload_module uhci-hcd > >>> + grep -q ^uhci-hcd /proc/modules > >>> + read i > >>> + unload_module ehci-hcd > >>> + grep -q ^ehci-hcd /proc/modules > >>> + read i > >>> + unload_module ata_piix > >>> + grep -q ^ata_piix /proc/modules > >>> + PATH=/sbin rmmod ata_piix > >>> + echo -n ata_piix > >>> ata_piix+ read i > >>> + unload_module piix > >>> + grep -q ^piix /proc/modules > >>> + PATH=/sbin remmod piix > >>> + read i > >>> + unload_module ib_mthca > >>> + grep -q ^ib_mthca /proc/modules > >>> + PATH=/sbin rmmod ib_mthca > >>> + echo -n ib_mthca > >>> ib_mthca+ read i > >>> + unload_module tg3 > >>> + grep -q ^tg3 /proc/modules > >>> + PATH=/sbin rmmod tg3 > >>> + echo -n tg3 > >>> tg3+ read i > >>> + unload_module tg3 > >>> + grep -q ^tg3 /proc/modules > >>> + read i > >>> + echo > >>> > >>> Total provision time: 6 s > >>> > >>> Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff > >>> Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could not > >>> ..... > >>> > >>> Has anyone seen this before? any comment is appreciated. > >>> > >>> thanks, > >>> PN > >>> > >>> ------------------------------ > >>> > >>> _______________________________________________ > >>> Warewulf mailing list > >>> Warewulf at caoslinux.org > >>> http://lists.caosity.org/mailman/listinfo/warewulf > >>> > >>> > >>> _______________________________________________ > >>> Warewulf mailing list > >>> Warewulf at caoslinux.org > >>> http://lists.caosity.org/mailman/listinfo/warewulf > >>> > >>> > >>> > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: http://lists.caosity.org/pipermail/warewulf/attachments/20080325/91db9216/attachment-0001.html > > > > ------------------------------ > > > > Message: 2 > > Date: Tue, 25 Mar 2008 12:37:38 +0100 (MET) > > From: tegner at nada.kth.se > > Subject: [Warewulf] Costum made network driver > > To: warewulf at caoslinux.org > > Message-ID: > > <27625.150.227.15.253.1206445058.squirrel at webmail.csc.kth.se> > > Content-Type: text/plain;charset=iso-8859-1 > > > > Hi all, > > > > I'm new to this so please forgive me if this is a stupid question that has > > already been answered (didn't find a good way to search the archives). > > > > I'm using centos 5.1 (and perceus 1.3.7). The driver-module (r8169.ko, > > included in the distribution) for the nic on my nodes is not working, and > > I need to build my own. In order for this to work, is it enough to build > > this driver (r8169) and put it in the VNFS image? > > > > Regards, > > > > /jon > > > > > > > > > > ------------------------------ > > > > _______________________________________________ > > Warewulf mailing list > > Warewulf at caoslinux.org > > http://lists.caosity.org/mailman/listinfo/warewulf > > > > > > End of Warewulf Digest, Vol 39, Issue 14 > > **************************************** > > > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > From astevens at gravitypark.com Tue Mar 25 08:41:14 2008 From: astevens at gravitypark.com (Arthur Stevens) Date: Tue, 25 Mar 2008 08:41:14 -0700 Subject: [Warewulf] Warewulf Digest, Vol 39, Issue 14 References: <47E8EAC5.9090003@avalon.umaryland.edu> Message-ID: <008b01c88e8e$a8a00a20$6400a8c0@terminal209> Aside from over 75% of our userbase asking for it, their are a lot of reasons for it. As far as the Perceus/Infiscale team see's it, here are a few of the top chices for adding it. I also think they need KVM not Xen (mainly because I preffer true virtualization as I can kill an entire Xen paravirtualization system from having a shell on a single VM). :) So say you use the new IPMI.pmod, Green.pmod, virtualization.pmod, here is some slick stuff you can do. You can start a cluster with only 100 nodes and start all instances in VM's. As the load increases, the cluster can turn up more nodes and migrate the vm's to their own server or span them over multiple. So your not paying the power bill for 800 servers when only 100 are in use. Perceus is hitting the Enterprise like a freight train. This means you problably are not even HPC anymore yet using Perceus. This might be an ISP that wants to stick 50 customers on the same box like a godaddy or one of those places. You might also be a bank or a chemical plant that is using virtualization for remote terminals. in multiple locations. The virtualization is not a forced option, but a very well taken and very much requested feature. We first showed it off at the last Intel Developer Forum in Fan Francisco and have seen a drastic increase in not only adoption but full replacement of commercial offerings with it. Virtualization is here to stay, and Perceus/Abstractual take full advantage of what it can add to the bottom line :) Arthur ----- Original Message ----- From: "pscadmin" To: Sent: Tuesday, March 25, 2008 5:06 AM Subject: Re: [Warewulf] Warewulf Digest, Vol 39, Issue 14 >I have a dumb questions: Why do people need xen or vmware on clusters -- > doesn't it defeat the purpose of clustering, which is harvest computing > power across computing nodes? > > warewulf-request at caoslinux.org wrote: >> Send Warewulf mailing list submissions to >> warewulf at caoslinux.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://lists.caosity.org/mailman/listinfo/warewulf >> or, via email, send a message with subject or body 'help' to >> warewulf-request at caoslinux.org >> >> You can reach the person managing the list at >> warewulf-owner at caoslinux.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Warewulf digest..." >> >> >> Today's Topics: >> >> 1. Re: perceus with xen (PN) >> 2. Costum made network driver (tegner at nada.kth.se) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Tue, 25 Mar 2008 16:49:06 +0800 >> From: PN >> Subject: Re: [Warewulf] perceus with xen >> To: "The Warewulf Cluster Toolkit" >> Message-ID: >> <92daa7bf0803250149o3bbaae96t502b14075336ecfe at mail.gmail.com> >> Content-Type: text/plain; charset="utf-8" >> >> perceus 1.3.6 provides the same result as 1.3.7. >> >> after some investigations, i find out that perceus misuses the >> vmlinuz-2.6.18-53.el5xen as the vm's kernel. >> >> here is the grub for the dom0 machine: >> ... >> title CentOS (2.6.18-53.el5xen) >> root (hd0,0) >> kernel /xen.gz-2.6.18-53.el5 >> module /vmlinuz-2.6.18-53.el5xen ro root=LABEL=/ rhgb quiet >> module /initrd-2.6.18-53.el5xen.img >> title CentOS-base (2.6.18-53.el5) >> root (hd0,0) >> kernel /vmlinuz-2.6.18-53.el5 ro root=LABEL=/ rhgb quiet >> initrd /initrd-2.6.18-53.el5.img >> >> however i can't find the grub.conf under my compute node's image. >> anyone can tell me how to change the boot settings? >> >> thanks, >> PN >> >> 2008/3/11, PN : >> >>> i haven't tried Caos yet. >>> i have a VM using xen, also centos 5.1, that can successfully boot up >>> (stateful). >>> i don't know why it can't boot up with perceus. >>> anyway, i will try it with perceus 1.3.6 and see what happen. >>> >>> thanks, >>> PN >>> >>> >>> 2008/3/11, Arthur Stevens : >>> >>>> Perceus now fully supports KVM and VMware and still supports Xen. With >>>> Abstractual, you get full integration of VM's over the cluster with the >>>> ability to migrate. Sounds like a Centos issue here, and we try not to >>>> support Centos ;) >>>> >>>> What happens when you try it with Caos NSA? >>>> >>>> Arthur >>>> >>>> ----- Original Message ----- >>>> *From:* PN >>>> *To:* The Warewulf Cluster Toolkit >>>> *Sent:* Sunday, March 09, 2008 11:57 PM >>>> *Subject:* [Warewulf] perceus with xen >>>> >>>> >>>> has anyone tried perceus with xen? >>>> previously i tried perceus 1.3.6 and 5.1 kernel without problem. >>>> now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. >>>> however when the client node bootup, it shows >>>> >>>> Now provisiong: node0001 >>>> VNFS: GT8000 >>>> Group: cluster >>>> Node ID: 00:14:25:00:04 >>>> + cat /found_nics >>>> + ifconfig eth0 down >>>> + ifconfig eth1 down >>>> + ifconfig ib0 down >>>> + ifconfig ib1 down >>>> + [ ! -f /sbin/detect ] >>>> + . /etc/functions >>>> + . /etc/initramfs.conf >>>> + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 >>>> + MAX_TRIES=5 >>>> + echo Un-loading device drivers: >>>> + echo -ne \-> >>>> \->+ unload_module scsi_mod >>>> + grep -q ^scsi_mod /proc/modules >>>> + PATH=/sbin rmmod scsi_mod >>>> + unload_module ib_ipoib >>>> + grep -q ^ib_ipoib /proc/modules >>>> + PATH=/sbin rmmod ib_ipoib >>>> + echo -n ib_ipoib >>>> ib_ipoib+ cat /etc/modulerc >>>> + read i >>>> + /sbin/detect -q >>>> + read i >>>> + unload_module uhci_hcd >>>> + grep -q ^uhci_hcd /proc/modules >>>> + read i >>>> + unload_module uhci-hcd >>>> + grep -q ^uhci-hcd /proc/modules >>>> + read i >>>> + unload_module ehci-hcd >>>> + grep -q ^ehci-hcd /proc/modules >>>> + read i >>>> + unload_module ata_piix >>>> + grep -q ^ata_piix /proc/modules >>>> + PATH=/sbin rmmod ata_piix >>>> + echo -n ata_piix >>>> ata_piix+ read i >>>> + unload_module piix >>>> + grep -q ^piix /proc/modules >>>> + PATH=/sbin remmod piix >>>> + read i >>>> + unload_module ib_mthca >>>> + grep -q ^ib_mthca /proc/modules >>>> + PATH=/sbin rmmod ib_mthca >>>> + echo -n ib_mthca >>>> ib_mthca+ read i >>>> + unload_module tg3 >>>> + grep -q ^tg3 /proc/modules >>>> + PATH=/sbin rmmod tg3 >>>> + echo -n tg3 >>>> tg3+ read i >>>> + unload_module tg3 >>>> + grep -q ^tg3 /proc/modules >>>> + read i >>>> + echo >>>> >>>> Total provision time: 6 s >>>> >>>> Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff >>>> Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could not >>>> ..... >>>> >>>> Has anyone seen this before? any comment is appreciated. >>>> >>>> thanks, >>>> PN >>>> >>>> ------------------------------ >>>> >>>> _______________________________________________ >>>> Warewulf mailing list >>>> Warewulf at caoslinux.org >>>> http://lists.caosity.org/mailman/listinfo/warewulf >>>> >>>> >>>> _______________________________________________ >>>> Warewulf mailing list >>>> Warewulf at caoslinux.org >>>> http://lists.caosity.org/mailman/listinfo/warewulf >>>> >>>> >>>> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> http://lists.caosity.org/pipermail/warewulf/attachments/20080325/91db9216/attachment-0001.html >> >> ------------------------------ >> >> Message: 2 >> Date: Tue, 25 Mar 2008 12:37:38 +0100 (MET) >> From: tegner at nada.kth.se >> Subject: [Warewulf] Costum made network driver >> To: warewulf at caoslinux.org >> Message-ID: >> <27625.150.227.15.253.1206445058.squirrel at webmail.csc.kth.se> >> Content-Type: text/plain;charset=iso-8859-1 >> >> Hi all, >> >> I'm new to this so please forgive me if this is a stupid question that >> has >> already been answered (didn't find a good way to search the archives). >> >> I'm using centos 5.1 (and perceus 1.3.7). The driver-module (r8169.ko, >> included in the distribution) for the nic on my nodes is not working, and >> I need to build my own. In order for this to work, is it enough to build >> this driver (r8169) and put it in the VNFS image? >> >> Regards, >> >> /jon >> >> >> >> >> ------------------------------ >> >> _______________________________________________ >> Warewulf mailing list >> Warewulf at caoslinux.org >> http://lists.caosity.org/mailman/listinfo/warewulf >> >> >> End of Warewulf Digest, Vol 39, Issue 14 >> **************************************** >> > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > From pscadmin at avalon.umaryland.edu Wed Mar 26 05:42:14 2008 From: pscadmin at avalon.umaryland.edu (pscadmin) Date: Wed, 26 Mar 2008 08:42:14 -0400 Subject: [Warewulf] cluster virtualization In-Reply-To: References: Message-ID: <47EA44A6.7040404@avalon.umaryland.edu> Hello, would John or somebody please explain the difference between "real nodes run xen and compute nodes are virtual" / real node vs compute node? Does it mean that the "real" nodes running xen , which instances running virtual nodes? So, instead of one node , the frontend sees two or three? ... also, can I have only frontend, and the compute nodes setup as virtual? ... Is there some good place to read about cluster virtualization? I'm asking since the following seems to what we can use, but I can decipher it (I cannot see how the cluster would have to be setup), "Suppose I have a mix of serial and parallel jobs, with the serial jobs being low priority but not being written in such a way as they can do their own checkpointing. If my real nodes run xen and my compute nodes are virtual, I can suspend low priority serial jobs by suspending the serial VM nodes when a parallel job needs to run, then resume them when the parallel jobs are finished. Combine this with preemption and you have a very elegant solution that allows high priority jobs to always start right away AND can bring your total utilization closer to 100% because you never need to drain nodes to start a big parallel job." thx, psc > Date: Tue, 25 Mar 2008 07:55:29 -0600 > From: "John Hanks" > Subject: Re: [Warewulf] Warewulf Digest, Vol 39, Issue 14 > To: "The Warewulf Cluster Toolkit" > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > I can give some examples, real and some hypothetical. > > One of our users has a proprietary application that, given a 4 core > node performs better with 4 instances running in 4 VMs than it does on > 1 4 core system. After benchmarking this case, it was clear that VMs > were the solution (and a special purpose cluster). > > We have a faculty who wants his problem to run on hundreds of nodes, > but doesn't care how fast they are and needs very little memory, I > only have 64 nodes but I do have 256 cores. If I could boot my nodes > with xen, I could give him a higher number of nodes using only a > fraction of the total cluster. > > Suppose I have a mix of serial and parallel jobs, with the serial jobs > being low priority but not being written in such a way as they can do > their own checkpointing. If my real nodes run xen and my compute nodes > are virtual, I can suspend low priority serial jobs by suspending the > serial VM nodes when a parallel job needs to run, then resume them > when the parallel jobs are finished. Combine this with preemption and > you have a very elegant solution that allows high priority jobs to > always start right away AND can bring your total utilization closer to > 100% because you never need to drain nodes to start a big parallel > job. > > For those users who want a 3 month queue time, submitting their job as > an entire VM that can be suspended/resumed at the end of each > $YOURFAVORITEQUEUEMAXTIME makes both of you much happier. And they get > checkpointing as a side effect. > > Emergency maintenance: 1. Suspend all VMs. 2. Do maintenance. 3. > Resume all VMs. (Note there was nothing there about losing jobs, > notifying users of a prolonged emergency outage, waiting for queues to > drain, etc...) > > Last, with VMs I could more easily run a Windows based cluster for the > handful of users who need to run a windows only app (like a special > proprietary DLL for matlab, for instance). This'd make it easy to > bring the windows cluster up/down as needed. > > I think the benefits of cluster virtualization far outweight the > performance penalty of virtualization, especially given that the > performance penalty continues to drop as virtualization gets better. > If I were the perceus developers I'd have an option for the perceus > kernel to be a [xen|kvm|vmwareesx|???] kernel and not bother with the > kexec at all on real nodes, only kexec and provision virtual nodes. > > jbh > > On Tue, Mar 25, 2008 at 6:06 AM, pscadmin wrote: > >> I have a dumb questions: Why do people need xen or vmware on clusters -- >> doesn't it defeat the purpose of clustering, which is harvest computing >> power across computing nodes? >> >> warewulf-request at caoslinux.org wrote: >> > Send Warewulf mailing list submissions to >> > warewulf at caoslinux.org >> > >> > To subscribe or unsubscribe via the World Wide Web, visit >> > http://lists.caosity.org/mailman/listinfo/warewulf >> > or, via email, send a message with subject or body 'help' to >> > warewulf-request at caoslinux.org >> > >> > You can reach the person managing the list at >> > warewulf-owner at caoslinux.org >> > >> > When replying, please edit your Subject line so it is more specific >> > than "Re: Contents of Warewulf digest..." >> > >> > >> > Today's Topics: >> > >> > 1. Re: perceus with xen (PN) >> > 2. Costum made network driver (tegner at nada.kth.se) >> > >> > >> > ---------------------------------------------------------------------- >> > >> > Message: 1 >> > Date: Tue, 25 Mar 2008 16:49:06 +0800 >> > From: PN >> > Subject: Re: [Warewulf] perceus with xen >> > To: "The Warewulf Cluster Toolkit" >> > Message-ID: >> > <92daa7bf0803250149o3bbaae96t502b14075336ecfe at mail.gmail.com> >> > Content-Type: text/plain; charset="utf-8" >> > >> > perceus 1.3.6 provides the same result as 1.3.7. >> > >> > after some investigations, i find out that perceus misuses the >> > vmlinuz-2.6.18-53.el5xen as the vm's kernel. >> > >> > here is the grub for the dom0 machine: >> > ... >> > title CentOS (2.6.18-53.el5xen) >> > root (hd0,0) >> > kernel /xen.gz-2.6.18-53.el5 >> > module /vmlinuz-2.6.18-53.el5xen ro root=LABEL=/ rhgb quiet >> > module /initrd-2.6.18-53.el5xen.img >> > title CentOS-base (2.6.18-53.el5) >> > root (hd0,0) >> > kernel /vmlinuz-2.6.18-53.el5 ro root=LABEL=/ rhgb quiet >> > initrd /initrd-2.6.18-53.el5.img >> > >> > however i can't find the grub.conf under my compute node's image. >> > anyone can tell me how to change the boot settings? >> > >> > thanks, >> > PN >> > >> > 2008/3/11, PN : >> > >> >> i haven't tried Caos yet. >> >> i have a VM using xen, also centos 5.1, that can successfully boot up >> >> (stateful). >> >> i don't know why it can't boot up with perceus. >> >> anyway, i will try it with perceus 1.3.6 and see what happen. >> >> >> >> thanks, >> >> PN >> >> >> >> >> >> 2008/3/11, Arthur Stevens : >> >> >> >>> Perceus now fully supports KVM and VMware and still supports Xen. With >> >>> Abstractual, you get full integration of VM's over the cluster with the >> >>> ability to migrate. Sounds like a Centos issue here, and we try not to >> >>> support Centos ;) >> >>> >> >>> What happens when you try it with Caos NSA? >> >>> >> >>> Arthur >> >>> >> >>> ----- Original Message ----- >> >>> *From:* PN >> >>> *To:* The Warewulf Cluster Toolkit >> >>> *Sent:* Sunday, March 09, 2008 11:57 PM >> >>> *Subject:* [Warewulf] perceus with xen >> >>> >> >>> >> >>> has anyone tried perceus with xen? >> >>> previously i tried perceus 1.3.6 and 5.1 kernel without problem. >> >>> now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. >> >>> however when the client node bootup, it shows >> >>> >> >>> Now provisiong: node0001 >> >>> VNFS: GT8000 >> >>> Group: cluster >> >>> Node ID: 00:14:25:00:04 >> >>> + cat /found_nics >> >>> + ifconfig eth0 down >> >>> + ifconfig eth1 down >> >>> + ifconfig ib0 down >> >>> + ifconfig ib1 down >> >>> + [ ! -f /sbin/detect ] >> >>> + . /etc/functions >> >>> + . /etc/initramfs.conf >> >>> + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 >> >>> + MAX_TRIES=5 >> >>> + echo Un-loading device drivers: >> >>> + echo -ne \-> >> >>> \->+ unload_module scsi_mod >> >>> + grep -q ^scsi_mod /proc/modules >> >>> + PATH=/sbin rmmod scsi_mod >> >>> + unload_module ib_ipoib >> >>> + grep -q ^ib_ipoib /proc/modules >> >>> + PATH=/sbin rmmod ib_ipoib >> >>> + echo -n ib_ipoib >> >>> ib_ipoib+ cat /etc/modulerc >> >>> + read i >> >>> + /sbin/detect -q >> >>> + read i >> >>> + unload_module uhci_hcd >> >>> + grep -q ^uhci_hcd /proc/modules >> >>> + read i >> >>> + unload_module uhci-hcd >> >>> + grep -q ^uhci-hcd /proc/modules >> >>> + read i >> >>> + unload_module ehci-hcd >> >>> + grep -q ^ehci-hcd /proc/modules >> >>> + read i >> >>> + unload_module ata_piix >> >>> + grep -q ^ata_piix /proc/modules >> >>> + PATH=/sbin rmmod ata_piix >> >>> + echo -n ata_piix >> >>> ata_piix+ read i >> >>> + unload_module piix >> >>> + grep -q ^piix /proc/modules >> >>> + PATH=/sbin remmod piix >> >>> + read i >> >>> + unload_module ib_mthca >> >>> + grep -q ^ib_mthca /proc/modules >> >>> + PATH=/sbin rmmod ib_mthca >> >>> + echo -n ib_mthca >> >>> ib_mthca+ read i >> >>> + unload_module tg3 >> >>> + grep -q ^tg3 /proc/modules >> >>> + PATH=/sbin rmmod tg3 >> >>> + echo -n tg3 >> >>> tg3+ read i >> >>> + unload_module tg3 >> >>> + grep -q ^tg3 /proc/modules >> >>> + read i >> >>> + echo >> >>> >> >>> Total provision time: 6 s >> >>> >> >>> Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff >> >>> Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could not >> >>> ..... >> >>> >> >>> Has anyone seen this before? any comment is appreciated. >> >>> >> >>> thanks, >> >>> PN >> >>> >> >>> ------------------------------ >> >>> >> >>> _______________________________________________ >> >>> Warewulf mailing list >> >>> Warewulf at caoslinux.org >> >>> http://lists.caosity.org/mailman/listinfo/warewulf >> >>> >> >>> >> >>> _______________________________________________ >> >>> Warewulf mailing list >> >>> Warewulf at caoslinux.org >> >>> http://lists.caosity.org/mailman/listinfo/warewulf >> >>> >> >>> >> >>> >> > -------------- next part -------------- >> > An HTML attachment was scrubbed... >> > URL: http://lists.caosity.org/pipermail/warewulf/attachments/20080325/91db9216/attachment-0001.html >> > >> > ------------------------------ >> > >> > Message: 2 >> > Date: Tue, 25 Mar 2008 12:37:38 +0100 (MET) >> > From: tegner at nada.kth.se >> > Subject: [Warewulf] Costum made network driver >> > To: warewulf at caoslinux.org >> > Message-ID: >> > <27625.150.227.15.253.1206445058.squirrel at webmail.csc.kth.se> >> > Content-Type: text/plain;charset=iso-8859-1 >> > >> > Hi all, >> > >> > I'm new to this so please forgive me if this is a stupid question that has >> > already been answered (didn't find a good way to search the archives). >> > >> > I'm using centos 5.1 (and perceus 1.3.7). The driver-module (r8169.ko, >> > included in the distribution) for the nic on my nodes is not working, and >> > I need to build my own. In order for this to work, is it enough to build >> > this driver (r8169) and put it in the VNFS image? >> > >> > Regards, >> > >> > /jon >> > >> > >> > >> > >> > ------------------------------ >> > >> > _______________________________________________ >> > Warewulf mailing list >> > Warewulf at caoslinux.org >> > http://lists.caosity.org/mailman/listinfo/warewulf >> > >> > >> > End of Warewulf Digest, Vol 39, Issue 14 >> > **************************************** >> > >> >> _______________________________________________ >> Warewulf mailing list >> Warewulf at caoslinux.org >> http://lists.caosity.org/mailman/listinfo/warewulf >> >> > > > ------------------------------ > > Message: 3 > Date: Tue, 25 Mar 2008 08:41:14 -0700 > From: "Arthur Stevens" > Subject: Re: [Warewulf] Warewulf Digest, Vol 39, Issue 14 > To: "The Warewulf Cluster Toolkit" > Message-ID: <008b01c88e8e$a8a00a20$6400a8c0 at terminal209> > Content-Type: text/plain; format=flowed; charset="iso-8859-1"; > reply-type=original > > Aside from over 75% of our userbase asking for it, their are a lot of > reasons for it. As far as the Perceus/Infiscale team see's it, here are a > few of the top chices for adding it. I also think they need KVM not Xen > (mainly because I preffer true virtualization as I can kill an entire Xen > paravirtualization system from having a shell on a single VM). :) > > So say you use the new IPMI.pmod, Green.pmod, virtualization.pmod, here is > some slick stuff you can do. > > You can start a cluster with only 100 nodes and start all instances in VM's. > As the load increases, the cluster can turn up more nodes and migrate the > vm's to their own server or span them over multiple. So your not paying the > power bill for 800 servers when only 100 are in use. > > Perceus is hitting the Enterprise like a freight train. This means you > problably are not even HPC anymore yet using Perceus. This might be an ISP > that wants to stick 50 customers on the same box like a godaddy or one of > those places. > > You might also be a bank or a chemical plant that is using virtualization > for remote terminals. in multiple locations. > > The virtualization is not a forced option, but a very well taken and very > much requested feature. We first showed it off at the last Intel Developer > Forum in Fan Francisco and have seen a drastic increase in not only adoption > but full replacement of commercial offerings with it. > > Virtualization is here to stay, and Perceus/Abstractual take full advantage > of what it can add to the bottom line :) > > Arthur > > > ----- Original Message ----- > From: "pscadmin" > To: > Sent: Tuesday, March 25, 2008 5:06 AM > Subject: Re: [Warewulf] Warewulf Digest, Vol 39, Issue 14 > > > >> I have a dumb questions: Why do people need xen or vmware on clusters -- >> doesn't it defeat the purpose of clustering, which is harvest computing >> power across computing nodes? >> >> warewulf-request at caoslinux.org wrote: >> >>> Send Warewulf mailing list submissions to >>> warewulf at caoslinux.org >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> http://lists.caosity.org/mailman/listinfo/warewulf >>> or, via email, send a message with subject or body 'help' to >>> warewulf-request at caoslinux.org >>> >>> You can reach the person managing the list at >>> warewulf-owner at caoslinux.org >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of Warewulf digest..." >>> >>> >>> Today's Topics: >>> >>> 1. Re: perceus with xen (PN) >>> 2. Costum made network driver (tegner at nada.kth.se) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Tue, 25 Mar 2008 16:49:06 +0800 >>> From: PN >>> Subject: Re: [Warewulf] perceus with xen >>> To: "The Warewulf Cluster Toolkit" >>> Message-ID: >>> <92daa7bf0803250149o3bbaae96t502b14075336ecfe at mail.gmail.com> >>> Content-Type: text/plain; charset="utf-8" >>> >>> perceus 1.3.6 provides the same result as 1.3.7. >>> >>> after some investigations, i find out that perceus misuses the >>> vmlinuz-2.6.18-53.el5xen as the vm's kernel. >>> >>> here is the grub for the dom0 machine: >>> ... >>> title CentOS (2.6.18-53.el5xen) >>> root (hd0,0) >>> kernel /xen.gz-2.6.18-53.el5 >>> module /vmlinuz-2.6.18-53.el5xen ro root=LABEL=/ rhgb quiet >>> module /initrd-2.6.18-53.el5xen.img >>> title CentOS-base (2.6.18-53.el5) >>> root (hd0,0) >>> kernel /vmlinuz-2.6.18-53.el5 ro root=LABEL=/ rhgb quiet >>> initrd /initrd-2.6.18-53.el5.img >>> >>> however i can't find the grub.conf under my compute node's image. >>> anyone can tell me how to change the boot settings? >>> >>> thanks, >>> PN >>> >>> 2008/3/11, PN : >>> >>> >>>> i haven't tried Caos yet. >>>> i have a VM using xen, also centos 5.1, that can successfully boot up >>>> (stateful). >>>> i don't know why it can't boot up with perceus. >>>> anyway, i will try it with perceus 1.3.6 and see what happen. >>>> >>>> thanks, >>>> PN >>>> >>>> >>>> 2008/3/11, Arthur Stevens : >>>> >>>> >>>>> Perceus now fully supports KVM and VMware and still supports Xen. With >>>>> Abstractual, you get full integration of VM's over the cluster with the >>>>> ability to migrate. Sounds like a Centos issue here, and we try not to >>>>> support Centos ;) >>>>> >>>>> What happens when you try it with Caos NSA? >>>>> >>>>> Arthur >>>>> > From griznog at gmail.com Wed Mar 26 06:21:24 2008 From: griznog at gmail.com (John Hanks) Date: Wed, 26 Mar 2008 07:21:24 -0600 Subject: [Warewulf] cluster virtualization In-Reply-To: <47EA44A6.7040404@avalon.umaryland.edu> References: <47EA44A6.7040404@avalon.umaryland.edu> Message-ID: On Wed, Mar 26, 2008 at 6:42 AM, pscadmin wrote: > Hello, would John or somebody please explain the difference between > "real nodes run xen and compute nodes are virtual" / real node vs > compute node? Does it mean that the "real" nodes running xen , which > instances running virtual nodes? So, instead of one node , the frontend > sees two or three? Sorry, I more or less made up my own terminology as I went. "real node" == physical hardware node "xen node" or virtual node" == node running in a virtual machine on a real node So, in my view of this perceus would boot each real node with a VFNS xen, kvm, vmware esx, etc. Then the scheduler or some other application would boot virtual nodes as needed to fulfill job requests. > also, can I have only frontend, and the compute > nodes setup as virtual? ... You know, I hadn't really though of virtualizing the front end node as far as being the perceus master goes, but I have had the master run VMs for vnfs creation (when I was trying to build Redhat VNFS capsules). > Is there some good place to read about > cluster virtualization? If you find an answer to this, please post it. > I'm asking since the following seems to what we can use, but I can decipher it (I cannot see how the cluster would have to be setup), "Suppose I have a mix of serial and parallel jobs, with the serial jobs being low priority but not being written in such a way as they can do their own checkpointing. If my real nodes run xen and my compute nodes are virtual, I can suspend low priority serial jobs by suspending the serial VM nodes when a parallel job needs to run, then resume them when the parallel jobs are finished. Combine this with preemption and you have a very elegant solution that allows high priority jobs to always start right away AND can bring your total utilization closer to 100% because you never need to drain nodes to start a big parallel job." Here's how I'd see this working. Userclass A is a bunch of people doing monte carlo simulations, each running on a single CPU. They will never stop submitting jobs, and will happily let jobs run over and over accumulating data which they will analyze at some later date. Userclass B is a handful of people with large (relative to whatever cluster size you have available) MPI jobs. In addition to needing a lot of nodes/cores, they also like to run the app, immediately interpret the results then run again, needed near interactive turnaround times. So, the cluster starts (basically perceus starts and the scheduler is running) and immediately users from A submit a thousand serial jobs, each with max walltime. The scheduler powers on real nodes, they boot xen/kvm/vmware and then spawns a virtual node for each CPU core and starts as many serial jobs as will fit on the cluster. A few hours later, someone from B submits a job asking for 32 cores. The scheduler would then suspend-to-disk 32 of the virtual nodes running serial jobs and start virtual nodes (either 32 single cpu virtual nodes or some N = 32 / (number of cores per real node) number of virtual nodes) and allow to parallel job to run right away. Once the parallel job is done, the associated virtual nodes are destroyed and the suspended virtual nodes are resumed. Users from class A are happy, they are getting the absolute best throughput possible. Users from B are happy because their jobs always run right away without waiting for jobs/nodes to drain to make room. We admins are happy because no one is yelling at us. The only downside (or the one that keeps getting mentioned to me) is performance penalty. I get this for a number of things, like "my code runs faster on dual core than quad core" or "I lose 10% performance in a virtual machine", etc. My usual answer is "You tell me how long your job has to sit in the queue before the performance hit of waiting is worse than the performance hit of running with virtualization or quad core." Usually the answer is that running is better than waiting, so given a fixed cost investment in a cluster, virtualization and more cores would make more users happy (IMHO). Throughput seems to be the higher priority for clusters I manage. I've never had an original thought in my life, I have to assume someone is already doing this or at a minimum the parts needed exist in perceus, schedulers, etc. and it just needs to be tied together. If someone can convince my boss that this is a high priority item, I'd be thrilled to start figuring it out :) jbh > > > thx, > psc > > > Date: Tue, 25 Mar 2008 07:55:29 -0600 > > From: "John Hanks" > > Subject: Re: [Warewulf] Warewulf Digest, Vol 39, Issue 14 > > To: "The Warewulf Cluster Toolkit" > > Message-ID: > > > > Content-Type: text/plain; charset=ISO-8859-1 > > > > I can give some examples, real and some hypothetical. > > > > One of our users has a proprietary application that, given a 4 core > > node performs better with 4 instances running in 4 VMs than it does on > > 1 4 core system. After benchmarking this case, it was clear that VMs > > were the solution (and a special purpose cluster). > > > > We have a faculty who wants his problem to run on hundreds of nodes, > > but doesn't care how fast they are and needs very little memory, I > > only have 64 nodes but I do have 256 cores. If I could boot my nodes > > with xen, I could give him a higher number of nodes using only a > > fraction of the total cluster. > > > > Suppose I have a mix of serial and parallel jobs, with the serial jobs > > being low priority but not being written in such a way as they can do > > their own checkpointing. If my real nodes run xen and my compute nodes > > are virtual, I can suspend low priority serial jobs by suspending the > > serial VM nodes when a parallel job needs to run, then resume them > > when the parallel jobs are finished. Combine this with preemption and > > you have a very elegant solution that allows high priority jobs to > > always start right away AND can bring your total utilization closer to > > 100% because you never need to drain nodes to start a big parallel > > job. > > > > For those users who want a 3 month queue time, submitting their job as > > an entire VM that can be suspended/resumed at the end of each > > $YOURFAVORITEQUEUEMAXTIME makes both of you much happier. And they get > > checkpointing as a side effect. > > > > Emergency maintenance: 1. Suspend all VMs. 2. Do maintenance. 3. > > Resume all VMs. (Note there was nothing there about losing jobs, > > notifying users of a prolonged emergency outage, waiting for queues to > > drain, etc...) > > > > Last, with VMs I could more easily run a Windows based cluster for the > > handful of users who need to run a windows only app (like a special > > proprietary DLL for matlab, for instance). This'd make it easy to > > bring the windows cluster up/down as needed. > > > > I think the benefits of cluster virtualization far outweight the > > performance penalty of virtualization, especially given that the > > performance penalty continues to drop as virtualization gets better. > > If I were the perceus developers I'd have an option for the perceus > > kernel to be a [xen|kvm|vmwareesx|???] kernel and not bother with the > > kexec at all on real nodes, only kexec and provision virtual nodes. > > > > jbh > > > > On Tue, Mar 25, 2008 at 6:06 AM, pscadmin wrote: > > > >> I have a dumb questions: Why do people need xen or vmware on clusters -- > >> doesn't it defeat the purpose of clustering, which is harvest computing > >> power across computing nodes? > >> > >> warewulf-request at caoslinux.org wrote: > >> > Send Warewulf mailing list submissions to > >> > warewulf at caoslinux.org > >> > > >> > To subscribe or unsubscribe via the World Wide Web, visit > >> > http://lists.caosity.org/mailman/listinfo/warewulf > >> > or, via email, send a message with subject or body 'help' to > >> > warewulf-request at caoslinux.org > >> > > >> > You can reach the person managing the list at > >> > warewulf-owner at caoslinux.org > >> > > >> > When replying, please edit your Subject line so it is more specific > >> > than "Re: Contents of Warewulf digest..." > >> > > >> > > >> > Today's Topics: > >> > > >> > 1. Re: perceus with xen (PN) > >> > 2. Costum made network driver (tegner at nada.kth.se) > >> > > >> > > >> > ---------------------------------------------------------------------- > >> > > >> > Message: 1 > >> > Date: Tue, 25 Mar 2008 16:49:06 +0800 > >> > From: PN > >> > Subject: Re: [Warewulf] perceus with xen > >> > To: "The Warewulf Cluster Toolkit" > >> > Message-ID: > >> > <92daa7bf0803250149o3bbaae96t502b14075336ecfe at mail.gmail.com> > >> > Content-Type: text/plain; charset="utf-8" > >> > > >> > perceus 1.3.6 provides the same result as 1.3.7. > >> > > >> > after some investigations, i find out that perceus misuses the > >> > vmlinuz-2.6.18-53.el5xen as the vm's kernel. > >> > > >> > here is the grub for the dom0 machine: > >> > ... > >> > title CentOS (2.6.18-53.el5xen) > >> > root (hd0,0) > >> > kernel /xen.gz-2.6.18-53.el5 > >> > module /vmlinuz-2.6.18-53.el5xen ro root=LABEL=/ rhgb quiet > >> > module /initrd-2.6.18-53.el5xen.img > >> > title CentOS-base (2.6.18-53.el5) > >> > root (hd0,0) > >> > kernel /vmlinuz-2.6.18-53.el5 ro root=LABEL=/ rhgb quiet > >> > initrd /initrd-2.6.18-53.el5.img > >> > > >> > however i can't find the grub.conf under my compute node's image. > >> > anyone can tell me how to change the boot settings? > >> > > >> > thanks, > >> > PN > >> > > >> > 2008/3/11, PN : > >> > > >> >> i haven't tried Caos yet. > >> >> i have a VM using xen, also centos 5.1, that can successfully boot up > >> >> (stateful). > >> >> i don't know why it can't boot up with perceus. > >> >> anyway, i will try it with perceus 1.3.6 and see what happen. > >> >> > >> >> thanks, > >> >> PN > >> >> > >> >> > >> >> 2008/3/11, Arthur Stevens : > >> >> > >> >>> Perceus now fully supports KVM and VMware and still supports Xen. With > >> >>> Abstractual, you get full integration of VM's over the cluster with the > >> >>> ability to migrate. Sounds like a Centos issue here, and we try not to > >> >>> support Centos ;) > >> >>> > >> >>> What happens when you try it with Caos NSA? > >> >>> > >> >>> Arthur > >> >>> > >> >>> ----- Original Message ----- > >> >>> *From:* PN > >> >>> *To:* The Warewulf Cluster Toolkit > >> >>> *Sent:* Sunday, March 09, 2008 11:57 PM > >> >>> *Subject:* [Warewulf] perceus with xen > >> >>> > >> >>> > >> >>> has anyone tried perceus with xen? > >> >>> previously i tried perceus 1.3.6 and 5.1 kernel without problem. > >> >>> now i use perceus 1.3.7 and centos 5.1 xen kernel to make the image. > >> >>> however when the client node bootup, it shows > >> >>> > >> >>> Now provisiong: node0001 > >> >>> VNFS: GT8000 > >> >>> Group: cluster > >> >>> Node ID: 00:14:25:00:04 > >> >>> + cat /found_nics > >> >>> + ifconfig eth0 down > >> >>> + ifconfig eth1 down > >> >>> + ifconfig ib0 down > >> >>> + ifconfig ib1 down > >> >>> + [ ! -f /sbin/detect ] > >> >>> + . /etc/functions > >> >>> + . /etc/initramfs.conf > >> >>> + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 > >> >>> + MAX_TRIES=5 > >> >>> + echo Un-loading device drivers: > >> >>> + echo -ne \-> > >> >>> \->+ unload_module scsi_mod > >> >>> + grep -q ^scsi_mod /proc/modules > >> >>> + PATH=/sbin rmmod scsi_mod > >> >>> + unload_module ib_ipoib > >> >>> + grep -q ^ib_ipoib /proc/modules > >> >>> + PATH=/sbin rmmod ib_ipoib > >> >>> + echo -n ib_ipoib > >> >>> ib_ipoib+ cat /etc/modulerc > >> >>> + read i > >> >>> + /sbin/detect -q > >> >>> + read i > >> >>> + unload_module uhci_hcd > >> >>> + grep -q ^uhci_hcd /proc/modules > >> >>> + read i > >> >>> + unload_module uhci-hcd > >> >>> + grep -q ^uhci-hcd /proc/modules > >> >>> + read i > >> >>> + unload_module ehci-hcd > >> >>> + grep -q ^ehci-hcd /proc/modules > >> >>> + read i > >> >>> + unload_module ata_piix > >> >>> + grep -q ^ata_piix /proc/modules > >> >>> + PATH=/sbin rmmod ata_piix > >> >>> + echo -n ata_piix > >> >>> ata_piix+ read i > >> >>> + unload_module piix > >> >>> + grep -q ^piix /proc/modules > >> >>> + PATH=/sbin remmod piix > >> >>> + read i > >> >>> + unload_module ib_mthca > >> >>> + grep -q ^ib_mthca /proc/modules > >> >>> + PATH=/sbin rmmod ib_mthca > >> >>> + echo -n ib_mthca > >> >>> ib_mthca+ read i > >> >>> + unload_module tg3 > >> >>> + grep -q ^tg3 /proc/modules > >> >>> + PATH=/sbin rmmod tg3 > >> >>> + echo -n tg3 > >> >>> tg3+ read i > >> >>> + unload_module tg3 > >> >>> + grep -q ^tg3 /proc/modules > >> >>> + read i > >> >>> + echo > >> >>> > >> >>> Total provision time: 6 s > >> >>> > >> >>> Invalid memory segment 0xffffffff80200000 - 0xffffffff804c3fff > >> >>> Reguesting DHCP configuration via ib0 spoofing eth0:ERROR: Could not > >> >>> ..... > >> >>> > >> >>> Has anyone seen this before? any comment is appreciated. > >> >>> > >> >>> thanks, > >> >>> PN > >> >>> > >> >>> ------------------------------ > >> >>> > >> >>> _______________________________________________ > >> >>> Warewulf mailing list > >> >>> Warewulf at caoslinux.org > >> >>> http://lists.caosity.org/mailman/listinfo/warewulf > >> >>> > >> >>> > >> >>> _______________________________________________ > >> >>> Warewulf mailing list > >> >>> Warewulf at caoslinux.org > >> >>> http://lists.caosity.org/mailman/listinfo/warewulf > >> >>> > >> >>> > >> >>> > >> > -------------- next part -------------- > >> > An HTML attachment was scrubbed... > >> > URL: http://lists.caosity.org/pipermail/warewulf/attachments/20080325/91db9216/attachment-0001.html > >> > > >> > ------------------------------ > >> > > >> > Message: 2 > >> > Date: Tue, 25 Mar 2008 12:37:38 +0100 (MET) > >> > From: tegner at nada.kth.se > >> > Subject: [Warewulf] Costum made network driver > >> > To: warewulf at caoslinux.org > >> > Message-ID: > >> > <27625.150.227.15.253.1206445058.squirrel at webmail.csc.kth.se> > >> > Content-Type: text/plain;charset=iso-8859-1 > >> > > >> > Hi all, > >> > > >> > I'm new to this so please forgive me if this is a stupid question that has > >> > already been answered (didn't find a good way to search the archives). > >> > > >> > I'm using centos 5.1 (and perceus 1.3.7). The driver-module (r8169.ko, > >> > included in the distribution) for the nic on my nodes is not working, and > >> > I need to build my own. In order for this to work, is it enough to build > >> > this driver (r8169) and put it in the VNFS image? > >> > > >> > Regards, > >> > > >> > /jon > >> > > >> > > >> > > >> > > >> > ------------------------------ > >> > > >> > _______________________________________________ > >> > Warewulf mailing list > >> > Warewulf at caoslinux.org > >> > http://lists.caosity.org/mailman/listinfo/warewulf > >> > > >> > > >> > End of Warewulf Digest, Vol 39, Issue 14 > >> > **************************************** > >> > > >> > >> _______________________________________________ > >> Warewulf mailing list > >> Warewulf at caoslinux.org > >> http://lists.caosity.org/mailman/listinfo/warewulf > >> > >> > > > > > > ------------------------------ > > > > Message: 3 > > Date: Tue, 25 Mar 2008 08:41:14 -0700 > > From: "Arthur Stevens" > > Subject: Re: [Warewulf] Warewulf Digest, Vol 39, Issue 14 > > To: "The Warewulf Cluster Toolkit" > > Message-ID: <008b01c88e8e$a8a00a20$6400a8c0 at terminal209> > > Content-Type: text/plain; format=flowed; charset="iso-8859-1"; > > reply-type=original > > > > Aside from over 75% of our userbase asking for it, their are a lot of > > reasons for it. As far as the Perceus/Infiscale team see's it, here are a > > few of the top chices for adding it. I also think they need KVM not Xen > > (mainly because I preffer true virtualization as I can kill an entire Xen > > paravirtualization system from having a shell on a single VM). :) > > > > So say you use the new IPMI.pmod, Green.pmod, virtualization.pmod, here is > > some slick stuff you can do. > > > > You can start a cluster with only 100 nodes and start all instances in VM's. > > As the load increases, the cluster can turn up more nodes and migrate the > > vm's to their own server or span them over multiple. So your not paying the > > power bill for 800 servers when only 100 are in use. > > > > Perceus is hitting the Enterprise like a freight train. This means you > > problably are not even HPC anymore yet using Perceus. This might be an ISP > > that wants to stick 50 customers on the same box like a godaddy or one of > > those places. > > > > You might also be a bank or a chemical plant that is using virtualization > > for remote terminals. in multiple locations. > > > > The virtualization is not a forced option, but a very well taken and very > > much requested feature. We first showed it off at the last Intel Developer > > Forum in Fan Francisco and have seen a drastic increase in not only adoption > > but full replacement of commercial offerings with it. > > > > Virtualization is here to stay, and Perceus/Abstractual take full advantage > > of what it can add to the bottom line :) > > > > Arthur > > > > > > ----- Original Message ----- > > From: "pscadmin" > > To: > > Sent: Tuesday, March 25, 2008 5:06 AM > > Subject: Re: [Warewulf] Warewulf Digest, Vol 39, Issue 14 > > > > > > > >> I have a dumb questions: Why do people need xen or vmware on clusters -- > >> doesn't it defeat the purpose of clustering, which is harvest computing > >> power across computing nodes? > >> > >> warewulf-request at caoslinux.org wrote: > >> > >>> Send Warewulf mailing list submissions to > >>> warewulf at caoslinux.org > >>> > >>> To subscribe or unsubscribe via the World Wide Web, visit > >>> http://lists.caosity.org/mailman/listinfo/warewulf > >>> or, via email, send a message with subject or body 'help' to > >>> warewulf-request at caoslinux.org > >>> > >>> You can reach the person managing the list at > >>> warewulf-owner at caoslinux.org > >>> > >>> When replying, please edit your Subject line so it is more specific > >>> than "Re: Contents of Warewulf digest..." > >>> > >>> > >>> Today's Topics: > >>> > >>> 1. Re: perceus with xen (PN) > >>> 2. Costum made network driver (tegner at nada.kth.se) > >>> > >>> > >>> ---------------------------------------------------------------------- > >>> > >>> Message: 1 > >>> Date: Tue, 25 Mar 2008 16:49:06 +0800 > >>> From: PN > >>> Subject: Re: [Warewulf] perceus with xen > >>> To: "The Warewulf Cluster Toolkit" > >>> Message-ID: > >>> <92daa7bf0803250149o3bbaae96t502b14075336ecfe at mail.gmail.com> > >>> Content-Type: text/plain; charset="utf-8" > >>> > >>> perceus 1.3.6 provides the same result as 1.3.7. > >>> > >>> after some investigations, i find out that perceus misuses the > >>> vmlinuz-2.6.18-53.el5xen as the vm's kernel. > >>> > >>> here is the grub for the dom0 machine: > >>> ... > >>> title CentOS (2.6.18-53.el5xen) > >>> root (hd0,0) > >>> kernel /xen.gz-2.6.18-53.el5 > >>> module /vmlinuz-2.6.18-53.el5xen ro root=LABEL=/ rhgb quiet > >>> module /initrd-2.6.18-53.el5xen.img > >>> title CentOS-base (2.6.18-53.el5) > >>> root (hd0,0) > >>> kernel /vmlinuz-2.6.18-53.el5 ro root=LABEL=/ rhgb quiet > >>> initrd /initrd-2.6.18-53.el5.img > >>> > >>> however i can't find the grub.conf under my compute node's image. > >>> anyone can tell me how to change the boot settings? > >>> > >>> thanks, > >>> PN > >>> > >>> 2008/3/11, PN : > >>> > >>> > >>>> i haven't tried Caos yet. > >>>> i have a VM using xen, also centos 5.1, that can successfully boot up > >>>> (stateful). > >>>> i don't know why it can't boot up with perceus. > >>>> anyway, i will try it with perceus 1.3.6 and see what happen. > >>>> > >>>> thanks, > >>>> PN > >>>> > >>>> > >>>> 2008/3/11, Arthur Stevens : > >>>> > >>>> > >>>>> Perceus now fully supports KVM and VMware and still supports Xen. With > >>>>> Abstractual, you get full integration of VM's over the cluster with the > >>>>> ability to migrate. Sounds like a Centos issue here, and we try not to > >>>>> support Centos ;) > >>>>> > >>>>> What happens when you try it with Caos NSA? > >>>>> > >>>>> Arthur > >>>>> > > > > From gmkurtzer at gmail.com Wed Mar 26 15:34:35 2008 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Wed, 26 Mar 2008 15:34:35 -0700 Subject: [Warewulf] Costum made network driver In-Reply-To: <27625.150.227.15.253.1206445058.squirrel@webmail.csc.kth.se> References: <27625.150.227.15.253.1206445058.squirrel@webmail.csc.kth.se> Message-ID: <571f1a060803261534o6de13caew89390e4bb1147b17@mail.gmail.com> Yes, and don't forget to also run depmod in the VNFS chroot. (sorry for the late response) On Tue, Mar 25, 2008 at 4:37 AM, wrote: > Hi all, > > I'm new to this so please forgive me if this is a stupid question that has > already been answered (didn't find a good way to search the archives). > > I'm using centos 5.1 (and perceus 1.3.7). The driver-module (r8169.ko, > included in the distribution) for the nic on my nodes is not working, and > I need to build my own. In order for this to work, is it enough to build > this driver (r8169) and put it in the VNFS image? > > Regards, > > /jon > > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > -- Greg Kurtzer http://www.runlevelzero.net/ From poknam at gmail.com Thu Mar 27 03:00:26 2008 From: poknam at gmail.com (PN) Date: Thu, 27 Mar 2008 18:00:26 +0800 Subject: [Warewulf] 3 questions in perceus Message-ID: <92daa7bf0803270300l3eb4db3dr9b61137e972f9e6b@mail.gmail.com> hi all, i have 3 questions in perceus: 1) the syslog redirection seems not working, the default is redirected to the master node, but i can't find any compute node information in the master node's /var/log/messages, except the TFTP messages. 2) it seems that perceus can use IB as dhcp and provisioning, but after i set the /etc/perceus/dnsmasq.conf using ib0 and restart perceus, the compute node still cannot boot from IB. it just cannot find any dhcp server, am i missing something? 3) when using perceus-1.3.6, the compute node can boot up sucessfully. however in 1.3.7, the compute node cannot boot. it seems that the driver is not working in the new version. my compute node uses the tg3 network driver. Provisioning from 192.168.30.20 <= it is the internet IP, different from version 1.3.6, which shows the internal IP 11.1.0.1. Now provisiong: node0001 VNFS: GT8000 Group: cluster Node ID: 00:14:25:00:04 + cat /found_nics <= these messages did not appear in perceus 1.3.6 + ifconfig eth0 down + ifconfig eth1 down + ifconfig ib0 down + ifconfig ib1 down + [ ! -f /sbin/detect ] + . /etc/functions + . /etc/initramfs.conf + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 + MAX_TRIES=5 + echo Un-loading device drivers: + echo -ne \-> \->+ unload_module scsi_mod + grep -q ^scsi_mod /proc/modules + PATH=/sbin rmmod scsi_mod + unload_module ib_ipoib + grep -q ^ib_ipoib /proc/modules + PATH=/sbin rmmod ib_ipoib + echo -n ib_ipoib ib_ipoib+ cat /etc/modulerc + read i + /sbin/detect -q + read i + unload_module uhci_hcd + grep -q ^uhci_hcd /proc/modules + read i + unload_module uhci-hcd + grep -q ^uhci-hcd /proc/modules + read i + unload_module ehci-hcd + grep -q ^ehci-hcd /proc/modules + read i + unload_module ata_piix + grep -q ^ata_piix /proc/modules + PATH=/sbin rmmod ata_piix + echo -n ata_piix ata_piix+ read i + unload_module piix + grep -q ^piix /proc/modules + PATH=/sbin remmod piix + read i + unload_module ib_mthca + grep -q ^ib_mthca /proc/modules + PATH=/sbin rmmod ib_mthca + echo -n ib_mthca ib_mthca+ read i + unload_module tg3 + grep -q ^tg3 /proc/modules + PATH=/sbin rmmod tg3 + echo -n tg3 tg3+ read i + unload_module tg3 + grep -q ^tg3 /proc/modules + read i + echo .... thanks, PN -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080327/4c11dfcc/attachment.html From astevens at gravitypark.com Thu Mar 27 10:22:41 2008 From: astevens at gravitypark.com (Arthur Stevens) Date: Thu, 27 Mar 2008 10:22:41 -0700 Subject: [Warewulf] 3 questions in perceus References: <92daa7bf0803270300l3eb4db3dr9b61137e972f9e6b@mail.gmail.com> Message-ID: <006101c8902f$29ee39c0$cb00a8c0@terminal209> Replies to 1-3.... 1) you can go into your confs and adjust that. we run skinny by default. 2) yes you are :) IB by default does not support net booting without our love. You need our Rapid Boot Payload or our Zepher, or our embedded usb to boot directly from IB. Also the new Perceus imbedded IB cards will support that as well. 1 wire IB is a commercial offering at this time. 3) You need to add your driver that is needed to the perceus kernel/vnfs then restart stuff and all should be fine. Just add support for your tg3. Infiscale offers that as a service if needed or you just don't have the time. Arthur ----- Original Message ----- From: PN To: The Warewulf Cluster Toolkit Sent: Thursday, March 27, 2008 3:00 AM Subject: [Warewulf] 3 questions in perceus hi all, i have 3 questions in perceus: 1) the syslog redirection seems not working, the default is redirected to the master node, but i can't find any compute node information in the master node's /var/log/messages, except the TFTP messages. 2) it seems that perceus can use IB as dhcp and provisioning, but after i set the /etc/perceus/dnsmasq.conf using ib0 and restart perceus, the compute node still cannot boot from IB. it just cannot find any dhcp server, am i missing something? 3) when using perceus-1.3.6, the compute node can boot up sucessfully. however in 1.3.7, the compute node cannot boot. it seems that the driver is not working in the new version. my compute node uses the tg3 network driver. Provisioning from 192.168.30.20 <= it is the internet IP, different from version 1.3.6, which shows the internal IP 11.1.0.1. Now provisiong: node0001 VNFS: GT8000 Group: cluster Node ID: 00:14:25:00:04 + cat /found_nics <= these messages did not appear in perceus 1.3.6 + ifconfig eth0 down + ifconfig eth1 down + ifconfig ib0 down + ifconfig ib1 down + [ ! -f /sbin/detect ] + . /etc/functions + . /etc/initramfs.conf + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 + MAX_TRIES=5 + echo Un-loading device drivers: + echo -ne \-> \->+ unload_module scsi_mod + grep -q ^scsi_mod /proc/modules + PATH=/sbin rmmod scsi_mod + unload_module ib_ipoib + grep -q ^ib_ipoib /proc/modules + PATH=/sbin rmmod ib_ipoib + echo -n ib_ipoib ib_ipoib+ cat /etc/modulerc + read i + /sbin/detect -q + read i + unload_module uhci_hcd + grep -q ^uhci_hcd /proc/modules + read i + unload_module uhci-hcd + grep -q ^uhci-hcd /proc/modules + read i + unload_module ehci-hcd + grep -q ^ehci-hcd /proc/modules + read i + unload_module ata_piix + grep -q ^ata_piix /proc/modules + PATH=/sbin rmmod ata_piix + echo -n ata_piix ata_piix+ read i + unload_module piix + grep -q ^piix /proc/modules + PATH=/sbin remmod piix + read i + unload_module ib_mthca + grep -q ^ib_mthca /proc/modules + PATH=/sbin rmmod ib_mthca + echo -n ib_mthca ib_mthca+ read i + unload_module tg3 + grep -q ^tg3 /proc/modules + PATH=/sbin rmmod tg3 + echo -n tg3 tg3+ read i + unload_module tg3 + grep -q ^tg3 /proc/modules + read i + echo .... thanks, PN ------------------------------------------------------------------------------ _______________________________________________ Warewulf mailing list Warewulf at caoslinux.org http://lists.caosity.org/mailman/listinfo/warewulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080327/8d203857/attachment.html From poknam at gmail.com Fri Mar 28 07:45:49 2008 From: poknam at gmail.com (PN) Date: Fri, 28 Mar 2008 22:45:49 +0800 Subject: [Warewulf] 3 questions in perceus In-Reply-To: <006101c8902f$29ee39c0$cb00a8c0@terminal209> References: <92daa7bf0803270300l3eb4db3dr9b61137e972f9e6b@mail.gmail.com> <006101c8902f$29ee39c0$cb00a8c0@terminal209> Message-ID: <92daa7bf0803280745m36defb42y2646a33c913d0d1c@mail.gmail.com> Hi, 2008/3/28, Arthur Stevens : > Replies to 1-3.... > > 1) you can go into your confs and adjust that. we run skinny by default. > OK, I will try this later. 2) yes you are :) IB by default does not support net booting without our > love. You need our Rapid Boot Payload or our Zepher, or our embedded usb to > boot directly from IB. Also the new Perceus imbedded IB cards will support > that as well. 1 wire IB is a commercial offering at this time. > There is a Boot over IB software package provided by mellanox. Is Rapid Boot Payload or Zepher similar to that? > 3) You need to add your driver that is needed to the perceus kernel/vnfs > then restart stuff and all should be fine. Just add support for your > tg3. Infiscale offers that as a service if needed or you just don't have the > time. > It's ok for me to add the module. Actually the tg3 module comes along with the vnfs kernel. So do i simply edit the /etc/perceus/modules/modprobe is enough? It seems that perceus 1.3.6 and 1.3.7 are using the same perceus kernel. Is there any reason why the node cannot boot with the same vnfs image? Thanks a lot, PN > > Arthur > > ----- Original Message ----- > *From:* PN > *To:* The Warewulf Cluster Toolkit > *Sent:* Thursday, March 27, 2008 3:00 AM > *Subject:* [Warewulf] 3 questions in perceus > > > hi all, > > i have 3 questions in perceus: > > 1) the syslog redirection seems not working, the default is redirected to > the master node, but i can't find any compute node information in the master > node's /var/log/messages, except the TFTP messages. > > 2) it seems that perceus can use IB as dhcp and provisioning, but after i > set the /etc/perceus/dnsmasq.conf using ib0 and restart perceus, the compute > node still cannot boot from IB. it just cannot find any dhcp server, am i > missing something? > > 3) when using perceus-1.3.6, the compute node can boot up sucessfully. > however in 1.3.7, the compute node cannot boot. it seems that the driver > is not working in the new version. > my compute node uses the tg3 network driver. > > Provisioning from 192.168.30.20 <= it is the internet IP, different > from version 1.3.6, which shows the internal IP 11.1.0.1. > > Now provisiong: node0001 > VNFS: GT8000 > Group: cluster > Node ID: 00:14:25:00:04 > > + cat /found_nics <= these messages did not > appear in perceus 1.3.6 > + ifconfig eth0 down > + ifconfig eth1 down > + ifconfig ib0 down > + ifconfig ib1 down > + [ ! -f /sbin/detect ] > + . /etc/functions > + . /etc/initramfs.conf > + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 > + MAX_TRIES=5 > + echo Un-loading device drivers: > + echo -ne \-> > \->+ unload_module scsi_mod > + grep -q ^scsi_mod /proc/modules > + PATH=/sbin rmmod scsi_mod > + unload_module ib_ipoib > + grep -q ^ib_ipoib /proc/modules > + PATH=/sbin rmmod ib_ipoib > + echo -n ib_ipoib > ib_ipoib+ cat /etc/modulerc > + read i > + /sbin/detect -q > + read i > + unload_module uhci_hcd > + grep -q ^uhci_hcd /proc/modules > + read i > + unload_module uhci-hcd > + grep -q ^uhci-hcd /proc/modules > + read i > + unload_module ehci-hcd > + grep -q ^ehci-hcd /proc/modules > + read i > + unload_module ata_piix > + grep -q ^ata_piix /proc/modules > + PATH=/sbin rmmod ata_piix > + echo -n ata_piix > ata_piix+ read i > + unload_module piix > + grep -q ^piix /proc/modules > + PATH=/sbin remmod piix > + read i > + unload_module ib_mthca > + grep -q ^ib_mthca /proc/modules > + PATH=/sbin rmmod ib_mthca > + echo -n ib_mthca > ib_mthca+ read i > + unload_module tg3 > + grep -q ^tg3 /proc/modules > + PATH=/sbin rmmod tg3 > + echo -n tg3 > tg3+ read i > + unload_module tg3 > + grep -q ^tg3 /proc/modules > + read i > + echo > .... > > thanks, > PN > > ------------------------------ > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > > _______________________________________________ > Warewulf mailing list > Warewulf at caoslinux.org > http://lists.caosity.org/mailman/listinfo/warewulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080328/f2202836/attachment.html From astevens at gravitypark.com Fri Mar 28 09:13:01 2008 From: astevens at gravitypark.com (Arthur Stevens) Date: Fri, 28 Mar 2008 09:13:01 -0700 Subject: [Warewulf] 3 questions in perceus References: <92daa7bf0803270300l3eb4db3dr9b61137e972f9e6b@mail.gmail.com> <006101c8902f$29ee39c0$cb00a8c0@terminal209> <92daa7bf0803280745m36defb42y2646a33c913d0d1c@mail.gmail.com> Message-ID: <039501c890ee$98fd9170$cb00a8c0@terminal209> 1) cool. by default we run light. 2) Rapid Boot allows the full perceus client, ib drivers, etc to be embedded on the motherboard. Zepher is a different solution with lots of info available via google, and it supports the full perceus and can be used by a lot of vendors. Think of it as a compromise between embedded and a usb key. The Mellanox package I could not tell you about, you might give Mellanox a call. If I recall correctly, it lets you netboot over pxe but it is not the same as embedded perceus. The perceus embedded cards will have the full perceus client on them. 3) Looks like you have something else going on of which I do not have enough data to troubleshoot. Without even knowing the board vendor, it's hard to tell where to start. You might want to contact Infiscale if in a hurry with this one. Otherwise gather all your data and hit the list here with your pastebin ;) hope this helps, Arthur .Sent on a Blackberry using BBpro Mail v2. ----- Original Message ----- From: PN To: Arthur Stevens ; The Warewulf Cluster Toolkit Sent: Friday, March 28, 2008 7:45 AM Subject: Re: [Warewulf] 3 questions in perceus Hi, 2008/3/28, Arthur Stevens : Replies to 1-3.... 1) you can go into your confs and adjust that. we run skinny by default. OK, I will try this later. 2) yes you are :) IB by default does not support net booting without our love. You need our Rapid Boot Payload or our Zepher, or our embedded usb to boot directly from IB. Also the new Perceus imbedded IB cards will support that as well. 1 wire IB is a commercial offering at this time. There is a Boot over IB software package provided by mellanox. Is Rapid Boot Payload or Zepher similar to that? 3) You need to add your driver that is needed to the perceus kernel/vnfs then restart stuff and all should be fine. Just add support for your tg3. Infiscale offers that as a service if needed or you just don't have the time. It's ok for me to add the module. Actually the tg3 module comes along with the vnfs kernel. So do i simply edit the /etc/perceus/modules/modprobe is enough? It seems that perceus 1.3.6 and 1.3.7 are using the same perceus kernel. Is there any reason why the node cannot boot with the same vnfs image? Thanks a lot, PN Arthur ----- Original Message ----- From: PN To: The Warewulf Cluster Toolkit Sent: Thursday, March 27, 2008 3:00 AM Subject: [Warewulf] 3 questions in perceus hi all, i have 3 questions in perceus: 1) the syslog redirection seems not working, the default is redirected to the master node, but i can't find any compute node information in the master node's /var/log/messages, except the TFTP messages. 2) it seems that perceus can use IB as dhcp and provisioning, but after i set the /etc/perceus/dnsmasq.conf using ib0 and restart perceus, the compute node still cannot boot from IB. it just cannot find any dhcp server, am i missing something? 3) when using perceus-1.3.6, the compute node can boot up sucessfully. however in 1.3.7, the compute node cannot boot. it seems that the driver is not working in the new version. my compute node uses the tg3 network driver. Provisioning from 192.168.30.20 <= it is the internet IP, different from version 1.3.6, which shows the internal IP 11.1.0.1. Now provisiong: node0001 VNFS: GT8000 Group: cluster Node ID: 00:14:25:00:04 + cat /found_nics <= these messages did not appear in perceus 1.3.6 + ifconfig eth0 down + ifconfig eth1 down + ifconfig ib0 down + ifconfig ib1 down + [ ! -f /sbin/detect ] + . /etc/functions + . /etc/initramfs.conf + DEVS=eth0 eth1 eth2 eth3 eth4 eth5 eth6 eth7 ib0 ib1 + MAX_TRIES=5 + echo Un-loading device drivers: + echo -ne \-> \->+ unload_module scsi_mod + grep -q ^scsi_mod /proc/modules + PATH=/sbin rmmod scsi_mod + unload_module ib_ipoib + grep -q ^ib_ipoib /proc/modules + PATH=/sbin rmmod ib_ipoib + echo -n ib_ipoib ib_ipoib+ cat /etc/modulerc + read i + /sbin/detect -q + read i + unload_module uhci_hcd + grep -q ^uhci_hcd /proc/modules + read i + unload_module uhci-hcd + grep -q ^uhci-hcd /proc/modules + read i + unload_module ehci-hcd + grep -q ^ehci-hcd /proc/modules + read i + unload_module ata_piix + grep -q ^ata_piix /proc/modules + PATH=/sbin rmmod ata_piix + echo -n ata_piix ata_piix+ read i + unload_module piix + grep -q ^piix /proc/modules + PATH=/sbin remmod piix + read i + unload_module ib_mthca + grep -q ^ib_mthca /proc/modules + PATH=/sbin rmmod ib_mthca + echo -n ib_mthca ib_mthca+ read i + unload_module tg3 + grep -q ^tg3 /proc/modules + PATH=/sbin rmmod tg3 + echo -n tg3 tg3+ read i + unload_module tg3 + grep -q ^tg3 /proc/modules + read i + echo .... thanks, PN -------------------------------------------------------------------------- _______________________________________________ Warewulf mailing list Warewulf at caoslinux.org http://lists.caosity.org/mailman/listinfo/warewulf _______________________________________________ Warewulf mailing list Warewulf at caoslinux.org http://lists.caosity.org/mailman/listinfo/warewulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://altruistic.infiscale.org/pipermail/perceus/attachments/20080328/40d91c28/attachment.html