{"id":100,"date":"2020-01-22T19:36:47","date_gmt":"2020-01-22T19:36:47","guid":{"rendered":"https:\/\/ni.cmu.edu\/computing\/?post_type=ht_kb&#038;p=100"},"modified":"2020-01-22T19:37:32","modified_gmt":"2020-01-22T19:37:32","slug":"lens-software","status":"publish","type":"ht_kb","link":"https:\/\/ni.cmu.edu\/computing\/knowledge-base\/lens-software\/","title":{"rendered":"Lens Software"},"content":{"rendered":"<h5>This needs updated for the SLURM scheduler!<\/h5>\n<p>A sample PBS submit script can be found below. There are a few parts.<\/p>\n<p>Firstly, you will need to set the following environment variables in your .bashrc file located in your home directory.<\/p>\n<p>export LENSDIR=\/data2\/plautlab\/Lens<br \/>\nexport HOSTTYPE=x86_64<br \/>\nexport PATH=$PATH:$LENSDIR\/Bin\/$HOSTTYPE<br \/>\nexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LENSDIR\/Bin\/${HOSTTYPE}<br \/>\nexport TCL_LIBRARY=\/data2\/plautlab\/Lens\/Bin\/x86_64\/<\/p>\n<p>The next time you start a shell session, your environmental variable declaration will be read and passed on to the shell environment. You can force your current session to read the file now by typing in the shell that you are using:<\/p>\n<p>source ~\/.bashrc<\/p>\n<p>Or you can log out of the cluster and back in so they are used.<\/p>\n<p>The basic script is <em>test.qsub<\/em>, and the two important lines in that script are the line that sets the number of processors to use and the line calling the Lens software.<\/p>\n<p>Currently, the script reads: <em>#PBS -l nodes=4:ppn=4<\/em> which tells PBS to use 4 of the compute nodes (out of the current 7 on psych-o) and use 4 processors on each node (out of the 8 processors per node). Lens will use the first processor as the lens server, and the rest (15 in this case) as the clients for computation. You can change these numbers in the script before using it to set the number of nodes or processors you want. (nodes=2:ppn=8 would also give you 16 processors, using all the processors on 2 of the nodes, for example).<\/p>\n<p>The other important line is: <em>lens -b server.qsub.tcl &gt; server.out<\/em><\/p>\n<p>You can alter this line to use different input or output files. server.qsub.tcl has been changed from the default server.tcl file so you no longer need to edit it to list which machines to use for the job. That&#8217;s now handled through PBS, by setting the number of processors in test.qsub. It makes use of an additional little script, &#8216;tailnodelist&#8217;, which needs to be in the same location as the server.qsub.tcl file (see below) along with the rest of the scripts.<\/p>\n<p>So to run Lens via PBS:<\/p>\n<ol>\n<li>Edit test.qsub to set the requested number of nodes\/processors, as needed.<\/li>\n<li>Edit test.qsub to change the input or output files, as needed.<\/li>\n<li>Run the command &#8216;qsub test.qsub&#8217;<\/li>\n<\/ol>\n<p>One can always make different versions of test.qsub (and give them different filenames), and then just run them with the &#8216;qsub&#8217; command.<\/p>\n<hr \/>\n<h4><a name=\"Begin_test_qsub\"><\/a> Begin test.qsub<\/h4>\n<hr \/>\n<p>#!\/bin\/sh<br \/>\n# specify a jobname<br \/>\n#PBS -N test<\/p>\n<p># specify number of nodes (ppn should be 8 to reserve all cores on the node)<br \/>\n#PBS -l nodes=4:ppn=4<\/p>\n<p># misc other PBS settings:<br \/>\n#PBS -j eo<\/p>\n<p># echo &#8220;Moving to plautlab directory.&#8221;<br \/>\ncd \/data2\/plautlab\/Lens<br \/>\n# echo &#8220;Beginning lens job.&#8221;<br \/>\nlens -b server.qsub.tcl &gt; server.out<br \/>\n# echo &#8220;Lens job completed.&#8221;<\/p>\n<hr \/>\n<h4><a name=\"End_test_qsub\"><\/a> End test.qsub<\/h4>\n<hr \/>\n<h4><a name=\"Begin_server_qsub_tcl\"><\/a> Begin server.qsub.tcl<\/h4>\n<hr \/>\n<p><span class=\"TMLhtml\"># This script is run on the server with the following (assuming &#8220;excecutable&#8221; below is set to &#8220;lens&#8221;)<\/span><br \/>\n<span class=\"TMLhtml\">#<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>.\/lens -b server.tcl &gt; server.out &amp;<\/span><\/p>\n<p><span class=\"TMLhtml\"># LD_LIBRARY_PATH on both server and client machines must include current directory<\/span><br \/>\n<span class=\"TMLhtml\"># (containing libtcl8.3.so and libtk8.3.so)<\/span><\/p>\n<p><span class=\"TMLhtml\"># The base of the network file name (and its directory)<\/span><br \/>\n<span class=\"TMLhtml\">set filename<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>rand100<\/span><br \/>\n<span class=\"TMLhtml\">set workingDirectory \/data2\/plautlab\/Lens<\/span><br \/>\n<span class=\"TMLhtml\">set networkScript<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>$workingDirectory\/$filename.in<\/span><br \/>\n<span class=\"TMLhtml\">set clientScript<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>client.tcl<\/span><br \/>\n<span class=\"TMLhtml\">set fixedPort<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>2001<\/span><br \/>\n<span class=\"TMLhtml\">set executable<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>$workingDirectory\/lens<\/span><\/p>\n<p><span class=\"TMLhtml\"># Starting epoch, total number to run, and learning algorithm<\/span><br \/>\n<span class=\"TMLhtml\">set epoch<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>0<\/span><br \/>\n<span class=\"TMLhtml\">set nepochs<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>100<\/span><br \/>\n<span class=\"TMLhtml\">set algorithm<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>dougsMomentum<\/span><\/p>\n<p><span class=\"TMLhtml\"># number of epochs to run only steepestDescent before switching to the specified algorithm<\/span><br \/>\n<span class=\"TMLhtml\">set nsteepest<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>0<\/span><\/p>\n<p><span class=\"TMLhtml\"># Checkpointing of weight files<\/span><br \/>\n<span class=\"TMLhtml\">set checkpointInterval 1000<\/span><br \/>\n<span class=\"TMLhtml\">set minSaveInterval<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>100<\/span><br \/>\n<span class=\"TMLhtml\">set maxSaveInterval<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>100<\/span><\/p>\n<p><span class=\"TMLhtml\"># Here is where you list the client machines. <\/span><br \/>\n<span class=\"TMLhtml\"># To run two or more client processes on the same machine, just list it multiple times.<\/span><br \/>\n<span class=\"TMLhtml\">#set clientMachines<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>{<\/span><br \/>\n<span class=\"TMLhtml\">#compute-0-10 compute-0-10<\/span><br \/>\n<span class=\"TMLhtml\">#}<\/span><br \/>\n<span class=\"TMLhtml\">set clientMachines [exec $workingDirectory\/tailnodelist $::env(PBS_NODEFILE)]<\/span><br \/>\n<span class=\"TMLhtml\">#echo &#8220;List of client machines:&#8221;<\/span><br \/>\n<span class=\"TMLhtml\">#echo $clientMachines<\/span><\/p>\n<p><span class=\"TMLhtml\">#############################################################################<\/span><br \/>\n<span class=\"TMLhtml\"># shouldn&#8217;t need to change anything below this<\/span><br \/>\n<span class=\"TMLhtml\">#############################################################################<\/span><\/p>\n<p><span class=\"TMLhtml\">proc sourceIfExists {file} {<\/span><br \/>\n<span class=\"TMLhtml\"> if { [file exists $file] } {<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>puts &#8220;Reading parameter file $file&#8221;<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>puts [source $file]<\/span><br \/>\n<span class=\"TMLhtml\"> }<\/span><br \/>\n<span class=\"TMLhtml\">}<\/span><\/p>\n<p><span class=\"TMLhtml\">proc checkpoint { filename } {<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>global checkpointInterval<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>global minSaveInterval<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>global maxSaveInterval<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>set epoch [getObj totalUpdates] <\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>set saveInterval [expr int(pow(10,floor(log10($epoch))))]<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>if { $saveInterval &lt; $minSaveInterval } {<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>set saveInterval $minSaveInterval<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>} elseif { $saveInterval &gt; $maxSaveInterval } {<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>set saveInterval $maxSaveInterval<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>}<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>if { [expr $epoch % $saveInterval] <span class=\"WYSIWYG_TT\">= 0 } {<\/span><br \/>\n<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>puts &#8220;Saving weights to $filename.$epoch.wt.bz2&#8221;<br \/>\n<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>saveWeights $filename.$epoch.wt.bz2 -values 3<br \/>\n<span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>} elseif { [expr $epoch % $checkpointInterval] = 0 } {<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>puts &#8220;Checkpointing to $filename.ckp.wt.bz2&#8221;<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>saveWeights $filename.ckp.wt.bz2 -values 3<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>}<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>sourceIfExists $filename.prm<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>sourceIfExists $filename.$epoch.prm<\/span><br \/>\n<span class=\"TMLhtml\">}<\/span><\/p>\n<p><span class=\"TMLhtml\"># Start the server<\/span><br \/>\n<span class=\"TMLhtml\">set port [startServer $fixedPort]<\/span><br \/>\n<span class=\"TMLhtml\">set hostname [exec hostname -f]<\/span><\/p>\n<p><span class=\"TMLhtml\"># Write the customized client script<\/span><br \/>\n<span class=\"TMLhtml\">set customClientScript $workingDirectory\/client[getSeed].tcl<\/span><br \/>\n<span class=\"TMLhtml\">regsub -all \/ $networkScript \\\\\/ netScript<\/span><br \/>\n<span class=\"TMLhtml\">sed &#8220;s\/SCRIPT\/$netScript\/; s\/SERVER\/$hostname\/; s\/PORT\/$port\/&#8221; \\<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>$clientScript &gt; $customClientScript<\/span><br \/>\n<span class=\"TMLhtml\">echo &#8220;Copying script to clients&#8221;<\/span><br \/>\n<span class=\"TMLhtml\">foreach client $clientMachines {<\/span><br \/>\n<span class=\"TMLhtml\"> echo &#8220;$client&#8221;<\/span><br \/>\n<span class=\"TMLhtml\"> scp $customClientScript $client:$workingDirectory<\/span><br \/>\n<span class=\"TMLhtml\">}<\/span><\/p>\n<p><span class=\"TMLhtml\"># Here we define a command for launching clients using ssh<\/span><br \/>\n<span class=\"TMLhtml\">proc launchClients {executable customClientScript machines} {<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>global env<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>set i 0<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>foreach client $machines {<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>puts &#8221; launching on $client&#8230;&#8221;<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>exec \/usr\/bin\/ssh $client -n \\<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>&#8220;$executable -batch $customClientScript &gt; \/dev\/null&#8221; &amp;<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>incr i<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>}<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>return $i<\/span><br \/>\n<span class=\"TMLhtml\">}<\/span><\/p>\n<p><span class=\"TMLhtml\"># Now we use the command<\/span><br \/>\n<span class=\"TMLhtml\">puts &#8220;Launching clients&#8230;&#8221;<\/span><br \/>\n<span class=\"TMLhtml\">set numClients [launchClients $executable $customClientScript $clientMachines]<\/span><\/p>\n<p><span class=\"TMLhtml\"># Load the network and training set<\/span><br \/>\n<span class=\"TMLhtml\">source $networkScript<\/span><br \/>\n<span class=\"TMLhtml\">setObj postUpdateProc { checkpoint $filename }<\/span><\/p>\n<p><span class=\"TMLhtml\"># maybe load previously saved weights (to restart)<\/span><br \/>\n<span class=\"TMLhtml\">if { [file exists wts\/$filename.$epoch.wt.bz2] } {<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>loadWeights wts\/$filename.$epoch.wt<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>echo loadWeights wts\/$filename.$epoch.wt<\/span><br \/>\n<span class=\"TMLhtml\">} elseif { [file exists $filename.$epoch.wt.bz2] } {<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>loadWeights $filename.$epoch.wt<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>echo loadWeights $filename.$epoch.wt<\/span><br \/>\n<span class=\"TMLhtml\">}<\/span><br \/>\n<span class=\"TMLhtml\">set epoch [getObj totalUpdates] <\/span><\/p>\n<p><span class=\"TMLhtml\"># Now wait for the clients to connect.<\/span><br \/>\n<span class=\"TMLhtml\">puts &#8220;Waiting for $numClients clients&#8230;&#8221;<\/span><br \/>\n<span class=\"TMLhtml\">waitForClients $numClients<\/span><\/p>\n<p><span class=\"TMLhtml\"># Start training and wait for it to finish,<\/span><br \/>\n<span class=\"TMLhtml\"># but don&#8217;t wait if it didn&#8217;t start correctly.<\/span><br \/>\n<span class=\"TMLhtml\">puts &#8220;Training&#8230;&#8221;<\/span><\/p>\n<p><span class=\"TMLhtml\">if { $epoch &lt; $nsteepest } {<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>puts [trainParallel [expr $nsteepest &#8211; $epoch] -algorithm steepest ]<\/span><br \/>\n<span class=\"TMLhtml\"><span class=\"WYSIWYG_HIDDENWHITESPACE\">\u00a0<\/span>set epoch $nsteepest<\/span><br \/>\n<span class=\"TMLhtml\">}<\/span><br \/>\n<span class=\"TMLhtml\">puts [trainParallel [expr $nepochs &#8211; $epoch] -algorithm $algorithm ]<\/span><\/p>\n<p><span class=\"TMLhtml\"># Now break the barrier holding the clients so they can exit.<\/span><br \/>\n<span class=\"TMLhtml\">puts &#8220;Releasing clients&#8230;&#8221;<\/span><br \/>\n<span class=\"TMLhtml\">waitForClients<\/span><br \/>\n<span class=\"TMLhtml\">exec rm [glob $customClientScript]<\/span><br \/>\n<span class=\"TMLhtml\">puts &#8220;Ba-bye&#8221;<\/span><br \/>\n<span class=\"TMLhtml\">exit<\/span><\/p>\n<hr \/>\n<h4><a name=\"End_server_qsub_tcl\"><\/a> End server.qsub.tcl<\/h4>\n","protected":false},"excerpt":{"rendered":"<p>This needs updated for the SLURM scheduler! A sample PBS submit script can be found below. There are a few parts. Firstly, you will need to set the following environment variables in your .bashrc file located in your home directory. export LENSDIR=\/data2\/plautlab\/Lens export HOSTTYPE=x86_64 export PATH=$PATH:$LENSDIR\/Bin\/$HOSTTYPE export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LENSDIR\/Bin\/${HOSTTYPE} export TCL_LIBRARY=\/data2\/plautlab\/Lens\/Bin\/x86_64\/&#8230;<\/p>\n","protected":false},"author":1,"comment_status":"closed","ping_status":"closed","template":"","format":"standard","meta":{"footnotes":""},"ht-kb-category":[11],"ht-kb-tag":[],"class_list":["post-100","ht_kb","type-ht_kb","status-publish","format-standard","hentry","ht_kb_category-software"],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/ni.cmu.edu\/computing\/wp-json\/wp\/v2\/ht-kb\/100","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ni.cmu.edu\/computing\/wp-json\/wp\/v2\/ht-kb"}],"about":[{"href":"https:\/\/ni.cmu.edu\/computing\/wp-json\/wp\/v2\/types\/ht_kb"}],"author":[{"embeddable":true,"href":"https:\/\/ni.cmu.edu\/computing\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ni.cmu.edu\/computing\/wp-json\/wp\/v2\/comments?post=100"}],"version-history":[{"count":2,"href":"https:\/\/ni.cmu.edu\/computing\/wp-json\/wp\/v2\/ht-kb\/100\/revisions"}],"predecessor-version":[{"id":102,"href":"https:\/\/ni.cmu.edu\/computing\/wp-json\/wp\/v2\/ht-kb\/100\/revisions\/102"}],"wp:attachment":[{"href":"https:\/\/ni.cmu.edu\/computing\/wp-json\/wp\/v2\/media?parent=100"}],"wp:term":[{"taxonomy":"ht_kb_category","embeddable":true,"href":"https:\/\/ni.cmu.edu\/computing\/wp-json\/wp\/v2\/ht-kb-category?post=100"},{"taxonomy":"ht_kb_tag","embeddable":true,"href":"https:\/\/ni.cmu.edu\/computing\/wp-json\/wp\/v2\/ht-kb-tag?post=100"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}