quick scripting question - finding occurrence in many lines

On Wed, Nov 29, 2006 at 02:32:37PM +0000, michael wrote:
> I guess a complete rephrase is best.
>
> What I want is "how many processors does each WAITING job in lsf queues
> require?". From 'bhist' I get outputs such as below (see whitespace
> anywhere in "num Processors") and cannot determine a sure way of always
> parsing it...

In the brute force perl solution previously shown, just add whitespace
to the character class, [\s\n-], which is inserted between every target
character in the regular expression. This would be similar in awk, sed,
grep, or other tool using regular expressions.

#!/usr/bin/perl -w
use strict;
my $source = join '', <>; # get all the data into a string
my $t = '[\s\n-]'; # define a regexp character class
print "$1\n" while # to be between each character
$source =~ m/(\d+)\s+P$t*r$t*o$t*c$t*e$t*s$t*s$t*o$t*r/msg;

Other schemes previously shown would probably work with trivial changes,
e.g., using tr to delete (-d) or squeeze (-s) runs of spaces or newlines,
etc.

Unless this is a one-off task (which it seems like it isn't), I'd
suggest looking into fixing whatever is generating the screwed-up output
in the first place. Failing that, use tr/sed/python/perl/ruby/BASIC
whatever to filter the output to something more sensible, i.e., normalize
it, and don't try to do it in one step.

Ken

>
> Thanks, Michael
>
> EXAMPLES:
>
>
>
> ~/bin$ bhist -l 10418;bhist -l 10587;bhist -l 10601
>
> Job <10418>, Job Name <3d>, User , Project , Command
> <#BSUB
> -n 128;#BSUB -W 6:00;#BSUB -J 3d;#BSUB -o %
> J.out;#BSUB -w
> 'ended(10417)';./cont>
> Tue Nov 28 21:35:48: Submitted from host , to Queue ,
> CWD <$
> HOME/scratch/3d_newgc>, Output File <%J.out>, 128
> Processo
> rs Requested, Dependency Condition ;
>
> RUNLIMIT
> ...

--
Ken Irving, , 907-474-6152
Water and Environmental Research Center
Institute of Northern Engineering
University of Alaska, Fairbanks

--

0

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

quick scripting question - finding occurrence in many lines

Having all that whitespace in the 'wrong' spot breaks the idea of
splitting words based on their being surrounded by whitespace. So get
rid of __all__ whitespace. Then use other logic find what you want.
E.g. if you want the 'word' following the 'word' processor, find the
first occurance of 'processor' in the string (which contains the whole
file), then look at each following character one at a time to see if it
meets the criteria for being in the next word. E.g. if the following
word must be a number and the word after that is not a number, take each
successive character until its not a number and there you have your
target word.

This would be easy in Python but since I don't do RE I couldn't begin to
solve it using anything else.

I still like the idea of fixing the source of these mangled files.

Doug.

--

Syndicate content