[Bioperl-l] new-person question

Clay Shirky clay@shirky.com
Sun, 24 Sep 2000 20:41:43 -0400 (EDT)


> I have a loop that takes each line, and if the line starts with '>',
> should store that line in an array that will contain only sequence names.
> However, the '>' is causing problems. The coding is:
> 
> while (<>)
> {
>         $templine = $_;
>         if ($templine =~ /\b>/)
>         {
>                 other stuff;
>         }
> }
> 
> and it works for perfectly if I use a character, such as '>', but not with
> '>'. 

I'm not sure I understand this last sentence, since you reference >
twice, but > is not a special character in perl character classes.

It _is_ a special character, "STDOUT redirect with file overwrite", in
Unix, however, so if you are testing the program with input on the
command line, you may get problems there.

Also, a better way to specify FASTA title lines is /^>/, which is to
say "Lines where > is the first character."

If you had a script like 

while (<>) {

    if (/^>/) { # no need to assign temp variable

        print;
    }
}

it should print the title line from a FASTA format file and no others.

bioperl obviously provides many more ways to deal with FASTA data, but
I hope this answers your perl question.

-clay