First any number of lines are skipped, until a line is encountered, where the fields (seperated by whitespaces) match the following:
( first is "MSF:", third is "Type:", OR second is "MSF:", fourth is "Type:", ) AND last is "..", last-2 is "Check:"Then the sequence length is read from field 2 or 3, depending on whether field 1 or 2 contains the "MSF:" string.
The number of lines are skipped until "//" is encountered in a seperate line.
Every time "first field" is "Name:" number of sequences are counted one up.
While input remains: Skip 1 line.
An optional number of lines starting with _atleast_ 10
spaces " " are skipped
The next lines (as many as there are registered sequences) have their sequences
concatenated to the result. First field is assumed to be a name, the rest is
assumed to be sequences.
(The sequences here are from SRP database)
srp9_ali-msf MSF: 76 Type: N June 01, 1776 12:00 Check: 0 --
Name: ZEA-MAY- Len: 76 Check: 18 Weight: 1-00
Name: CAE-ELE- Len: 76 Check: 3642 Weight: 1-00
Name: CAN-SPE- Len: 76 Check: 1656 Weight: 1-00
Name: MUS-MUS- Len: 76 Check: 2083 Weight: 1-00
Name: HOM-SAP- Len: 76 Check: 1657 Weight: 1-00
//
ZEA-MAY- MVYVDSWEEF VERSVQLFRG DPNATRYVMK YRHCEGKLVL KVTDDRECLK
CAE-ELE- MTYFTSWDEF AKAAERLHSA NPEKCRFVTK YNHTKGQLVL KLTDDVVCLQ
CAN-SPE- -AQYQTWEEF SRAAEKLYLA DPMKARVVLK YRHSDGSLCI KVTDDLVCLV
MUS-MUS- MPQFQTWEEF SRAAEKLYLA DPMKVRVVLK YRHVDGNLCI KVTDDLVCLV
HOM-SAP- -PQYQTWEEF SRAAEKLYLA DPMKARVVLK YRHSDGNLCV KVTDDLVCLV
ZEA-MAY- FKTDQAQDAK KMEKLNNIFF ALMTRG
CAE-ELE- YSTNQLQDVK KLEKLSSTLL RGIVTQ
CAN-SPE- YRTDQAQDVK KIEKFHSQLM RLMVAK
MUS-MUS- YRTDQAQDVK KIEKFHSQLM RLMVAK
HOM-SAP- YKTDQAQDVK KIEKFHSQLM RLMVAK