MSF format for proteins

The msf format is used as output from a number of multi sequence alignment programs. However, as 'msf' formats can vary for different programs, we feel that it is necessary to state what this program expects from the format.

First any number of lines are skipped, until a line is encountered, where the fields (seperated by whitespaces) match the following:

(
   first is "MSF:",
   third is "Type:",
OR
   second is "MSF:",
   fourth is "Type:",
) AND 
   last  is "..",
   last-2 is "Check:"
Then the sequence length is read from field 2 or 3, depending on whether field 1 or 2 contains the "MSF:" string.

The number of lines are skipped until "//" is encountered in a seperate line. Every time "first field" is "Name:" number of sequences are counted one up. While input remains: Skip 1 line.

An optional number of lines starting with _atleast_ 10
spaces "          " are skipped
The next lines (as many as there are registered sequences) have their sequences concatenated to the result. First field is assumed to be a name, the rest is assumed to be sequences.

(The sequences here are from SRP database)



 srp9_ali-msf  MSF: 76  Type: N  June 01, 1776 12:00  Check: 0 --

 Name: ZEA-MAY-         Len:    76  Check:   18  Weight:  1-00
 Name: CAE-ELE-         Len:    76  Check: 3642  Weight:  1-00
 Name: CAN-SPE-         Len:    76  Check: 1656  Weight:  1-00
 Name: MUS-MUS-         Len:    76  Check: 2083  Weight:  1-00
 Name: HOM-SAP-         Len:    76  Check: 1657  Weight:  1-00

//

       ZEA-MAY-  MVYVDSWEEF VERSVQLFRG DPNATRYVMK YRHCEGKLVL KVTDDRECLK
       CAE-ELE-  MTYFTSWDEF AKAAERLHSA NPEKCRFVTK YNHTKGQLVL KLTDDVVCLQ
       CAN-SPE-  -AQYQTWEEF SRAAEKLYLA DPMKARVVLK YRHSDGSLCI KVTDDLVCLV
       MUS-MUS-  MPQFQTWEEF SRAAEKLYLA DPMKVRVVLK YRHVDGNLCI KVTDDLVCLV
       HOM-SAP-  -PQYQTWEEF SRAAEKLYLA DPMKARVVLK YRHSDGNLCV KVTDDLVCLV

       ZEA-MAY-  FKTDQAQDAK KMEKLNNIFF ALMTRG
       CAE-ELE-  YSTNQLQDVK KLEKLSSTLL RGIVTQ
       CAN-SPE-  YRTDQAQDVK KIEKFHSQLM RLMVAK
       MUS-MUS-  YRTDQAQDVK KIEKFHSQLM RLMVAK
       HOM-SAP-  YKTDQAQDVK KIEKFHSQLM RLMVAK


Go back.