This is the pipeline process for 454 sequencing. I am not a biotech guy but after spending time, giving effort and getting help from my collague, I came up with a way to do that. The tools I used are NCBI SRA Toolkit and 454 sequencing tool for multiplexing.

process

  • Downloaded SRA files from …

  • Converted these SRA files into SFF format using sff-dump tool

    	$ sff-dump -A xxxx.sra
    
  • Rebuilt the scores of converted sff dataset with sfffile tool
    	$ sfffile -o xxxxxxn.sfff xxxxx.sff
    
  • Split the file according to MID groupname
    	$ sfffile -s GSMIDs/RLMIDs xxxxx.sff
    
  • Calculated the total MID matches for each group
    • Extract sequence:
		 $ sffinfo -s 454Readsxx.sff > MIDx.fasta
		
*  Count total sequence no:  		
		 $ egrep -e '^>'  MIDx.fasta | wc -l
		
  • Combine sff files into one main file
	$ sfffile -o combined.sff xxx1.sff xxx2.sff xxx3.sfff ...
	

Other useful Commands

  • Get the quality scores from the sff file:
	$ sffinfo -q 454Readsxx.sff > MIDx.qual
	
  • Retrieve the flow intensities:
	$ sffinfo -f 454Readsxx.sff > MIDx.flow
	
  • View file:
	$ more/less MIDx.fasta