RosettaScript1

發布時間：2023-06-13 19:15:23

0 XML骨架<ROSETTASCRIPTS>n <SCOREFXNS>n </SCOREFXNS>n <RESIDUE_SELECTORS>n </RESIDUE_SELECTORS>n <TASKOPE

0 XML骨架

<ROSETTASCRIPTS>n <SCOREFXNS>n </SCOREFXNS>n <RESIDUE_SELECTORS>n </RESIDUE_SELECTORS>n <TASKOPERATIONS>n </TASKOPERATIONS>n <SIMPLE_METRICS>n </SIMPLE_METRICS>n <FILTERS>n </FILTERS>n <MOVERS>n </MOVERS>n <PROTOCOLS>n </PROTOCOLS>n <OUTPUT />n</ROSETTASCRIPTS>

方便的提示：要得到上面的空模板腳本，你可以運行rosetta_scripts應用程序，省略-parser:protocol標志。如果省略了這個標志（即沒有提供輸入腳本），那么應用程序會打印出模板腳本并退出。這在人們坐下來寫一個新的腳本時非常有用。

1 XML文件例子

下面的現代例子使用:in:file:native，對一個蛋白質的CDR loop 進行最小化，在最小化前后計算各種度量。這些指標將全部輸出到scorefile中，并給出前綴/后綴和每個指標的名稱。

Rosetta將執行PROTOCOLS中指定的操作順序。重要的一點是，SimpleMetrics和Filters從不改變結構的序列或構象。

這些移動器確實改變了pose，輸出文件將是依次應用協議部分的移動器的結果。輸出的標準分數將從任何做評分的協議中延續下來，除非指定了OUTPUT標簽，在這種情況下將使用SCOREFXNS塊中的相應評分函數。你可以在OUPUT標簽中使用 "commandline "這個名字作為評分函數。請注意，這意味著如果你的pose在協議期間沒有被打分，你的輸出中就不會有打分信息!

額外的xml腳本例子，包括對接、蛋白質界面設計和預包裝蛋白質復合物等的例子，可以在Rosetta/demos/public/rosetta_scripts/目錄中找到。

下面的命令行將運行上述協議，鑒于協議文件名是ala_scan.xml

Rosetta/main/source/bin/rosetta_scripts.linuxgccrelease -s <輸入PDB文件名> -use_input_sc -nstruct 20 -ex1 -ex2 -parser:protocol ala_scan.xml -parser:view

請注意，為了讓RosettaScripts真正考慮大多數命令行選項，你需要使用任務操作。

parser:view flag可用于使用extras=graphics開關編譯的rosetta可執行文件，方法如下（從Rosetta根目錄）：

scons mode=release -j3 bin extras=graphics

當用-parser:view運行時，會打開一個圖形查看器，顯示軌跡中的許多步驟。這對于確保取樣是按照預定的軌跡進行是非常有用的。

2 RosettaScript一般約定

2.1 General Comments

這個文件列出了RosettaScripts所認可的移動器、過濾器、它們的默認值、含義和用途。它是以xml格式寫成的，只要文件的擴展名是.xml，使用許多免費的查看器（如vi）就能突出顯示關鍵的xml符號。

每當顯示一個xml語句時，將使用以下慣例：

<...> 定義一個分支語句（一個有更多葉子的語句） <.../> 一個葉子語句。"" 定義期望從用戶那里得到的輸入，用& 定義期望的類型（字符串、浮點等）（）定義默認值，如果協議沒有提供，解析器將使用該值。

2.2 Specifying Residues

在Rosetta中使用了兩種殘基編號方法--"pose編號 "和 "pdb編號"。pose編號法給第一條鏈的第一個殘基分配一個1的值，然后從那里按順序編號，忽略新鏈的開始和缺失的殘基。Pdb編號使用輸入的pdb文件中存在的鏈/殘基/插入代碼的指定。一般來說，只要給一個帶有鏈的殘基標識符，它就會被PDB編號，而沒有鏈的就是pose編號。

例如，如果你有一個PDB文件，其中有兩條鏈，鏈A的殘基12-62，鏈B的殘基5-20和32-70，鏈A的PDB殘基12的pose編號是1，鏈A的PDB殘基62的pose編號是51。鏈B殘基5的pose編號為52，B鏈殘基32的pose編號為68。

在許多接受殘基標識符的RosettaScripts標簽中，有一個聯合選項，可以用pose編號或PDB編號來指定它，記作res_num/pdb_num之類。對于有這個選項的標簽，你可以指定res_num=或者pdb_num=，但不能同時指定。res_num選項需要一個pose編號的殘基指定，而pdb_num選項需要一個pdb編號指定，形式為 "42.A "或 "42A"，其中A指定鏈，42是pdb殘基編號。目前，不可能用pdb_num選項指定插入代碼。

在使用PDB編號與改變pose長度的協議時必須小心。殘基的插入會使與pose相關的PDB信息失效，導致pdb編號被解碼時出現錯誤。此外，一些RosettaScripts對象會根據輸入的結構編號將PDB編號轉換為pose編號，如果添加/刪除殘基，就會導致潛在的錯位。

2.3 獲取幫助

盡管本文檔旨在成為RosettaScripts的主要用戶手冊，但也有應用中的幫助。要獲得一個空的模板腳本，只需運行rosetta_scripts程序，不加任何輸入標志。比如說

> ./bin/rosetta_scripts.default.linuxgccrelease

這將產生以下輸出：

core.init: USEFUL TIP: Type -help to get the options for this Rosetta executable.napps.public.rosetta_scripts.rosetta_scripts: No XML file was specified with the "-parser:protocol <filename>" commandline option. In order for RosettaScripts to do something, it must be provided with a script.napps.public.rosetta_scripts.rosetta_scripts: The following is an empty (template) RosettaScripts XML file:nn<ROSETTASCRIPTS>nt<SCOREFXNS>nt</SCOREFXNS>nt<RESIDUE_SELECTORS>nt</RESIDUE_SELECTORS>nt<TASKOPERATIONS>nt</TASKOPERATIONS>nt<SIMPLE_METRICS>nt</SIMPLE_METRICS>nt<FILTERS>nt</FILTERS>nt<MOVERS>nt</MOVERS>nt<PROTOCOLS>nt</PROTOCOLS>nt<OUTPUT />n</ROSETTASCRIPTS>nnAt any point in a script, you can include text from another file using <xi:include href="filename.xml" />.napps.public.rosetta_scripts.rosetta_scripts: Variable substituion is possible from the commandline using the -"parser:script_vars varname=value" flag. Any string of the pattern "%%varname%%" will be replaced with "value" in the script.napps.public.rosetta_scripts.rosetta_scripts:napps.public.rosetta_scripts.rosetta_scripts: The rosetta_scripts application will now exit.

你也可以使用-parser:info <name1> <name2> <name3> ...flag 獲得任何mover、filter、task operation或residue selector 的語法幫助。例如，下面的命令行將提供關于MutateResidue mover和HbondsToAtom filter的信息：

./bin/rosetta_scripts.default.linuxgccrelease -info MutateResidue HbondsToAtom

輸出結果如下：

The rosetta_scripts application was used with the -parser:info flag.nWriting options for the indicated movers/filters/task operations/residue selectors:n--------------------------------------------------------------------------------nINFORMATION ABOUT MOVER "MutateResidue":nnDESCRIPTION:nnChange a single residue or a given subset of residues to a different type. For instance, mutate Arg31 to an Asp, or mutate all Prolines to AlaninennUSAGE:nn<MutateResidue target=(string) new_res=(string) mutate_self=(bool,"false") perserve_atom_coords=(bool,"false") update_polymer_bond_dependent=(bool) preserve_atom_coords=(bool) residue_selector=(string) name=(string)>n</MutateResidue>nnOPTIONS:nn"MutateResidue" tag:nnttarget (string): The location to mutate. This can be a PDB number (e.g. 31A), a Rosetta index (e.g. 177), or an index in a reference pose or snapshot stored at a point in a protocol before residue numbering changed in some way (e.g. refpose(snapshot1,23)). See the convention on residue indices in the RosettaScripts Conventions documentation for detailsnntnew_res (string): The name of the residue to introduce. This string should correspond to the ResidueType::name() function (eg ASP).nntmutate_self (bool,"false"): If true, will mutate the selected residue to itself, regardless of what new_res is set to (although new_res is still required). This is useful to "clean" residues when there are Rosetta residue incompatibilities (such as terminal residues) with movers and filters.nntperserve_atom_coords (bool,"false"): If true, then atoms in the new residue that have names matching atoms in the old residue will be placed at the coordinates of the atoms in the old residue, with other atoms rebuilt based on ideal coordinates. If false, then only the mainchain heavyatoms are placed based on the old atom's mainchain heavyatoms; the sidechain is built from ideal coordinates, and sidechain torsion values are then set to the sidechain torsion values from the old residue. False if unspecified.nntupdate_polymer_bond_dependent (bool): Update the coordinates of atoms that depend on polymer bondsnntpreserve_atom_coords (bool): Preserve atomic coords as much as possiblenntresidue_selector (string): name of a residue selector that specifies the subset to be mutatednntname (string): The name given to this instancenn--------------------------------------------------------------------------------nINFORMATION ABOUT FILTER "HbondsToAtom":nnDESCRIPTION:nnThis filter counts the number of residues that form sufficiently energetically favorable H-bonds to a selected atomnnUSAGE:nn<HbondsToAtom partners=(int) energy_cutoff=(real,"-0.5") bb_bb=(bool,"0") backbone=(bool,"0") sidechain=(bool,"1") pdb_num=(refpose_enabled_residue_number) atomname=(string) res_num=(int) name=(string) confidence=(real,"1.0")>n</HbondsToAtom>nnOPTIONS:nn"HbondsToAtom" tag:nntpartners (int): H-bonding partner expectation, below which counts as failurenntenergy_cutoff (real,"-0.5"): Energy below which a H-bond countsnntbb_bb (bool,"0"): Count backbone-backbone H-bondsnntbackbone (bool,"0"): Count backbone H-bondsnntsidechain (bool,"1"): Count sidechain H-bondsnntpdb_num (refpose_enabled_residue_number): Particular residue of interestnntatomname (string): Atom name to which to examine H-bondsnntres_num (int): Residue number in Rosetta numbering (sequentially with the first residue in the pose being 1nntname (string): The name given to this instancenntconfidence (real,"1.0"): Probability that the pose will be filtered out if it does not pass this Filternn--------------------------------------------------------------------------------nnThe rosetta_scripts application will now exit.

3 XML協議文件中可用的選項

3.1 變量替換

偶爾，我們也希望用稍微不同的參數來運行一系列不同的運行。我們可以使用腳本變量來完成這項工作，而不是創建許多略有不同的XML文件。

如果在命令行中設置了-parser:script_vars選項，那么每次在XML文件中遇到"%variable_name%%"這樣的字符串時，就會用命令行中的相應值來替換。

例如，XML中的一行字，如

<AlaScan name="scan" partner1="1" partner2="1" scorefxn="interface" interface_distance_cutoff="%%cutoff%%" repeats="%%repeat%%"/>

可以轉化為:

<AlaScan name="scan" partner1="1" partner2="1" scorefxn="interface" interface_distance_cutoff="10.0" repeats="5"/>

用命令行選項:

-parser:script_vars repeat=5 cutoff=10.0

例如，在不同的運行中，這些值可以隨意改變：

-parser:script_vars repeat=5 cutoff=15.0n-parser:script_vars repeat=2 cutoff=10.0n-parser:script_vars repeat=1 cutoff=9.0

"%%var%%"字符串的多個實例都將被替換，在任何subroutine子程序的XML文件中也是如此。請注意，雖然目前script_vars是作為純粹的宏文本替換來實現的，但這在未來可能會發生變化，除了替換標簽值之外，任何使用都可能無法進行。特別是，任何使用腳本變量來改變XML文件本身的解析結構的做法都是明確的*不支持*的，你甚至考慮這樣做都是一種狡猾的想法。

3.2 包含XML文件

把常用的XML腳本放在自己的文件中，并指導腳本從預先存在的文件中加載一些XML代碼，這樣用戶就不需要手動復制和粘貼XML代碼了。XML xi:include命令可用于這一目的，用 "href=filename "來指定要包括的文件。

<xi:include href="(&filename_string)" />

xi:include塊被天真地替換為用 "href=filename "指定的文件的內容。下面是一個使用xi:include的例子，我們假設用戶經常使用AlaScan和Ddg過濾器，并希望把它們的設置放在一個單獨的文件中，以便在他/她寫一個新的RosettaScripts XML文件時可以包括在內：

file1.xml:

<ROSETTASCRIPTS>n <SCOREFXNS>n <ScoreFunction name="interface" weights="interface"/>n </SCOREFXNS>n <FILTERS>n <xi:include href="file2.xml"/>n <Sasa name="sasa" confidence="0"/>n </FILTERS>n <MOVERS>n <Docking name="dock" fullatom="1" local_refine="1" score_high="soft_rep"/>n </MOVERS>n <PROTOCOLS>n <Add mover_name="dock" filter_name="scan"/>n <Add filter_name="ddg"/>n <Add filter_name="sasa"/>n </PROTOCOLS>n <OUTPUT scorefxn="interface"/>n</ROSETTASCRIPTS>

file2.xml:

<AlaScan name="scan" partner1="1" partner2="1" scorefxn="interface" interface_distance_cutoff="10.0" repeats="5"/>n <Ddg name="ddg" confidence="0"/>

注意，文件的包含是遞歸發生的，所以包含的文件可以包含其他文件。循環依賴（例如，file1.xml包括file2.xml包括file3.xml包括file1.xml）是被禁止的，并將導致一個錯誤。然而，同一個文件的多重包含是允許的（盡管這很少是可取的）。以這種方式包含的文件數量是有限制的。遞歸限制是8，這個值可以通過使用-parser:inclusion_recursion_limit命令行選項來改變。在某些情況下，你可能希望阻止遞歸搜索（例如，如果被包含的文件非常大），可以在包含標簽中使用一個可選的參數 "prevent_recursion "來實現這一目標，如下所示：

<xi:include href="(&filename_string)" prevent_recursion="True"/>

變量替換發生在文件包含之后，這意味著%%variable%%語句可以出現在包含的文件中；但是，這也意味著xi:includeblock不能包含%%variable%%語句。