Help

This is the Help

FAQ

Account FAQs

I dont have an account. How do I get one?

Go to the login page and click on 'Sign up'. If you need further help, please contact our support.

I forgot my password. What should I do?

Go to the login page and click on 'Forgot your password?'. If you need further help, please contact our support.

I did not receive confirmation instruction? What should I do?

Go to the login page and click on 'Didn't receive confirmation instructions?'. If you need further help, please contact our support.

Publication FAQs

I cant find an ECF publication, which should be imported into this database!

If the publication is listed in PubMed, receive it's PMID (e.g. for this publication the PubMed ID is 31398314, which can be found on the bottom of this entry.). Then go to the publications page. In the buttom right, there is a field to suggest a publication to be imported into this database. See the publications section for further help. If the publication is not listed in PubMed, please contact our support.

Search

The search is accessible at the home page.

With the search, it is possible to search for the most important ECF related entries. With the basic search (selecting 'all categories') you can search for ECF proteins(link), ECF groups, Publications, Authors, Taxa.

Selecting a sub-category, it may be possible to have an advanced search (e.g. looking for rpoE could generate a too big list of rpoE (and rpoE-like) proteins.) Here, you can filter for specific Taxa:

At the moment, the advanced search is under construction. If you miss a filter step, do not hesitate to contact our support. This would help us to improve the search for your needs.

Analyzing data

With this feature you can upload and analyze up to 100 protein sequences.

You can set a name for the analysis to find it later if you want. You can upload the sequences by file upload or by pasting the amino-acid sequences in the text field. After uploading, you get a table with the results.

After finishing the analysis, the page reloads itself. Here, we check, if the sequences contain a sigma2, sigma3, or sigma4 domain, respectively. If a sequence contains both sigma2 and sigma4 domain, we try to figure out, to which ECF group they might belong. As we check for groups and subgroups independently, the subgroup might not belong to the predicted ECF group. In addition to this, we provide a statistic, how likely this protein belongs to the group/subgroup.

By clicking on the name of the protein, you can see its protein sequences, where sigma2 (red) and sigma4 domains are located, and to which groups it might also belong to.

The analyzed proteins are deleted after 2 weeks or if you delete your session. As a logged in user you are not limited to 100 protein sequences and your analyses are stored permanently.

Difference for users w/ and w/o accounts

Feature Guest user Registered user
#uploaded sequences 100 unlimited
private Area no yes
permantly stored data no yes

Taxa

ECF Taxonomy

At the ECFexpressDatabase, the taxonomy conforms with the NCBI taxonomy. The Taxon-IDs are the same as in the NCBI Taxonomy, as well as the ancestry of each taxon. We included the following taxa ranks:

  • superkingdom
  • phylum
  • class
  • order
  • family
  • genus
  • species
  • subspecies
  • no_rank (which is often a strain's rank)

In the original NCBI taxonomy, there are also ranks with the prefix "sub-" e.g., subphylum, subclass, etc. The ECFexpressDatabase excludes taxa with one of these ranks or with the rank no_rank if there exists no assembly directly assigned to them.

Genome assemblies

Our data consists of an ECF scan, which was done in February 2017. We included all predicted protein sequences from RefSeq and GenBank and scanned for ECFs based on their primary structure. For the re-classification, non-redundant sequences were considered based on sequence similarity. Each non-redundant sequence, which belongs to the re-classification, are included in this database with an auto-incremented ID. The naming convention is ECFp<auto-incremented ID>. Therefore, an ECF protein can come from multiple sources.

Representative genome assemblies

Each genome assembly belongs to exactly one taxon. Therefore, in the hiarchically tree, a taxon can have multiple genome assemblies assigned to it, e.g. for Bacillus subtilis, there are currently 227 genome assembly assigned to it or to it's descandents. Because counting ECF proteins for all of these 227 genome assemblies would lead to an useless number, we define one genome to be representative for one species and its descandents.

For this purpose, we used the RefSeq assembly summaries for GenBank assemblies and RefSeq assemblies to sort all genome assemblies for a species and took the best.

Because of their automated pipeline, we say that a genome assembly from RefSeq is more reliable than from GenBank. Genome assemblies, defined as reference genomes, are better than genome assemblies with the RefSeq category representative genome, which are better then genome assemblies with no (na) RefSeq category. After this check, we order the assemblies according to their genome representation, where Full is better than Partial. We further sort the genome assemblies based on their assembly level, where the order is as follows:

  1. Complete genome
  2. Chromosome
  3. Scaffold
  4. Contig

If there are multiple genome assemblies with the same criteria, we order them according to their release data, where newer ones are better than older ones.

TaXplorer

The TaXplorer is for searching and filtering for taxa of interest and receiving a first insight into taxa and their subtaxa. E.g. you can search for 'Bacillus subtilis 168'. After this, you can click on Show Tree in the table to get the taxonomic tree of the reference genome of Bacillus subtilis. As this taxon has no taxa below, you only see one entry on the right with the number of representative ECFs. By clicking on the up-arrow, you reset the tree to its parent - thus you can explore the target taxon and its siblings. By clicking on the plus, you can expand the tree like a file browser.

Filtering for ECF groups gives you the possibility to see how many ECFs per group are available for the selected taxa. For this, check each ECF group, which should be displayed. After clicking on Show Tree, you can see how many ECFs are available for each group of each taxa in the tree by clicking on the blue plus.

By clicking on Info or on the icon, you get to the Taxon's page.

Taxon page

Lineage

On the taxon's page, you can find the lineage of the taxon.

Each ancestor can be clicked to get to another taxon. By clicking on children, you can get a level down.

ECF distributions

Here, you can find the ECF distribution of that taxon. On the left, there are the ECFs of the representative genome assembly (or assemblies in case of taxa with rank genus or higher.)

On the right, there are all non-redundant ECF protein sequences counted, which are assigned to the current taxon and all its descendants.

Genome Assembly inspection

Additionally, you can select a genome assembly, which is directly assigned to this taxon or one of its descendants. Per default, the best assembly (see the help section for the best genome assembly for further information)

Selecting one genome assembly gives you the opportunity to see all it's ECFs with basic meta information

and its genomic neighborhood.

ECF Groups

Overview page

The ECF groups overview page can be accessed at this page. Here, you can find the complete list of all ECF groups of the current classifications. Per default, old groups, which are not part of the reclassification anymore, are excluded from the table.

By unchecking the checkbox "Exclude deprecated groups?", you have access to groups, which disappeared during the re-classification. Most of the ECF group features are not available for old groups.

By clicking on the blue links, you get to the details pages of the ECF groups.

On the bottom, there is the groups' phylogenetic tree. Here, you can see which groups are related to each other. In the search field of the phylogenetic tree, you have to match the name of the group completely. Only non-deprecated groups are available in this tree.

By selecting one group, you get a small overview of this group and its subgroups.

Details page

Description

Each group has a main page for their description (based on Casas-Pastor et al.). On the gray button below the name, you can select for its subgroups. If you are logged in, you can favorize a group to get easy access to it.

Additionally, computed properties are displayed:

Download section

Sample Neighborhood

Promoter Motif

MSA Viewer

Figures

  • Distribution of representative ECFs in subgroups
  • Distribution of non-redundant ECFs in subgroups
  • Distribution of representative ECF in the taxonomic tree
  • Overrepresented Pfam domain patterns
  • Pfam domains in the ECFs' neighborhoods

Publications

We imported ECF related publications into our database with their abstract and additional meta-information. We manually curated most of them and connected them to ECF proteins by tags. Each tag, which are connected to a protein, can also be seen in the groups' description pages. (not done yet)

If you can not find a publication, which described ECFs, do not hesitate to suggest one. On the main page of the publication, you can find the suggest fields in the bottom right of that page. To suggest a publication, you have to be logged in and you have to provide the publication's PubMed ID.

After this, we try to receive the basic information of the publication. You get to a form, where you can double-check the received information and correct them, if needed. Please let a comment, why this publication is related to ECFs.

If you suggest a publication, an expert will have a look at it and connects the publication to the related group/subgroup/protein. If the publication is under review, you see its status on its page.

Download

Here, you can bulk-download the most important ECFDB entities:

If you wish to bulk-download additional files, please contact our support to provide them for you.