Metadata

Here we explain all metadata that may be associated to blobs, and their meaning.

As explained in the previous section, the metadata is associated to the UUID (and not to the specific blob, as specified by language and file type)

Environment

Before proceeding, though, we clarify what we mean by environment.

LaTeX uses environments to delimit text elements, as in this example

\begin{Theorem}
  The hypothesis implies the thesis.
\end{Theorem}

Internally ColDoc identifies such environment as E_Theorem . The prefix E_ helps identifying environments, and avoiding name collisions.

By passing the option --split-environment environment to blob_inator, you may specify which environments to split.

For example, E_document is the part between \begin{document} and \end{document}; note that this blob is always splitted since the option --split-environment document is already present as default into blob_inator.

ColDoc uses other environments :

  • main_file is the main blob, the root of the tree

  • preamble is the preamble, that is the part between \documentclass and \begin{document} ; this blob is always splitted, unless he argument --dont-split-preamble is passed to blob_inator (but this may break some parts of the portal).

  • input or include are used for blobs that contain text from a LaTeX file that was include using \input or \include

  • input_preamble is used for blobs that contain code from a LaTeX file that was include using \input while inside the preamble

  • usepackage is used for blobs that contain packages; these are copied if found in the same directory of the main file

  • bibliography is used for blobs that contain bibliography, as specified by the \bibliography command

  • section is used for sections

  • paragraph is used for long paragraphs of text (as specified by the --split-paragraph option)

  • graphic_file is used for blobs containing images (usually inserted using \includegraphics

    or other commands specified with the option --split-graphic of blob_inator)

Metadata key list

This is the list of all keys in the metadata storage, and the meaning of their values. Note that a key may be repeated multiple times.

These keys are static : they are instantiated when the blob is first added to the tree (e.g. by using blob_inator), but are not changed when the blob content is subsequently edited.

  • coldoc , the nickname of the ColDoc that this blob is part of

  • environ , the value is the environ that contained this blob . See the previous section for details.

  • optarg , the optional argument of the environment, as in this example.

    \begin{Theorem}[Foobar's theorem]
      The hypothesis implies the thesis.
    \end{Theorem}
    

    where the optarg would be equal to Foobar's theorem.

  • lang , the languages available for this blob; more than one language may be available.

  • extension , the extentions available for this blob; more than one extension may be available, for example a graphical file may be available a .jpeg and .svg. For blobs containined LaTeX, only .tex is allowed.

  • author the list of people that contributed to this blob (this does not distinguish if somebody contributed only to a certain language version).

  • original_filename , the filename whose content was copied in this

    blob (and children of this blob) by blob_inator; the extension of the filename (if any) is stripped; the path is not absolute, but is relative to the directory where the main LaTeX file was located.

    An exception of the above are pseudo-filenames starting starting with '/' (currently either '/preamble.tex' or '/document.tex' or '/main.tex') that indicate the original preamble and document part of the input; the code will also create language symlinks for them.

  • uuid , the UUID of this blob

  • parent_uuid , the UUID of the parent of this blob; all blob have one, but for the blob with environ=main_file

  • child_uuid , the UUID of the children of this blob; there may be none, one, or more than one

  • access can be open , public or private . See the section on permissions.

  • creation_date

  • modification_date ; this is updated when the blob content is edited (this does not distinguish which language version was edited).

  • latex_date ; this is updated when the view (html and pdf) of this blob was last compiled

    (this does not distinguish which language version was edited - the system automatically recompiles the language last edited).

  • replaces ; the list of UUID (comma separated) that this blob replaces; to be used to mark

    duplicate material.

These keys are derived from the content of the blob. Any direct change to this database would be lost as soon as the blob is changed. (In Django, they are stored in a SQL database for convenience; this database is called ExtraMetadata.)

  • M_ followed by a name that was provided as --metadata-command name . E.g. if blob_inator was invoked with the command

    blob_inator --metadata-command label --split-environment Theorem
    

    to parse this input

    \begin{Theorem}\label{tautol}
      The hypothesis implies the thesis.
    \end{Theorem}
    

    then the metadata for that blob would contain environ=E_Theorem and M_label={tautol}

  • S_ followed by an environment and then followed by _M_name ; this is used by metadata extracted from environments that are deeper in the tree than the current blob, but that are not splitted in a child blob. As in this example:

    blob_inator --metadata-command label --split-environment Theorem
    

    to parse this input

    \begin{Theorem}\label{tautol}
      The hypothesis implies the thesis.
      \begin{equation}\label{eq:forall}
        \forall x
      \end{equation}
    \end{Theorem}
    

    then a blob will contain this Theorem, and its metadata would contain M_label={tautol} and S_E_equation_M_label={eq:forall}

Metadata in source code

Metadata is represented and operated on by a Python Class.

The class interface is described as the base class MetadataBase in ColDoc.classes

This interface is implemented in the FMetadata class, that stores metadata in a file (this is independent of Django); and DMetadata, that stores metadata in the Django databases.

To write code that works with both implementations, it is important to use the get method, that always returns a list of values (even for properties that are known to be single valued).

The keys coldoc, uuid, environ are known to be single valued, and for convenience there is a Python property that returns the single value (or None).

Note that in DMetadata some objects are not strings:

  • author is a models.ManyToManyField on the internal User class

  • coldoc is a models.ForeignKey on the DColDoc model.