Mathlingua
Mathlingua is a language specially designed to create encyclopedias of mathematical knowledge with the goal to make it easier to learn and understand mathematics, in particular to remove ambiguity, illustrate the relationship between mathematical concepts, and highlight aspects of definitions and theorems that are easy to overlook.
It is created to address the unique needs for recording mathematical knowledge, that include but are not limited to:
- that a Hilbert space extends the concept of a vector space with additional properties, and, although it is not directly a Banach space, it can be viewed as one
- that an integer is, strictly speaking, not a real number, but it can be viewed as one
- that in the expression \(x + n\) where \(x\) is a real number an \(n\) is an integer, \(+\) is interpreted as real addition because the integer \(n\) is viewed as a real number
- that a function can be though of abstractly as a standalone concept that is a mapping from an input to a unique output, but can also be encoded as a set of pairs, and thus thought of as a set if needed
- that a continuous function is an abstract concept while the real-valued function \(f(x) := \sin(x)\) is a concrete example of a continuous function on \(\mathbb{R}\)
- that a group can be thought of as a tuple \(G := (X, *, e)\), where a tuple-like object is different from a function like object, is different from a set-like object
- that when one says \(x \in G\) where \(G := (X, *, e)\) with \(X\) a set that one means \(x \in X\)
- that \(\int f\) can potentially represent the Riemann integral, Lebesgue integral, Daniel integral, etc. of a function \(f\) and thus, math symbols can be ambiguous. Mathlingua addresses this by separating the name defining a concept and how it is rendered.
- that in the expression \[ \int_a^b f(x) \: dx \] the exact symbols \(a\), \(b\), and \(f\) are important, but the symbol \(x\) can be replaced with any symbol (other than \(a\), \(b\), or \(f\)) without changing the meaning of the expression.
- that definitions and theorems expect certain types of mathematical objects as inputs. That is, if a theorem says given a monoid \(M\), one cannot use a continuous function \(f\) for \(M\) because a continuous function is not a monoid. However, one could use a group \(G\) because a group is a monoid.
- that when one says \(A \subset B\) for some sets \(A\) and \(B\) that it is implied that also \(B \supset A\).
- that in the expression \(x + y * z\), where \(x\), \(y\), and \(z\) are real numbers, the expression should be evaluated as \(x + (y * z)\).
- that to describe \(f\) verbally one could say that \(f\) is a function from \(A\) to \(B\), but in writing would write \(f \: : \: A \rightarrow B\).
- that if \(G := (X, *, e)\) is a group and \(x,y\) are elements of \(G\) then the \(*\) in \(x * y\) refers to the \(*\) in the tuple \((X, *, e)\).
- that when specifying a definition, theorem, axiom, conjecture, or proof, to remove ambiguity, it is essential to allow the type of each symbol used be determined. That is, do not just write \(x + n\) but specify that \(x\) is real and \(n\) is an integer.
- to fully record mathematical knowledge, it is important to not just record statements of definitions, theorems, etc. but also describe their history, importance, use, and how how they related to other math concepts
- in addition to definitions, theorems, etc. it is important to record information about math resources such as books, articles, etc. as well as people that discovered mathematical results
The Need for a Recording Mathematics
There are many examples in mathematics of concepts being rediscovered. This could be from the fact that it is often times difficult to find what has already been discovered. This is compounded by the observation that different authors can use very different nomenclature and notation for the same concepts.
Further, as more and more mathematics is discovered, it is becoming harder and harder for any individual to follow everything new. This is especially difficult since many discoveries may be in journals that are hard to search and require subscriptions, and thus are not accessible to most.
As a result, a standard repository of all mathematics in a clear, consistent format is needed to help this and future generations discover and record mathematics.
Existing Math Collections
Existing collections of mathematical knowledge typically fall into two categories. First are books, articles, journals, encyclopedias, etc. while the second are collections written in theorem proving languages such as Lean, Coq, Mizar, etc.
For the first category, different math concepts can be spread across different books or other resources. Even for encyclopedias, nomenclature and notation can vary across the encyclopedia.
Next, these resources are written in natural language, and are thus difficult for computers to search and understand. That is, it is not easy to make computer search capabilities to search math books, journals, and encyclopedias and find, for example, all theorems that describe when a function is measurable.
Further, mathematics written in natural language can be imprecise and difficult to read because statements can rely on context or conventions for information. For example, a theorem might say for all \(x\) and \(y\) there exists an \(n\) such that \(x*n > y\), where it is up to the reader to infer, based on convention and the fact that the theorem is in a real analysis book, that \(x\) and \(y\) refer to real numbers and \(n\) an integer.
Last, mathematics written in natural language do not have any automated checking of any aspects of the statements. In particular, a statement could have a typo in a symbol name, thus using a name that is not defined anywhere or refer to a definition that doesn’t exist, and it is up to manual review to catch those mistakes.
Getting Started
Mathlore is the site that hosts the community grown collection of mathematical knowledge encoded in the Mathlingua language. You can access it at mathlore.org. Contributions to Mathlore are welcome from everyone.
To get started first fork
the repository.
Next, download the latest release of the Mathlingua command line
tool from the releases
page, rename it to mlg
, and ensure it is in
your PATH
.
Last, you can use whatever editor you like to edit Mathlingua
code. However, if you use Visual Studio Code you
can install
the Mathlingua
Language Support extension that provides syntax
highlighting, autocompletion, and error diagnostics provided by
mlg
.
Mathlingua files have the
.math
extension and themathlingua-language-support
extension is designed to analyze any files with the.math
extension.
The Mathlingua Tooling
mlg
is the Mathlingua command line tools used to
interact with collections of Mathlingua documents. It uses the
format mlg <command>
where
<command>
is an action to perform:
mlg check
: Analyzes all.math
files in the current working directory recursively for any errors and prints them to the screen. Optionally, if directory or.math
file name(s) are specified, only errors for those files are printed to the screen.mlg view
: Serves the current directory’s Mathlingua files athttp://localhost:8080
so that they can be viewed.mlg version
: Prints the Mathlingua version.mlg help
: Prints information about how to usemlg
.
About the Author
Mathlingua is designed and implemented by Dominic Kramer.
Dominic has a PhD in mathematics from Iowa State University, has over ten years of extensive software design and engineering experience in top companies such as Google, Databricks, and Boeing, and has spent much time, both professionally and personally, working in the area of computer and natural language design and implementation.
Dominic’s interests focus on the intersection of mathematics and languages and includes all aspects of language design and usage ranging from traditional parser design and lambda calculus (ranging from simple to dependent type theory), all of the way to deep learning and large language models.
Dominic is particularly interested in how the design of a language’s syntax and semantics shapes how concepts or applications can or cannot be easily expressed or implemented in that language.
Any questions, comments, or feedback is greatly appreciated, and you can post feedback as an issue in the Mathlingua GitHub repo or via email at DominicKramer@gmail.com.
The Language
Formulation Forms
Names
x
x?
"a b"
Functions
f(x, y)
Infix Operators
x * y
Prefix Operator
+x
Suffix Operator
x!
Tuples
(x, y)
Conditional Sets
{x | ...}
{f(x, y) | ...}
Functional Literal
x |-> f(x)
(x, y) |-> f(x, y)
Conditional Set Id Form
[x]{x | f(x)...}
Colon Equals Form
X := (a, b)
f(x) := y
f(x) := (a, b)
Formulation Expressions
Function Expression
f(x + y, z)
(f + g)(x)
Function Literals
x |-> x + 1
(x, y) |-> x + y
Tuples
(x + y, z)
Groupings
(x + y)
Invisible Groupings
(:x + y:)
Conditional Set Expression
TODO: update this to use the second |
[x]{(x, x+1) | x is \real ; x > 0}
[x]{(x, x+1) | x is \real}
[x, y]{x + y | x, y is \real ; x > 0 ; y > 0}
Command Expressions
\function:on{A}:to{B}
\a.b.c{x, y}
\a.b.c(x, y)
\a.b.c[x, y]{x + y}
\a.b.c[x, y]{x + y | x, y is \real ; x > 0 ; y < 0}
Prefix Operator
-x
Postfix Operator
x!
Infix Operator
x + y
Is Expression
x is \y
x is \y & \z
Equality
x = y
x != y
Extends Expression
x extends \y
As Expression
x as \:y
x as (a, b, c)
The as
operator is used to cast a symbol to
another type or another form. The x as (a, b, c)
form is used to cast to another form.
x as <form>
the type of
x
is used to determine if it has a view as the
specified form. If so, that description is used to convert to
the form. Otherwise, if x is a tuple and