Theme

Available Languages For This Post:

Converting arXiv into XHTML+MathML - access to scientific papers

This is the presentation that Michael Kohlhase gave at the @Science conference "Making Science Accessible".
He explains what their work is about, namely translating the collection of scientific publications of the Cornell e-Print Archive (arXiv) using the LATEXtoXML converter, which is currently under development.
The main technical task of the arXMLiv project is to supply LaTeXML bindings for the (thousands of) LATEX classes and packages used in the arXiv collection. To this aim, they developed a distributed build system that reiteratively runs LaTeXML over the arXiv collection and collects statistics about, e.g., the most sorely missing LaTeXML bindings and clusters common error events. This creates valuable feedback to both the developers of the LaTeXML package and to binding implementers.

The results of the conversion are impressive: the complete arXiv collection of more than 400,000 documents has been processed from 1993 until 2006 (one run is a processor-year-size undertaking) and the success rate is more than 56% (i.e., over 56% of the documents that are LATEX have been converted by LaTeXML without noticing an error and are available as XHTML+MathML documents).

These documents are directly accessible by blind and partially sighted users, because of the availability of readers.



Authors: Michael Kohlhase (Jacobs University, Bremen, Germany)
Resource URL: Link to Resource:

n/a

Taxonomy: 
Languages:
Keywords:
File Attachments: Converting arXiv into XHTML+MathML - access to science.mp3 (5,76 Mb)
Converting arXiv into XHTML+MathML - access to scientific papers.pdf (1,04 Mb)

Show Full Record



Export to LOM
Exported times