Wednesday, August 10, 2016

TexLexAn Analyze, Classify and Summarize any text.

Kind of old post, this was written in 2010

TexLexAn is the project of an automatic text analyzer, classifier and summarizer. 

It can be used to:
  • Estimate the reading time and the reading difficulty.
  • Categorize a text (automatic classifier).
  • List keywords.
  • Summarize by extraction.
  • Count repetition and estimate the ratio of basic words
  • Look for an eventual plagiarism.
  • Evaluate sentiments.
  • Archive & retrieve documents.
  • Knowledge base.
It works with:
  • url links (use wget ).
  • text and html files.
  • pdf, odt, ppt, doc files but require pdftotext, odt2txt, ppthtml and antiword to be installed).
 The summarizer extracts the most relevant sentences in the text. It simplifies them a little bit by removing the sentences between the brackets and the repetition of same sentences, and replaces the deadwood expressions with their shortest forms (mode VIII). 

-TexLexAn programs are tested on Ubuntu 8.04//9.04/10.04 and FreeBSD 6.2. The binaries included in the package work on Ubuntu 10.04

Auteur: Jean-Pierre Redonnet (last update: 2010/08/13) 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.