September 20, 1999

Page Decomposition and Signature Finding via Shape Classification and Geometric Layout

  • Hobby J.

The decomposition of a page image into text and various types of non-text elements is a challenging problem important to document analysis problems such as OCR, storage and retrieval, and identifying the sender and recipient of a FAX. A fast classifier based on a skeletonization of the image attempts to classify groups of related line segments as text, ruling lines, signatures, other line art, or miscellaneous. Then everything classified as text is processed by Baird's language-free layout analysis system so that a post-processor can use the geometric layout to refine the decisions about what is text and what is non-text. This could then be further processed to identify complex objects such as tables, signature blocks and line drawings. In order to recognize signature and separate them from ruling lines and components of line drawings, line segments from skeletonization need to be strung together by a curve-fitting process. After finding long, fairly-straight lines and setting them aside, a more lenient criterion is used for stringing together pairs of segments to form the groups on which to run the fast classifier.

View Original Article

Recent Publications

January 01, 2019

Friendly, appealing or both? Characterising user experience in sponsored search landing pages

  • Bron M.
  • Chute M.
  • Evans H.
  • Lalmas M.
  • Redi M.
  • Silvestri F.

© 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. Many of today's websites have recognised the importance of mobile friendly pages to keep users engaged and to provide a satisfying user experience. However, next to the experience provided by the sites themselves, ...

January 01, 2019

Analyzing uber's ride-sharing economy

  • Aiello L.
  • Djuric N.
  • Grbovic M.
  • Kooti F.
  • Lerman K.
  • Radosavljevic V.

© 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. Uber is a popular ride-sharing application that matches people who need a ride (or riders) with drivers who are willing to provide it using their personal vehicles. Despite its growing popularity, there exist ...

January 01, 2019

The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race

  • Cresci S.
  • Petrocchi M.
  • Pietro R.
  • Spognardi A.
  • Tesconi M.

© 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. Recent studies in social media spam and automation provide anecdotal argumentation of the rise of a new generation of spambots, so-called social spambots. Here, for the first time, we extensively study this novel ...