Ritesh Sarkhel

PhD Student

About Me

Hello! I am a PhD student in the Department of Computer Science and Engineering at The Ohio State University. I am grateful to be advised by Prof. Arnab Nandi. I develop tools and algorithms for information extraction from, and controllable enrichment of multimodal data. An exhaustive list of my publications can be found here.

One of my current research foci is extracting structured information from Visually Rich Documents (VRD). Here is an annotated bibliography of recent papers on VRD that I regularly refer to.


  • 12/2021: I spent Fall ‘21 as a Research Intern at Google.
  • 08/2021: I spent Summer ‘21 as an Applied Scientist Intern at Amazon Science.
  • 04/2021: I am a Ph.D. candidate now!!
  • 04/2021: I was invited to present at The 35th Hayes Research Forum, 2021.
  • 12/2020: New findings from our ongoing effort on IE from multimodal documents to appear in VLDB 2021.

Selected Publications

C4. Improving Information Extraction from Visually Rich Documents using Visual Span Representations
TLDR: A visually rich document refers to a document where visual features play an important role in its semantics. We investigate whether incorporating domain-specific knowledge to encode the context of a visual span helps the downsteam performance of an IE task. Our results show context representation learned using a multimodal bi-LSTM network improves end-to-end performance on four separate IE tasks.

Conference: 47th International Conference on Very Large Data Bases (VLDB), 2021
Authors: Ritesh Sarkhel, Arnab Nandi
Full-texthere, Slideshere

C3. Interpretable Multi-Headed Attention for Abstractive Summarization at Controllable Lengths
TLDR: We propose a supervised method for generating abstractive summaries within a user-specified length. We develop a length-aware encoder-decoder network that constructs a representative summary by leveraging an interpretable attention mechanism. We obtain strong results on two low-resource domains.

Conference: 28th International Conference on Computational Linguistics (COLING), 2020
Authors: Ritesh Sarkhel, Moniba Keymanesh, Arnab Nandi, Srinivasan Parthasarathy

C2. Deterministic Routing between Layout Abstractions for Multi-Scale Classification of Visually Rich Documents
TLDR: We propose a fast, multi-scale classifier for visually rich documents. For fast inference, we define an attention-like operator that extracts visual features from a hierarchical abstraction defined for each document. We obtain state-of-the-art results on multiple benchmark datasets.

Conference: 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019
Authors: Ritesh Sarkhel, Arnab Nandi
Full-texthere, Slideshere

C1. Visual Segmentation for Information Extraction from Heterogeneous Visually Rich Documents
TLDR: We hypothesize that every visually rich document is comprised of a set of isolated, semantically coherent areas called logical blocks. We propose a divide-and-conquer approach for information extraction leveraging this bag-of-logical-blocks representation. Our end-to-end workflow does not utilize any type or format specific features, making it robust towards heterogeneous documents.

Conference: International Conference on Management of Data (SIGMOD), 2019
Authors: Ritesh Sarkhel, Arnab Nandi
Full-texthere, Slideshere

Background and Experience

Before graduate school, I was a part of the CMATER Research Laboratory at Jadavpur University, India, where I developed a cost-effective optical character recognition system for Indian languages. Read about some of our findings here and here. Many moons ago, I used to be a part of a research and development group in Samsung Research Institute with a focus on increasing the accesibility of mobile devices. Read about some of the products I have worked on here.

I have reviewed for journals including IEEE Transactions on Visualization and Computer Graphics, IEEE Transactions on Multimedia, Soft Computing (Springer), and Neurocomputing (Elsevier). I have also volunteered as an external reviewer for conferences including SIGMOD and The Web Conference.