How to Tag PDF Documents Using HTML Coordinates?

How to Tag Documents Using HTML Coordinates?

Are you a law professional looking to find relevant cases amongst the humongous case directories? Amidst the humongous amount of data saved as cases, it becomes critical to find the relevant case within a short time frame. Finding the relevant information requires document tagging. But what is document tagging, exactly? Document Tagging is the process of giving references or assigning tags to be searched within different documents. The user can search or filter the information using these tags. Tagging is to group or link similar tag texts on PDF or HTML for faster search response.

The PDF tagging feature is available in Adobe Acrobat Pro DC. However, it only allows you to tag the whole page or particular destination points. As a solution to the above limitation, we can tag data using HTML with any text, word, or character. Hence HTML makes the tagging feature more efficient and useful for text searching purposes for users.

The experts at DEV IT, help you in finding relevant cases by making a few changes in the backend. Do you wish to know to how?

We have carefully curated this document to help you in understanding HTML tagging and how you can tag documents using HTML coordinates.

There are two ways to save HTML tags. You can save tags using the HTML file itself or the database with x, y coordinates of text as described below in detail.

1. Save Tags in HTML File:

The simplest way to save tags is to highlight the data with HTML tag. And assign it an ID or metadata that defines its characteristics, and saves updated files.

  • Advantages:
    • Easy to perform and less time-consuming.
  • Disadvantages:
    • The complexity of the HTML text increases with time since the tags get saved within the original HTML document itself.
    • Modification of original text in the HTML file is laborious because a lot of text comprises the HTML tags and makes it more complicated.

2. Save Tags in Database with co-ordinates:

A modern way to save tags is using the database and Selection API made by Mozilla (Open Source). A Selection represents the range of the text selected by the user. To get the selection range, use the javascript method window.getSelection().

Properties

Selection.anchorNode
This option returns the starting selection Node.

Selection.anchorOffset
This option returns a number representing the position within the anchor node. If the anchor node is a text node, it often represents the anchor node as an element. But if there are several elements within the text nodes, then it represents the anchor node along with child nodes.

Selection.focusNode
This option returns the ending selection Node.

Selection.focusOffset
This option returns a number representing the position within the focus node. If the focus node is a text node, then the option returns the number of characters within focus nodes. But if the node has several other elements, the option returns the number of child elements.

Selection.isCollapsed
Returns a Boolean indicating whether the selection’s start point and endpoints are the same.

Methods

Selection.addRange():
A Range object that you wish to add to the selected object.

Selection.deleteFromDocument():
This option deletes the content from the ‘selection’ tab within the document.

Selection.getRangeAt()
This option returns a Range object from the selected object.

Selection.removeAllRanges()
This option removes all ranges.

Selection.toString()
This option returns a string from the selection object, that is, the selected text.

It is vital to save the text with a unique data attribute like the data-key with an integer value to the original document. This helps in finding the text more easily in the future.

<div data-key="225" class="paratext">
   	  <div data-key="226" class="Parapadding"></div>
	</div>

Now we have to get selection ranges by using the getSelection() method and get the range of that selection by using selection.getRangeAt(0).

Now by using this range, we can use StartContainer, EndContainer, StartOffset, EndOffset.

We have to take Nodes with StartContainer and get the child node that contains the beginning selection and the ending selection by using EndContainer.

We can use a general method to get all the values written as follows:

function rangeToObj(range) {
    return {
      startKey: range.startContainer.parentNode.dataset.key,
      startTextIndex: Array.prototype.indexOf.call
				(
					range.startContainer.parentNode.childNodes, 						
           range.startContainer
				),
      endKey: range.endContainer.parentNode.dataset.key,
      endTextIndex: Array.prototype.indexOf.call
			   (
					range.endContainer.parentNode.childNodes, 						
           range.endContainer
			   ),
      startOffset: range.startOffset,
      endOffset: range.endOffset
    }
  }

Then save this object in the database with some additional information about the tag.

When you wish to reselect the saved tag, you may convert the saved object to the range.

function objToRange(rangeStr) {
    range = document.createRange();
    range.setStart(document.querySelector('[data-key="' + rangeStr.startKey + '"]').childNodes[rangeStr.startTextIndex], rangeStr.startOffset);
    range.setEnd(document.querySelector('[data-key="' + rangeStr.endKey + '"]').childNodes[rangeStr.endTextIndex], rangeStr.endOffset);
    return range;
  }

Then create one selection object.

    var sel = getSelection();
      sel.removeAllRanges();
      sel.addRange(objToRange(each));

by using that range we can create one highlight node like using
var newNode = document.createElement(“span”);

    newNode.setAttribute(
      "class",
      "search__highlight"
    );

After declaring the highlighted node using surroundContents() method, we can reselect our selection range.

range.surroundContents(newNode);

After the selection, we have to remove the range from the selected object by using removeAllRanges().

document.getSelection().removeAllRanges();

The experts at DEV IT help you in finding the required case in no time. They tag your PDF documents using HTML coordinates and find the information required quickly. To know more about such solution, contact here.