Skip to content

How to extract hyperlink from PDF using iTextsharp?

  • by
  • 2 min read

PDFs are a great way to maintain your document’s formatting regardless of screen size or device when sharing it with others. That said, this also makes editing PDFs a hassle, especially if you’re trying to do it in code. 

Thankfully, libraries like iTextsharp allow for PDF manipulation using a series of simple methods. In this article, we’re talking about how to extract hyperlinks from PDFs using iTextsharp.

Also read: Fix: Unknown error: soap-error: encoding: object has no uirequestid property

Extracing hyperlinks from PDFs

Itextsharp has a lot of different functions for carrying out individual tasks. For extracting hyperlinks, we’ll use the getAnnotations() method that will loop over the file and collect them in a single List object.

The basic syntax looks something like this.

List annots = pdfPage.getAnnotations();

Of course, you’ll need a PDF document loaded in the pdfPage object mentioned above. Additionally, not all links are actual URLs, so we also need to perform a few sanity checks on our document. 

Here’s what a basic script would look like.

//Get the current PDF page
PdfPage pdfPage = pdfDoc.getPage(page);
//Get all of the annotations or hyperlink from the current page
List annots = pdfPage.getAnnotations();
//Check if there were any links
if ((annots == null) || (annots.size() == 0)) {
    System.out.println("No hyperlins in PDF");
}
//Loop through each hyperlink
else {
    for (PdfAnnotation a : annots) {
        //Make sure this hyperlink has a link
        if (a.getSubtype().equals(PdfName.Link))
            continue;
        //Make sure this hyperlink has an ACTION
        if (a.getAction() != null) {
            //Get the ACTION for the current annotation
            PdfDictionary annotAction = a.getAction();
            // Test if the found hyperlink is actually a URL
            if (annotAction.get(PdfName.S).equals(PdfName.URI) ||
                annotAction.get(PdfName.S).equals(PdfName.GoToR)) {
                    //Saving external links
                    PdfString destination = annotAction.getAsString(PdfName.URI);
                    String url1 = destination.toString();
            }
            else if (annotAction.get(PdfName.S).equals(PdfName.GoTo) ||
                annotAction.get(PdfName.S).equals(PdfName.GoToE)) {
                    //do smth with internal links
            }
        }
    }
}

Also read; How to Emphasize Crucial Textual Content on Foxit PDF Editor?

nv-author-image

Yadullah Abidi

Yadullah is a Computer Science graduate who writes/edits/shoots/codes all things cybersecurity, gaming, and tech hardware. When he's not, he streams himself racing virtual cars. He's been writing and reporting on tech and cybersecurity with websites like Candid.Technology and MakeUseOf since 2018. You can contact him here: yadullahabidi@pm.me.

>