PDFs are a great way to maintain your document’s formatting regardless of screen size or device when sharing it with others. That said, this also makes editing PDFs a hassle, especially if you’re trying to do it in code.
Thankfully, libraries like iTextsharp allow for PDF manipulation using a series of simple methods. In this article, we’re talking about how to extract hyperlinks from PDFs using iTextsharp.
Also read: Fix: Unknown error: soap-error: encoding: object has no uirequestid property
Extracing hyperlinks from PDFs
Itextsharp has a lot of different functions for carrying out individual tasks. For extracting hyperlinks, we’ll use the getAnnotations() method that will loop over the file and collect them in a single List object.
The basic syntax looks something like this.
List annots = pdfPage.getAnnotations();
Of course, you’ll need a PDF document loaded in the pdfPage object mentioned above. Additionally, not all links are actual URLs, so we also need to perform a few sanity checks on our document.
Here’s what a basic script would look like.
//Get the current PDF page
PdfPage pdfPage = pdfDoc.getPage(page);
//Get all of the annotations or hyperlink from the current page
List annots = pdfPage.getAnnotations();
//Check if there were any links
if ((annots == null) || (annots.size() == 0)) {
System.out.println("No hyperlins in PDF");
}
//Loop through each hyperlink
else {
for (PdfAnnotation a : annots) {
//Make sure this hyperlink has a link
if (a.getSubtype().equals(PdfName.Link))
continue;
//Make sure this hyperlink has an ACTION
if (a.getAction() != null) {
//Get the ACTION for the current annotation
PdfDictionary annotAction = a.getAction();
// Test if the found hyperlink is actually a URL
if (annotAction.get(PdfName.S).equals(PdfName.URI) ||
annotAction.get(PdfName.S).equals(PdfName.GoToR)) {
//Saving external links
PdfString destination = annotAction.getAsString(PdfName.URI);
String url1 = destination.toString();
}
else if (annotAction.get(PdfName.S).equals(PdfName.GoTo) ||
annotAction.get(PdfName.S).equals(PdfName.GoToE)) {
//do smth with internal links
}
}
}
}
Also read; How to Emphasize Crucial Textual Content on Foxit PDF Editor?