用PDFBox从PDF中提取图片

从PDF文件中提取图片,这个也用到了PDFBox。
将图片提取出来,PDF每一页中有多个图片时也可以提取出每个图片。

提取后发现有部分有带透明的图片提取后会有两个图片,一个有透明,一个背景为黑色,这个黑色背景的图片并不需要,要把它删掉。

于是将提取的图片分为两组,一组有透明的,一组没透明的。
再将透明的图片加上黑色背景,后与没透明的那组图片进行比较,相同的即是多余出来带黑色背景的图片,把它删掉,剩下的就是提取所要的图片了。

用到了以下三个库
pdfbox-1.8.10.jar
commons-logging-1.2.jar
fontbox-1.8.10.jar

import java.io.File;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.ArrayList;
import java.awt.Color;
import java.awt.image.BufferedImage;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage;

public class pdfimg 
{
    public static void main(String[] args) throws IOException {
        String filename = args[0];
        String savefile = args[1];

        PDDocument doc = PDDocument.load(filename);
        List pages = doc.getDocumentCatalog().getAllPages();
        Iterator iter = pages.iterator();
        int i = 1;
        String name = null;

        while (iter.hasNext()) {
            PDPage page = (PDPage) iter.next();
            PDResources resources = page.getResources();
            Map pageImages = resources.getXObjects();
            if (pageImages != null) {
                Iterator imageIter = pageImages.keySet().iterator();
                List<BufferedImage> isTransparentImage = new ArrayList<>();
                List<BufferedImage> notTransparentImage = new ArrayList<>();
                
                while (imageIter.hasNext()) {
                    String key = (String) imageIter.next();
                    PDXObjectImage image = (PDXObjectImage) pageImages.get(key);

                    BufferedImage bi = (BufferedImage) image.getRGBImage();
                    int width = bi.getWidth();
                    int height = bi.getHeight();
                    boolean isTransparent = false;
                    
                    outer : for (int x = 0; x < height; x++) {
                        for (int y = 0; y < width; y++) {
                            int dip = bi.getRGB(y, x)>>24;
                            if (dip == 0){
                                isTransparent = true;
                                
                                File outputFile = new File(savefile + "_" + i + ".png");
                                ImageIO.write(bi, "png", outputFile);
                                System.out.println("path::" + savefile + "_" + i + ".png");
                                i++;
                                break outer;
                            }
                        }
                    }
                    if (isTransparent == true) {
                        isTransparentImage.add(bi);
                    } else {
                        notTransparentImage.add(bi);
                    }
                }

                Iterator isTI = isTransparentImage.iterator();
                Iterator notTI = notTransparentImage.iterator();
                
                while (isTI.hasNext()) {
                    BufferedImage img = (BufferedImage) isTI.next();
                    int width = img.getWidth();
                    int height = img.getHeight();

                    BufferedImage bufferedImage = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB);
                    bufferedImage.createGraphics().drawImage(img, 0, 0, Color.BLACK, null);

                    for (notTI = notTransparentImage.iterator(); notTI.hasNext(); ) {
                        BufferedImage bi = (BufferedImage) notTI.next();
                        if (compareImage(bufferedImage, bi) == true){
                            notTI.remove();
                            break;
                        }
                    }
                }

                notTI = notTransparentImage.iterator();
                while (notTI.hasNext()) {
                    BufferedImage bi = (BufferedImage) notTI.next();
                    File outputFile = new File(savefile + "_" + i + ".png");
                    ImageIO.write(bi, "png", outputFile);
                    System.out.println("path::" + savefile + "_" + i + ".png");
                    i++;
                }
            }
        }
        doc.close();
        // System.out.println("over");
    }

    public static boolean compareImage(BufferedImage img1, BufferedImage img2) {
        int w1 = img1.getWidth();
        int h1 = img1.getHeight();
        int w2 = img2.getWidth();
        int h2 = img2.getHeight();
        if (w1 != w2 || h1 != h2) {
            return false;
        }
        int movex = (int)(w1 / 200);
        int movey = (int)(h1 / 200);
        for (int x = 0; x < h1; x += movex) {
            for (int y = 0; y < w1; y += movey) {
                int dip1 = img1.getRGB(y, x)>>24;
                int dip2 = img2.getRGB(y, x)>>24;
                
                if(dip1 != dip2) {
                    return false;
                }
            }
        }
        return true;
    }
}
anyShare分享到:

《用PDFBox从PDF中提取图片》有3个想法

发表评论

电子邮件地址不会被公开。 必填项已用*标注