Number of pages in a Mediawiki site

I wanted to know how many A4 pages I would have to print on a printer, if I wanted to print all the articles in a Mediawiki installation.

We have a Mediawiki installation with a total number of 1,418 content pages.

I already managed to export all the articles to their respective PDF files.

This I did with another bash-script I wrote (I will add this in another post later on).

First I had to find the file-names of all the PDF files located in my export directory:
find . -type f -name '*.pdf' | sed 's/\.\///g' > /tmp/pdffiles.txt;

Then I could use this PHP code for counting the number of a4 pages to print if the selected PDF file should be printed. The number of pages to print is a number located in the PDF file.

[php]
<?php
$FILEN=$argv[1];

if (!$fp = fopen($FILEN,"r")) {
echo ‘failed opening file ‘.$FILEN;
}
else {
$max=0;
while(!feof($fp)) {
$line = fgets($fp,255);
if (preg_match(‘/\/Count [0-9]+/’, $line, $matches)){
preg_match(‘/[0-9]+/’,$matches[0], $matches2);
if ($max<$matches2[0]) $max=$matches2[0];
}
}
fclose($fp);
echo "There ".($max<2?"is ":"are ").$max." page".($max<2?"":"s")." in ". $FILEN.".\r\n";
}
?>
[/php]

The PHP code would have the file as an argument, and would be run on the command line with:
php test.php "filename.pdf"
The output would be something like:
There are 2 pages in KlientRegnemaskiner.pdf.

I would gather all the 1418 lines with this command in a bash file name: runme.sh

php test.php "Uibrepo.pdf"
php test.php "Maskinen sender mailer til ukjente adresser.pdf"
php test.php "IP-telefon feilfinning.pdf"
php test.php "Puppetklient.pdf"
php test.php "WordPress.pdf"
php test.php "Basware - Overføre OM-lisens i Basware IP.pdf"
...

and run the command with:
./runme.sh | awk {'print $3'} >> numberofpages.txt
The file numberofpages.txt will contain a list of numbers, each representing the number of pages per PDF file:

1
2
1
1
3
2
7
9
2
...

In order to sum up everything, i would then run this command:
paste -sd+ numberofpages.txt | bc

The result in my case was: 4109 pages.
Which means that from the 1418 wiki articles, I would have to print 4109 A4 pages. Let’s say a a normal book contains of 300 pages, then I would have 14 books.

* http://www.hotscripts.com/forums/php/23533-how-now-get-number-pages-one-document-pdf.html
* http://stackoverflow.com/questions/3096259/bash-command-to-sum-a-column-of-numbers
* http://wbilljohnson.com/journal/math/pagethickness.htm
* https://answers.yahoo.com/question/index?qid=20070926113958AAqZRMs

Leave a Reply

Your email address will not be published. Required fields are marked *