What is Providence?¶
Providence is the core of CollectiveAccess. It includes a data modeling framework, a database, a media handling framework capable of manipulating and converting digital images, video, audio and documents, and a web-based user interface application for cataloguing, searching and managing your collections. If you are starting out with CollectiveAccess, Providence is the first (and most important) component you need to install. All other CollectiveAccess components are add-ons to Providence and require a functional Providence installation.
Providence is a web-based application that runs on a server. Users access the server from their own computers over a network using standard web browser software. As with any web-based application, Providence is designed to be accessed via the internet, enabling collaborative cataloguing of collections by widely dispersed teams. However, you do not have to make your Providence installation accessible on the internet. It will function just as well on a local network with no internet connectivity, or even on a single machine with no network connectivity at all. Who gets to access your system is entirely up to you.
Before attempting an installation verify that your server meets the basic requirements for running Providence:
|Operating System||Linux, Mac OS X 10.9+, or Windows (Server 2012+, Windows 7, 8 and 10 verified to work).|
|Server Memory||4 gb of RAM minimum. If you intend to have CA handle large image files then your server should ideally have three times the size of the largest image when uncompressed. In general more memory is always better, and 8 gb of RAM is a good baseline assuming it is not cost prohibitive.|
|Data Storage||A simple formula for estimating storage requirements requires an expected number of media items to be catalogued and an average size for those media items. Once these quantities are known an estimate can be derived using some simple arithmetic: <storage required in mb> = (<# of media items> * <average storage requirements per media item in mb>) + (<# of media items> * 5mb). 5mb is estimated overhead of storing derivatives (small JPEG, TilePic pan-and-zoom version, etc.) It is recommended to double the calculated storage requirements when acquiring hardware if practical. Storage requirements for your metadata and database indices, even if your database is quite large, are usually negligible compared to the storage required for media.|
|Processor||Multiprocessor/multicore architectures are desirable for the improved scalability they provide, and well as the capability to speed the processing of uploaded media. Media processing is often CPU-bound (as opposed to database operations which are often I/O bound) and lends itself to multiprocessing. It is advisable to obtain a machine with at least 2 cores and, if possible, 4+ cores.|
Core software requirements¶
Providence requires three core open-source software packages be installed prior to installation. Without these packages Providence cannot run:
|Webserver||Apache version 2.4 or NGINX 1.14 or later are recommended.|
|MYSQL Database||Versions 5.5, 5.6, 5.7 and 8.0 are supported.|
|PHP programming language||PHP version 7.0 or better is required. PHP 7.2 or later is strongly recommended. Note that the CollectiveAccess, 1.7.7 is the last version to support PHP 5.6.|
All of these should be available as pre-compiled packages for most Linux distributions and as installer packages for Windows. For Macs, Brew is a highly recommended way to get all of CA’s prerequisites quickly up and running.
If setting up Apache, MySQL or PHP is daunting, you may want to consider pre-configured Apache/MySQL/PHP environments available for Windows and Macintosh such as MAMP and XAMPP. These can greatly simplify setup of CollectiveAccess and its’ requirements and are useful tools for experimentation and prototyping. They are not recommended for hosting live systems, however.
Required and Suggested Software Packages By Distribution¶
Some packages used by CollectiveAccess are available only from 3rd party repositories. Packages recommended here are from the following repositories:
- Nux: http://li.nux.ro/download/nux/dextop/el7/x86_64/nux-dextop-release-0-5.el7.nux.noarch.rpm
- Remi: http://rpms.remirepo.net/enterprise/remi-release-7.rpm
- EPEL: https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
- mariadb-server [Database server]
- httpd [Web server]
- redis-server [Cache server]
- php php-mcrypt php-cli php-gd php-curl php-mysqlnd php-zip php-fileinfo php-devel php-gmagick php-opcache php-process php-xml php-mbstring php-redis [Runtime environment] (Remi, EPEL)
Suggested: - GraphicsMagick-devel [Image processing] - ghostscript-devel - ffmpeg-devel [Audio and video processing] (Nux) - libreoffice [Microsoft Office file processing] (EPEL) - dcraw [RAW image format support] - mediainfo [Media metadata extraction] - exiftool [Media metadata extraction] - xpdf [Media metadata extraction]
When installing a tool for media metadata extraction, you need only install one, although having multiple installed will not cause issues.
Some packages used by CollectiveAccess are available only from 3rd party repositories. Packages recommended here are from the following repositories:
- ondrej/php: ppa:ondrej/php
- PECL: https://pecl.php.net
- php7.x libapache2-mod-php7.x php7.x-common php7.x-mbstring php7.x-xmlrpc php7.x-gd php7.x-xml php7.x-intl php7.x-mysql php7.x-cli php7.x-mcrypt php7.x-zip php7.x-curl php7.x-posix php7.x-dev php-pear php7.x-
- pecl.php.net/gmagick-2.0.5RC1 [pecl install channel://pecl.php.net/gmagick-2.0.5RC1]
- graphicsmagick libgraphicsmagick-dev [Image processing]
- ffmpeg [Audio and video processing]
- ghostscript [PDF processing]
- libreoffice [Microsoft Office file processing]
- dcraw [RAW image format support]
- mediainfo [Media metadata extraction]
- xpdf [Media metadata extraction]
- exiftool [Media metadata extraction]
If you are running Apache on Linux, the root of your CollectiveAccess installation will usually be located in /var/www/html.
Software requirements for media processing¶
Depending upon the types of media you intend to handle with CA you will also need to install various supporting software libraries and tools. None of these is absolutely required for CA to install and operate but without them specific types of media may not be supported (as noted below).
|Software Package||Media Types||Notes|
|GraphicsMagick||Images||Version 1.3.16 or better is required. GraphicsMagick is the preferred option for processing image files on all platforms and is better performing than any other option. Be sure to compile or obtain a version of GraphicsMagick with support for the formats you need. Support for some image formats is contingent upon other libraries being present on your server (eg. libTiff must be present for TIFF support]). Some less common formats, such as PSD, may require special configuration and/or compilation.|
|ImageMagick||Images||Version 6.5 or better is required. ImageMagick can handle more image formats than any other option but is significantly slower than GraphicsMagick in most situations. Be sure to compile or obtain a version of ImageMagick with support for the formats you need! Support for some image formats is contingent upon other libraries being present on your server (eg. libTiff must be present for TIFF support]).|
|libGD||Images||A simple library for processing JPEG, GIF and PNG format images, GD is a fall-back for image processing when ImageMagick is not available. This library is typically bundled with PHP so you should not need to install it separately. In some cases you may need to perform a manual install or use a package provided by your operating system provider. In addition to supporting a limited set of image formats, GD is typically slows than ImageMagick or GraphicsMagick for many operations. If at all possible install GraphicsMagick on your server.|
|ffmpeg||Audio, Video||Required if you want to handle video or audio media. Be sure to compile to support the file formats and codecs you require.|
|Ghostscript||PDF Documents||Ghostscript 8.71 or better is required to generate preview images of uploaded PDF documents. PDF uploads will still work, but without preview images, if Ghostscript is not installed. If you require color management (if you are dealing with color PDF documents you do), then you must install Ghostscript 9.0 or better.|
|dcraw||Images||Required to support upload of proprietary CameraRAW formats produced by various higher-end digital cameras. Note that that AdobeDNG format, a newer RAW format, is supported by GraphicsMagick and ImageMagick.|
|PdfToText||PDF Documents||A utility to extract text from uploaded PDF files. If present CA will use PdfToText to extract text for indexing. If PdfToText is not installed on your server CA will not be able to search the content of uploaded PDF documents.|
|PdfMiner||PDF Documents||A utility to extract text and text locations from uploaded PDF files. If present CA will use PdfMiner to extract text for indexing and locations to support highlighting of search results during PDF display. If PdfMiner is not installed on your server CA will fall back to PdfToText for indexing and highlighting of search results will be disabled.|
|MediaInfo||Images, Audio, Video, PDF Documents||A library for extraction of technical metadata from various audio and video file formats. If present CA can use MediaInfo to extract technical metadata, otherwise it will fall back to using various built-in methods such as GetID3.|
|ExifTool||Images||A library for extraction of embedded metadata from many image file formats. If present CA can use it to extract metadata for display and import.|
|WkHTMLToPDF||PDF Output||WkHTMLToPDF is an application that can perform high quality conversion of HTML code to PDF files. If present CollectiveAccess can use WkHTMLToPDF to generate PDF-format labels and reports. Version 0.12.1 is supported. Do not use version 0.12.2, which has bugs that prevent valid formatting of output. If WkHTMLToPDF is not installed CollectiveAccess will fall back to a slower built-in alternative.|
|LibreOffice||Office Documents||LibreOffice is an open-source alternative to Microsoft Office. CollectiveAccess can use it to index and create previews for Microsoft Word, Excel and Powerpoint document. LibreOffice 4.0 or better is supported.|
Most users will want at a minimum GraphicsMagick and ffmpeg installed on their server, and should install other packages as needed. For image processing you need only one of the following: GraphicsMagick, ImageMagick, libGD.
PHP extensions for media processing (optional)¶
CA supports two different mechanisms to employ GraphicsMagick or ImageMagick. The preferred option is a PHP extensions that, when installed, provide a fast and efficient way for PHP applications such as CA to access GraphicsMagick or ImageMagick functionality. Alternatively GraphicsMagick or ImageMagick can be invoked as a command-line program directly without any PHP extension.
In general you should try to use a PHP extension rather than the command-line mechanism. The extensions provide much better performance. Unfortunately, the extensions have proven to be unstable in some environments and can be difficult to install on Windows systems. If you are running the PHP GMagick (for GraphicsMagick) or IMagick (for ImageMagick) extension and are seeing segmentation faults or incorrect image encoding such as blank images you should remove the extension, let the command-line mechanism take over and see if that improves things.
GraphicsMagick version 1.3.32 and better break certain functions in the PHP GMagick extension API and cause all media processing to fail in CollectiveAccess in versions prior to 1.7.9. Upgrade to the current version of CollectiveAccess if you are seeing failed processing with later versions of GraphicsMagick from 1.3.32.
Both Gmagick and Imagick are available in the PHP PECL repository and often available as packages for various operating systems. They should be easy to install on Unix-y operating systems like Linux and Mac OS X. Installation on Windows is a waking nightmare.
Configuring PHP prior to installation¶
With the core software requirements installed on your server examine the newly installed PHP configuration file. A few settings may need adjustment.
Your PHP configuration file is usually named php.ini. On Linux systems the php.ini file is often in /etc/php.ini or /usr/local/lib/php.ini. If you cannot locate your php.ini file, look for its location in the output of phpinfo(), either by running the PHP command line interpreter with the -i option (eg. php -i) or running a PHP script that looks like this: <?php phpinfo(); ?> The output from phpinfo() will include the precise location of the php.ini file used to configure PHP.
Once you’ve found your php.ini file verify and, if necessary, change the following values:
- post_max_size - sets maximum size a POST-style HTTP request can be. The default value is 8 megabytes. If you are uploading large media files (and most CollectiveAccess users are) you will need to raise this to a value larger than the largest file size you are likely to encounter.
- upload_max_filesize - sets the maximum size of an uploaded file. Set this to a the same large value set for post_max_size.
- memory_limit - sets the maximum amount of memory a PHP script may consume. The default is 128 megabytes which should be enough for many systems, unless you are (a) uploading large images (b) reindexing the search index of a large database or (c) importing data. Even if you have not received memory limit exceeded errors, you may want to increase this limit to 196 or 256 megabytes.
- display_errors - determines whether errors are printed to the screen or not. In some installation this is set to “off” by default. While this is a good security decision for public-facing systems, it can make debugging installation problems difficult. It is therefore suggested that while installing and testing CA you set this option to “On”
To install CollectiveAccess Providence perform the following steps:
- Set up an empty MySQL database for your installation. Give the database a name and create a login for it with full read/write access. Note the login information - you’ll need it later. You can use the MySQL command line or web-based tools like phpMyAdmin to create the database and login.
- Copy the contents of the CollectiveAccess software distribution to the root of the web server instance in which your installation will run. You can obtain the latest release version from our download page. If you wish to obtain CollectiveAccess from the project’s GitHub repository run the following command from the parent of the directory into which you want to install CA:
git clone https://github.com/collectiveaccess/providence.git providencewhere the trailing “providence” is the name of the directory you want your installation to be in. Git will create the directory for you.
- Copy the setup.php-dist file (in the root directory of the CA distribution) to a file named setup.php. Edit setup.php, changing the various directory paths and database login parameters to reflect your server setup.
- Make sure the permissions on the
mediadirectories are such that the web server can write to them. In the next step, the web-based installer will need the access to create directories for uploaded media, and to generate cached files. In most hosted environments these permissions will already be set correctly.
- In a web browser navigate to the web-based installer. If the URL for your installation server is
http://www.myCollectiveaccessSite.orgthen the URL to the installer is
http://www.myCollectiveaccessSite.org/install. Enter your email address and select the installation profile (a profile is a set of pre-configured values for your system) that best fits your needs. Then click on the “begin” button. If you don’t see a profile suitable for your project you may want to ask on the support forum or look at our list of contributed profiles.
- The installer will give you login information for your newly installed system when installation is complete. Be sure to note this information in a safe place!
Optional post installation tasks¶
Set up for background encoding of media¶
By default, CollectiveAccess will process all uploaded media immediately at time of upload. For large media files this can make the user’s browser in unresponsive for an extended period of time while CA performs large and complex media conversions. If you expect to be uploading many large media files you can enable background processing of media by setting the __CA_QUEUE_ENABLED__ setting to 1 in your setup.php (it is off by default).
Once background processing is enabled, all media files exceeding a specific size will be queued for later processing. Small sizes will still be run “while you wait” unless you modify the media processing configuration. To actually process the images in the queue you must run the script support/bin/caUtils process-task-queue. This script is typically run from a crontab (in Unix-like operating systems, at least).
You can run the queue processing script as often as you want. Only a single instance of the script is allowed to run at any given time, so you need not worry about out-of-control queue processing scripts running simultaneously and depleting server resources. Note that the queue processing script should always be run under a user with write-access to the CA media directory.
What to do if something goes wrong?¶
If your CollectiveAccess installation fails, the first thing to do is examine error messages on screen or in the log (written to the app/log directory). If you receive a blank white screen odds are error messages are being suppressed in your PHP php.ini configuration file. Try changing the display_errors option to “On” and then attempt to reinstall.
If you are totally stumped after reviewing the error messages and logs you can find help on the online support forum. Please include a full description of your problem as well as the operating system you are running, the version of CA you are running, the text of any error messages, the output of phpinfo() and the output of the CA “Configuration Check” (available in the “Manage” menu under “System Configuration”) - assuming you are able to log in. We will try our best to resolve your problems quickly.
You may also want to look at our list of OS specific Installation notes.