When benchmarking an algorithm it is recommendable to use a standard test data set for researchers to be able to directly compare the results. While there are many databases in use currently, the choice of an appropriate database to be used should be made based on the task given (aging, expressions, lighting etc). Another way is to choose the data set specific to the property to be tested (e.g. how algorithm behaves when given images with lighting changes or images with different facial expressions). If, on the other hand, an algorithm needs to be trained with more images per class (like LDA), Yale face database is probably more appropriate than FERET.
R. Gross, Face Databases, Handbook of Face Recognition, Stan Z. Li and Anil K. Jain, ed., Springer-Verlag, February 2005, 22 pages
- is a web-based automated evaluation system developed to evaluate biometric algorithms. Algorithms submitted to the Face Compliance Verification to ISO standard (FICV) are required to check the compliance of face images to ISO/IEC 19794-5 standard. To the best of our knowledge this is the first available benchmark that directly assesses the accuracy of algorithms to automatically verify the compliance of face images to the ISO standard, in the attempt of semi-automating the document issuing process.
P. Jonathon Phillips, A. Martin, C.l. Wilson, M. Przybocki, An Introduction to Evaluating Biometric Systems, IEEE Computer, Vol. 33, No. 2, February 2000, pp. 56-63
, 407 kB
A. J. Mansfield, J. L. Wayman, Best Practices in Testing and Reporting Performance of Biometric Devices, NPL Report CMSC 14/02, August 2002
, 406 kB
K. Delac, M. Grgic, S. Grgic, Independent Comparative Study of PCA, ICA, and LDA on the FERET Data Set, International Journal of Imaging Systems and Technology, Vol. 15, Issue 5, pp. 252-260
, 412 kB
Here are some face data sets often used by researchers:
The FERET program set out to establish a large database of facial images that was gathered independently from the algorithm developers. Dr. Harry Wechsler at George Mason University was selected to direct the collection of this database. The database collection was a collaborative effort between Dr. Wechsler and Dr. Phillips. The images were collected in a semi-controlled environment. To maintain a degree of consistency throughout the database, the same physical setup was used in each photography session. Because the equipment had to be reassembled for each session, there was some minor variation in images collected on different dates. The FERET database was collected in 15 sessions between August 1993 and July 1996. The database contains 1564 sets of images for a total of 14,126 images that includes 1199 individuals and 365 duplicate sets of images. A duplicate set is a second set of images of a person already in the database and was usually taken on a different day. For some individuals, over two years had elapsed between their first and last sittings, with some subjects being photographed multiple times. This time lapse was important because it enabled researchers to study, for the first time, changes in a subject's appearance that occur over a year.
SCface is a database of static images of human faces. Images were taken in uncontrolled indoor environment using five video surveillance cameras of various qualities. Database contains 4160 static images (in visible and infrared spectrum) of 130 subjects. Images from different quality cameras mimic the real-world conditions and enable robust face recognition algorithms testing, emphasizing different law enforcement and surveillance use case scenarios. SCface database is freely available to research community. The paper describing the database is available .
The database is comprised of 21 facial landmarks (from 4160 face images) from 130 users annotated manually by a human operator, as described in .
A close relationship exists between the advancement of face recognition algorithms and the availability of face databases varying factors that affect facial appearance in a controlled manner. The PIE database, collected at Carnegie Mellon University in 2000, has been very influential in advancing research in face recognition across pose and illumination. Despite its success the PIE database has several shortcomings: a limited number of subjects, a single recording session and only few expressions captured. To address these issues researchers at Carnegie Mellon University collected the Multi-PIE database. It contains 337 subjects, captured under 15 view points and 19 illumination conditions in four recording sessions for a total of more than 750,000 images. The paper describing the database is available .
Contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink.
Contains 5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses x 64 illumination conditions). For every subject in a particular pose, an image with ambient (background) illumination was also captured.
A database of 41,368 images of 68 people, each person under 13 different poses, 43 different illumination conditions, and with 4 different expressions.
Capturing scenario mimics the real world applications, for example, when a person is going through the airport check-in point. Six cameras capture human faces from three different angles. Three out of the six cameras have smaller focus length, and the other three have larger focus length. Plan to capture 200 subjects in 3 sessions in different time period. For one session, both in-door and out-door scenario will be captured. User-dependent pose and expression variation are expected from the video sequences.
Ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).
Subjects in the released portion of the Cohn-Kanade AU-Coded Facial Expression Database are 100 university students. They ranged in age from 18 to 30 years. Sixty-five percent were female, 15 percent were African-American, and three percent were Asian or Latino. Subjects were instructed by an experimenter to perform a series of 23 facial displays that included single action units and combinations of action units. Image sequences from neutral to target display were digitized into 640 by 480 or 490 pixel arrays with 8-bit precision for grayscale values. Included with the image files are "sequence" files; these are short text files that describe the order in which images should be read.
The MIT-CBCL face recognition database contains face images of 10 subjects. It provides two training sets: 1. High resolution pictures, including frontal, half-profile and profile view; 2. Synthetic images (324/subject) rendered from 3D head models of the 10 subjects. The head models were generated by fitting a morphable model to the high-resolution training images. The 3D models are not included in the database. The test set consists of 200 images per subject. We varied the illumination, pose (up to about 30 degrees of rotation in depth) and the background.
24 subjects are represented in this database, yielding between about 6 to 18 examples of the 150 different requested actions. Thus, about 7,000 color images are included in the database, and each has a matching gray scale image used in the neural network analysis.
395 individuals (male and female), 20 images per individual. Contains images of people of various racial origins, mainly of first year undergraduate students, so the majority of indivuals are between 18-20 years old but some older individuals are also present. Some individuals are wearing glasses and beards.
There are images of 1573 individuals (cases) 1495 male and 78 female. The database contains both front and side (profile) views when available. Separating front views and profiles, there are 131 cases with two or more front views and 1418 with only one front view. Profiles have 89 cases with two or more profiles and 1268 with only one profile. Cases with both fronts and profiles have 89 cases with two or more of both fronts and profiles, 27 with two or more fronts and one profile, and 1217 with only one front and one profile.
450 face images. 896 x 592 pixels. JPEG format. 27 or so unique people under with different lighting/expressions/backgrounds.
Database is made up from 37 different faces and provides 5 shots for each person. These shots were taken at one week intervals or when drastic face changes occurred in the meantime. During each shot, people have been asked to count from '0' to '9' in their native language (most of the people are French speaking), rotate the head from 0 to -90 degrees, again to 0, then to +90 and back to 0 degrees. Also, they have been asked to rotate the head once again without glasses if they wear any.
Contains four recordings of 295 subjects taken over a period of four months. Each recording contains a speaking head shot and a rotating head shot. Sets of data taken from this database are available including high quality colour images, 32 KHz 16-bit sound files, video sequences and a 3D model.
4,000 color images corresponding to 126 people's faces (70 men and 56 women). Images feature frontal view faces with different facial expressions, illumination conditions, and occlusions (sun glasses and scarf).
Contains 125 different faces each in 16 different camera calibration and illumination condition, an additional 16 if the person has glasses. Faces in frontal position captured under Horizon, Incandescent, Fluorescent and Daylight illuminant. Includes 3 spectral reflectance of skin per person measured from both cheeks and forehead. Contains RGB spectral response of camera used and spectral power distribution of illuminants.
The CAS-PEAL face database has been constructed under the sponsors of National Hi-Tech Program and ISVISION. The goals to create the PEAL face database include: providing the worldwide researchers of FR community a large-scale Chinese face database for training and evaluating their algorithms; facilitating the development of FR by providing large-scale face images with different sources of variations, especially Pose, Expression, Accessories, and Lighting (PEAL); advancing the state-of-the-art face recognition technologies aiming at practical applications especially for the oriental.
The database contains 213 images of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models. Each image has been rated on 6 emotion adjectives by 60 Japanese subjects.
The dataset consists of 1521 gray level images with a resolution of 384x286 pixel. Each one shows the frontal view of a face of one out of 23 different test persons. For comparison reasons the set also contains manually set eye postions.
This is a collection of images useful for research in Psychology, such as sets of faces and objects. The images in the database are organised into SETS, with each set often representing a separate experimental study.
Consists of 564 images of 20 people. Each covering a range of poses from profile to frontal views. Subjects cover a range of race/sex/appearance. Each subject exists in their own directory labelled 1a, 1b, ... 1t and images are numbered consequetively as they were taken. The files are all in PGM format, approximately 220 x 220 pixels in 256 shades of grey.
This database contains short video sequences of facial Action Units recorded simultaneously from six different viewpoints, recorded in 2003 at the Max Planck Institute for Biological Cybernetics. The video cameras were arranged at 18 degrees intervals in a semi-circle around the subject at a distance of roughly 1.3m. The cameras recorded 25 frames/sec at 786x576 video resolution, non-interlaced. In order to facilitate the recovery of rigid head motion, the subject wore a headplate with 6 green markers. The website contains a total of 246 video sequences in MPEG1 format.
450 face images. 896 x 592 pixels. JPEG format. 27 or so unique people under with different lighting/expressions/backgrounds.
Human identification from facial features has been studied primarily using imagery from visible video cameras. Thermal imaging sensors are one of the most innovative emerging techonologies in the market. Fueled by ever lowering costs and improved sensitivity and resolution, our sensors provide exciting new oportunities for biometric identification. As part of our involvement in this effort, Equinox is collecting an extensive database of face imagery in the following modalities: coregistered broadband-visible/LWIR (8-12 microns), MWIR (3-5 microns), SWIR (0.9-1.7 microns). This data collection is made available for experimentation and statistical performance evaluations.
With the aim to facilitate the development of robust audio, face, and multi-modal person recognition systems, the large and realistic multi-modal (audio-visual) VALID database was acquired in a noisy "real world" office scenario with no control on illumination or acoustic noise. The database consists of five recording sessions of 106 subjects over a period of one month. One session is recorded in a studio with controlled lighting and no background noise, the other 4 sessions are recorded in office type scenarios. The database contains uncompressed JPEG Images at resolution of 720x576 pixels.
The database has two parts. Part one contains colour pictures of faces having a high degree of variability in scale, location, orientation, pose, facial expression and lighting conditions, while part two has manually segmented results for each of the images in part one of the database. These images are acquired from a wide variety of sources such as digital cameras, pictures scanned using photo-scanner, other face databases and the World Wide Web. The database is intended for distribution to researchers.
The database contains images of 50 people and is stored in JPEG format. For each individual, there are 15 color images captured between 06/01/99 and 11/15/99. Most of the images were taken in two different sessions to take into account the variations in illumination conditions, facial expression, and appearance. In addition to this, the faces were captured at different scales and orientations.
The database contains a set of face images taken in February, 2002 in the IIT Kanpur campus. There are eleven different images of each of 40 distinct subjects. For some subjects, some additional photographs are included. All the images were taken against a bright homogeneous background with the subjects in an upright, frontal position. The files are in JPEG format. The size of each image is 640x480 pixels, with 256 grey levels per pixel. The images are organized in two main directories - males and females. In each of these directories, there are directories with name as a serial numbers, each corresponding to a single individual. In each of these directories, there are eleven different images of that subject, which have names of the form abc.jpg, where abc is the image number for that subject. The following orientations of the face are included: looking front, looking left, looking right, looking up, looking up towards left, looking up towards right, looking down. Available emotions are: neutral, smile, laughter, sad/disgust.
The VidTIMIT database is comprised of video and corresponding audio recordings of 43 people, reciting short sentences. It can be useful for research on topics such as multi-view face recognition, automatic lip reading and multi-modal speech recognition. The dataset was recorded in 3 sessions, with a space of about a week between each session. There are 10 sentences per person, chosen from the TIMIT corpus. In addition to the sentences, each person performed a head rotation sequence in each session. The sequence consists of the person moving their head to the left, right, back to the center, up, then down and finally return to center. The recording was done in an office environment using a broadcast quality digital video camera. The video of each person is stored as a numbered sequence of JPEG images with a resolution of 512 x 384 pixels. The corresponding audio is stored as a mono, 16 bit, 32 kHz WAV file.
Labeled Faces in the Wild is a database of face photographs designed for studying the problem of unconstrained face recognition. The database contains more than 13,000 images of faces collected from the web. Each face has been labeled with the name of the person pictured. 1680 of the people pictured have two or more distinct photos in the database. The only constraint on these faces is that they were detected by the Viola-Jones face detector. Please see the database web page and the technical report linked there for more details.
LFWcrop is a cropped version of the Labeled Faces in the Wild (LFW) dataset, keeping only the center portion of each image (i.e. the face). In the vast majority of images almost all of the background is omitted. LFWcrop was created due to concern about the misuse of the original LFW dataset, where face matching accuracy can be unrealistically boosted through the use of background parts of images (i.e. exploitation of possible correlations between faces and backgrounds). As the location and size of faces in LFW was determined through the use of an automatic face locator (detector), the cropped faces in LFWcrop exhibit real-life conditions, including mis-alignment, scale variations, in-plane as well as out-of-plane rotations.
The "Labeled Faces in the Wild-a" image collection is a database of labeled, face images intended for studying Face Recognition in unconstrained images. It contains the same images available in the original data set, however, here we provide them after alignment using a commercial face alignment software. Some of our results were produced using these images. We show this alignment to improve the performance of face recognition algorithms. We have maintained the same directory structure as in the original LFW data set, and so these images can be used as direct substitutes for those in the original image set. Note, however, that the images available here are grayscale versions of the originals.
The 3D_RMA database is a collection of two sessions (Nov 1997 and Jan 1998) consisting of 120 persons. For each session, three shots were recorded with different (but limited) orientations of the head. Details about the population and typical problems affecting the quality are given in the referred link. 3D was captured thanks to a first prototype of a proprietary system based on structured light (analog camera!). The quality was limited but sufficient to show the ability of 3D face recognition. For privacy reasons, the texture images are not made available. In the period 2003-2008, this database has been downloaded by about 100 researchers. A few papers present recognition results with the database (like, of course, papers from the ).
GavabDB is a 3D face database. It contains 549 three-dimensional images of facial surfaces. These meshes correspond to 61 different individuals (45 male and 16 female) having 9 images for each person. The total of the individuals are Caucasian and their age is between 18 and 40 years old. Each image is given by a mesh of connected 3D points of the facial surface without texture. The database provides systematic variations with respect to the pose and the facial expression. In particular, the 9 images corresponding to each individual are: 2 frontal views with neutral expression, 2 x-rotated views (ą30o, looking up and looking down respectively) with neutral expression, 2 y-rotated views (ą90o, left and right profiles respectively) with neutral expression and 3 frontal gesture images (laugh, smile and a random gesture chosen by the user, respectively).
This database is formed by up to 109 subjects (75 men and 34 women), with 32 colour images per person. Each picture has a 320 x 240 pixel resolution, with the face occupying most of the image in an upright position. For one single person, all the photographs were taken on the same day, although the subject was forced to stand up and sit down again in order to change pose and gesture. In all cases, the background is plain and dark blue. The 32 images were classified in six groups according to the pose and lighting conditions: 12 frontal images, 4 15o-turned images, 4 30o-turned images, 4 images with gestures, 4 images with occluded face features and 4 frontal images with a change of illumination. This database is delivered for free exclusively for research purposes.
This database contains 106 subjects, with approximately one woman every three men. The data were acquired with a Minolta VIVID 700 scanner, which provides texture information (2D image) and a VRML file (3D image). If needed, the corresponding range data (2.5D image) can be computed by means of the VRML file. Therefore, it is a multimodal database (2D, 2.5D y 3D). During all time, a strict acquisition protocol was followed, with controlled lighting conditions. The person sat down on an adjustable stool opposite the scanner and in front of a blue wall. No glasses, hats or scarves were allowed. A total of 16 captures per person were taken in every session, with different poses and lighting conditions, trying to cover all possible variations, including turns in different directions, gestures and lighting changes. In every case only one parameter was modified between two captures. This is one of the main advantages of this database, respect to others. This database is delivered for free exclusively for research purposes.
The BJUT-3D is a three dimension face database including 500 Chinese persons. There are 250 females and 250 males in the database. Everyone has a 3D face data with neutral expression and without accessories. Original high-resolution 3D face data is acquired by the CyberWare 3D scanner in given environment, Every 3D face data has been preprocessed, and cut the redundant parts. Now the face database is available for research purpose only. The Multimedia and Intelligent Software Technology Beijing Municipal Key Laboratory in Beijing University of Technology is serving as the technical agent for distribution of the database and reserves the copyright of all the data in the database.
The Bosphorus Database is a new 3D face database that includes a rich set of expressions, systematic variation of poses and different types of occlusions. This database is unique from three aspects: (1) The facial expressions are composed of judiciously selected subset of Action Units as well as the six basic emotions, and many actors/actresses are incorporated to obtain more realistic expression data; (2) A rich set of head pose variations are available; (3) Different types of face occlusions are included. Hence, this new database can be a very valuable resource for development and evaluation of algorithms on face recognition under adverse conditions and facial expression analysis as well as for facial expression synthesis.
PUT Face Database consists of almost 10000 hi-res images of 100 people. Images were taken in controlled conditions and the database is supplied with additional data including: rectangles containing face, eyes, nose and mouth, landmarks positions and manually annotated contour models. Database is available for research purposes.
The Basel Face Model (BFM) is a 3D Morphable Face Model constructed from 100 male and 100 female example faces. The BFM consists of a generative 3D shape model covering the face surface from ear to ear and a high quality texture model. The model can be used either directly for 2D and 3D face recognition or to generate training and test images for any imaging condition. Hence, in addition to being a valuable model for face analysis it can also be viewed as a meta-database which allows the creation of accurately labeled synthetic training and testing images. To allow for a fair comparison with other algorithms, we provide both the training data set (the BFM) and the model fitting results for several standard image data sets (CMU-PIE, FERET) obtained with our fitting algorithm. The BFM web page additionally provides a set of registered scans of ten individuals, together with a set of 270 renderings of these individuals with systematic pose and light variations. These scans are not included in the training set of the BFM and form a standardized test set with a ground truth for pose and illumination.
The plastic surgery face database is a real world database that contains 1800 pre and post surgery images pertaining to 900 subjects. Different types of facial plastic surgeries have different impact on facial features. To enable the researchers to design and evaluate face recognition algorithms on all types of facial plastic surgeries, the database contains images from a wide variety of cases such as Rhinoplasty (nose surgery), Blepharoplasty (eyelid surgery), brow lift, skin peeling, and Rhytidectomy (face lift). For each individual, there are two frontal face images with proper illumination and neutral expression: the first is taken before surgery and the second is taken after surgery. The database contains 519 image pairs corresponding to local surgeries and 381 cases of global surgery (e.g., skin peeling and face lift). The details of the database and performance evaluation of several well known face recognition algorithms is available in .
The Iranian Face Database (IFDB), the first image database in middle-east, contains color facial imagery of a large number of Iranian subjects. IFDB is a large database that can support studies of the age classification systems. It contains over 3,600 color images. IFDB can be used for age classification, facial feature extraction, aging, facial ratio extraction, percent of facial similarity, facial surgery, race detection and other similar researches.
The Biometric Research Centre at The Hong Kong Polytechnic University developed a real time NIR face capture device and used it to construct a large-scale NIR face database. The NIR face image acquisition system consists of a camera, an LED light source, a filter, a frame grabber card and a computer. The camera used is a JAI camera, which is sensitive to NIR band. The active light source is in the NIR spectrum between 780nm - 1,100 nm. The peak wavelength is 850 nm. The strength of the total LED lighting is adjusted to ensure a good quality of the NIR face images when the camera face distance is between 80 cm - 120 cm, which is convenient for the users. By using the data acquisition device described above, we collected NIR face images from 335 subjects. During the recording, the subject was first asked to sit in front of the camera, and the normal frontal face images of him/her were collected. Then the subject was asked to make expression and pose changes and the corresponding images were collected. To collect face images with scale variations, we asked the subjects to move near to or away from the camera in a certain range. At last, to collect face images with time variations, samples from 15 subjects were collected at two different times with an interval of more than two months. In each recording, we collected about 100 images from each subject, and in total about 34,000 images were collected in the PolyU-NIRFD database.
The Biometric Research Centre at The Hong Kong Polytechnic University established a Hyperspectral Face database. The indoor hyperspectral face acquisition system was built which mainly consists of a CRI's VariSpec LCTF and a Halogen Light, and includes a hyperspectral dataset of 300 hyperspectral image cubes from 25 volunteers with age range from 21 to 33 (8 female and 17 male). For each individual, several sessions were collected with an average time space of 5 month. The minimal interval is 3 months and the maximum is 10 months. Each session consists of three hyperspectral cubes - frontal, right and left views with neutral-expression. The spectral range is from 400 nm to 720 nm with a step length of 10 nm, producing 33 bands in all. Since the database was constructed over a long period of time, significant appearance variations of the subjects, e.g. changes of hair style and skin condition, are presented in the data. In data collection, positions of the camera, light and subject are fixed, which allows us to concentrate on the spectral characteristics for face recognition without masking from environmental changes.
The MOBIO database consists of bi-modal (audio and video) data taken from 152 people. The database has a female-male ratio or nearly 1:2 (100 males and 52 females) and was collected from August 2008 until July 2010 in six different sites from five different countries. This led to a diverse bi-modal database with both native and non-native English speakers. In total 12 sessions were captured for each client: 6 sessions for Phase I and 6 sessions for Phase II. The Phase I data consists of 21 questions with the question types ranging from: Short Response Questions, Short Response Free Speech, Set Speech, and Free Speech. The Phase II data consists of 11 questions with the question types ranging from: Short Response Questions, Set Speech, and Free Speech. The database was recorded using two mobile devices: a mobile phone and a laptop computer. The mobile phone used to capture the database was a NOKIA N93i mobile while the laptop computer was a standard 2008 MacBook. The laptop was only used to capture part of the first session, this first session consists of data captured on both the laptop and the mobile phone.
Texas 3D Face Recognition database (Texas 3DFRD) contains 1149 pairs of facial color and range images of 105 adult human subjects. The images were acquired at the company Advanced Digital Imaging Research (ADIR), LLC (Friendswood, TX), formerly a subsidiary of Iris International, Inc. (Chatsworth, CA), with assistance from research students and faculty from the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin. This project was sponsored by the Advanced Technology Program of the National Institute of Standards and Technology (NIST). The database is being made available by Dr. Alan C Bovik at UT Austin. The images were acquired using a stereo imaging system at a high spatial resolution of 0.32 mm. The color and range images were captured simultaneously and thus are perfectly registered to each other. All faces have been normalized to the frontal position and the tip of the nose is positioned at the center of the image. The images are of adult humans from all the major ethnic groups and both genders. For each face, is also available information about the subjects' gender, ethnicity, facial expression, and the locations 25 anthropometric facial fiducial points. These fiducial points were located manually on the facial color images using a computer based graphical user interface. Specific data partitions (training, gallery, and probe) that were employed at LIVE to develop the are also available.
The database contains both spontaneous and posed expressions of more than 100 subjects, recorded simultaneously by a visible and an infrared thermal camera, with illumination provided from three different directions. The posed database also includes expression images with and without glasses. The paper describing the database is available .
The FEI face database is a Brazilian face database that contains a set of face images taken between June 2005 and March 2006 at the Artificial Intelligence Laboratory of FEI in Sao Bernardo do Campo, Sao Paulo, Brazil. There are 14 images for each of 200 individuals, a total of 2800 images. All images are colourful and taken against a white homogenous background in an upright frontal position with profile rotation of up to about 180 degrees. Scale might vary about 10% and the original size of each image is 640x480 pixels. All faces are mainly represented by students and staff at FEI, between 19 and 40 years old with distinct appearance, hairstyle, and adorns. The number of male and female subjects are exactly the same and equal to 100.
ChokePoint video dataset is designed for experiments in person identification/verification under real-world surveillance conditions using existing technologies. An array of three cameras was placed above several portals (natural choke points in terms of pedestrian traffic) to capture subjects walking through each portal in a natural way. While a person is walking through a portal, a sequence of face images (ie. a face set) can be captured. Faces in such sets will have variations in terms of illumination conditions, pose, sharpness, as well as misalignment due to automatic face localisation/detection. Due to the three camera configuration, one of the cameras is likely to capture a face set where a subset of the faces is near-frontal. The dataset consists of 25 subjects (19 male and 6 female) in portal 1 and 29 subjects (23 male and 6 female) in portal 2. In total, the dataset consists of 54 video sequences and 64,204 labelled face images.
The University of Milano Bicocca 3D face database is a collection of multimodal (3D + 2D colour images) facial acquisitions. The database is available to universities and research centers interested in face detection, face recognition, face synthesis, etc. The UMB-DB has been acquired with a particular focus on facial occlusions, i.e. scarves, hats, hands, eyeglasses and other types of occlusion wich can occur in real-world scenarios.
The primary use of VADANA is for the problems of face verification and recognition across age progression. The main characteristics of VADANA, which distinguish it from current benchmarks, is the large number of intra-personal pairs (order of 168 thousand); natural variations in pose, expression and illumination; and the rich set of additional meta-data provided along with standard partitions for direct comparison and bench-marking efforts.
MORPH database is the largest publicly available longitudinal face database. The MORPH database contains 55,000 images of more than 13,000 people within the age ranges of 16 to 77. There are an average of 4 images per individual with the time span between each image being an average of 164 days. This data set was comprised for research on facial analytics and facial recognition.
LDHF database contains both visible (VIS) and near-infrared (NIR) face images at distances of 60 m, 100 m and 150 m outdoors and at a 1 m distance indoors. Face images of 100 subjects (70 males and 30 females) were captured; for each subject one image was captured at each distance in daytime and nighttime. All the images of individual subjects are frontal faces without glasses and collected in a single sitting.
This unique 3D face database is amongst the largest currently available, containing 3187 sessions of 453 subjects, captured in two recording periods of approximately six months each. The Photoface device was located in an unsupervised corridor allowing real-world and unconstrained capture. Each session comprises four differently lit colour photographs of the subject, from which surface normal and albedo estimations can be calculated (photometric stereo Matlab code implementation included). This allows for many testing scenarios and data fusion modalities. Eleven facial landmarks have been manually located on each session for alignment purposes. Additionally, the Photoface Query Tool is supplied (implemented in Matlab), which allows for subsets of the database to be extracted according to selected metadata e.g. gender, facial hair, pose, expression.
The Dataset consists of multimodal facial images of 52 people (14 females, 38 males) acquired with a Kinect sensor. The data is captured in two sessions at different intervals (of about two weeks). In each session, 9 facial images are collected from each person according to different facial expressions, lighting and occlusion conditions: neutral, smile, open mouth, left profile, right profile, occluded eyes, occluded mouth, side occlusion with a sheet of paper and light on. An RGB color image, a depth map (provided both as a bitmap depth image and a text file containing the original depth levels sensed by Kinect) as well as the associated 3D data are provided for all samples. In addition, the dataset includes 6 manually labeled landmark positions for every face: left eye, right eye, tip of the nose, left side of mouth, right side of mouth and the chin. Other information, such as gender, year of birth, ethnicity, glasses (whether a person wears glasses or not) and the time of each session are also available.
The data set contains 3,425 videos of 1,595 different people. All the videos were downloaded from YouTube. An average of 2.15 videos are available for each subject. The shortest clip duration is 48 frames, the longest clip is 6,070 frames, and the average length of a video clip is 181.3 frames. In designing our video data set and benchmarks we follow the example of the 'Labeled Faces in the Wild' LFW image collection. Specifically, our goal is to produce a large scale collection of videos along with labels indicating the identities of a person appearing in each video. In addition, we publish benchmark tests, intended to measure the performance of video pair-matching techniques on these videos. Finally, we provide descriptor encodings for the faces appearing in these videos, using well established descriptor methods.
The dataset consists of 151 subjects, specifically Caucasian females, from YouTube makeup tutorials. Images of the subjects before and after the application of makeup were captured. There are four shots per subject: two shots before the application of makeup and two shots after the application of makeup. For a few subjects, three shots each before and after the application of makeup were obtained. The makeup in these face images varies from subtle to heavy. The cosmetic alteration is mainly in the ocular area, where the eyes have been accentuated by diverse eye makeup products. Additional changes are on the quality of the skin due to the application of foundation and change in lip color. This dataset includes some variations in expression and pose. The illumination condition is reasonably constant over multiple shots of the same subject. In few cases, the hair style before and after makeup changes drastically.
The VMU dataset was assembled by synthetically adding makeup to 51 female Caucasian subjects in the FRGC dataset. We added makeup by using a publicly available tool from Taaz. Three virtual makeovers were created: (a) application of lipstick only; (b) application of eye makeup only; and (c) application of a full makeup consisting of lipstick, foundation, blush and eye makeup. Hence, the assembled dataset contains four images per subject: one before-makeup shot and three aftermakeup shots.
The MIW dataset contains 125 subjects with 1-2 images per subject. Total number of images is 154 (77 with makeup and 77 without makeup). The images are obtained from the internet and the faces are unconstrained.
The 3D Mask Attack Database (3DMAD) is a biometric (face) spoofing database. It currently contains 76500 frames of 17 persons, recorded using Kinect for both real-access and spoofing attacks. Each frame consists of: (1) a depth image (640x480 pixels – 1x11 bits); (2) the corresponding RGB image (640x480 pixels – 3x8 bits); (3) manually annotated eye positions (with respect to the RGB image). The data is collected in 3 different sessions for all subjects and for each session 5 videos of 300 frames are captured. The recordings are done under controlled conditions, with frontal-view and neutral expression. The first two sessions are dedicated to the real access samples, in which subjects are recorded with a time delay of 2 weeks between the acquisitions. In the third session, 3D mask attacks are captured by a single operator (attacker). If you use this database please cite this publication: N. Erdogmus and S. Marcel. "", in IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), 2013. Source code to reproduce experiments in the paper: https://pypi.python.org/pypi/maskattack.lbp
The Senthilkumar Face Database contains 80 grayscale face images of 5 people (all are men), including frontal views of faces with different facial expressions, occlusions and brightness conditions. Each person has 16 different images. The face portion of the image is manually cropped to 140x188 pixels and then it is normalized. Facial images are available in both grayscale and colour images.
This database contains 18000 video frames of 640x480 resolution from 60 video sequences, each of which recorded from a different subject (31 female and 29 male). Each video was collected in a different environment (indoor or outdoor) resulting arbitrary illumination conditions and background clutter. Furthermore, the subjects were completely free in their movements, leading to arbitrary face scales, arbitrary facial expressions, head pose (in yaw, pitch and roll), motion blur, and local or global occlusions.
The SiblingsDB contains two different datasets depicting images of individuals related by sibling relationships. The first, called HQfaces, contains a set of high quality images depicting 184 individuals (92 pairs of siblings). A subset of 79 pairs contains profile images as well, and 56 of them have also smiling frontal and profile pictures. All the images are annotated with, respectively, the position of 76 landmarks on frontal images and 12 landmarks on profile images. For each individual the information on sex, birth date, age (the highest and average age differences between siblings are 30 and 4.6 years, respectively) and votes of the panel of human raters (who were asked to evaluate if the couples depict siblings or not) are also available. The second DB, called LQfaces, contains contains 98 pairs of siblings (196 individuals) found over the Internet, where most of the subjects are celebrities. The position of the 76 frontal facial landmarks are provided as well, but this dataset does not include the age information and human expert ratings were not collected since this dataset is composed mainly of well-known personages and, hence, likely to produce biased ratings.
The dataset consists of 26,580 images, portraying 2,284 individuals, classified for 8 age groups, gender and including subject labels (identity). It is unique in its construction: The sources of the images included in this set are Flickr albums, assembled by automatic upload from iPhone5 or later smartphone devices, and released by their authors to the general public under the Creative Commons (CC) license. This constitutes the largest, fully unconstrained collection of images for age, gender and subject recognition.
Large face datasets are important for advancing face recognition research, but they are tedious to build, because a lot of work has to go into cleaning the huge amount of raw data. To facilitate this task, we developed an approach to building face datasets that detects faces in images returned from searches for public figures on the Internet, followed by automatically discarding those not belonging to each queried person. The FaceScrub dataset was created using this approach, followed by manually checking and cleaning the results. It comprises a total of 107,818 face images of 530 celebrities, with about 200 images per person. As such, it is one of the largest public face databases.
Frontalization is the process of synthesizing frontal facing views of faces appearing in single unconstrained photos. Recent reports have suggested that this process may substantially boost the performance of face recognition systems. This, by transforming the challenging problem of recognizing faces viewed from unconstrained viewpoints to the easier problem of recognizing faces in constrained, forward facing poses. Authors provide frontalized versions of both the widely used Labeled Faces in the Wild set (LFW) for face identity verification and the Adience collection for age and gender classification. These sets, (LFW3D and Adience3D) are made available along with our implementation of the method used for the frontalization.
Indian Movie Face database (IMFDB) is a large unconstrained face database consisting of 34512 images of 100 Indian actors collected from more than 100 videos. All the images are manually selected and cropped from the video frames resulting in a high degree of variability interms of scale, pose, expression, illumination, age, resolution, occlusion, and makeup. IMFDB is the first face database that provides a detailed annotation of every image in terms of age, pose, gender, expression and type of occlusion that may help other face related applications.
The goal of this project is to mine facial images and other important information for the Wikipedia Living People category. Currently, there are over 0.5 million biographic entries, and the number continues to grow. Unlike other data sets, such as the Labeled Faces in the Wild () and , the Labeled Wikipedia Faces (LWF) comes from Wikipedia, which is a creative common resource. In addition to these faces, useful meta data are released: the source images, image captions (if available), and person name detection results (through a named entity detector). So, mining experiments can also be performed. This is an unique property of this benchmark compared to others. The Labeled Wikipedia Faces (LWF) is a dataset with 8.5k faces for about 1.5k identities.
It is a database of 10,168 natural face photographs of all different individuals, and major celebrities removed. This database was made by randomly sampling Google Images for randomly generated names based on name distributions in the 1990 US Census. Because of this methodology, the distribution of the faces matches the demographic distribution of the US (e.g., age, race, gender). The database also has a wide range of faces in terms of attractiveness and emotion. Ovals surround each face to eliminate any background effects. Additionally, for a random set of 2,222 of the faces, we have demographic information, attribute scores (attractiveness, distinctiveness, perceived personality, etc), and memorability scores included with the images, to help researchers create their own stimulus sets.
Denver Intensity of Spontaneous Facial Action (DISFA) Database is a non-posed facial expression database for those who are interested in developing computer algorithms for automatic action unit detection and their intensities described by FACS. This database contains stereo videos of 27 adult subjects (12 females and 15 males) with different ethnicities. The images were acquired using PtGrey stereo imaging system at high resolution (1024×768). The intensity of AU’s (0-5 scale) for all video frames were manually scored by two human FACS experts. The database also includes 66 facial landmark points of each image in the database.
BU-3DFE (Binghamton University 3D Facial Expression) includes 100 subjects with 2,500 facial expression models. The BU-3DFE database is available to the research community (e.g., areas of interest come from as diverse as affective computing, computer vision, human computer interaction, security, biomedicine, law-enforcement, and psychology). The database contains 100 subjects (56% female, 44% male), ranging age from 18 years to 70 years old, with a variety of ethnic/racial ancestries, including White, Black, East-Asian, Middle-east Asian, Indian, and Hispanic Latino.
To analyze the facial behavior from a static 3D space to a dynamic 3D space, BU-3DFE Database is extended and a new database is formed: BU-4DFE (3D + time): A 3D Dynamic Facial Expression Database. A newly created high-resolution 3D dynamic facial expression database are presented, which is made available to the scientific research community. The 3D facial expressions are captured at a video rate (25 frames per second). For each subject, there are six model sequences showing six prototypic facial expressions (anger, disgust, happiness, fear, sadness, and surprise), respectively. Each expression sequence contains about 100 frames. The database contains 606 3D facial expression sequences captured from 101 subjects, with a total of approximately 60,600 frame models. Each 3D model of a 3D video sequence has the resolution of approximately 35,000 vertices. The texture video has a resolution of about 1040×1329 pixels per frame. The resulting database consists of 58 female and 43 male subjects, with a variety of ethnic/racial ancestries, including Asian, Black, Hispanic/Latino, and White.
Because posed and un-posed (aka “spontaneous”) 3D facial expressions differ along several dimensions including complexity and timing, well-annotated 3D video of un-posed facial behavior is needed. Therefore, newly developed 3D video database of spontaneous facial expressions in a diverse group of young adults is introduced - BP4D-Spontanous: Binghamton-Pittsburgh 3D Dynamic Spontaneous Facial Expression Database. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains using both person-specific and generic approaches. The work promotes the exploration of 3D spatiotemporal features in subtle facial expression, better understanding of the relation between pose and motion dynamics in facial action units, and deeper understanding of naturally occurring facial action. The database includes 41 participants (23 women, 18 men). They were 18-29 years of age; 11 were Asian, 6 were African-American, 4 were Hispanic, and 20 were Euro-American. An emotion elicitation protocol was designed to elicit emotions of participants effectively. Eight tasks were covered with an interview process and a series of activities to elicit eight emotions. The database is structured by participants. Each participant is associated with 8 tasks. For each task, there are both 3D and 2D videos. As well, the metadata include manually annotated action units (FACS AU), automatically tracked head pose and 2D/3D facial landmarks. The database is in the size of about 2.6 TB (without compression).
The DMCSv1 database is a multimodal biometric database (DMCS stands for Department of Microelectronics and Computer Science and v1 indicates version number of the database). The database contains 3D face and hand scans. It was acquired using the structured light technology. According to our knowledge it is the first publicly available database where both sides of a hand were captured within one scan.
Although there is a large amount of research examining the perception of emotional facial expressions, almost all of this research has focused on the perception of adult facial expressions. There are several excellent stimulus sets of adult facial expressions that can be easily obtained and used in scientific research (i.e., NimStim, Ekman faces). However, there is no complete stimulus set of child affective facial expressions, and thus research on the perception of children making affective facial expression is sparse. In order to fully understand how humans respond to and process affective facial expressions, it is important to have this understanding across a variety of means. The Child Affective Facial Expressions Set (CAFE) is the first attempt to create a large and representative set of children making a variety of affective facial expressions that can be used for scientific research in this area. The set is made up of 1200 photographs of over 100 child models (ages 2-8) making 7 different facial expressions - happy, angry, sad, fearful, surprise, neutral, and disgust.
Unconstrained Facial Images (UFI) is a novel real-world database that contains images extracted from real photographs acquired by reporters of the Czech News Agency (ČTK). It is mainly intended to be used for benchmarking of the face identification methods, however it is possible to use this corpus in many related tasks (e.g. face detection, verification, etc.). Two different partitions of the database are available. The first one contains the cropped faces that were automatically extracted from the photographs using the Viola-Jones algorithm. The face size is thus almost uniform and the images contain just a small portion of background. The images in the second partition have more background, the face size also significantly differs and the faces are not localized. The purpose of this set is to evaluate and compare complete face recognition systems where the face detection and extraction is included. Each photograph is annotated with the name of a person.
This database contains IRTT (Institute of Road and Transport Technology) students of both colour and gray scale facial images. There are 317 facial images for 13 IRTT students. They are of same age factor around 23 to 24 years. The images along with background are captured by canon digital camera of 14.1 megapixels resolution. The actual size of cropped faces 550x780 and they are further resized to downscale factor 5. Out of 13, 12 male and one female. Each subject have variety of face expressions, little makeup, scarf, poses and hat also.
The database version 1.2 contains IRTT students of both colour and gray scale faces. There are 100 facial images for 10 IRTT girl students (all are female) with 10 faces per subject with age factor around 23 to 24 years. The colour images along with background are captured with a pixel resolution of 480x640 and their faces are cropped to 100x100 pixels.
This IRTT student video database contains one video in .mp4 format. Later more videos will be included in this database. The video duration is 55.938 seconds and contains 30 frames with resolution of 720x1280. This video is captured by smart phone. The faces and other features like eyes, lips and nose are extracted from this video separately.
Virginia Tech - Arab Academy for Science & Technology (VT-AAST) Bench-marking Dataset is a color face image database for benchmarking of automatic face detection algorithms and human skin segmentation techniques. It is named the VT-AAST image database, and is divided into four parts. Part one is a set of 286 color photographs that include a total of 1027 faces in the original format given by our digital cameras, offering a wide range of difference in orientation, pose, environment, illumination, facial expression and race. Part two contains the same set in a different file format. The third part is a set of corresponding image files that contain human colored skin regions resulting from a manual segmentation procedure. The fourth part of the database has the same regions converted into grayscale. The database is available on-line for noncommercial use.
The database is designed for providing high-quality HD multi-subject banchmarked video inputs for face recognition algorithms. The database is a useful input for offline as well as online (Real-Time) Video scenarios. The database has primarily three classified subjects (3) and two not-classified (unknown/un-labeled) subjects available in 30 fps - High Definition Video (Full HD - 1080p) video developed at School of Engineering & Applied Science, Ahmedabad University in 2016.
The IIIT-CFW is database for the cartoon faces in the wild. It is harvested from Google image search. Query words such as Obama + cartoon, Modi + cartoon, and so on were used to collect cartoon images of 100 public figures. The dataset contains 8928 annotated cartoon faces of famous personalities of the world with varying profession. Additionally, we also provide 1000 real faces of the public figure to study cross modal retrieval tasks, such as, Photo2Cartoon retrieval. The IIIT-CFW can be used for the study spectrum of problems, such as, face synthesis, heterogeneous face recognition, cross modal retrieval, etc. (Please use this database only for the academic research purpose)
Facial Expression Research Group Database (FERG-DB) is a database of stylized characters with annotated facial expressions. The database contains multiple face images of six stylized characters. The characters were modelled using the MAYA software and rendered out in 2D to create the images. The database contains facial expression images of six stylized characters. The images for each character is grouped into seven types of expressions - anger, disgust, fear, joy, neutral, sadness and surprise.
Large Age-Gap (LAG) dataset is a dataset containing variations of age in the wild, with images ranging from child/young to adult/old. The dataset contains 3,828 images of 1,010 celebrities. For each identity at least one child/young image and one adult/old image are present.
The SoF dataset is a collection of 42,592 (2,662×16) images for 112 persons (66 males and 46 females) who wear glasses under different illumination conditions. The dataset is FREE for reasonable academic fair use. The dataset presents a new challenge regarding face detection and recognition. It is devoted to two problems that affect face detection, recognition, and classification, which are harsh illumination environments and face occlusions. The glasses are the common natural occlusion in all images of the dataset. However, the glasses are not the sole facial occlusion in the dataset; there are two synthetic occlusions (nose and mouth) added to each image. Moreover, three image filters, that may evade face detectors and facial recognition systems, were applied to each image. All generated images are categorized into three levels of difficulty (easy, medium, and hard). That enlarges the number of images to be 42,592 images (26,112 male images and 16,480 female images). Furthermore, the dataset comes with a metadata that describes each subject from different aspects. The original images (without filters or synthetic occlusions) were captured in different countries over a long period. // Usage: 1 - Gender classification; 2 - Face detection; 3 - Facial landmark estimation; 4 - Emotion Recognition; 5 - Eyeglasses detection; 6 - Age classification.
The CyberExtruder Ultimate Face Matching Data Set contains 10,205 images of 1000 people scraped from the internet. The data set is unrestricted, as such, it contains large pose, lighting, expression, race and age variation. It also contains images which are artistic impressions (drawings, paintings etc.) and a multitude of occlusion (hats, glasses, makeup). All images have size 600 x 600 pixels and are stored with jpeg compression.
The IST-EURECOM Light Field Face Database includes data from 100 subjects, captured by a Lytro ILLUM camera in two 1-6 months separated sessions, with 20 samples per each person per session. To simulate multiple scenarios, the images are captured with several facial variations, covering a range of emotions, actions, poses, illuminations, and occlusions. The database includes the raw light field images, 2D rendered images and associated depth maps, along with a rich set of metadata. The first part of the database, captured at Instituto de Telecomunicações - Instituto Superior Técnico, Lisbon, Portugal can be accessed at http://www.img.lx.it.pt/LFFD/. The second part, captured at EURECOM, SophiaTech Campus, Nice, France can be accessed at
The Makeup Induced Face Spoofing (MIFS) dataset consists of 107 makeup-transformations taken from random YouTube makeup video tutorials. Each subject is attempting to spoof a target identity. Hence this dataset consists of three sets of face images: images of a subject before makeup; images of the same subject after makeup with the intention of spoofing; and images of the target subject who is being spoofed.
The RAVDESS is a validated multimodal database of emotional speech and song. The database is gender balanced consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity, with an additional neutral expression. All conditions are available in face-and-voice, face-only, and voice-only formats. The set of 7356 recordings were rated by 319 adult participants. High levels of emotional validity and test-retest intrarater reliability were reported, as described in our . All recordings are made freely available under a Creative Commons license, non-commerical license.
Face recognition research community has prepared several large-scale datasets captured in uncontrolled scenarios for performing face recognition. However, none of these focus on the specific challenge of face recognition under the disguise covariate. The Disguised Faces in the Wild (DFW) dataset has been prepared in order to address these limitations. The proposed DFW dataset consists of 11,157 images of 1,000 subjects. The dataset contains a broad set of unconstrained disguised faces, taken from the Internet. The dataset encompasses several disguise variations with respect to hairstyles, beard, mustache, glasses, make-up, caps, hats, turbans, veils, masquerades and ball masks. This is coupled with other variations with respect to pose, lighting, expression, background, ethnicity, age, gender, clothing, hairstyles, and camera quality, thereby making the dataset challenging for the task of face recognition. The paper describing the database and the protocols is available .