Category Archives: First Semester Blog

Voyant Software for Researchers

Voyant Software for Researchers

The Voyant software package is a powerful program that allows researchers to find common terms and their relationships that appear in a large “corpus” (or collection of individual text files).  It can do word usage analyses by document and across the corpus, revealing patterns and relationships that might not be readily apparent to a researcher when considering using a collection.  Its main product is Wordclouds showing the prevalence and relationship of key terms and Graphing that shows the frequency of usage of a term or terms across the corpus.  Searches can be customized down to an individual file or grouping of files.  Individual words can be displayed in context in each individual document, another useful feature.

I was able to use the corpus of text files from the WPA Slave Narrative found in the Guttenberg Project to analyze the entire corpus and selected state collections.  I was able to create Wordclouds around key terms to see how key terms were related to other terms in a specific collection and across the entire corpus.  I was also able to examine individual documents and see key terms used in context.  I was able to graph the prevalence of terms across the corpus and within individual collections.  All of these enabled me to better understand some of the limitations of how these transcriptions were made as well as to compare the work of different transcribers in different regions.

While the Voyant software was relatively easy to use, it worked much better after I downloaded it to my computer rather than just using the online version.   Although my background is not in graphing, I did find the information helpful.


Metadata Review

This is a review of the metadata for an item in the New York Public Library’s digital collections:

What features of the item does the metadata describe?

  • Its title: Radio Corporation of America (RCA) – Bud Abbott (in costume as Father Time), Lou Costello (in costume as baby), and Grover Whalen with Bulova Talking Clock
  • Names: New York World’s Fair (1939-1940 : New York, N.Y.) (Creator)
  • Its location:  NY World’s Fair Collection, 1939-40
  • Dates/Origin: When it was created (within a decade; i.e. 1935-1945)
  • Library Locations: Archives and Manuscript Division, along with physical shelf locator
  • Topics: Three overall broad topics: Exhibits; NY World’s Fair; Radio Corporation of America
  • Type of resource: Still image
  • Languages: English
  • Identifiers: NYPL catalog number; MSS unit ID; Universal Unique ID (UUID)
  • Rights statement: Copyright undetermined
  • Item Timeline of Events: When created (approx.), when digitized, when “found by you”

What features does it not describe?

  • Birth/death dates of people portrayed in the image
  • Biographical identifiers for the people in the image (although some information is provided on the scan of the reverse of the image)
  • Name (Creator) attributes the image to the World’s Fair but no to its photographer or source

What questions does the metadata allow you to ask?

  • You can search by:
    • Name of individuals in the image
    • Physical items in the image (A search by “clock radio” produced this image; although not a “direct hit,” it was of interest in the history of “talking radios”)
    • Type of media
    • When image was created (broadly within a decade)
    • Location
    • Three types of catalog numbers (NYPL; NYPL’s MS #; UUID)
    • Rights
    • Topics (although somewhat limited; search by “film comedians” did not produce this image)

What questions does it not allow you to ask?

  • Searching was limited to 3 broad topics.


Database Review: ArtStor


  • Description: ArtStor was originally founded by the Andrew Mellon Foundation to support the digitization of art objects for education and research.  It has grown considerably to encompass “approximately 300 collections composed of over 2.5 million images (and growing).”  It is primarily used by teachers as a source of images for classroom and online pedagogy, as well as by scholars for research activities.  An annual membership fee is charged to a host institution to allow its faculty to access the site; most images are limited to classroom or pedagogic use.  The site does have a collection of open-access images that can be used freely. A not-for-profit organization, Artstor is now a part of ITHAKA, which also operates JStor.
  • Overview: Search Options/Information on Digitization:  Search options in ArtStor include by collection or the source of the imagery  (at GMU, this includes GMU collections), personal and public collections, and “groups,” broken down by “institutional” (indicating particular art classes taught at GMU), and “ArtStor curated,” which includes usage categories like “AP Art History” and also broad subject areas (“Paintings”).   You can also search by broad categories of types of art or “Classifications,” (“Architecture and City Planning”; “Film, Audio, Video, and Digital Art”), “Geography” (countries and regions), and “Teaching Resources,” a grab bag of selected images by common teaching focusses, including individual artists, types of art, etc.

ArtStor also supports search by Keyword; Creative Commons Zero License; Public Collections (collections that are open for free usage); File Type; Wildcards and Punctuation; and File Type.

ArtStor provides only general information on the digitization of its contents, saying it provides “high-quality images for education and research.”  It notes that it provides “high-quality metadata from the collection catalogers, curators, institutions, and artists themselves.”  However, the term “high quality” is not defined in concrete terms for the overall collection.

  • Facts
    • Date range:  No specified date range on the site, but digitized items date back to prehistory to today.
    • Publisher: ITHAKA, a not-for-profit organization helping the academic community use digital technologies to preserve the scholarly record and to advance research and teaching in sustainable ways.
    • Publisher About page:
    • Object type: Digital files with metadata
    • Location of original materials:
    • Exportable image?: Yes, within the terms of usage that limit to use for educational purposes within the host institution.
    • Facsimile image?: Yes.
    • Full text searchable?: No.
    • Titles list links: Yes.
  • History / Provenance:
    • Original catalog?:  ArtStor was originated by the Mellon Foundation as a means of digitizing art images for use in arts education.  It was born as an online database.
    • Digitized from microfilm?:  Individual images come from various collections and each may have digitized from various sources, including microfilm.
    • Original sources:  See above; the sources digitized depend on the provenance of each collection.
  • Reviews: Although several articles discussing ArtStor and its features were located, most were 10 years old or older.  Because of the changing nature of both its features and the growth of its collections, these reviews have their limitations but still raise important issues.  The following are listed in reverse chronological order.
    • Slobuski, Teresa. Digital Image Databases: A Study from the Undergraduate Point of View. Art Documentation, Fall 2011, Vol. 30, No. 2 (Fall 2011): 49-55,  Author compares ArtStor with the commercial art database, Bridgeman Images.   While the reviewer found that ArtStor had more images, Bridgeman had better search capabilities for the content of images, and often turned up more relevant images on the first results page.  Also critiqued was ArtStor’s “hands off” approach to Metadata, meaning there was a great variety in tagging of search terms among its many collections.
    • Schroeder, Eunice. ARTstor Review.  Notes , Second Series, Vol. 65, No. 3 (Mar., 2009): 546-549, Reviewer focuses on the musicological-related images in the database; criticizes the difficulty of using the site’s search interface due to inconsistent data provided by each collection, and variety in the quality and resolution of scans; but praises its ease of use.
    • Caviness MH. Reproducing Works of Art Held in Museums: Who Pays, Who Profits? Diogenes. 2006;53(3):45-52. doi:10.1177/0392192106069011:  Caviness discusses the need for scholars to have free access to reproductions of artworks.  Caviness approves of ArtStor’s promotion of the idea that museums profit from scholars’ use of their images, although she does note that the service charges universities a usage fee by year.
    • Parsons, Sarah. Beyond the Slide Library?: Facing the Digital Future of Art History. RACAR: revue d’art canadienne / Canadian Art Review , Vol. 30, No. 1/2, (2005): 114-25,  This reviewer focuses the initial resistance to art professors’ switching from traditional digital slide libraries to online sources.  Concerns over quality, accessibility, and range of available images.  Author generally praises ArtStor, particularly the ability to download images for class usage, and finds that its breadth of material and teaching tools are useful.  Sustainability and non-commercial status of the resource were also praised.
  • Access:  The site’s terms and conditions can be found here: An annual subscription fee is required to access the full collection through a school or institution.
  • Info from Publisher:
  • Citing:; ©2000-2020 ITHAKA. All Rights Reserved. JSTOR®, and Artstor®, ITHAKA® are registered trademarks of ITHAKA.

The Perils of Digitization

  • What can you capture, and not capture, when you digitize something?

Digitization currently is limited to either capturing the appearance of an item (through a scan or photograph), the sound of an image (through an audio file), or a motion picture (through a video file).  You are not able to capture the smell or texture of an item.  And no matter what format you use, the digital image is not an exact reproduction of an item, even when a photograph is scanned or an analog sound recording is digitized.  There are “losses” that may occur depending on the software used, plus the user may manipulate the image or audio file in ways to change it from its original state.

  • Which forms of digitization make the most sense for different types of items?

Texts are the most easily digitized, although you do lose the physical presence of the original object.  Photographs or 2-D objects can be scanned, with the caveat that the new scan is not the same as the original.  Objects that have a particular physical presence–a smell or texture as two examples–are less easily transferred to a digital representation.

  • To what extent does working with digitized representations impact how we understand different kinds of items, and/or our ability to use them for different purposes?

This depends on what kind of study you are making.  If you are analyzing text, you probably aren’t losing much by not having the physical object available to you.  However, if you are analyzing an audio recording, artwork, or photograph, you are losing key elements of the original item if you rely solely on its digital image.