personal-repository-space usage

ideas to encourage productive use of our new personal-repository-space

February 24, 2011

8 min, 1576 words

2023-October update: Due to the advent of easy-to-use network storage, and heavy usage of the Brown Digital Repository for digital-collections, the personal-upload feature, though still available, gets very little use.

At the 2011 Personal Digital Archiving conference, held at the Internet Archive in San Francisco on February 24 & 25 -- I presented an idea to help ensure the Brown Digital Repository's newly-released personal-upload-space is used well.

Notes on the conference: are here and here.
The video is here.
Below are my notes for the talk.

Setup

Phone ringer off.
Phone backlight permanently on.
If computer, energy-saver longer than 15 minutes.
Backup:
- USB stick with presentation in pdf.
- Printout of talk.

Context

Hi. I'm Birkin Diana, a programmer for the Brown University Library.

This is a talk about how our Library repository team aims to promote productive usage of personal repository space that the Library is offering to students and faculty.

First, some context...

The Brown Digital Repository (the 'BDR'), developed by the Library, has multiple campus roles. It is

an institutional repository,
a platform for Library and departmental digital projects,
a personal digital archive for students and faculty.

It's this last piece -- the personal digital archive of the BDR -- that I want to focus on.

Uploader intent & concerns

We recently built the personal-upload component of the BDR, and plan to roll it out this semester. This will enable students and faculty to upload, describe, and manage levels of access to uploaded items. In an ideal world, students and faculty would store in their personal BDR space materials both personally important, and of academic relevance -- this is the 'productive usage' I mentioned previously.

Ignoring grey areas, and to give a simple usage example, we hope students will find this a useful space to store, for example, drafts or final copies of class-projects, and faculty members will find this a useful space to store, for example, in-progress or completed class presentations -- as opposed to these groups using their personal BDR-space to store, say, unedited personal vacation video.

So intended use is a concern. A related, but different concern crystallized after hearing a terrific keynote Cathy Marshall gave at the 2010 code4lib conference, in which she noted that, for far too many of us, the most common strategy for managing our personal digital assets is 'benign neglect' -- because it's time-consuming to organize and curate digital materials, it's easy to let stuff just build up. Because we've made it easy to deposit items, there is a real risk that the personal BDR-space could become a 'digital junk drawer' -- a mix of undifferentiated materials, some important, some disposable, in which items of value are difficult to recognize.

Working assumption

Our strategy for dealing with these concerns is based on a key working assumption...

If repository tools can

help users easily navigate their content -- and
help them assess its usefulness -- and
offer scholarly resources related to their content...

...the items in their personal BDR space will be more likely to constitute a useful personal archive which is more likely to mesh with the University's academic mission.

Metadata

This assumption itself assumes good metadata.

Quality item-metadata makes each of these goals -- navigation, utility-assessment, and related-resources discovery -- much more achievable. In the academic Library world, we talk endlessly about metadata -- but not everyone does, so to clarify the term, the 'metadata' about a digital photograph, to pick an example, might include the name of the photographer, the photo's title, its geo-location, and information about the camera -- as opposed to the photo data itself: numerical representations of color information.

Academic Libraries for a period of time built digital repositories requiring general users to enter extensive metadata for items -- long forms had arcane labels and required valid input -- only to see sparse use. "We built it; they didn't come." was a common refrain. Thus our decision to require only minimal metadata for personal uploads. Along the lines of flikr, we require only a title & tag -- we're conciously trading away the guarantee of quality metadata for ease-of-submission (as an aside, we do offer the ability to add more extensive metadata) -- despite the fact that good metadata is central to achieve our goals.

Improving metadata

So, how to enrich the metadata for these personal items? A couple of approaches.

The first is automated behind-the-scenes data-mining to extract useful information from items or from other sources -- without the user having to do anything at all. One consumer-level example of this is Apple's 'Places' feature in iPhoto, which extracts geo-location data from photographs, and offers useful visuals based on that data. A good commercial example is Amazon's "You may also be interested in this item" feature, which uses background processes mining other users' purchases to display useful information.

A second approach also is founded on automated background processes, but with an explicit assumption that OPTIONAL user-interactivity can improve the results. Apple's 'Faces' feature in iPhoto is an example of this. Its pattern-recognition software 'recognizes' faces it thinks belong to the same individual, and passively offers the user the ability to easily correct its results, improving future pattern-recognition results. This paradigm of optional user input for better results is also found in TiVo's thumbs-up/thumbs-down buttons, the use of which improve TiVo's suggestions.

The experiment

The above two approaches to enriching metadata are obvious candidates to put in place. But we're particularly curious about a third approach, which is the focus of an experiment we plan to conduct. It, too, assumes the automated behind-the-scenes processes just discussed. But user-interaction is more central. We want to test the hypothesis that people will be willing to spend nano-blocks of time organizing and describing their materials. How?

Specifically, 'occasionally' -- only 'occasionally' -- when using the BDR, users will be asked a single question which will take no more than 5 seconds to answer -- and with dismissal always an option.

The questions will focus on three axes...

Information about the item
Information about the importance of the item
Information about relationships between items

Regarding information about the item, a question might ask, "Might any of these three tags be applicable?". These possible tags could be derived from the "first-approach" data-mining -- of, say, text extracted from doc and pdf files and run through a subject-extractor -- or from running user-supplied tags against authoritative subject APIs.

Regarding information about the importance of the item, importance can depend on multiple factors. A simple direct question could be to ask for a low/medium/high ranking. A question to elicit temporal importance might be: "If you had to guess, how long do you envision this item being important to you?" If a student or professor is just working on a draft, the utility time-frame might be less than if the item is a final copy.

Regarding information about the relationships between items, if direct or derived metadata indicates a possible relationship between some items, and they're not already grouped via our 'collections' feature, a question might show a bit of information about two items, and ask "Are these two items related? If so, please optionally add a tag or two to describe the relationship."

Feedback loop

Our plan is to highlight for users -- in a sidebar and an informational page:

how questions they've answered have enhanced their ability to navigate their personal repository. For example, via additional tags, and via possible collections which enable better browsing and searching,
how questions they've answered have improved their ability to determine the utility of their items. For example, that sidebar could display a section labeled "Do I still need these?" -- and offer a toggle to view their items color-coded or sorted by importance.
how questions they've answered have resulted in better, more relevant academic resources being displayed. For example, that sidebar could display a saction labeled 'Relevant Journal Articles' -- and show that the journal articles are now for the topic 'ancient Egypt' instead of, previously, just 'Egypt'.

Experiment assessment

Finally we plan to collect data to assess how users feel about this interactive process. Every question would have two additional options:

A "thumbs-up / thumbs-down" toggle indicating:
- "I see how you're trying to be helpful, but please ask these questions less often."
- or: "I see how these questions are useful; please ask them a bit more often."
An "I'm ignoring this" button -- essentially, a cancel button -- the 'dismissal' option I mentioned earlier.

From this, we can get a sense of users' experience of this interactive occasional process for enhancing their metadata -- as well as update the weighted randomization algorithms controlling the frequency and types of questions asked.

Conclusion

To end...

We're looking into whether any university repositories have implemented the obviously useful automated data-mining background-processes that have become commonplace in the consumer-commercial realm.

And outside of simple targeted educational 'flashcard' applications -- or annoying surveys -- we're not aware of much use at all of the the required-interaction approach described.

We welcome feedback, and references to pre-existing work in these areas.

Thank you.