Wikidata professional development – Wiki Education

Improving equity on Wikipedia using Wikidata

Will Kent — Thu, 31 Aug 2023 17:34:12 +0000

Do you ever wonder where Wikipedia articles come from? With a world of knowledge to represent, it’s a big question. At Wiki Education, we are especially concerned with Wikipedia being an equitable and representative resource. Whether it’s a museum of paintings, a library full of volumes of books, or an online encyclopedia, systematic bias is inherent in every collection and Wikipedia is not immune to it. So when we think about where Wikipedia articles come from, another question we must answer is how do we ensure Wikipedia has articles to make it a more representative resource?

With support from the Nielsen Foundation’s Data for Good grants program, we have been developing a free and open Wikipedia resource that encourages editors to create articles to improve representation of diverse groups and topics on Wikipedia. There are some amazing projects that are working to address this issue on Wikipedia that have been around for a few years — Women in Red, Art + Feminism, Black Lunch Table — to name a few. It’s our hope that this tool can complement the work of these projects.

For example, Women in Red, uses Wikidata, a linked data knowledge base that connects all language versions of Wikipedia, to generate lists of articles that could exist in English Wikipedia, but don’t yet. Taking a page out of their book, we are creating a resource that allows community members to do the same thing, but with a broader set of demographic variables. In addition to individuals who identify as women, we have constructed pages that list thousands of potential articles based around sexual orientation, nationality, disability status, and ethnicity.

A screenshot of the Gender page from the Equity lists showing a list of individuals without English Wikipedia articles.

These lists query the other language versions of Wikipedia and pull only the results that don’t have English language articles. From there, community members can select individuals and generate English language versions of the articles. Since these articles exist in other language versions of Wikipedia, the idea is they already pass notability – a major requirement for articles to exist – and have references. The article writing process will still take time, but it saves some effort not starting from scratch. Check out our resource here.

I know what you’re thinking — can this get any cooler? And the answer is yes! Wiki Education has been developing and maintaining the Dashboard for the past few years. The Dashboard allows instructors and individuals to create courses that are scoped to a set of students/Wikipedians/edit-a-thon attendees, etc. – basically any set of individuals that want to participate in whatever the course is. Another feature is the ability to frame a course around a list of articles. Using the same query from our resource, anyone using a Dashboard can scope it to one of the lists we’ve developed. The idea here is to encourage Dashboard users to select articles about underrepresented groups or individuals and write them for English Wikipedia. Follow this link for an example of an article-scoped Dashboard. Heads up — clicking the PSID list will take some time to load because it is large.

A list of individuals generated from PetScan

And this, my friends, is one place where Wikipedia articles come from.

To review: we’re building a tool that encourages community members to write articles to increase the visibility of diverse groups and topics on Wikipedia. We’re doing this using Wikidata, queries, a list tool called Listeria, articles scoping on the Dashboard, and the hard work of anyone taking a Dashboard course or attending an event that uses the Dashboard. Although systemic bias and underrepresentation will remain a significant problem on Wikipedia and beyond, we hope this tool can push new and old users alike to edit in a way that helps to improve representation on the platform. As the community and these tools mature, we also hope others can refine and adapt it to their specific needs. An amazing thing about pulling from Wikidata is users can narrow and expand queries to generate new lists. For example, these lists are configured to improve English Wikipedia, but in a snap they can point to other language versions.

We’re still tinkering and ironing out the wrinkles, but we hope to have this up and running soon. Get ready to make some edits.

Thank you, Nielsen Foundation, for helping us leverage Wikidata for good!

Will Kent — Tue, 31 Jan 2023 22:09:39 +0000

At Wiki Education, we spend a lot of time working to make Wikipedia and Wikidata more representative of the world we live in. Many of our courses focus on content gaps about historically marginalized communities, so that our programs and the greater Wikipedia editing community can systematically tackle them at scale. Unfortunately, there have been few tools to assist in addressing this issue at scale – until now. Thanks to the Nielsen Foundation’s generous support through their 2022 Data for Good grants program, we are designing a portal focused on equity that will identify representation gaps on Wikipedia and Wikidata, and allow us to use our courses to help close them.

The instant availability of knowledge on your personal devices has revolutionized how we learn about the world around us. When you ask Google about a topic or pose a question to a virtual assistant like Alexa, the answer you get will likely come from Wikidata. That makes the open data repository an essential resource that we must make sure reflects the fullness of human knowledge. Limited coverage on Wikipedia and Wikidata of historically excluded populations and notable women has not reflected their historical importance. One of the potential causes of these gaps is that the majority of Wikipedia’s editing community are white and male. Wiki Education is committed to addressing these opportunities for growth and expanding both the editing population and coverage of historically marginalized communities on Wikidata and beyond.

Currently, groups of Wikipedia editors surface content gaps on Wikipedia manually, often through online common spaces called WikiProjects. We are inspired by the massive success of Women in Red, a WikiProject focused on expanding and adding articles about women on Wikipedia. Thanks to dedicated volunteer editors, the number of biographies about women has increased from 15% of all Wikipedia biographies to 19% since October 2014. Considering that there are almost 2 million biographies today on the English Wikipedia, 4% is quite a jump. While more progress needs to be made, the project has helped add much-needed visibility and credibility to women’s accomplishments that will inspire generations of leaders.

Using Wikidata in concert with Wikipedia provides a place to build a tool that can scale this important work further. Using Women in Red as a model, our online portal will allow the Wikipedia community to use information queried from Wikidata to tackle the gaps in knowledge in an organized way. Women in Red relies heavily on Wikidata queries to generate lists of women who do not yet have Wikipedia articles. With this approach, we will scope the queries to different demographics and create new lists of articles that do not exist on Wikipedia. We will leverage our portal to provide insights into the types of courses that we offer in our Scholars & Scientists Program.

We will also add this portal to the “Finding your article” training module on our Dashboard’s library of resources for student editors participating in our Wikipedia Student Program. This tool would guide students to edit Wikipedia articles that need the greatest amount of attention. We believe that the broad community who looks to Wiki Education for tools and resources will also benefit from this portal for their own initiatives and across languages.

Wiki Education’s new transformative portal will deepen the engagement of new and current program participants by empowering them to quickly assess the topics and communities most in need of improvement and representation on Wikipedia.

At the same time, we want to acknowledge that data about the personal identity of prominent figures is extremely sensitive and personal. We want everyone to know that in order for this kind of data to exist on Wikipedia, it must have a reliable source backing up that fact. It’s our hope that this portal will help encourage better sourcing, correcting errors, and a better ability to identify inaccurate or potentially harmful data from winding up (and staying) on Wikidata and Wikipedia.

Throughout this year, I’ll be developing a working prototype of the online portal and gathering feedback from the Wikimedia community. I’ll use Wikidata to test the functionality of the portal and add demographic properties that can be selected by Wikipedia editors to identify gaps in coverage of historically marginalized communities. We’re excited to leverage this portal to improve Wikipedia’s coverage of underrepresented groups and help volunteers provide millions of readers with more equitable information.

The Future of Data: a community that grows together stays together

Cassidy Villeneuve — Thu, 22 Dec 2022 21:41:33 +0000

Wiki Education hosted webinars all of October to celebrate Wikidata’s 10th birthday. Below is a summary of our fourth event. Watch the webinar in full on Youtube. And access the recordings and recaps of the other three events here.

For our fourth and final webinar celebrating Wikidata’s birthday, Hilary Thorsen, Julian Chambliss, Kate Topham, and Justin Wigard each shared how they invite newcomers into the linked open data fold. What does Wikidata allow that other platforms don’t? What advice do they have for getting people started? And what do we mean when we say we’re building a “community of practice”?

From upper left: Kate Topham, Will Kent, Julian Chambliss, Justin Wigard, and Hilary Thorsen in our webinar.

Hilary got her start with Wikidata as Wikimedian-in-Residence for the Linked Data for Production Project. While there, she helped library colleagues advance their own projects and had fun answering their linked data questions. She decided to capture that expertise and disseminate it even more widely through the LD4 Wikidata affinity group and has been doing so since April 2019.

Justin is a Postdoctoral Research Fellow in the Distant Viewing Lab at University of Richmond, where he works and teaches courses on comics and popular culture. For Justin, Wikidata provides fruitful ways of thinking about community engagement and facilitating open data work in the humanities classroom. He’s thinking about ways we can connect the dots between the classroom, the academy, and Wikidata’s global community of users.

Kate is a Digital Archivist at Michigan State University (MSU). She specializes in metadata, data migration, and digital collections. Kate got into Wikidata as a form of data cleaning through her work in Open Refine, which she utilizes so often she refers to it as her “software spouse.” She’s interested in using Wikidata for research and making things that are hidden more visible to everyday people.

Julian is a professor of English and the Val Berryman Curator of History at the MSU Museum at Michigan State University. He leads the Department of English Graphic Possibilities Research Workshop, a group that brings comic studies faculty and graduate students together to contribute to Wikidata. Justin and Kate have contributed extensively to Graphic Possibilities–Justin as a recent PhD graduate of MSU and Kate in her archivist role. Together, the group is creating a data set from MSU’s library of comic art metadata collection to share with the world.

What Wikidata allows that other platforms don’t

In Wikidata, you can describe collection items in more depth and nuance, disrupting library authority and traditional modes of collaboration. The possibilities are endless when you can crowd-source corrections to your data and share results with an audience that spans the globe.

Julian, Justin, and Kate often consult Wikipedia to fill in missing data in their catalogs, which is how they found their way to Wikidata. They appreciated the abundance of information already in the repository, but also saw the gaps. Filling them was a worthwhile pursuit, not only for the project but for the many researchers that would come after them. “We began to think of Wikidata as a means of providing that information about comics that would really enhance peoples’ ability to write about them,” Julian said. Wikidata allows you to provide detail and nuance to an item in an unparalleled way. That freedom was an attractive feature for Hilary, too. “With Wikidata, the sky’s the limit,” she added. “And you can find anything that interests you and add it if it’s notable enough. I found that exciting.”

Julian is curious about how we can make nuances around culture more visible in a data record. Wikidata is useful in surfacing the omissions in a record, especially related to race and gender. “We can’t change the library record [to be more inclusive],” Julian noted. “But we can do something in Wikidata that has a substantive impact in peoples’ ability to understand what the record is showing, or what it doesn’t show. Questions of race and metadata are linked in a way that’s a challenge, but it’s something we have to wrestle with.”

Other Wikidatans can help. As the Graphic Possibilities team were combing their collection, they discovered some errors in their bibliographic data. “All our Marmaduke comics were attributed to the wrong person. And that same problem existed in a lot of places,” Kate said. “By bringing together this community in Wikidata we could figure out where the errors were and that community of knowledge and practice allowed for us to improve.”

“I love the way Wikidata disrupts library authority,” Kate continued. “We can incorporate different expertise and ways of seeing the world. The way we structure things is better because we can draw on so many different communities.”

Wikidata is a tool for examining topics in new, multidisciplinary ways. Justin invites his humanities students to create visualizations about comics in the platform, where they see the instant ramifications of their work. “They think about how their work extends beyond the classroom, beyond the gated silo of academia. And for me, I can connect with colleagues I didn’t know before. It’s not just linked data, it’s linked people.”

The value of the Wikidata community has been a through-thread across our Speakers Series. “Before, cataloging had been internal and focused only on what I was working on at my institution,” Hilary shared. “But with Wikidata it becomes so much easier to collaborate with people around the world and contribute to other projects and learn something new. It broadens the way you can contribute and it’s a more accessible practice too. You can start participating in linked data immediately, which before was really hard to do. Overall, the community is what drew me to Wikidata and what makes all the contributions so worthwhile and keeps me coming back.”

Wikidata as a “community of practice”

Wikidata provides a forum for anyone to participate in discussions around data integrity. With archives of past discussions, decisions are transparent and up for friendly debate. And Wikidatans share a deep interest in adapting until we get it right. As the Graphic Possibilities team said more than once, it’s a community of practice.

Given that the platform can be a little overwhelming at first, it’s important to give newbies different modes of entry and participation. “That’s more sustainable for the long run,” Hilary said of her work with LD4. “People don’t always have time to join every call or working hour, but because we have consistent programming, people know that if they miss a week they can join the next week.” Justin, who helps lead Wikidata edit-a-thons with Graphic Possibilities, noted that the platform was great for both synchronous and asynchronous work as the pandemic forced them to transition to remote work. “We had to try to find ways to reach folks who were not fluent in comics or Wikidata or might not be digital experts, but still wanted to be part of a community.”

When asked what was most helpful in building community around the Graphic Possibilities project, Kate thought of two things. “Hilary Thorsen and Will Kent,” Kate said with a smile. “There’s so much within Wikidata and Wikipedia that we joke about satisfies the need for nerds to correct each other. And I feel like both of you have provided a model for this very generous, opening space that makes working with linked data, and Wikidata in particular, a lot easier. This whole thing is a big conversation and we get to decide what the best way forward is.”

Advice for bringing others into linked data

The Graphic Possibilities team has successfully invited comics-interested scholars from across institutions to join them in edit-a-thons and build their own capacity around linked open data. Having scaffolded events with clear, narrowly defined goals is helpful in fostering this community of learning. “It’s easy to get lost in the weeds, so we set firm boundaries about what to work on, what to avoid, and we have really clear tutorials and troubleshoot issues,” Justin shared. “Wikidata can be overwhelming if you’re not prepared for it. Having that support is helpful. And recognizing that smaller goals can be just as effective as something lofty. We actually started to scale our projects back so we can achieve more with less.”

Preparation is also key. Keep events focused and small, but have a back-up plan for what to work on in case you finish early. And be prepared to let people pursue their interests. “Allowing for creativity within your scoped event can be powerful and fun,” Kate added.

The future of Wikidata

Wikidata has grown so much over the last 10 years – it just hit 100 million items this year. We only see it becoming more important to library curricula, job training, and the World Wide Web as a whole.

“It’s a necessary skill,” Hilary said. “Five years from now, you’ll want to have that on your resume.”

“Wikidata and other open source repositories are going to become increasingly necessary and relevant as other avenues of data become more monitored, privatized, siloed,” said Justin. “There’s something really powerful and amazing about Wikidata and the fact that it’s grown so much over 10 years. … I want to see more of that, more projects, in more classrooms. I want to see what other people do with it that I haven’t thought about.”

“Understanding data becomes a fundamental question of civil society,” Julian added. “I’m no Wikidata expert, but I do recognize the tremendous potential in Wikidata to support really interesting conversations. How does a data description actually translate to how society operates? How do we tell stories with data? Students at some level have been born consumers of technology but explaining how it works is a real problem for them. Data especially is particularly complicated for them. I’ve said, you know, these platforms aren’t actually free. The thing they’re selling is you. If you don’t have a sense of data literacy, you’re going to be in trouble. If you get a little sense of it, you begin to understand that data is intrinsically connected to your life.”

Check out LD4 here and Graphic Possibilities here.

If you’re the kind of learner who seeks community and guidance on your journey, the Wikidata Institute has three upcoming training courses starting in January, March, and May 2023.

Another successful Wikidata Project with UVA

Will Kent — Thu, 08 Dec 2022 18:55:04 +0000

We have wrapped up another round of excellent work with the University of Virginia (UVA) Data Science capstone project. Capstone work entails having students collaborate with community partners using data science methodology and some powerful computing to provide new insights about a dataset. This is Wiki Education’s second round working with a UVA capstone group and I’m excited to share their hard work with you. I want to acknowledge the hours of processing, analyzing, and making sense of Wikidata’s data that the UVA team – Quinton Mays, Antoine Edelman, and Olu Omosebi – did. They were an excellent team and I’m proud of their work.

This group started with a classic challenge on Wikidata: how do we know what we are describing (given a little data, can we guess what a thing/entity is?) and what properties do we use to describe any given thing? Phrased differently – how do we know how complete or incomplete something is? This is hard to answer for many reasons.

There are millions of different kinds of things in Wikidata (people, countries, organizations)
There are multiple ways to describe these things (how do you describe an organization?)
Even if you know what something is, how do you know what’s missing or what to add to it? (is this a complete description of an organization?)

Sounds tough, but I’ve got good news. Even if we know very little about an item, a little data science magic can predict a lot about what your mystery item may be.

In their paper, “Review of Knowledge Graph Embedding Models for Link Prediction on Wikidata Subsets,” this group analyzed different subsets of items on Wikidata (countries, people, bridges, and films to name a few). They ran several algorithms through these sets to sort them and make guesses as to what the items may be. They found that some worked better than others and recommend them for future use in prediction tools. This could have an impact on evaluating data quality, consistency, and item completeness, which are some essential metrics on Wikidata. So how did they do this?

Let’s take a look at those subsets they selected. From this list you can start to guess how Wikidata describes these things. Countries and bridges have locations. Humans must have a place of birth. Films almost always have a director and actors. Bridges must start somewhere and end somewhere. This set of descriptions used to describe something is known as a schema or shape (don’t think geometry – think a specific set of things used to explicitly define or describe something). Their research also takes into account these shapes and considers how these items relate to other items. Sticking with humans as the example, a specific person has a two way relationship with their parents. A date of birth would be a one way relationship. And a teacher of a class of students would be a one-to-many relationship. For the information architecture superfans, these specific relationships are called cardinality. The group analyzed item cardinality and data models among these subsets within Wikidata.

So can something as small as analyzing these basic relationships reveal that much? It turns out that this is foundational for identification and recommendation features. Adding complexity reveals more and more about data models and makes identification easier and more accurate. In their analysis, they ran fifty-four different algorithms to analyze and identify items. A major takeaway is that these different algorithms can successfully process Wikidata at this level, but selecting subsets (a set of humans, a set of countries) will likely yield better results since there is more consistency in those subsets. Subsets process faster, requiring fewer resources. The paper details their rationale for rating these different programs and they recommend a few for link prediction on Wikidata. Best of all? They share all of their findings as a set of analysis tools on a Github page for anyone to use.

Let’s return to our initial question: how do we know what we’re describing? It turns out analyzing basic relationships between sets of things can reveal a great deal about those things. Since Wikidata is machine readable, knowing about these relationships can allow for the creation of recommendation tools (like Recoin) so Wikidata community members can make better edits on Wikidata. These kinds of tools could also be used to identify erroneous information and take guesses at what an unknown item or entity may be. All of this encourages better data consistency, quality, and completeness.

As great as Wikidata is, it’s not perfect. The community regularly deals with inconsistency, missing descriptions, and data that’s misplaced, out of date, or just wrong. This kind of work from UVA is exactly what is needed to make Wikidata even better. There’s a lot of work left to do and tools like what this group produced are an important step in engaging with some towering Wikidata challenges. We hope that the Wikidata community (and others outside of it) find these tools and approaches helpful in analyzing other knowledge bases and using the results to improve the data even more.

A special thanks again to Olu, Quinton, and Antoine, and the UVA data science department for supporting this work.

Want to learn Wikidata or brush up on your skills? We have online training courses starting in January, March, and May 2023. Visit wikiedu.org/learn to learn more.

Scaling and sustaining a Wikidata Initiative

Cassidy Villeneuve — Thu, 27 Oct 2022 23:03:37 +0000

Wiki Education is hosting webinars all of October to celebrate Wikidata’s 10th birthday. Below is a summary of our third event. Watch the webinar in full on Youtube. And access the recordings or recaps of other events here.

So far, we’ve covered the state of Wikidata and cultural heritage 10 years in and what you need to know to kickstart a Wikidata Initiative of your own. Last week, Will Kent brought additional experts together to reflect on scaling and sustaining Wikidata work within cultural institutions. Dr. Anne Chen, an art historian and archaeologist, joined us from the International (Digital) Dura-Europos Archive. Ian Gill is a Collections Information Systems Specialist at SFMOMA. Dr. Stephanie Caruso is a Giorgi Family Foundation Curatorial Fellow at the Art Institute of Chicago. Previously, she was a Postdoctoral Fellow in Byzantine Art/Archaeology at the Dumbarton Oaks Research Library and Collection, where she worked with Bettina Smith, the current Manager of Image Collections and Fieldwork Archives. All four speakers completed a course through our Wikidata Institute at some point in the last three years, and we’ve loved watching their Wikidata Initiatives grow.

llustrated notes featuring our speakers by Dr. Jojo Karlin via Twitter. Rights reserved.

What does Wikidata allow that makes it unique from other platforms?

For Anne, Wikidata provides an opportunity to collaborate across continents and languages in a way she and her archaeological colleagues have never been able to do. She can draw together disparate artifacts and rebuild archaeological contexts virtually. And because Wikidata’s interface is set up for translation into many different languages, Anne and her team can invite their global colleagues to interact with their records, some of whom will have access to these records in their native language for the very first time. “Because of the democratic nature of Wikidata, we can pull additional people from all over the world into the conversation about linked open data at a relatively early stage.”

For Bettina, Wikidata is the place where Dunbarton Oaks’ collections can compare and contrast similar collections around the world. “That kind of aggregated search has been tantalizingly promised by linked open data for so many years,” said Bettina. “Wikidata is the first real manifestation of it.”

New research is possible from there, which is what Stephanie is particularly excited about. “With Wikidata you can work with much broader data in one consolidated place,” she said. “The questions you can ask of the material wouldn’t be possible if you had to go to each archive. That would be way too much work and too slow going.” When she and Bettina began cataloging collections of Syrian origin, they noticed that item names varied across different languages. Traditional repositories might ask to privilege one language over another. Not Wikidata. “Having a QID that is translatable between all these issues makes it possible to get a fuller depth of research.” And to that, Anne added: For how many generations have researchers been reinventing the same research? If someone can point to a Q number and no one has to do that work again, imagine! It’s easier to build on each other’s research if we don’t have to reinvent the wheel.”

At SFMOMA, Ian populates Wikidata records based on their permanent collection as part of the Artist Identities Project. Wikidata helps him represent artist information more ethically, generating metrics about who his institution exhibits and acquires each year, potentially informing the institutions’ future decisions. “A lot of museums are trying to do this work, and Wikidata is the central repository for it,” Ian said.

And it’s not just museums who are interested in improving linked open data. “I noticed there were Wikidata users that were enhancing our records, saying ‘oh this exhibition actually went to this other venue too.’ I could then add that information to our records. It’s cool to interact with others with the same goals.” It goes both ways. When institutions make improvements to Wikidata, that information has the potential to start a ripple effect. And in return, the institution benefits from access to a more complete repository. “The idea that content generated by amazing editors within the Wikidata community could be reabsorbed into a collection database and used for collection ends in the future is really exciting,” Anne added.

Editing Wikidata is also personally satisfying. Seeing your work out there with immediate effects is metadata’s version of instant gratification! “And it challenges the paradigm that your work has to go through and be checked by traditional levels of authority,” Will chimed in. “I forget sometimes that doing something for the first time, having attained this new skill, is something tangible, compelling, and addictive,” Anne added. “It’s a rush!” Bettina added with a smile.

How did they convince others at their institution to support them?

Most of us are new to editing, and even as we learn, Wikidata evolves. So how do you convey the opportunities it presents to your institution if it’s not a static platform and the possibilities are limitless? For Anne, learning enough of the basics to convey the value of a larger project was huge. “As an art historian and archaeologist, I went into the linked open data sphere feeling uncomfortable in my technical knowledge,” she said. “So I had to start at the beginning and develop content in Wikidata that I could use to demonstrate the promise. Ultimately the thing that got traction was not just talking about abstract ideas, but pointing to a case study. From there, I can talk about all the things I haven’t done yet and how this could be better if we all contributed to it and if we had buy-in from the institution to go full scale.” And once she and her team had a case study, they were able to apply for larger scale funding from the National Endowment of the Humanities–which they received!

Although Wikidata is a strategic fit for cultural institutions, many are hesitant about participating in an open platform where anyone can change anything. Stephanie had some ideas for calming nerves: “I tell them, ‘You already did a good job creating a stable URL for each object in the collection. If someone clicks on it through Wikidata, they will go to your website. There’s a unique property for a Met ID, something that links back to the Met’s site and the owners’ explanation of the object. That can reassure people that regardless of what’s happening on Wikidata, you’re not changing the authority of the institution.”

Presenting a “handbrake option” can also be reassuring. “Anything on Wikidata that is erroneous or disputed can be reverted,” Anne shared. “I’ve also found it useful to talk about the history of the edits that have been made to a particular object on Wikidata. Thinking from an archival perspective, the idea that there’s a record that there was a dispute about an object is an important facet for the next generation and for thinking about how we can more responsibly engage with multiple perspectives with the content we’re managing.” “It’s also worth making the point that if you don’t do it, someone else could,” Bettina added. “And they might not do it the way you would do it.”

What are the key elements for sustaining a project?

According to our speakers, the main elements for success are some combination of the following: Passion. Supervisor support for your time. Other colleagues’ help. Funding opportunities are also nice. And above all else, expertise and continual learning.

“The Wikidata Institute is probably the best possible resource,” Bettina shared. “There’s also things like LD4, the Wikidata interest group that meets every other week. And I’m a member of ARLIS, the Art Library Society of North America, and they have a Wikidata interest group that meets once a month. Those are useful ways to find out about tools and things that I would not otherwise have known about.” “Going through lists of tools that people have developed is also cool. That’s how I found QuickStatements!” Ian added. “I’ll also put in a plug for discussion pages and the Wikidata telegram channel,” Anne said. “As a new user I was a little intimidated about revealing my ignorance on certain issues or how to do certain things. But at Will’s encouragement, and as part of the course, we got to realize that everyone is learning something and the community is helping each other grow.”

How do they see Wikidata influencing their field in the next 5 or 10 years?

Anne sees promise in the multilingual collaborative nature of Wikidata and the effect that it could have for equity in her field at large. “I’m doing work that deals with cultural heritage material from Syria and I would love to partner with other institutions and offer Wikidata trainings. The payoff of that could be huge. For a project like mine, we could get more diverse perspectives looking at the content that we’re creating.”

Ian pointed out that there’s a lot more internal work to be done within cultural institutions to make things public. “I expect wider adoption of Wikidata in five years for sure. In terms of the Artist Identities Project, a lot of other museums are working on that and it has come up in meetings where people say, ‘What if there were a central repository we could pull from?’ And I get to say, ‘That exists! It’s Wikidata!’”

“Innately as a librarian, archivist, and reformed cataloguer, Wikidata just makes sense,” Bettina added. “I didn’t know it existed before two years ago and now I’m presenting on it! I’m seeing that rapid increase in interest in a lot of my library colleagues and other institutions and I think it’s just gonna grow exponentially from here. If there are any cataloguers in the audience, you can do it—I promise!”

Check out Ian’s project here, Bettina and Stephanie’s here, and Anne’s here and here.

If you’re the kind of learner who seeks community and guidance on your journey, the Wikidata Institute has three upcoming training courses starting in November, January, and March.

What you need to know to kickstart a Wikidata Initiative

Cassidy Villeneuve — Wed, 19 Oct 2022 15:13:57 +0000

Wiki Education is hosting webinars all of October to celebrate Wikidata’s 10th birthday. Below is a summary of our second event. Watch the webinar in full on Youtube. Sign up for our next two events here.

After checking in on the state of Wikidata and cultural heritage last week, this week we wanted to explore how a Wikidata initiative comes into being. What possibilities do catalogers and metadata librarians see in Wikidata? How do they convince their institutions to get excited about it, too? And how can anyone start a Wikidata Initiative of their own? That’s what we aimed to find out in the second event of our Wikidata Birthday Speaker Series. “Wikidata Will” Kent gathered Joe Cera from Berkeley Law Library, Kiley Jolicouer, a Metadata Strategies Librarian at Syracuse University Libraries, and Chris Long, the Director of the Resource Description Services Team at University of Colorado Boulder Libraries.

What problem was your institution trying to solve with Wikidata?

Our panelists each found Wikidata in a different way. Joe’s repository didn’t use a common identifier and wasn’t easily editable. He wanted a system where he could assign a consistent authority across data items and make it easier to identify them in the repository.

Kiley’s team was migrating from a local digital collection system to a new digital asset management system. The new system didn’t have a native authority control capability, so she was looking for other methods of authority control and a way to integrate her team’s information with the cataloging department.

For Chris, his department was already interested in transitioning to open linked data. Plus, as his team’s director, he wanted a project big enough to give 12 people linked data experience, while also producing some real results that would bring value to the institution at large. In all three of these cases, Wikidata provided their solution.

How did you get buy-in at an institutional level?

“I found myself wondering, ‘Why isn’t everyone doing this?’ It’s an easy way to make your own information accessible to other people.”

One of the biggest draws of Wikidata is its ability to make massive amounts of data public like never before. Compared to other platforms, the interface is user-friendly and accessible. A self-proclaimed “low-tech poster boy”, Chris shared that he believes contributing to Wikidata is really something anyone can do with a little preparation. “I would love to convince more and more librarians that linked data and Wikidata aren’t that hard,” Joe added. “We just have to jump in and do it. To show people that there’s this real benefit to the profession generally and to institutions, too, to be engaging with it. You don’t have to be comfortable with technology. You just have to be willing to jump in.”

“Because there’s such a low barrier to entry, there have been a lot of opportunities to pull in other people who are not familiar with Wikidata,” Joe continued. “They can clearly see the connections in my project, even if they’re not comfortable interacting with the data. The ability to include other people at any step of the process has been really useful. I found myself wondering, ‘Why isn’t everyone doing this?’ It’s an easy way to make your own information accessible to other people.” Joe’s internal goal–defining common identifiers for his own repository–had external implications. That data was ready to be shared! “When it was time to use resources, we had already set ourselves up to add value to Wikidata.”

Wikidata is not only low-barrier and low-risk, but even the smallest contributions have big pay-offs. “The fact that you’re contributing to both the institutional value and to a global information community is compelling,” Kiley shared. “It’s so much bigger than the specific project you run. You’re creating something that will persist and snowball into something bigger. Once information is created on Wikidata, a little bit more gets added, then a little bit more and more, and then it’s so much bigger than what you started with. The net gain of it is enormous.”

There’s something to be said for engaging with the public and de-siloing collections in this new, far-reaching way. “As a cataloger or metadata librarian, you know people are seeing what you’re producing but you don’t necessarily see them seeing it,” Kiley pointed out. “But with Wikidata and Wikipedia, you can see that participation where people either challenge what you said or add to it. You see the ecosystem from a very different point of view.”

Kiley had used information from Wikidata before, but Wiki Education’s Wikidata training course was what prepared her to contribute to the repository herself and use it in a new way. Preparing data to be added to Wikidata did require upfront work, but Kiley says the possibilities are worth it. “With the way our linked data is structured, moving it over requires us to be more specific about it. But that also gives us the ability to make all that data publicly accessible, allowing users to query it in a way that’s different from just searching in the digital collections.”

For Chris, on the other hand, his Wikidata Initiative was a natural progression from existing linked data projects at his institution. “I was able to get that buy-in from the other catalogers, who saw it was an extension of something they had been doing for a long time.” After taking Wiki Education’s intro to Wikidata course, Chris then had the tools to bring others along with this new system. “I could show them that hey, this linked data thing is not so scary. And we’re beginning to see how we can use this in our production environment. Wikidata enables you to code in relationships that other platforms don’t allow you to do. The querying feature is so powerful that you can find relationships that you might not be able to otherwise.”

The challenge, he said, wasn’t convincing his team that Wikidata is valuable. Instead, it was getting them (and himself!) to think outside of their typical workflow. “I had to say, ‘There are so many more things that we can put in a Wikidata item than a NACO record.’ But NACO records have very stringent rules, while Wikidata is a little bit of the ‘Wild West’ for metadata. Giving ourselves the freedom to do these things is a mindshift. I found myself encouraging my folks saying, we don’t have to do the same things we did in NACO records. Let’s embrace all the possibilities we can.”

Wikidata presents an opportunity to think outside the box and beyond the closed systems many of us are accustomed to working within. “It’s clear that everyone on Wikidata has a different way of approaching the same thing,” Joe pointed out. “It depends on where you’re coming from.” And through the systems of consensus-building that Wikidata is known for, we arrive somewhere great together.

I want to start a Wikidata Initiative. What do I do?

“When doing any kind of project, but especially a project that’s dealing with data, it has to be iterative,” Kiley pointed out. “You’re going to find stuff that you did that you absolutely hate, that didn’t work, or isn’t quite working well enough. You have to be willing to backtrack. Whether it’s working in Wikidata itself or a local instance of Wikibase, that willingness to understand that even if you’re already familiar with what you’re doing, there’s still a learning curve with everything. There’s going to be adaptive decisions along the way.”

Chris had some advice of his own. “When you’re starting out, you have to decide at the beginning, what’s the scope of the project? How many resources do you have? Don’t try to take on too much for your initial project. Keep it small. The thing I like about Wikidata is that you don’t have to do everything. Other people can add information to your items. Wikidata items beget other Wikidata items. You don’t have to take on the whole world with your project, just try to learn and realize you’re not going to get it right the first time. You can go back and fix it as you learn more things. It’s a learning journey.”

And there are resources to help you learn. Above all else, you’re not alone. For Kiley, the best resource is other people. “The whole community–constellations of people working in different areas on different things–is really invaluable, regardless of what you’re trying to do. They’re so friendly and so helpful and willing to weigh in on ideas, whether it’s offering something you haven’t even considered or just helping you solve a problem that you can’t figure out on your own.” Joe chimed in too: “I expected someone to say, ‘You don’t know what you’re doing!’” But they didn’t. Instead, Joe benefited from the wisdom of other Wikidatans while working in a public space, free of a silo.

“You can contribute to Wikidata without being responsible for the frankly massive undertaking of having the information be as complete as possible,” said Kiley. “I think people misunderstand that as a weakness of Wikidata, that it’s not ‘complete,’ but I think that’s what makes it such a wonderful opportunity.”

“I’m actually creating, I’m not just experimenting,” Chris added. “And I can share that. It’s hard to demonstrate what value your cataloging team is adding. You know you’re doing it, but it’s hard to make it visible and understandable to others on campus. Wikidata is one way to show that you’re adding value to the campus community. To me, that’s what’s exciting.”

In five years, Chris sees Wikidata being even more incorporated into the production environment, especially with the Library of Congress incorporating it. Kiley hopes to see more discussion of both Wikidata and Wikipedia in the context of information literacy and data literacy. Really, the possibilities are endless. Sure, the “choose your own adventure” nature of Wikidata can be a bit overwhelming (thanks Joe for the apt phrase). But there’s also beauty in that freedom. As a Wikidatan, you’re a data wrangler in the Wild West of metadata. And that’s pretty cool.

Check out Kiley’s project here; Joe’s project here; and Chris’ projects here, here, and here.

Interested in choosing your own adventure, but don’t know where to start? Wiki Education has a vision for the future, too: that all librarians, archivists, museum professionals, and other linked data enthusiasts can participate in Wikidata with as few barriers to entry as possible. If you’re the kind of learner who seeks community and guidance on your journey, , the Wikidata Institute has three upcoming training courses starting in November, January, and March. Consider also signing up for our next 2 webinar events celebrating Wikidata’s birthday all of October.

The State of Wikidata and Cultural Heritage: 10 Years In

Cassidy Villeneuve — Fri, 07 Oct 2022 18:27:50 +0000

Wiki Education is hosting webinars all of October to celebrate Wikidata’s 10th birthday. Below is a summary of our first event. Watch Tuesday’s webinar in full on Youtube. Sign up for our next three events here.

Never before has the world had a tool like Wikidata. The semantic database behind Wikipedia and virtual assistants like Siri and Alexa is only ten years old this month, and yet with almost 100 million unique items, it’s the biggest open database ever. Wiki Education’s “Wikidata Will” Kent gathered key players in the Wikidataverse to reflect on the last ten years and set our sights on the next ten. Kelly Doyle, the Open Knowledge Coordinator for the Smithsonian Institution; Andrew Lih, Wikimedian at Large with Smithsonian Institution and Wikimedia strategist with the Metropolitan Museum of Art; and Lane Rasberry, Wikimedian in Residence at University of Virginia’s Data Science Institute discussed the “little database that could” (not so little anymore!).

Illustrated notes featuring our speakers by Dr. Jojo Karlin via Twitter. Rights reserved.

In our webinar (one of four this month celebrating Wikidata’s birthday), audience members joined us from libraries, universities, museums, galleries, and Wikimedia projects from all around the world. Kelly posed an important question to us: as knowledge professionals and stewards, what is our responsibility in building, curating, and tending to a database that reaches millions of people?

“We’ve really never had this opportunity,” said Andrew. “Folks from all different academic backgrounds, from different languages and cultures, can treat Wikidata’s taxonomy as a malleable lump of clay and try to converge on some version of consensus for how to model the world.” As the founder of Wikidata Denny Vrandecic and the Product Manager for Wikidata Lydia Pintscher have said, “Wikidata is an ontological playground.”

This playground is becoming more and more embedded in our online knowledge structures, connecting everyone to everything, everywhere. “Wikidata is the portal to the linked open web,” said Lane. “As soon as content gets into Wikidata, it reaches huge audiences around the world. Big tech companies index it. They start sending it in every direction. As does anyone else who wants access to a free and open database. Anyone can copy this stuff; anyone can recirculate it.”

Data science is a forming field, and it’s no different with Wikidata. As Andrew mentioned, it’s this malleability that makes the open repository so powerful. “If you get tapped into Wikidata, you get tapped into an ethical network,” Lane added. Even with its gaps and inaccuracies, there’s nothing else like it. “Who’s doing better at this?” Lane asked. “Who else has convened the global community to get together and have conversations about this? There is no ideal data set out there, but where are you going to find one better?”

Sure, we’re a long way to go from having the perfect repository. It will never exist, as Lane pointed out. But the radical beauty of Wikidata is how the community goes about striving for it anyway. As Will said, it’s the humanity inherent in Wikidata’s structure and culture that makes it different from other data repositories.

Even so, attempting to model the world through consensus is messy. “As anyone who dives into Wikidata knows, we’ve got a lot of inconsistencies, missing parts,” Andrew pointed out. “But boy, we’ve never had this opportunity before to try to do it collectively and collaboratively. When it works, it really works in ways that nothing else can. I think that’s one of the miracles of Wikidata.”

Becoming a Wikidata contributor enters you into this community that grapples with data ethics every day. The community, which spans countries and languages, discusses issues and precedents with transparency and openness. As problems appear, the community is designed to chew through them together. This is how Wikidata has come so far in a short time.

Will, our host, shared his own perspective as Wiki Education’s Wikidata Program Manager. Knowledge institutions, he suggested, are actually missing out if they’re not participating. “In my capacity, I teach a lot of courses and we work with a lot of professional institutions, and it might sound simplistic, but representation is huge. If you’re not on Wikidata, you can’t be linked to all these other things. So being more deliberate about what’s there versus what’s not is actually pretty radical. And being more thorough and accurate with all the data has a huge impact.” Knowledge institutions like the Met and MoMA consider it the authoritative place to disambiguate data. Their webpages feature Wikidata Q numbers now, rather than traditional powerhouses like Getty, because Wikidata is the biggest arts database out there. “The good news is that it wasn’t even hard to convince the Met of that,” said Andrew. “Now it’s just a matter of implementation.”

Kelly stressed that working on Wikidata is an efficient way for a single person or team to start a ripple effect in the informational stratosphere, especially since Wikidata is a semantic database. “For institutions like the Smithsonian or the Met who want to batch upload into Wikidata, that data can be read in over 200 languages with just one person doing the work,” Kelly shared. “Multilingual collaboration is real,” Andrew added. “It’s not just a theory. It actually happens with Wikidata. And it happens every second of every day.” “And it’s then impacting those language Wikipedias,” Kelly continued. “Especially in the gender gap space, where I primarily work, the question is why would we host an edit-a-thon if some of this content might be taken down or is not considered notable enough. We’re going to do all this research and it might not be able to stay on Wikipedia. But this is a great pivot to Wikidata because we can batch upload these lists of names and all of the biographical information behind it and have that in Wikidata because the notability threshold is lower. That’s really significant because we can then use what we put in Wikidata to build a case for later Wikipedia article creation.”

Wikidata and Wikipedia editing isn’t just beneficial to institutions. It’s a skillset that is becoming more relevant in the spheres of knowledge curation, creation, and archiving. “Wiki skills are professional skills,” Will chimed in. “For a lot of you attending this webinar, you do things that other people don’t do in your line of work. And that’s an asset.” One of Wiki Education’s goals with this Wikidata Speaker Series itself is to share innovative ways professionals are accomplishing their goals through Wikidata and hopefully inspire others to join this community as it influences more and more of the content people get on the internet.

And its impact will only continue to grow. “It’s important to remind folks how crucial Wikidata is to the fabric of knowledge now,” said Andrew. “Wikidata is being massively used in AI now for training, for trying to understand the world, for better or worse. It can be a little scary to think that they’re depending on Wikidata for the future of humanity. But it is the best assembly of human knowledge so far.”

So what’s next? The internet looks incredibly different now than it did ten years ago, and it will continue to adapt to meet people’s information needs. “Wikidata, Wikipedia, and the ecosystem gets a billion unique visitors a year,” Lane pointed out. “Big tech is doing some things in Wikidata. I’d like to counterbalance that with more museums and more universities getting involved.” That way, we can ensure a diverse group of experts will shape this ontological playground and share the best possible knowledge billions of times.

“When you add all of this together: all this attention on Wikidata, how Wikidata handles the social and ethical aspects of data, and all the data sets we can get from traditional and conventional resources, then you get absolute magic,” Lane continued. “You put all this in Wikidata, it mixes together, and you get new creative data, remixed data, things that would be unthinkable to create in any other way. It can only happen if you have everybody in the world, community representatives from all these institutions socializing in Wikipedia and Wikidata, remixing this, and then spreading it out. That’s the big cookie.”

Want to be a part of the big cookie? If you’d like Wikidata training, the Wikidata Institute has three upcoming training courses starting in November, January, and March. Consider also signing up for our next 3 webinar events celebrating Wikidata’s birthday all of October.

Watch Tuesday’s webinar in full on our Youtube.

Thumbnail image by Matt Britt CC BY 2.5, via Wikimedia Commons.

Speaker Series: Wikidata’s 10th Birthday

Will Kent — Tue, 27 Sep 2022 21:45:12 +0000

At the end of October, Wikidata, the-open-knowledge-base-that-could, turns ten! What better way to celebrate than by having a series of in-depth conversations all month long profiling Wikidata Initiatives and the impact that Wikidata has had on the world. Whether you’re a Wikidata newbie, a seasoned expert, or somewhere in between, join us as we reflect on kickstarting, growing, and sustaining Wikidata Initiatives. Just in time for Wikidata’s birthday!

The State of Wikidata and Cultural Heritage: 10 Years In
- Tuesday October 4, 2022 — Watch the recording on Youtube or read our summary blog.
- We’ll learn how Wikidata is (or is not) integrated into Wikipedia, how it helps an enormous cultural institution like the Smithsonian achieve its goals, and how Kelly Doyle, Andrew Lih, and their colleagues at the Smithsonian work to keep a Wikidata Initiative going. Lane Rasberry also joins us from the University of Virginia as the Wikimedian-in-Residence at the School of Data Science!

What You Need to Know to Kickstart a Wikidata Initiative
- Thursday Oct 13, 2022 —Watch the recording on Youtube or read our summary blog.
- We’ll hear from Wikidata’s biggest fans: librarians. Namely, Joe Cera from Berkeley Law Library, Kiley Jolicouer, a Metadata Strategies Librarian at Syracuse University Libraries, and Chris Long, the Director of the Resource Description Services Team at University of Colorado Boulder Libraries. They’ll each share how they got involved with Wikidata at their respective institutions, how Wikidata projects align with libraries’ missions, and how you can start a Wikidata Initiative at your institution, too!

Scaling and Sustaining a Wikidata Initiative
- Thursday Oct 20, 2022 — Watch the recording on Youtube or read our summary blog.
- You’ve got a vision for a Wikidata Initiative that will amplify your work and make Wikidata more equitable and more complete. You may know how to get started, but how will you keep it going? Or foster community around this work? What does your institution need to do in order to support your Wikidata work? Join Bettina Smith from Dumbarton Oaks, Stephanie Caruso from the Art Institute of Chicago, Anne Chen of Dura-Europos and Bard College, and Ian Gill from SFMOMA. Let’s dive into what makes their projects successful. Their experiences may spark ideas for you as you develop your own Wikidata Initiative.

The Future of Data: A Community that Grows Together Stays Together
- Tuesday Oct 25, 2022 — Watch the recording on Youtube or read our summary blog.
- For our final birthday celebration, we’re looking to the future. Speakers Julian Chambliss, Kate Topham, Justin Wigard, and Hilary Thorsen are out there building Wikidata community. We want to know where they envision this work going over the next few years. What kinds of insights do they want their communities to have from their Wikidata Initiatives five years from now, and how do they approach their projects to achieve this?

We hope these talks present a nice forum for connecting you not only with knowledge, but also with other attendees who can build community around an idea or project you may have. These free conversations will happen once a week over Zoom for one hour. We’ll also record and post the sessions online for you to view in the event of a scheduling conflict.

We hope you’ll be able to join us (virtually) and hear all of the insights these community members have to share about Wikidata and their projects. I can’t think of a better way to celebrate Wikidata than to show what an impact its made in all of these fields. See you soon!

Reach out if you have any questions: will@wikiedu.org

Leveraging Wikidata for Wikipedia

Will Kent — Thu, 14 Jul 2022 16:08:57 +0000

We have spent time on this blog discussing some useful ways Wikidata can take advantage of Wikipedia’s data. In this post we’re going to spend some time exploring how Wikipedia can use Wikidata’s data. We will explore some ways Wikipedia can integrate Wikidata into articles, templates, and some other useful tools.

Before we jump into all of that, it’s important to remember that there are more than 300 different language versions of Wikipedia, all governed by their own language community. Wikidata is also its own community. This means that the rules and guidelines that all of these projects follow can differ from one another. One way they do differ is what you are and are not allowed to do with Wikidata’s data. So the resources I’ll be sharing may be activated on some versions of Wikipedia, they may not, and they may change in the near future. I will also include some resources where you can see why or why Wikidata not allowed on Wikipedia (yet).

On to the most important part of the post: you can call (in the sense of use) various values and relationships from Wikidata onto any other Wiki-page (Wikipedia and any other Wikimedia projects). This is exciting because one value, like a city’s population, can be updated in Wikidata, and with that one edit, it will cascade across all language versions of Wikipedia. This has the potential to make data consistency better across Wikipedias, and it also makes updating all Wikipedias as easy as one edit in Wikidata.

Margaret Sanger’s infobox

A specific example of this is the Wikidata Bridge project. The aim of the Bridge project is to use power infoboxes with data from Wikidata. Some language versions of Wikipedia, like Catalan, already have this feature turned on. In English Wikipedia, the use of this tool is not widespread yet due to concerns about data quality on Wikidata. Either way, the implications of this kind of resource will be far reaching.

There are other projects that have been leveraging Wikidata for years. The beloved WikiProject Women in Red relies on Wikidata to generate lists of women who do not yet have Wikipedia articles on English Wikipedia. Women in Red uses hundreds of Wikidata queries to generate and organize these lists of women from all the other language versions of Wikipedia. The query results are presented as tables on the Redlist index page (note: WD stands for Wikidata list) using a tool called Listeria. Listeria is a Wikidata tool that takes Wikidata query results and displays them as a table on a Wiki page. This is a powerful tool because the lists are dynamic — updated frequently if not in real time — and you can pull in customized slices of data thanks to the query service. This is one way Women in Red is able to take advantage of Wikidata’s vast dataset to advance an urgent cause on Wikipedia.

One more way Wikipedia is leveraging Wikidata is through citations. As you know, citations are central to Wikipedia articles. What you may not know is that you can import citations to Wikipedia using resource identifiers like DOIs, PMIDs, ISBNs (instead of a title, the source is represented as a unique number — this helps avoid ambiguity and confusion). Now you can also do this with Wikidata Q-ids to do the same thing. If an article exists on Wikidata, you can insert any Q-id into the “automatic” citation menu when you are editing with Wikipedia’s Visual Editor and it will generate a citation in a Wikipedia article. This is convenient, but it also comes with the added benefit of the Wikidata items being queryable. As more Wikipedia articles include citations represented in Wikidata, we will soon be able to query any number of Wikidata variables — gender gap, ethnicity, location — in the context of Wikipedia references.

We’re just scratching the surface of what Wikipedia will be able to do with Wikidata. Returning to what I described at the beginning of this post, Wikipedia will be able to call any snippet (or enormous data set) from Wikidata soon. This will have a huge impact on the community and change the nature of a lot of the workflows on Wikipedia and Wikidata. Ideally it will improve quality, representation, and how we can evaluate data on all projects. Catch a glimpse of these new features appearing by keeping an eye on what appears here. This particular page tracks any template on English Wikipedia that uses data from Wikidata. You can expect this list to grow and grow over the next few years.

It’s exciting to think of the potential of all of these new tools. To learn more about Wikidata and Wikipedia, follow this link to find more information about our Wikipedia and Wikidata courses.

This post expands on a presentation its author Will Kent, together with Rosie Stephenson-Goodknight, gave to the LD4 Wikidata Affinity Group in June 2022.