METHODOLOGY

Yemeni Archive aims to support human rights investigators, advocates, media reporters, and journalists in their efforts to document human rights violations in Yemen by developing new open source tools as well as providing a transparent and replicable methodology for collecting, preserving, verifying and investigating documentation in conflict areas. Yemeni Archive is a Mnemonic archive.

About the research

Included in this database are all conflict-related armed attacks against journalists and media infrastructure in Yemen that fall within the current conflict time period of 2014 to 2021 and that are verifiable following Yemeni Archive’s methodology of primarily open source tools and methods.

To reach the final dataset of 138 incidents --- each thoroughly researched by the Yemeni Archive team --- our collection and verification workflow for this project included the following steps:

Creating a preliminary allegation database, structured around initial allegations by Yemen Data Project and other open source data from organisations working on this topic, such as the Yemeni Journalists Syndicate .
Feeding the preliminary allegation database with open-source information.
Collecting testimonies from victims of and witnesses to the attacks.
Compiling corroborative and verifiable information to create a secondary allegation database.
Re-analysing collected information and their sources with open source tools and verification processes. Simultaneously tagging the data for key features.
Finalising the database of verified incidents of attacks against journalists and media infrastructure in Yemen.

Sources and types of information used

Yemeni Archive thoroughly researched each of the 138 incidents. This required extracting open source data observations based on dates and geographic location, as well as using content discovery tools to search on social media platforms, such as Twitter advanced search and Google advanced search. This also required seeking additional information outside of open sources.

As compared to the open source information ecosystems for other Mnemonic projects like Syrian Archive, Yemen produces less open source visual content. Non-visual open source content derived from social media can be more difficult to verify using only open source tools and methodologies. Consequently, the Yemeni Archive team pulled from a wider variety of sources and types of information when seeking to verify or potentially disprove --- in whole or in part --- the data collected here on attacks against journalists and media infrastructure.

At the initial open source information collection stage of work, the research team used the Whopostedwhat tool to gather more accurate information about the nature and time of the attack by searching for the first published information about the attack and then analysing the timestamp of Facebook posts using the 4webhelp tool. In order to verify the attacks, we searched archived Google Earth Pro satellite images, analysed the visual documentation associated with the attack and verified its compatibility with the satellite images. These images were also compared with the visual content of each attack and identified by visible and prominent places in photos and in linked videos. Yemeni Archive also relied on EZsearch to identify all reports and videos connected to the attacks and posted in the days following the incident in order to make the search more accurate.

When reviewing and verifying the collected open source information, Yemeni Archive gave greater weight to visual content directly documenting the attack incident or its aftermath that could be verified for location, date, time, and content and whose source could be evaluated for reliability and credibility. All open source materials derived from social media were evaluated for their relevance and potential reliability against a number of factors, including:

Time of posting relative to the verified attack time;
Location of the source relative to the verified attack location;
The source’s dialogue or accent relative to those heard in the attack area;
Historical online behaviors of the source (e.g., whether their tone implies strong biases or whether they appear to have contributed in some way to the spread of mis/disinformation);
Consistency of technology apparently used by the source (e.g., as discernible in footage quality); and
Whether the source is known to Yemeni Archive (i.e., whether they are included in Yemeni Archive’s established database of credible sources for online content).

Turning to closed source fact-finding methodologies to build on the information gathered from open sources, Yemeni Archive also coordinated with local human rights organisations, local journalists, and an on-the-ground research team to conduct in-person interviews with survivors, victims, and other eyewitnesses.

Each of the 138 attacks included in this database has been documented and analysed with both open and closed source information.

Standard of information

Where judgment was required in the tagging process for this project, Yemeni Archive only assigned tags that met a ‘reasonable grounds to suspect’ standard of information. In other words, each individual tag was assigned only if the researcher was convinced by the available information that there are reasonable grounds to suspect the tag is applicable or accurate. We have chosen to point to and aim for this standard of information with the end goal of being as accurate as possible while also erring on the side of inclusion when assigning tags. For almost all subsequent uses, the facts established in the published dataset will require additional investigation and corroboration. Since Yemeni Archive does not have the resources or mandate to research and evaluate each incident to the highest possible evidentiary standard, use of the ‘reasonable grounds to suspect’ standard for the information available to us enables us to assign tags to incidents both in instances where we are very confident in the tag based on available open source information as well in instances where we reasonably suspect that subsequent investigation and analysis will confirm the tag.

The methodology descriptions and examples provided in the tag definitions [link to WHAT portion of the findings page] illustrate how this discretion was exercised and this standard of information met, in practice. Researchers also maintained open lines of communication or otherwise flagged and revisited after consultation the more challenging discretionary decisions encountered. Further, each tag for each incident was reviewed multiple times, at multiple stages of the workflow, by multiple Yemeni Archive researchers. This helped to ensure that all tags were assigned as consistently as possible across the entire database.

Phrased differently, tagging decisions were an application of the standard of information to an open source verification process. This may mean, for example, that we have seen unverifiable claims in the source materials about certain alleged perpetrators, delivery methods, munitions, or other types of information. However, simply because we have not affirmatively tagged these claims does not mean that they are necessarily false. We simply could not verify them to the chosen standard. For this reason, the default tag identified in our methodology is often ‘unknown,’ and the final database includes numerous ‘unknowns.’

About the data

This database contains 1.2GB of documentation of 138 attacks on journalists and media infrastructure in Yemen since the first documented incident in 2015. This data comes from 818 sources made up of individual citizen journalists, local and international media groups, as well as NGOs and civil society organisations. It is important to note that many if not all of these sources are partisan; their claims should be evaluated with caution.

In total, Yemeni Archive identified 2806 relevant videos, posts, and publications documenting attacks on media and journalists, the majority of which were published on social media pages. After evaluating this content for potential security risks associated with publicising the data in this format, Yemeni Archive has made the decision to keep non-open source material related to the verification of specific incidents portions of this database private, although some of this material is available upon request.

The sheer amount of content being created, and the near constant removals of materials from public channels, means that Yemeni Archive is in a race against time to preserve important documentation of crimes committed. Content preserved and verified by the Yemeni Archive might offer the only evidence to corroborate witness testimonies of attacks on media in Yemen and to implicate potential perpetrators.

Due to Yemeni Archive’s technical infrastructure and the constant monitoring of the status of videos and channels, we have been able to identify many examples of documentation of attacks on media and journalists in which users did not remove the content willingly. Of the 106 videos from Youtube included in the dataset, 11% have been made publicly unavailable. Of the 675 Twitter posts included in the dataset, 12.3% have been made publicly unavailable. Of the 801 Facebook posts included in the dataset, 4.2 % have been made publicly unavailable.

Number of Youtube Videos

Number of Tweets

Number of Facebook Posts

Number of Websites Articles

Errors, corrections, & feedback

Yemeni Archive strives for accuracy and transparency of process in our reporting and presentation. That said, the information publicly available for particular events can, at times, be limited. Our datasets are therefore organically maintained, and represent our best present understanding of alleged incidents. Although the range of incidents documented is comprehensive in scope, the availability of data varies and so while some incidents have been extensively documented, there are gaps in information for others.

Acknowledgements

Groups and individuals in Yemen are the pillars who hold this report together and made it possible to be seen by the world. Citizen journalists and media personnel are the real heroes behind this work. This report gathered at least 2806 relevant videos, posts, and publications documenting attacks on media and journalists. Special thanks to everyone who helped in collecting, verifying, analyzing, and investigating the data. We would like to thank Dr Caroline Tynan for leading the project, students at Sana’a University, Yemen Data Project (YDP), and Professor Yvonne Mcdermott Rees for her expert review. We would finally like to thank each team member who helped in translating and editing the report including Mansour Alamri and Nawar Mohra .

Yemeni Archive is committed to continue monitoring attacks on journalist and media infrastructure and updating the database as data becomes available. If you have new information about a particular event, if you find an error in our work, or if you have concerns about the way we are reporting our data, please do engage with us. You can reach us at info[@]yemeniarchive.org.