Global Terrorism. A Visual Analysis

Screenshot of the Tableau Dashboard. Available [here] and at the end of this post.

I recently wrote about my initial experiments in visualising data relating to terrorist attacks. My conclusions were that although the dashboard accurately portrayed the underlying data, I needed a better datasource to provide more detail and a broader context. After a bit of digging, I found a series of Wikipedia pages that attempt to collate all known terrorist attacks across the globe on a month by month basis. This data runs from the present time back to January 2015 [2015, 2016, & 2017]. Before this, Wikipedia has data available for the period from 2010-2014 broken into 6-month blocks and data from 1970-2009 is available in whole year pages. I’ve elected to restrict my choice of data to the period from 2015 onwards for a number of reasons. A brief perusal of the earlier data confirmed that while it was richer and more detailed than what I had used previously, it was still not fully comparable to the data from 2015-17. While I am usually interested in pushing the data back as far as possible, in this instance I really wanted to concentrate on the state of the world as it is now.

The datasource
My process here was to manually take a copy of the Wikipedia table and paste it into an Excel spreadsheet. As each Wiki page pertains to an individual month and lists attacks by Day, I only had to add Year and Month data manually as I went along. This allowed me to go with my preferred approach of keeping the date elements separate in the data source and combining them within Tableau with the MAKEDATE() function. My reason for this is simple – I was badly burned on an early project where Excel formatted the complete dates in one manner, but Tableau read them in another. The result was that I had a Viz where I claimed that the Battle of Fort Sumter happened on the 4th of December 1861, rather than on the actual date of April 12th 1861. It’s as true for an historical dashboard as a business one – if the user can’t trust one element of what they see, they’re unlikely to trust any of it! In my case, I didn’t notice until just before I went to promote it … luckily, I spotted it in time and corrected it, but it meant quite a bit of hasty reworking and being annoyed with myself. To avoid it, I just take the simple solution and let Tableau handle it.

The ‘Dead’ and ‘Injured’ columns required a little manual cleaning to remove non-numeric data. For example, where the terrorists are killed or injured in their own attacks their numbers are added in brackets after the number of victims (e.g.: 27 (+2)), these have been removed. Where figures are given as ‘at least’ or ‘X+’ I’ve gone for the minimum in all cases. Similarly, where the numbers of causalities are listed as ‘Unknown’, I’ve listed them a 0. The ‘Details’ column has been cleaned of footnote references as I couldn’t make them work in Tableau, but has otherwise been left unchanged. Similarly, the columns for ‘Type’, ‘Perpetrator’, and ‘Part of’ have not been modified from the original. Each of these could do with further work to clean them up as, for example, ‘Type’ (on the dashboard as ‘Attack Type’) has both Airstrike and Airstrikes along with ‘Assassination’ and ‘Assassinated’ etc. Similarly, the ‘Part of’ column (on the dashboard as ‘Associated With’) has multiple inconsistencies. For example, it contains ‘Islamic terrorism in Europe’ as well as ‘Islamic terrorism in Europe (2014-present)’, an ‘Islamic terrorism in the West’ and a general ‘Islamic extremism’. Rather than take on the mammoth task of sorting this out, I’ve left it to the judicious skill of the user to select the correct data they’re looking for. The ‘Perpetrator’ column has similar inconsistency issues, but not as frequent. For example, the entries contain both ‘Al-Shabaab’ and ‘Al Shabaab’. Once again, a bit of judicious data cleaning could be helpful here and may be something pursue at some date in the future, but for now I’m leaving this as it is, merely advocating for caution on the part of the user.

These issues rather pale into insignificance when compared to the issues with the ‘Location’ column! It all looks well and good on the Wiki page … the column shows the town/city/region and the country, all prefaced by a delightful little flag of said country as a handy visual reference. Unfortunately, when that data appears in Excel the flag is translated into the name of the country, giving awkward looking constructions such as ‘Afghanistan Jalalabad, Afghanistan’ and ‘Iraq Baghdad, Iraq’. There was no easy way to work with this … at least none that I was either aware of or technically able to pull off … First off, there’s a difficulty with inconsistent naming conventions with several names relating to the same place. In part, this relates to manual data entry by a variety of Wiki editors. My solution has been very manual and relies on a second Excel sheet where the ‘Iraq Baghdad, Iraq’ format is preserved as a unique key and a copy broken out into city and county using the text-to-columns functionality. This is trimmed, reorganised, and concatenated to a manageable ‘City, Country’ format (e.g. ‘Baghdad, Iraq’). Any errors and inconsistencies are then corrected by hand. Once again, I’ve used Chris Bell’s rather excellent Batch Geocoding service at doogal.co.uk to turn the ‘City, Country’ data in digital Latitude and Longitude. As excellent as it is, it (and the Google data the service relies on) still has difficulties. In many cases these are related to locating rather small, obscure places in rural areas and my solution has been to manually do my best with Wikipedia and (if all else fails) general locations within the respective countries. The results have been tested to ensure that the locations at least fall within the borders of the countries they are associated with. For this reason I would suggest that while I’ve tried to get the locational data as accurate as possible, it should be regarded as, at best a guide and in no way definitive.

The Dashboard
Anyone familiar with my dashboards will find few surprises here. Along the right-hand edge there’s the familiar list of filters etc. Right on top is a drop-down for Casualties. Here you can decide how data in the top two graphs (the map and the bar chart) is displayed. The user can choose to see the data for just those killed or only the injured, or a combined figure of both. Below this there are two date filters. The first is a Year dropdown where the user can select entire years. For those wishing for more granular control, the second Date Range filter is a calipers where periods down to single days can be selected. Below this there are dropdown filters that allow the user to select one or more complete Continents or Countries. The issues associated with the Attack Type, Associated With, and Perpetrator have already been discussed, and individual dropdown filters are available here with the appropriate caveats.

Turning to the dashboard, proper. In the top left corner there’s a map of the world (Terrorist Casualties Worldwide) with dots for every location, sized and coloured by the number of casualties (depending on whether the user has selected the Killed, Injured, or the Combined figure). The keys to the colours and sizes are tucked away on the right-hand margin, below the filters. The date range in the map title is dynamic and reflects the date selection choices. I saw this technique for creating dynamic headers as part of a Tableau training presentation and decided that it would be interesting and useful to use on one of my work projects, but I wanted to try it out first and know I could make it work in the way I needed. It’s a simple trick, but one that’s remarkably effective … and I like it!

Along the top of the right side are the three headline figures – the cold numbers of Attacks, Killed, and Injured. My original instinct was to make the figures large and bold, but I found that my eye was continually drawn to them, degrading my ability to concentrate on other aspects of the dashboard. Instead, I have gone with making them slightly larger (but not overwhelming) and giving them plenty of space to ‘breathe’. For me, they are no longer visually distracting, but are given sufficient prominence to make their point. In a dashboard where every pixel of visual real estate has to work for its keep, perhaps the most extravagant thing you can do is to lavish some white space on your data … well, that’s my thinking …

Below this is a bar chart breaking out the Killed/Injured/Combined figures by country. As a comprehensive list of all countries was too cramped and couldn’t easily be read, I’ve kept this to just the Top 20. Again, the rank and values are dependent on the Killed/Injured/Combined Casualties parameter chosen.

In the bottom left I’ve placed a line chart of all attacks. At the time of writing (August 2017), the chart shows a low of 14 attacks in February 2015 and a high in June 2016 with 235 attacks. The important thing about this graph is that I’ve incorporated an attempt at providing a broad three-month Forecast to the data. The number of terrorist attacks in any time period is dependent upon myriad factors that cannot easily be anticipated, much less the exact outcomes in terms of human lives impacted. For all that, there is something sobering about being able to read off forecast numbers of dead and wounded that haven’t yet come to pass. I deal with this aspect in a little more depth, below.

The final portion of the dashboard, in the bottom right corner, is a text table laying out all attacks by date. Rather than give the tallies of the dead and injured as ‘flat’ numbers, I’ve chosen to present them as bars. To save on that all-important visual real estate, I’ve included the Wikipedia description of the attack as a hover-over Tooltip.


Predictive Analytics
Everyone loves predictive analytics. The thought that you can take the data assets already in your possession and use them to forecast future states speaks to something primitive within us, no matter how we’d like to dress it up in sciencey-sounding language. Working with some people in the past, I’ve found that having some form of predictive capability could be viewed as an end in itself, without any form of assessment as to either its utility or accuracy. As an initial step towards my personal understanding of how the Tableau predictive algorithms work, I wanted to look at the predictions and how they stacked up against the actual events. To this end, I manually noted down the predicted and actual results. As stated previously, this type of data is by definition incredibly difficult to predict. The variety of actors, motivations, opportunities, and skill levels is impossible to quantify, even leaving aside issues of detection and pre-attack neutralisation, not to mention recording and reporting biases along with fundamentals as to how terrorist acts are defined in the first instance. It is, in so many ways, a chaotic system. For this reason it’s not surprising that the Tableau frequently gets it wrong. I’ve attempted to graph the actual and predicted numbers of Attacks, along with the numbers of Killed and Injured (bars) against their % difference (lines). It’s still very much a work in progress and needs further work and thought. However, it’s clear that that at the time of writing (August 2017) the extremes over and under estimations can be significant. Looking first at the Attacks, the worst underestimate, where there were 59 fewer attacks than occurred in reality was in December 2012 (-57.84% diff) can be balanced against July 2016 when there were 118 more attacks than forecast (83.45% diff).This is markedly better than the actual vs predicted numbers for those killed. The greatest overestimate was in October 2016 when 1020 more people were predicted to die than actually did (-65.57% diff). The opposite end of the scale was when the forecast fell short and estimated that 1346 fewer people would die than were killed in January 2017 (159.10% diff). It is a similar story for the injured. The most significant overestimate was in March 2016, where 953 fewer people suffered injury than the predictions indicated (-57.31% diff). Finally, the largest underestimate for injuries came in august 2016, when Tableau predicted 1793 fewer casualties than actually occurred (146.85% diff).

So there’s your story – big over and under estimates, this type of prediction isn’t to be trusted on this type of data! Except … except … except … that’s not the full story! I’m not quite sure how to go about quantifying it, but it’s clear to me that the forecast algorithm is fine so long as the data is predictable, either rising or falling at a reliable rate (#NoScheisseSherlock). Obviously, sudden changes in the numbers of attacks, deaths, and injuries cannot be accommodated by the forecast. However – and this is the important bit – the model takes this into consideration and adapts accordingly. Sometimes the process takes two months (two iterations) to correct the forecast, but frequently they are back on track by the following month … at least until the next big change. But that’s taking the forecasts on a month-by-month basis. If we look at the averages over this period from January 2015 to July 2017 (27 months) we see that the mean error for deaths is an underestimate of 11.75%, or 0.16 of a person – yep … not even a full person out! The mean difference for the injured is 3.66%, an underestimate equivalent to 25.6 people. In terms of mean percentage difference, the number of attacks is closest to the predicted values, with a value of 0.52% underestimate, or 4.08 attacks.

I freely admit that this is very much a preliminary examination of the quality of predictions, using a peculiarly unsuitable dataset. For all that, the results are interesting and warrant further study.

Notes
Finally, the Notes tab provides links to the Wikipedia pages where I took my data, along with an appropriate recognition of Chris Bell’s rather wonderful Doogal geolocations service.

As always, my advice to the reader is to use the controls I’ve provided – the filters and clickable graphs – to explore the data and drill down to find the information that has most meaning and interest for them.


If there are issues with this embedded version, try the dashboard on my Tableau Public page [here]

Comments

Popular posts from this blog

Renewable Heat Incentive (RHI) non-domestic beneficiaries: an interactive analysis of the data

Just a little glimpse of a data story I’ll tell ’Bout a North Country singer that you all know well

Notifiable Infectious Diseases Reports (NoIDs) Northern Ireland | Trends & Predictions