Jamie Gottlieb

DIGITAL MARKETING | CONTENT AND COMMUNICATIONS

TEDx   Twitter   Analysis   Using   NodeXL

Method

  1. I collected data on TEDx because TEDx is the umbrella over all TEDx events across the world, including TEDxUGA. The data showed that 1183 people tweeted from Oct. 23 -24, 2013 using #TEDx. 

  2. Run group by clusters with the Clauset-Newman-Moore algorithm, and 494 clusters were created.     

  3. Run Graph Metrics on the data and select:

    • Overall graph metrics

    • Vertex degree

    • Vertex in-degree

    • Vertex out-degree

    • Vertex betweenness and closeness centralities 

    • Group metrics

    • Words and word pairs: Select option and select Tweets on the drop down menu under “Edges worksheet”

    • Top items

    • Twitter search network top items

  4. Click on Autofill Columns

    • Select the Edges tab

      • Use “Edge Weight” for Edge Color, Edge Width, Edge Style and Edge Opacity

    • Select the Vertices tab

      • Use “In-Degree” for Vertex Size and Vertex Shape

        • Click on the Vertex Shape options and set the vertex shape to Image

        • Click on the Vertex Size options and set the vertex size from 1.0 to 30.0

      • Use “Followers” for Vertex Opacity

      • Use “Vertex” for Vertex Label

      • Use “Followers” for Vertex Layout Order

  5. Layout the graph using the Harel-Koren Fast Multiscale

    • Click on Layout Options and select “Lay out each of the graph’s groups in its own box and sort the boxes by group size” and press OK

  6. Rename the first 12 groups based on the user most popular and connected in the group

  7. Calculate percentage of retweets by sorting “Tweets” from A-Z and counting how many times a tweet began with “RT,” then divide that number by the total number of tweets. 

Network Analysis


Interconnectedness of the entire network


After importing the data and separating the network into clusters, I found one large group, 15 medium-sized groups and 201 small groups. This shows that many people who mention TEDx aren’t talking and interacting with each other. The largest group holds 116 members and about 9 percent of all the people who used #TEDx, while the medium-sized group hold about 15-23 members each and 2 percent of the overall number of people. The small groups hold .2 percent of all the people on average each.


With 453 clusters – I removed the clusters with only one user from the graph – there isn’t much interconnectedness within the network. The modularity of the network is .701894, which shows that the groups do not overlap often. Additionally, the graph density is .000699518, which again shows the lack of interconnectedness among users. If the users interacted with each other heavily or more often, the density of the network would be greater because users would have multiple edges with multiple other users. Instead, many people are talking about TEDx, but not talking about it with each other.   


After separating the users into groups, I ran group metrics to determine in-degree, out-degree and betweenness centrality. This data will tell me who are they key players and why in the hashtag. The user with the highest in-degree is @TEDx with 132, meaning this user gets retweeted the most. The number of in-degree then drops to 20 with @tednews, 18 with @oitnb, 17 with @patrickklepek and 15 with @youtube. Because the number drops so drastically, it shows that most users get information about TEDx from the main account @tedx, while fewer get information from related but not as relevant accounts. 


The overall out-degree of this network is low with the highest out-degree as 6 for @tedxtelfairst, @tedxbelfast, @pinkseasonhk and @tedxhappyvalley. Because these users have the highest out-degree, it shows that these accounts interact the most with other users in the network. Although @tedx has the highest in-degree, it has a fairly low out-degree of 3, showing that the account rarely interacts with other users in the network. Also, 59 percent of the conversation are retweets from users in the network. 


Users in important positions


  • @TEDx: This user holds an important position in the network because it connects most of the groups and remains mostly the center of conversation for groups that interact with more than one or two other people. This can be shown through the visualization of the graph. With a betweenness centrality of 45,974.667, @TEDx connects most of the users and groups together because users share its content most. The higher the between centrality, the more interconnected the user with other groups. Also, @TEDx’ in-degree is 132, which means most users retweet and share its content, while it’s out-degree is 3, which means @TEDx doesn’t interact much with other users. Instead, other users share its content, rather than @TEDx sharing other users’ content. With a high betweenness centrality and in-degree means much of the users’ information comes from @TEDx. 

  • @TEDNews: This user acts as another important user in the network because it has the second highest betweenness centrality of 1,029. Although this number severely dropped from @TEDx, @TEDNews still plays a role in connecting users in the network because many users in its cluster receive information from this network as well as share its content. @TEDNews’ in-degree is 20, which means a fair amount of people retweet its content, but the out-degree is 2, which means the user interacts with other users in the network even less than @TEDx. With a high in-degree and fairly high betweenness centrality, many users in the network gather information from @TEDNews. Additionally, the users in the @TEDNews cluster rarely get information from @TEDx, but rather solely get information and content from @TEDNews. 

  • @YouTube: This user sits in a unique and important position because many users are in its cluster, but the cluster isn’t connected to @TEDx in any way. The lack of connectedness @YouTube has with @TEDx and the other users in the network is shown visually and through the data. @YouTube’s out-degree is zero, which means it doesn’t interact with other users in the network, and because @YouTube acts this way, it isn’t connected to @TEDx in any way. However, its in-degree is 15, which shows that users in its cluster solely get information from @YouTube and share most of its content. These users only share #TEDx content from @YouTube, not @TEDx because these users do not share @TEDx’ content at all. These users have out-degrees of 1 or 2 because they share content, but their in-degrees are zero because their content is never shared. @YouTube’s betweenness centrality is 265 and ranked 7th in the network, which shows it plays a role in connecting users. However, in this case, @YouTube only connects users in its cluster and doesn’t interact with other users in the network. 

Clusters

 

With 453 clusters – I removed the clusters with only one user from the graph – there isn’t much interconnectedness within the network. The modularity of the network is .701894, which shows that the groups do not overlap often. Additionally, the graph density is .000699518, which again shows the lack of interconnectedness among users. If the users interacted with each other heavily or more often, the density of the network would be greater because users would have multiple edges with multiple other users. Instead, many people are talking about TEDx, but not talking about it with each other.   

 

In the network, most clusters directly feed from @TEDx, but @YouTube remains isolated for the group. For example, @TEDNews gets all of its content from @TEDx, while @YouTube only gets content from within itself. The individual TEDx events’ Twitter accounts are as equally isolated as YouTube because they use their own content to share, rather than sharing TEDx’ content. They use #TEDx to get their information in the TEDx network, but because TEDx rarely shares other content, they aren’t in that specific cluster. 

 

Content Analysis

I classified Twitter users into types of information sources identified using a grounded theory method, where clusters emerged from the data through the use of the hashtag #TEDx. Specifically, I gathered users who used #TEDx during Oct. 23-24, 2013 in the last 500 tweets. I identified two primary information sources – the TEDx brand and non-news media sources – with a list of subcategories. The final categories under the TEDx brand were: (1) The TEDx company itself (i.e. @TEDx, @TEDChris); (2) the news source that disperses TED information; (3) TEDx individual events: The TEDx sponsored events around the globe (i.e. TEDxBrussels, @TEDxBelfast) that host and disperse TEDx content. Two categories of non-news media sources were identified: (1) Grassroots: individuals or small groups not affiliated with a news outlet, TEDx or an official advocacy group; (2) Video websites (i.e. YouTube): Resources that distribute video content. 

After coding the data, I found that 9.1 percent of users were affiliated with TEDx, while 31.8 percent of users did not use a link when they tweeted. The data that was coded was a random sample of the network; I collected 110 users out of 1110, or 10 percent of the network.

Each user was classified in one of the five categories by using the following data: first, the short Twitter bio, next the hyperlinks provided in the Twitter bio, and finally, if needed, then the hyperlinks the user tweeted with #TEDx. 

 

Coding Sheet

 

Introduction

This protocol defines variables and procedures for a content analysis of TEDx-related Twitter posts. The objective of the study is to provide useful information for TEDx users regarding where people in the TEDx network are getting information on TED News, upcoming TEDx events and past TEDx Talks. 

The codebook that follows is for a study to characterize available information about TEDx on Twitter. It focuses on the popularity of users, the size of clusters in the network and the source of information in the network. Understanding these elements will allow the research team to understand which Tweets are influential, where users receive information and whether the network truly interacts with users. 

The variables and process are described below in two sections. 

 

Variables to code

The first level of this analysis is to (1) mark the individual Tweets (2) identify the influential users in the network and (3) do analysis to create a metric for popularity. 

The result will be the following variables:

  • Identify each Tweet: sender, date, time

  • TEDx affiliate

    • 0 = Related to TEDx either as individual events or work for TEDx

    • 1 = Nonrelated to TEDx either as an outside company or user

  • Link present: 1 = yes/ 0 = no

 

Process

 

  • Establish NodeXL Excel sheets with data already entered. Copy/paste relevant data out of NodeXL sheet for readability and relevant content focusing on messages.

  • Sample data by choosing 119 (10%) random tweets. Sort the sampled data to move message-containing variables to the top. This sheet will include:

    • Line for each Tweet containing date and time, which creates the Tweet Identity 

    • Determine whether the user is affiliated with TEDx by looking at the Twitter handle and description on Twitter. If necessary, Google the user for more information. Then, not whether the user is affiliated or not. 

    • Link presence (URLS in Tweet column) and noting yes/no