Wednesday, 12 September 2012

MCFC Analytics - Some thoughts about data


The release by Opta/Manchester City of player data for the 2011/12 season is something that could potentially open up a whole new area of analytics to the wider public which previously would have been restricted to those working at the clubs.

More details are available here but essentially what has been released is a dataset of one record, per player per match for all games of the 2011/12 season with details such as goals scored, passes attempted/made etc.,

This post is more concerned with the data aspect of the project than with the practical application of the data (which will come in later posts and on my Swansea City blog www.wearepremierleague.com).

The raw supplied Excel file contains 10,369 records (excluding column headings) and 210 columns, so even though it’s just a summary of a players activity within a game it’s still a sizeable file.

Most of the initial toying of the data I have done with Excel (in particular pivot tables), but I’m using Access for the more detailed manipulation as often easier to manipulate data in a database rather than a spreadsheet. 

Below are a few details around changes and derivations I have made from the initial file.  Apparently over 5,000 people have requested the file.  This shows a huge level of interest but also means that without a sufficiently quick feedback loop for the data, there will be a lot of people doing the same sort of processing that could have just been done once and also leaves the data open to different interpretations rather than one true set of metrics.
Own Goals
An example of this is that the dataset doesn’t directly contain information on own goals, it would be an easy mistake (as I did initially) to think you could just sum the total of the ‘Goals’ column to get total goals scored by Team.

What the data does have however is the total goals conceded by the Goalkeeper on the pitch at that time so if you know the number of goals the opposition team has conceded in a game and the number of goals ‘your’ team has scored then:

Goals for your team coming from opposition own goals = Total Goals Conceded by Opposition in match – Total Goals Scored by your team

To do this I have created a summary table of one record per team per match from the initial data, with the image below showing some of the fields for the Swansea – Chelsea game where Neil Taylor of Swansea scored an own goal:



Where Total_Goals_exc_own_goals is the sum of the ‘Goals’ field in the raw dataset.

I then created a second table which has details of goals conceded per team:



From these two tables you can see that no Chelsea player scored a goal in this game but that Swansea conceded one.

I then updated a ‘Total Goals Scored’ field in the first table by matching the ‘Team’ in the original table to the ‘Opposition’ in the second table and also the ‘Opposition’ in the first table with the ‘Team’ in the second.  

As a extra measure in case at some point in the future the data has more than one match where Swansea were home to Chelsea I also matched on date.

This then gives the following information:





From this 1 record per team per match summary, you can then create an overall summary of goals scored:





















This is interesting as much was made of Liverpool's 'bad luck' in hitting the woodwork so many times last season, but not heard as much about their 'good luck' regarding own goals.

Derived Fields
In addition to the fields supplied by Opta, it’s likely that you’d want a number of extra derived fields added, as mentioned previously it could be beneficial to have a process where there is a latest approved version of the file available for people to use that has a number of agreed extra fields to avoid everyone having to create these themselves.

One example of this would be having a ‘Total Shots’ field as with the raw data there is no total but only the constituent parts (On Target/Off Target/Blocked).  

As with anything of this nature there’s the balance between everyone using consistent definitions/data and the fact that extra fields means bigger file sizes.

Another example of a derived field might be having a standardised name format: If you want to be able to filter by name, it makes more sense to have the name in a single field rather than having forename and surname separately.  It also removes the strange anomaly in the data that ‘Adam Johnson’ is listed in the surname field rather than ‘Adam’ as forename and ‘Johnson’ as Surname.

There are a few cases where a player is genuinely only known by one name (e.g., Alex at Chelsea) so using excel/access we can create a Player Name field by taking Forename and Surname where both supplied or just Surname where only Surname supplied.

This however doesn’t create a unique field to filter on as there was a Paul Robinson at both Bolton and Blackburn last season and also cases where the same player played for multiple clubs; the easiest way around this is to add the players club on to the name when creating the field that will be used for filtering by name.  This gives the option to then filter by person or by person at a specific club.

Also, the raw data contains a player ID which you can use to differentiate between where it’s the same person for two teams or two different players.

Metadata
The raw dataset contains only instances where a player gets on the pitch so needs a bit of rejigging to fill in any potential blanks.

I’ve done this by using the raw data to create a summary table of all matches (grouping the table by Team/Opposition/Venue/Date e.g.




This then gives a list of all the fixtures for all the teams (20 teams playing 38 matches = 760 records).

The next step was to create a deduplicated list of all players for each team e.g., Joshua (Josh) McEachran has an entry for both Chelsea and Swansea.  This gives a list of 561 players, matching this to the fixture table (matching by team) gives a total of 21,318 records (561 players for each of 38 matches).

This gives the ability to create a dataset which includes details of where a player takes no part in a game such as the chart below showing shots by game for Wayne Rooney.  The blanks are where he didn’t play (as opposed to the zero values which are where he had no shots).














All of the above is still just scratching the surface (even before the more detailed release of within game player actions) but hopefully begins to make the point about creating open source (and approved) modified datasets to avoid large scale duplication of work as well as issues around differing definitions.

Update - 14th Sep.

I have now created a spreadsheet in the same format as the original dataset which has 40 records containing own goals data.  If you add this to the original spreadsheet then the 'Goals' and 'Goals Conceded' totals will now tally.

Spreadsheet is available at: https://skydrive.live.com/redir?resid=A1BA00769DC2D906!105

Dan Barnett

Director of Analytics










Wednesday, 22 August 2012

Sweating the Assets – Making the most of what you’ve got with Twitter


As mentioned in previous posts, Twitter is a great way to get your message out to a wide audience but its quick moving nature means your message will probably not be seen by the vast majority who follow you.  Using examples from a number of major websites, in this (and subsequent) blogs I’ll show how most have plenty of scope to make more of what they have.

Bitly is one of the most used URL shortening services and one great feature it has is the ability for anyone to be able to track the performance of a link simply by adding a ‘+’ to that link.  You don’t get the full functionality that you would have if it was your Bitly link but there’s more than enough to get a feel for how a link is performing.

As with any business where their product is content based, The Times are trying to make the most of monetising their product.  Where most go for an ad-funded model, the Times have set up a Paywall to make their online content subscriber-only.

Presumably as a means of quantifying the kind of volume of visitors they could reach and/or to show potential readers what they are missing out on (and encourage to subscribe), they have recently started 1 hour ‘freeviews’ of particular content.

An example of this was an article about Joe Cole and Liverpool that was made free to view between 12pm and 1pm on Tue 21st August.  The chart below shows the click figures during this hour for a link to the piece tweeted by journalist who wrote the article - Tony Barrett @tonybarrettimes (92k followers):

Tweets from @tonybarrettimes regarding the article
Clicks by Minute between 12:00-12:59pm
This second post was retweeted at 12:15pm by Phil McNulty @philmcnulty who is the Chief Football Writer for the BBC Sports Website (160k followers) and 12:16pm by Oliver Kay @oliverkaytimes who is the Chief Football Correspondent for The Times (147k followers).

Other people of note such as Rory Smith @rorysmithtimes (56k followers) mention the article via Twitter but put a direct link to the Times website URL (fine for them for their back end analysis using a tool such as Google Analytics but won’t show up on the Bitly chart above).

From the figures above there are a few points worth making:
·  
  • It may not be the most important marketing channel, but Twitter can deliver a sizeable audience within a short space of time
  • The retweets from Tony’s followers (especially from well-followed accounts) is vital to the overall click volume
  • By using the same Bitly link, it’s not possible to separate out the impact of Tony’s two tweets at 10:56am and 12:11pm (the latter is likely to have driven the vast majority but we can’t be sure)
  • There was scope from around 12:30pm onwards for another push of the message from Tony’s feed anyone logging in to Twitter around then would probably have missed any previous mention of the link
  • If you had a number of Times journalist tweeting the link every 5 minutes or so (e.g., 12 journalists tweet the link once each over the hour), would that be classed as good marketing or would it be felt that goes against the organic, free-for-all ‘spirit’ of Twitter.  Nobody likes the feeling they’re being marketed to, even though the truth is you are most of the time


Dan Barnett

Director of Analytics

















Wednesday, 18 July 2012

Make the most of Tracking


"If you can't measure it, you can't manage it"

This phrase for me is at the heart of everything related to database marketing, to avoid the guesswork around what activity does and doesn't work.

Following on from my previous piece around Twitter, it's good to know which links from your Twitter feed are being clicked on (we'll look at how to track retweets in a future post).

As mentioned previously, you may post a Tweet to x followers but the number who actually see that Tweet will be significantly less. This will depend on when your followers visit Twitter and how inclined they may be to scroll through their timeline, but it's likely that unless they are looking within 10 minutes of you posting your Tweet it won't get seen.

This then suggests that it's a waste of your hard work in producing your article/blog/promotion to only Tweet it once.  There are numerous things that can impact the response you get to a Tweet that you may want to test e.g., 
  • Time of Day/Day of week - There will be peaks in when people are on Twitter e.g., Early Morning/Lunchtime etc., but this also means that there'll be more Tweets in their timeline so you are fighting for their attention.  Also what you are asking the reader to do will impact on likelihood of click-through e.g., is it a quick or an in-depth piece, is it business related (Marketing Advice) or personal (Hotel Breaks).
  • Tone/Content - How much of the final message should you put in the Tweet e.g., does "Great Twitter Tips here" work better than "Our latest blog on how you can use Twitter to improve your marketing"
  • Frequency - How much is too much?  This is likely to depend on the kind of followers you have, consumers will generally follow a few hundred accounts and business a higher number partly to take part in the follow you - follow me ritual to increase follower numbers and therefore appear more important.
With so many variables it's impossible to test everything at once, any learning should be an iterative process, with the aim to continuously refine what you do until you have the style that works best for you (remembering to still occasionally test that this still holds true).


URL shortening procedures such as within Twitter or bit.ly give you a link that can then be used to track visits to your website.  The only problem with these is that the code is unique to that web address rather than that tweet.  e.g., If I wanted to promote http://www.analysismarketing.com/ then every tweet I send with a link to this address will have the same shortened URL.

A way to get around this is to add an extra part to the address which still takes you to the same page e.g., 
http://www.analysismarketing.com/#This_is_a_test_to_show_what_can_be_done The important thing to do is to start the extra part that you are adding with a # (other characters such as ?) as work.

If you name these links with a bit more thought than the one above e.g. http://www.analysismarketing.com/#TW120871801 (for Twitter Link, on 18th July Link number 1) or   http://www.analysismarketing.com/#0001 where you keep track in Excel what each link relates to then you will know the response for every tweet not just every article you have linked to.

Dan Barnett

Director of Analytics





Friday, 6 July 2012

A season on Twitter - Tips to improve usage


Anyone looking at the gap in time between this post and my previous blog (back in Sep 2011) may wonder why I’m coming back after such a long time away.

The reason for the gap is that Swansea City gained promotion to the Premier League and my blogging exploits rather than involving marketing and data have been taking place at www.wearepremierleague.com. It is however a bit of a busman’s holiday in that the focus is still very much on the use of data, just in a different context.

As well as the website, I set up an accompanying Twitter account @we_r_pl. This blog details some of the things I’ve learnt over the course of the season using Twitter which hopefully provide a useful guide as to things to bear in mind when using a Twitter account to give added exposure to your business.

A good account name helps, but can always be changed
Twitter now has over 600m users so there’s a fair chance your first choice is already taken. With a little imagination it should be possible however to get close, examples can be adding a relevant suffix e.g., @danbmarketing

A lot of the time, users will be clicking on a link to get to your Twitter account so in that sense it could be anything but bear in mind someone who see’s one of your tweets second hand e.g., via a Retweet. Hopefully the content will encourage them to follow but a relevant name could also convince them to follow.

The other factor is length of name as if someone is replying to you or mentioning your account that is using up some of the 140 characters.

If at some point you want to change the account name e.g., going from a personal name to a company focused account then as long as the new name is available you can transfer your followers over rather than starting over.

Consider your tone of voice carefully
As with any kind of communication (email/Direct Marketing/Websites etc.,) it’s important that it fits within the context of your brand. Most businesses probably wish they had the carefree, relaxed attitude such as that displayed by Innocent but in reality that would be terrible for a lot of them.

This also relates to what to comment on, the easiest way to gauge it is to consider why people are following you. For my Swansea account it’d probably look strange for me to give my thoughts on Quantitative Easing or the new Spice Girls musical.

On the other hand, Twitter is best when it is a two-way conversation not just a place to drop a link to your latest press release and run away, it’s good to let your personality through as well. It’s also one of the hardest things to get your head around when you start; the analogy I use is that it’s like being in a pub where you can hear everyone’s conversations and where (generally) nobody gets annoyed if you butt in halfway through to give your own opinion.

When to Tweet/Direct Message/Email
This links in with the point above, your twitter feed should hopefully inform and/or entertain so a stream of tweets to various people discussing where you are going when you meet up at lunchtime isn’t likely to be of interest to most of your followers (although discussing where to go and opening it out to your followers may fit in well with your style).

Sometimes it’s better to send a Direct Message to that person or even interact outside of Twitter but it’s useful to remember the key to Twitter is that everyone who is following the person tweeting you will ‘see’ the message and some may then choose to follow you.


When someone clicks on your account name from someone's tweet they will see a summary of your account from which they will probably decide whether you are worth following, if the last 3 tweets are along the lines of 'See you later', 'Semi skimmed please' then this may not convince people to follow:


A summary profile has your last 3 tweets as well as your personal description (Bio) as well as details on number of Tweets you've made as well as how many you follow/follow you.  Will there be enough in here for someone to think you are worth following?
Don’t be afraid to post the same thing more than once
Although that tweet you make appears on the timeline of everyone who follows you, the proportion of people who actually see it will be far less.

It will depend on the number of people they are following but someone following 500+ accounts could easily only view tweets posted in the last few minutes before they have to tap to load extra tweets or scroll through a significant number of tweets to get to those posted earlier.

This means if the person who is ideal for your tweet wasn’t looking at Twitter within that small window of opportunity after you posted the tweet, then the message doesn’t get seen by them.

It may feel like spam to mention your new blog, promotion etc., several times over the course of a couple of days but that would only be the case if they were following just a handful of people.

If you want retweets, be eye-catching
If you want your message to spread it’s important for your tweet to be interesting not just any final content that you may be linking to. 

An example of this is one post I did looking at the Twitter following of Premier League clubs, I noticed that most of them had fewer followers than @anfieldcat (set up by a quick witted individual when a cat appeared on the pitch during a live Liverpool game, quickly reaching 60k followers – and now has over 75k).

My tweets about the blog generally get a few retweets but the one where I mentioned @anfieldcat got retweeted by @anfieldcat and overall retweeted 80 times (excluding any times the tweet would have been edited before retweeting).

As well as your own content look to others, but credit where it’s due
Any tweets you make should ideally provoke some sort of response, either directly back to you or in the form of others retweeting your content as it’s something they feel worth sharing.

Similarly when you find something of interest and want to share it, you have 3 options:
o   Straight retweet
o   Edited retweet with accreditation e.g., Great link on x here (via @we_r_pl)
o   Edited retweet with no accreditation.g., Great link for x here

For a straight Retweet, you users will see the original tweet as coming from the original source stating that it’s been retweeted by you.

If you edit a tweet and that then gets retweeted, then your name is still linked to the content where if someone straight retweets something you straight retweeted then you are not mentioned.

If you’re editing a tweet before retweeting as long as you’re adding value or context then that’s fine, where you’re just doing it to get the ‘credit’ is a different story and even more so if you don’t even mention where you originally got the information from.

A good example of the different types can be seen from the image below, where the tweet from @anfieldcat has been both straight retweeted and also edited.  There’s also every chance that the joke itself was lifted from elsewhere by @anfieldcat.
An example of a tweet spreading out from its original source 
I haven’t necessarily followed my own advice all the time, the biggest thing I’ve done wrong is avoiding getting involved too much in interacting with other users and the stream has been more like a broadcast than a conversation.

You don’t want to annoy people with constant messages but to go back to the pub analogy, if you just sit in the corner nursing your pint then people will pass you by and you’ll miss out.

Thursday, 22 September 2011

The Importance of Being 'Liked'

If a tree falls in a forest, does it make a sound? If someone writes a blog and nobody reads it, does it exist?

Getting people to hear your message has always been a challenge but as well as the traditional methods such as Press/Direct Mail/TV there are now a number of other channels which bring with them a different set of rules.

I recently saw a connection of mine on LinkedIn had a status update that referred to a ‘Like’ they had for an update made by a connection of theirs:


Over 6,500 Likes and Over 1,300 Comments!













The sheer volume of likes and comments on this update show how a comment or update can travel far beyond your own sphere of influence, after all, one of the principles of LinkedIn is the 6 degrees of separation where we are all connected somehow and ‘my connection used to work with you’ sort of warmish leads.

Each person who is connected to one of the 6,526 people who liked the post would have seen that as an update from that connection (as I did when my connection liked the update), so if you think of all the people who saw it, but didn't bother liking the update there were probably tens of thousands of people who saw that update.

Getting a Like on a status update such as the one above reminds me of an episode of the Simpsons where Bart’s teacher Mrs. Krabappel gets passing motorists to honk their horns, seemingly in support for the teacher’s strike but in reality the placard she’s holding says ‘Honk if you love Cookies’.

Getting a Like (or a retweet) on a more sober business related message is harder but if it’s interesting enough people will pass it on as they’ll want to be generating useful content for their followers and there’s only so much content they can generate themselves.

If you can prove to be a useful filter of all the noise that’s out there then people will value what you say whether it’s related to your own or someone else’s content.

The most important thing I’ve learnt though is if you don’t ask you don’t get, so please like/retweet this blog and help to prove that point.

Our next blog will look at the ‘half-life’ of a tweet/status update and how to maximise the value of your activity to increase the proportion of your followers that see your message.

Dan Barnett

LinkedIn: http://www.linkedin.com/in/danjbarnett

Twitter: @analysismktg http://www.twitter.com/analysismktg

Wednesday, 7 September 2011

What train prices can teach you about Optimisation

As a Swansea City fan living in Hertfordshire, getting back to watch games is a relatively lengthy process with myriad options involving car, train and bus.

In terms of optimal solutions if you had an unlimited amount of time and wanted to minimise costs then you could walk or cycle, if you had unlimited budget and wanted the quickest time possible between the two points you could charter a helicopter.

In this blog I look at the pricing structure of train fares between London and Swansea as an example of how there are potentially hundreds of possible solutions and the difference in cost between a simple and an optimised solution.

If we look at the prices of day returns on a Saturday between London and Swansea, for just one mode of transport, one journey and one train operator there are 512 different combinations of train tickets to get between the two stations (e.g., Paddington-Cardiff-Swansea, Paddington-Reading-Cardiff-Swansea etc.,).


When looking at the price matrix the main thing that is apparent is the sharp drop in cost to travel to Swansea from Swindon onwards compared to stations closer to London.
e.g. Paddington-Swansea is £69 and Paddington-Swindon-Swansea is £63.20 (£39+£24.20).

Breaking down the Paddington-Swindon part of the journey even further, this can be reduced from £39 to £37 by breaking the ticket at Reading (or £30.70 if also breaking the ticket at Didcot Parkway, with the proviso that only every other train stops at Didcot Parkway, limiting your options).

In short you can pay £69 for London-Swansea or pay £61.20 for exactly the same service by splitting the journey into several tickets (without having to get off the train) or £54.90 if you get on trains that stop at Didcot Parkway.

“So what?” you may say, but if a business could get the same results but for 10-20% less spend then it could transform their fortunes.

What does all this have to do with Marketing? Well, if you consider your starting station as where you are now with your business and the end station as where you want to get to e.g., 1,000 sales in the next month, all the relative journeys you could take to get there could include one or multiple marketing channels each with their own levels of ROI and points at which that ROI dramatically starts to flag.

For example, from previous experience you find it costs £5,000 to get 100 sales via Direct Mail or £4,000 to get 100 sales via email. Notwithstanding the fact that not all new sales are made equal, at face value you’d say you should put your money into email marketing.

In reality there’s likely to be a point at which the cost per sale is far above the average £40 that you’re getting from email marketing at which point you should be sending people your brochure in the post.

With optimisation you can make things as complicated as you like with the number of channels, constraints on who gets targeted when (and how often) etc., but the key thing is the more you understand about who responds well to what the smarter your targeting will become.

On our website we talk about the ‘binary thinking’ that often prevails e.g., “We tried email marketing and it didn’t work”, “Door Drops are too expensive” etc., and how the mix of right medium and right message will outperform a one size fits all policy.

Dan Barnett

Director of Analytics


LinkedIn: http://www.linkedin.com/in/danjbarnett

Friday, 17 June 2011

Can Swansea City be Premier League off the pitch too?

Today saw the publication of football fixtures for next season and so the excitement from Swansea’s play-off victory is ratcheted up another notch as fans (including myself) start to pencil in visits to Eastlands first then the Emirates, Stamford Bridge and beyond.

With the riches that accompany the Premier League comes a greatly increased profile and an opportunity to maximise off-field earnings as well as those from TV and gate receipts.

This isn’t about bleeding supporters dry, it’s about engaging with supporters through the means of targeted and relevant content (both Online and Offline). Although it feels like treason to say it, I don’t think Swansea have done great things on this front so far.

Regardless of what’s been going on this season, the only contact has been a weekly email every Friday afternoon which has a couple of one line teasers to content on the website with the rest of it being advertisements.

It could be argued that the aim of a newsletter is to drive people to the website and so content in the newsletter isn’t crucial. Unfortunately, in my opinion the content in the newsletter is so bland that I rarely bother opening them as I know that I’m better off just looking at the website directly now and again (I’m sure in plenty of cases however people will neither open the email or visit the website).

For successful off-field business, a club needs to be able to identify who is doing what with them. The obvious starting point are Season Ticket holders, but then beyond that there’s everyone who subscribes to the email newsletter, everyone who buys Match Day tickets (both Home and Away), Club Shop sales etc.,

One example of where this may have helped is after sealing promotion they announced the sale of a final batch of 2,000 Season Tickets. Rather than being able to determine those who may be more deserving of the opportunity of buying these tickets, it was announced on a Thursday afternoon that they were going on-sale at the club ticket office at 10 a.m. the next day, in-person sales only.

This meant that those season tickets just went to the first people who could get down to the ticket office and didn’t have to worry about work the next day. Some would say that real fans would have bought their Season Tickets already but there will be others who would argue they have been forced out by those jumping on the bandwagon.

It doesn’t have to be like this though, by combining all customer interactions you can then have a more solid understanding of:
• Who your key supporters are
• Who to target to try and increase their interaction with the club
• Delivering different offers/communications for different segments e.g., Those who bring children to the game, those who travel to away games, exile fans etc.,

This approach could be a cost effective strategy for a non-league team let alone a Premier League side but all too often is overlooked as it’s too far removed from the day to day running of a football club but is about:

• Capturing data
• Consolidating data into a single area
• Analysing the data to gain insights on activity

Which of course is where we come in, so if there’s any football clubs out there who are looking to get more from their data then get in touch.

Dan Barnett

Director of Analytics
blog@analysismarketing.com
LinkedIn: http://www.linkedin.com/in/danjbarnett