Wednesday, 9 April 2014

What is your biggest problem? #letstest2013

I got asked the following question at the letstest conference in Sweden last year.

What is your biggest problem?

I answered the question, by saying something along the lines of:
Getting access to the system to try some tests out, I also started to talk about some of the tools I was using "python and google spreadsheets to probe the system"

 Does that seem ok? (better than saying the trotts or something but....)

Not to me because I don't think I answered the question correctly... In fact as soon as I answered the question I thought I hadn't answered it very well.

1. I made an assumption to the question - The 'biggest' problem in my current test activity
2. I didn't really ask many questions about the question - To which I'm most annoyed...the questions context!

Let's assume that the question was indeed about the biggest problem in my current context. I answered about running tests on the system, i.e 'Controlling and Observing the System' even then I don't think that's actually what I really meant. So what did I really mean...

I may have answered the question as a current problem, not necessarily the biggest problem.

 Well actually I think my biggest problem as a tester is trying to identifying problems!
 Ermmm Well Thanks Peter!, that's really helpful I hear you cry... 

At the time the question was asked I had been working with 'Big Data', (Note: I feel 'Big Data' is rather like 'Quality' - Subjective i.e. What 'Big data' means for one person is probably not the same another') In order for you to understand my context I probably need to explain what that meant to me.

 The following presentation may give you a flavour: http://files.meetup.com/1789394/Mike%20Keating%20-%20News%20Int%20-%2018th%20BDL%20meetup.pdf

A brief summary of the system in which I was testing:
A data pipeline:

  1. Collect data from various internal and external sources (in the big data world this is sometimes called the ingest process)
  2. Do any initial formatting or transformations to a schema (transform)
  3. Enrich the data

Here is a brief overview of some problems I have encountered:

PIST UP

Purpose:

Why are we doing this? / Why are we doing this NOW?/ Why did we do this? /Should we do this? (or is this just framing)
   A few people asked me this at letstest, I had to stop and think about it, why were we doing this?
To understand our customers for:


Improving customer experience
Maximizing revenue


Interpretation:

I have been struggling to try and explain this until I stumble back onto something I favourited on twitter from   .

'Combating "about-itis" by thinking with and through what you know '

I believe this is hard, translating what you have observed and making sense of it. What is the veracity of the data?
Some interesting observations:

Data skewing:
  • Data from web forms, Data can come from the quickest way to fill forms first/last items in a list i.e. 01, January, 1900  Real or fake?
  • Data from different systems can set some default values.
  • Test Data - Is there test data in the system? Where is it? Where did it come from?
  • Ludic fallacy? Relevance paradox?


Scale:

Large amounts of production data / Prod data vs Test Data?
Ok so how do you test at scale?
There are a number of options, and of course it depends on your scale and what you are testing
  • Do you create large amount of test data?
  • Do you use production data?
  • Do you use small amounts of test data for targeted testing?
Sorry to say that there is no silver bullet I think all 3 in combination can help, which can cause problems.
How do you generate? - what you want, when you want it.
How do you anonymize? - totally cleanse vs data that is meaningful for a test


Technology:

New technology, constantly evolving, bleeding edge
  • Hadoop distributed infrastructure / abstractions e.g. apache crunch - http://bigdatauniversity.com/
  • HBASE - http://bigdatauniversity.com/
  • javascript frond end / backbone/requirejs/yeoman/ highcharts /
Using technology that there is not a lot of information about led me too a steep and bumpy learning curve.

Unknowns:  

What don't I know from Purpose, Interpretation, Scale, Technology and platform....

Platform: 

Distributed architecture / making observations / testing / logging / access / speed

What I learnt from that question was: I must become better at explaining things.

I must become a better tester.

Also note: Having a conversation while a band is playing is hard....