This is a post about my experience at PyData London 2017. About what I liked, what I learnt... Note that having 4 tracks, and so many people, my opinions are very biased. If you want to know how your experience would be, it'll be amazing, but different than mine. :)
On the organization side, I think it's been excellent. Everything worked as expected, and when I've got a problem with wifi, I got it fixed literally in couple of minutes by the organizers. It was great to have sushi and burritos instead of last year sandwiches too. The slack channels were quite useful and well organized. I think the organizers deserve a 10, and that's very challenging when organizing a conference.
More on the content side, I used to attend conferences mainly for talks. But this year I decided to try other things a conference can offer (networking, sprints, unconference sessions...). Some random notes:
Bayesian stuff I think probabilistic models is the are of data science with a higher entry barrier. This is a personal opinion, but shared by many others, including authors:
The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author's own prior opinion. Cameron Davidson-Pilon
It looks like there is even terminology to define whether the approach used is mathematical (formulae and proofs quite cryptic to me), or computational (more focused on the implementation).
It was luxury to have at PyData once more, Vincent Warmerdam, from the PyData Amsterdam organization. He has been one step ahead of most of us, who are more focused on machine learning (I didn't meet any frequentist so far at PyData conferences). He already gave a talk last year about the topic, The Duct Tape of Heroes: Bayes Rule, which was quite inspiring and make probabilistic models easier, and this year we've got another amazing talk, SaaaS: Sampling as an Algorithm Service.
After that, we managed to have an unconference session with him, where we could see more in detail the examples presented in the talk. While Markov Chain Monte Carlo or Gibbs sampling aren't straight forward to learn, I think we all learnt a lot, so we can finish learning all the details easily by ourselves.
There were other sessions about Bayesian stuff too:
- Bayesian optimisation with scikit-learn - Thomas Huijskens
- Variational Inference and Python - Peadar Coyle
- Bayesian Deep Learning with Edward (and a trick using Dropout) - Andrew Rowan
- Segmenting Channel 4 Viewers using LDA Topic Modelling - Thomas Nuttall
I've got good recommendations of books related to probabilistic models and Bayesian stuff, which shouldn't use the tough approach:
- Bayesian methods for Hackers
- Information theory, inference and learning algorithms
- Computer age statistical inference
- Statistical Rethinking: A Bayesian course with examples in R and Stan
There is a Meetup in London, which is the place to be to meet other Bayesians:
Frequentist stuff
Amazing also to seehow Lev Konstantinovskiy managed to run a tutorial, a talk, a sprint and a lightning talk, during the conference.
From theory to practice
It may be just my impression, but I'd say there have been more talks on applications of data science, and more diverse. While I remember talks on common applications like recommender systems in previous editions, I think it's been an increase on the talks on applications of all these techniques, in different areas.
To name few: -
That awkward moment when you thought you knew Python, but James Powell is your interviewer...
Ok, it wasn't an interview, it was a pub quiz, but the feeling was somehow similar. 10 years working in Python, I passed challenging technical interviews for companies such as Bank of America or Google, and at some point you start to think you know what you're doing.
Then, when you're relaxed in a pub, after and amazing but exhausting day, James Powell starts running the pub quiz, and you feel that you don't know anything about Python. Some new Python 3 syntax, all time namespace tricks, and so many atypical cases...
Luckily, all the dots started to connect, and I realized that few hours before, I was discussing with Steve Holden about the new edition of his book Python in a Nutshell. Which sounded like an introduction to me, but it looks like it provides all Python internals.
Going back to the pub quiz, I think it's one of the most memorable moments in a conference. Great people, loads of laughs, and an amazing set of questions perfectly executed.
Big Data becoming smaller
As I mentioned before, my experience at the conference is very biased, and very influenced by the talks I attended, the people I met... But my impression is that the boom on big data (large deep networks, spark...) is not a boom anymore.
Of course there is a lot of people working with Spark, and researching in deep neural networks, but instead of growing, I felt like these things are loosing momentum, and people is focusing on other technologies and topics.
Meetup groups
One of the things I was interested in, was on finding new interesting meetups. I think among the most popular ones in data science are:
- https://www.meetup.com/PyData-London-Meetup/
- https://www.meetup.com/London-Machine-Learning-Meetup/
- https://www.meetup.com/London-ODSC/