Robert Picard

Robert's blog.

I’m Writing a Book on Flask

I’ve been looking for a project to do this summer. I’m going to have a little over two months off of school, and I don’t have a lot of time-consuming responsibilities at work this summer. Yesterday, I found my project. I’m going to write a book on web development with Flask.

Why me?

I’m not an expert on web development, or Flask. I am, however, fairly knowledgable about both. I’ve spent countless hours looking for the best practices for coding various tasks that are expected of a modern web application. I think there is some real value in a book that lays out more advanced examples than are within the scope of the Flask documentation, with an emphasis on current best practices.

Now, I write. I have a lot of ideas for sections to include, such as “A/B Testing your Flask app.” I’m going to need to do a lot more research into what the best practices really are, but I’m setting the goal of publishing the book on August 1, 2013. I have June and July to write, edit, market, and launch the book. It may be slightly ambitious, but I plan to have a lot of time on my hands.

I’ll be posting to this blog with updates on my progress as well as samples from the book as I write it. If you want to keep up with things, I encourage you to join my mailing list by entering your email in the form in the sidebar of this page.

Also, if you have suggestions for subjects you’d like to see in the book, send me an email at mail@ this domain, or leave a comment below.

Getting DuckDuckGo Instant Answers by SMS With Twilio

The Problem

I don’t have a smartphone, but sometime when I’m reading I come across an unfamiliar word. I want to define words (and answer similarly simple questions) without having to get to a computer or find the right book (e.g. a dictionary or encyclopedia).

The Solution

I built a small web app that receives requests from Twilio and returns a response after querying the DuckDuckGo API. You text a phone a number your search query, and it texts back the answer from DuckDuckGo.

I wasn’t really expecting to use it much when I first made it. I really just thought it was an interesting idea, and I hadn’t made anything with Twilio before. It has turned out to be super useful though.

The Stack

I originally built this back in October of 2012, but on seeing a recent Hacker News post for whit, a small app that solves a similar problem, I thought that I should open source what I had written.

I first wrote the app using Perl and the Dancer web framework. After seeing the post, I took a look at the code for the first time since last year. There were a lot of cruft files for Dancer that I just didn’t need for the 30 or so — well spaced — lines of code that actually did the work.

I decided to rewrite the app in Python using Flask, my current framework of choice.

Now it’s a small Flask app living in a single file. The app is being served via Gunicorn on the Amazon EC2 instance that I use for small projects like this.

When you add a phone number to Twilio, you can specify an SMS URL. When somebody sends a text message to that phone number, Twilio sends a GET or POST request (depending on your settings) to the specified URL, then parses the TwiML (Twilio Markup Language) that is returned. This TwiML allows you to respond to the request by answering the phone, sending a text message, playing a sound, etc.

When my Flask app receives the request, it queries the DuckDuckGo API for an instant answer, shortens it to fit into one text message (160 characters), and returns the answer in Twilio’s TwiML XML format. Twilio takes that response and texts it back to the sender.

The Code

The code is available on GitHub: https://github.com/rpicard/text-ddg

You can try the one running on my server by texting 813-419-1902.

FindALib - a Big List of Free JavaScript Libraries

I created FindALib to bookmark interesting JavaScript libraries a while ago. I never did a write-up about it, so I figured I would take the opportunity to do so now.

The Problem

A while ago I wanted to display some images on a web page, and I knew that there were plenty of JS libraries to create a cool image modal, but everything I found was either really crappy or just not free. I ended up finding a really good library, but I wanted to make that sort of thing easier in the future.

The Solution

My solution wasn’t particularly creative, but it works. It’s just a database of libraries. When I find an interesting one, I log-in to the admin panel and add it.

The Stack

FindALib was the first project I created with Flask, and python for that matter. The stack is pretty much:

  • A Flask app to show the libraries that are listed in the database and give me an admin panel I can us use to add more.
  • A Postgres database holds all of the library information. I originally used sqlite, but I kept feeling like it was too ephemeral.
  • The application is served using Gunicorn and nginx. Requests hit nginx and are reverse-proxied to the Gunicorn server.
  • The servers are running on an Amazon EC2 micro instance along with some other projects. I don’t expect any of these to blow-up any time soon so there’s no need to dedicate a machine to them.

The Future

I recently made it possible to create an account on FindALib, but the accounts don’t do anything yet. The plan is to let users rate libraries that they’ve used and add libraries to the database themselves. This would mean having some sort of moderation feature as well to make sure the quality of the listings stays high. Users could also add links to projects that are using the libraries, but keeping up with broken links would be a nightmare.

None of this is a major priority while I’m the only person using the site, but it’s something fun to work on every now and again.

Check out FindALib

Do the Police Give More Tickets Towards the End of the Month to Meet Quotas?

Update 10/21: I calculated some more conventional statistics and weighed the data to account for days of the week. See more here.

Disclaimer: I am not a statistician. This is just a little experiment I ran based on the data that I could get my hands on. I’ve tried to give a comprehensive explanation of my methodology, so if you notice any flaws, please let me know!

A recent discussion with a friend left me wondering how much truth there is to the rumor that the police give more tickets towards the end of the month to meet quotas. As a fan of the Steven Levitt / Stephen Dubner team behind Freakonomics I decided to find some data and see what it had to say.

The Data

The data I’m using is from the City of Baltimore’s Open Data Catalog. [1] It contains information on almost two million traffic citations issued in the last decade. [2][3]

Is the data comprehensive?

The first thing I wanted to know about the data is whether or not it was comprehensive. It was obvious that certain earlier years weren’t complete; there is a single citation listed for 1999.

Unfortunately, I have not been able to deduce whether or not the set is complete. This brings me to my big assumption:

The data is either complete, or a sufficiently representative sample of all tickets in 2009, 2010, and 2011.

I don’t like making that assumption, but I’m not writing a paper for journal publication here, so I’ll just work with what I can get.

Does it accurately represent the question?

The question here is whether or not the police give out more citations towards the end of the month. This data includes citations given by automated speed cameras, but I can remove those from the data set. I think that the remaining citations will give an accurate record of manually issued citations.

Processing the data

There are several things that need to be corrected for before the raw data will show us what we want to know:

  • The start and stop points of the data are a little blurry.
  • The data includes citations issued by automated cameras.
  • There aren’t as many 31sts, 30ths, or 29ths in a year as other days of the month.

We’ll start with a graph of all of the data in the set[4]:

Distribution among days of the month for all citations in the data set

Limiting the data to a certain date range

Now we want to firm up the edges of the data set. I’m going to alter my processing script so that in only counts citations issued in 2011.

Distribution among days of the month for 2011 citations

Removing automated cameras from the data

Now we want to skip citations issued by automated cameras. To do this, I filtered out violation codes 32 and 33, i.e. fixed and mobile speed cameras.

Distribution among days of the month for 2011 citations - speed cameras removed

Only a small number of tickets were removed by filtering out speed cameras, so the graph is virtually unchanged.

Correcting for more frequent dates

In the update below, I also corrected for the effects of more frequent days of the week, so many of the charts have changed. Some points made in the analysis may now be moot, but I’m leaving this part in its original form. Make sure you read the update too!

Only eight months in a year have a 31st. To account for this change among different days of the month, I used the following formula to figure out what an “even” distribution of citations would look like for a given day of the month[5]:

(((appearances in year * 100) / # days in year ) * total citations) / 100 = expected # of citations

Here’s what an even distribution of citations would look like:

Expected distribution among days of month for 2011 citations

Now, to compare the expected distribution with the actual distribution, here’s a chart of the difference between the two for each day of the month:

Actual - Expected tickets vs Day of Month (2011 - Speed cameras removed

When the bar is positive, more tickets were given than would be expected, and when it’s negative, less.

Here’s the same graph recalculated with the citations from 2010 and 2009:

Actual - Expected tickets vs Day of Month (2010 - Speed cameras removed

Actual - Expected tickets vs Day of Month (2009 - Speed cameras removed

Finally, I’ve averaged out the data from those three years. I first normalized the data for each year to account for different total numbers of citations in each year with this formula:

(actual - expected) / total * 1000 = normalized number

Actual - Expected tickets vs Day of Month (3 yr. avg. - Speed cameras removed

This is the key graph here. It shows that the citation rates from the 10th to the 27th are, for the most part, lower than we’d expect with an even distribution, and that on the 28th, 29th, and 30th those rates jump up to a much higher level than we’d expect. Interestingly, it also shows that the rates drop to well below their expected levels on the 31st.

Analysis

There are a few things about this graph that I find interesting:

  • The rates jump on the 28th, 29th, and 30th.
  • The rates dive back down for one day on the 31st.
  • The first nine days have higher-than-expected rates.
  • The period from the 10th to the 27th has lower-than-expected rates.

I can only guess what the cause of these patterns might be. Since we want to to know about quotas, let’s see if we can explain things with that.

  • “I need to have X tickets written in 4 days!”
  • No good explanation
  • “I don’t want to be rushing like that again this month.”
  • Falls behind after the fear from the last rush fades

So what does this mean? Are the police giving out more tickets at the end of the month to meet quotas? Not exactly. It may look like a quota system would explain the patterns we see in the data, but you could probably come up with another explanation that seems to fit the data too. It’s also worth noting that a quota system wouldn’t really explain the drop on the 31st, and my quota-based explanation for the high rates in the first week of the month is kind of thin.

Conclusion

Well, it’s been fun playing economist, but I don’t know if I really have enough here to draw a conclusion. Do the police in Baltimore give out more tickets at the end of the month to meet quotas? It seems plausible, but I don’t know more than that.

Updates

Update 10/21: I’ve done a little studying, and I’d like to provide some more conventional statistics.

Here’s some data from each year:

Year Sample Size Mean Diff. From Expected Standard Deviation
2009 68,396 -5.452 152
2010 124,031 -10.226 352
2011 447,846 -38.871 2467

Z-score is a statistic used to normalize data. It is calculated with this formula:

Z-score = (data point - mean) / standard deviation

Here’s a graph of the average of Z-scores for the three years:

Z-scores (3 year average)

Several dates — such as the 6th and 28th — appear very different in this graph than in the other average.

Here we can directly compare the two. The graph from the old-formula has been re-scaled, so the numbers along the y-axis are not accurate for that data.

Z-scores and Old Formula

Day of week

Several people suggested that I take days of the week into consideration, since it’s likely that the number of tickets given out on weekends is very different from those on weekdays, and that if any day of the month should fall on a certain day of the week more often than others, it would skew the results.

To adjust for this, I’m using a formula suggested by one of those readers:

Day Weight = Total on day / Total in year

Here are the weights for days of the week in each year expressed as percentages of the total number of tickets for that year. They’ve been rounded, so they may not add up to 100%:

Year Mon Tue Wed Thu Fri Sat Sun
2009 15 16 15 17 16 11 09
2010 18 18 17 18 19 06 04
2011 19 18 17 18 20 05 04

It looks like there are far fewer tickets on weekends than weekdays in this data. How will I account for this?

Remember that chart that showed the expected tickets vs day of month? Here’s a little reminder:

Expected distribution among days of month for 2011 citations

Well I’m going to re-calculate the expected tickets for each day based on the distribution of those days among days of the week. Here’s the result:

Expected distribution adjusted for days of the week

Now I’m going to recalculate the differences based on these new expectations. Here’s the unadjusted chart, followed by our newly adjusted chart:

Z-scores (3 year average)

Z-scores (3 year average - adjusted for day of week)

The first thing that I notice on this chart is that the previous drop that occured on the 31st is now reversed. When we adjust for the effects of different days of the week, each of the last four days of the month is above what you would expect from an even distribution of tickets.

That doesn’t necessarily change my conclusion, but it does let us look a little closer at the true effects of day of the month on ticketing rates.

Notes

1 https://data.baltimorecity.gov/

2 The earliest data point in the set is acually a red light violation from 1999, but it’s the only citation listed before 2002.

3 The data set is labeled “Parking Citations,” which is a little confusing because it seems to include several moving violations, such as citations issued by automated speed cameras. In any case, I don’t see any reason to assume that this would skew the data distribution among days of the month.

4 These charts were generated with this tool.

5 I used “# days in year” rather than specifying 365 to account for leap years.

The Tale of Dr. Edward and Mr. Allen

This is the tale of how an interesting email found it’s way to my SPAM folder.

“Andrew, the Edward estate has been on your desk for almost four years. Every quarter I ask about it, and every time you tell me you’ve gotten nowhere. What am I supposed to make of this?” Dick Goodman was a short man with a circumference that made up for what he lacked in height. He dressed like an actor in a 1960s detective movie. Goodman was a partner at the firm where Andrew worked and he’d been in law since before Andrew was born.

“I’ve had associates re-checking for a next-of-kin every couple of months, but they always come up empty-handed. All of the usual channels for this sort of thing are dry.” Dr. Edward was a client of the firm, who had passed away several years earlier. He left behind a sizable chunk of money in an offshore bank account, but never gave the names of any relatives, so when it came time to execute his estate there was a major loose end. The firm couldn’t take their commission until the estate was closed, as Goodman was keen to remind Andrew.

“One more year and we’ll take the commission out of your bonus.” Andrew knew the threat was empty; there were several partners more level-headed than Goodman, and they understood that problems often arise with large estates. Still though, the case of Dr. Edward was particularly troublesome. Andrew just wanted to get it – and Goodman – out of his hair. “If that’s all, I have work to do.” Goodman’s face showed that he wasn’t happy with that response, but he waived Andrew away and sat back down at his desk.

Andrew left Goodman’s office through a door frame that was slightly larger than the others on the fifteeth floor of 49 Billy Lows Lane. Goodman had it widened when he moved from the eighth floor after being made partner. Unfortunately, that move meant that everyone who worked for him – still stuck on floor number eight – had to ride the elevator up and down several times a day to talk to him. Nobody was surprised by it though; Dick Goodman was a vain man. Having an office on a corner of the building, with windows on two walls, was as much about appearances as views.

Andrew caught the elevator just before it closed. He stepped inside the retreating doors and took his place among a group of audibly impatient colleagues. Andrew saw that the light around the button for the eighth floor was already lit, but a meeting with Goodman was as good an excuse as any for a break, so he hit the button labeled “1.”

Andrew Allen had been working at Johnston, Murphy, and Schwartz for a little over six years. He was recruited right out of Harvard Law School. His work wasn’t too stressful, but it wasn’t as fulfliling as he’d expected either. He mostly dealt with wills and trusts, but sometimes he helped clients who wanted a familiar face on their side when dealing with general legal matters. The only interesting things that happened were heated arguments over inheritance, generally led by people who couldn’t care less that the population of their family had just decreased.

On his way through the lobby, Andrew nodded politely at a partner having a conversation with someone who’s face he couldn’t see. His display of courtesy was ignored, so he continued out of the building. It was a ten minute walk to the coffee shop where Andrew liked to relax during work. When he walked in, he was greeted with a faux-cheer filled “Welcome.” At least they pretend to care. He got his coffee and sat at a table in the back of the room.

Since he had come straight from his meeting with Goodman, he still had the Edward file in hand. He opened it on the table in front of him. There was about a magazine’s worth of paper in the manilla folder, mostly account statements and forms the late doctor filled out to open those accounts. In every one of the latter forms, the “NEXT OF KIN” column was blank. You’d think a doctor could fill out some paperwork.

Andrew hadn’t come up with any ground-breaking insights by the time he realized he’d been gone for half-an-hour, so headed back to his office. One perk of his boss moving seven floors up was that he didn’t notice when Andrew was gone a little too long. Back in his office, he sat down to tend to some more fruitful matters. John Wilson, a young college student in Utah, had been left a seven-figure sum by a distant relative he’d never even met. This wasn’t the first such case the firm had handled, but it was the first that Andrew worked on himself. It was a nice break from the bitter-sweet mix of emotions heirs usually feel, or at least pretend to feel. One day he’s an average college student and the next he’s a multi-millionaire. It’s like a loophole in the rules of life.

Suddenly Andrew was struck by a solution to the problem with the Edward estate. He dismissed the thought as crazy, and continued on with his other work, but the idea kept coming back to the front of his mind for the rest of the day. When he woke up the next morning he still hadn’t shaken it. He got into the elevator at 49 Billy Lows Lane that morning, and he passed the eighth floor and headed for floor fifteen.

The door to Goodman’s office was closed when he arrived. He pulled up his fist to knock, but paused. If anyone will go for this, it’s Goodman. He brought his knuckles down on the wooden door three times. The response from inside wasn’t entirely clear, but Andrew didn’t hear anyone else in there so he opened the door and stepped through the widened frame.

“Make it quick, I’m meeting a client at the golf course in half-an-hour.” Andrew closed the door behind him. “I think I may have a solution to the Edwar—” “It’s about time! Let’s hear it,” Goodman interrupted. “Well, in a little over a year, everything in his accounts will be collected by the government if we don’t find an heir. My suggestion is that we…” Andrew hesitated, “…that we find an heir.” His subtlety was lost on Goodman. “No shit. What do you think we’ve been paying you to do? Go back to work and quit wasting my time.” Andrew persisted, “No, I’m saying that we should just find someone, and make them the heir.” Goodman was finally catching on. He let out a low chuckle, “You sly son of a bitch. I’ll tell you what. You find me an heir,” Goodman started the sentence with an exaggerated wink, “and I’ll see to it that you’re bonus reflects your hard work.” Andrew was relieved. With a smile and a nod, he left Goodman’s office.

As he waited for the elevator, Andrew became aware of the fact that he didn’t feel at all guilty for his plan. Lawyers defrauding clients was one thing, but keeping some money out of the government’s reach wasn’t really hurting anyone. Plus, some lucky kid is about to get the call of his lifetime.

When he got back into his office, he immediately pulled open his laptop and started his hunt for a good candidate. After searching all day, and seeing the bottom of several cups of coffee, he had narrowed the search to two people. One was a woman in Missouri and the other a student in Florida. They both seemed like they’d be fairly responsible with the money, something Andrew made sure to look for, and were prime candidates in most every other way. Finally, he decided to flip a coin. Heads is Florida, tails Missouri. He found a coin in the drawer of his desk, gave it a flick with his thumb, and watched it land on the desk in front of him. It rolled for a few seconds before falling flat. Heads it is.

With the identity of the “heir” decided, all that was left was to make contact. He had to decide whether to tell Robert from Florida the truth, or to try and make him believe that he had some long-forgotten relative who named him in his will. The likelihood of anyone believing that story seemed pretty low, so Andrew decided that it would be best to have Robert on the same page. He took a deep breath, and tried the only phone number he was able to find. His call was answered with an automated message, “The number you are trying to reach has been disconnected.” The number was a dead-end, but he still had an email address to try, so he typed up an email, hit “send,” and waited.

Mr. Andrew Allen
49 Billy Lows Lane, Potters Bar,
Hertfordshire EN6 1UX.
                                            
My Dear Sir,    
                
I will like to seek your help in a business proposal, which although
is sensitive by nature and not what I should discuss with someone I
don’t know and have not met using a medium such as this but I do not
have a choice.
 
I am Mr. Andrew Allen, personal attorney to late Dr. Edward, who died
of a cardiac arrest a few years ago leaving behind a large sum of
money with a commercial bank in the Island of Seychelles which is
a tax free zone, a place where plenty of rich people tend to hide
away funds not ready to be used or invested, I am also the Client’s
manager. I will not mention the amount of money which runs into
several millions in United States Dollars and name of bank presently
until we have agreed to deal. I trust you will understand the need
for such precautions.
 
So far, valuable efforts has been made to get to his people but
to no avail, as he had no known relatives more because he left his
next of kin column in his account opening forms blank and he has
no known relative. Due to this development the bank has been expecting
someone to come forward as a close relative to claim the funds otherwise
as the Seychelles national laws would have it, any dormant account
for five years will be declared unclaimed and then paid into the
government purse.
 
To avert this negative development my colleagues and I have decided
to look for a reputable person to act as the next of kin to late
Dr. Edward, So that, the funds could be processed and released into
his account, which is where you come in. We shall make arrangements
with a qualified and a reliable attorney to represent you locally
to avoid any inconvenience of you coming down to claim the funds.
 
All legal documents to aid your claim for this fund and to prove
your relationship with the deceased will be provided by us. Your
help will be appreciated with 30% of the total sum which I would
disclose in my next email Please accept my apologies, keep my confidence
and disregard this letter if you do not appreciate this proposition
I have offered you.
 
I wait anxiously for your response.
 
Yours Faithfully,
Mr. Andrew Allen

How I Came to Be an Intern at DuckDuckGo

From humble beginnings

I’ll start with a little background about myself. I’m currently eighteen years old. Wanting to make a plugin for a MyBB forum, I started programing with PHP around two years ago (judging from old forum posts). I cringe when I recall my plans to “make it big” with a cool forum, but we all have to start somewhere.

The plugin, created mostly by looking at similar plugins made by other people, was my first attempt at programming. Despite this, I managed to get on the MyBB development team as a “Junior Developer;” they wanted me to help with the next release. I quickly realized that I had no idea what I was doing and left within a week. They didn’t ask me to leave. I thought, “They must not understand how little I actually bring to the table. I can’t do anything for them. It’ll be so awkward when they realize that. I’m going to look like an idiot,” and left before attempting a single line of code.

A real project

Over the next couple of years, I went through a few phases of learning more PHP and stopping. I enjoyed it, but never really had a project; I needed a reason to learn.

Back in September 2009, in an earlier attempt at “making it big,” I started buying and selling domains on NamePros. Aside from a couple of sells that one might put in the “win” category, including a net of over three-hundred dollars on a batch of adult names picked up for free on the Digital Point forums, I wasn’t very successful. Around the same time, I tried a number of pre-made PHP scripts for creating various types of websites, e.g. forums, link directories, etc., in still other ill-fated attempts at getting rich quick.

In August of 2011, after moving into my dorm at UNF, I decided that it was time for a real project. Drawing from my failed attempts at domaining and using pre-made scripts, I created DNP Script, a PHP script that lets you create a site to display your portfolio of domain names. It’s not amazing, but it works. It was a fun project and it felt great to have finally completed something.

Out on a limb

In November of 2011, while looking up information on the hiring processes of various startups (I’ve always enjoyed thinking about my future career), I came across Gabriel Weinberg’s post on inbound hiring. I read a little more on his blog, as well as Caine Tighe’s post on working at DuckDuckGo, and eventually ended up on Gabe’s Twitter. A recent tweet caught my eye:

Looking for paid student interns for @duckduckgo around Philly. Lots of software projects. Please let me know if you’re interested.— Gabriel Weinberg (@yegg) November 12, 2011

Struck by the possibility that I could be an intern for a real startup like DuckDuckGo, I emailed him. I’d previously emailed Gabe about another project that I was going to work on. He was very helpful, referring me to someone who had a startup doing what I was planning to do. Maybe this made me less timid with regards to inquiring about the internship, even though I knew it was a long shot; I figured, “I’m in my first semester of college and just finished my first actual project. I’m not ready to contribute to a real startup.” I was actually a little relieved when he replied to say that they were looking for local interns, i.e. interns in the Philadelphia area, who are a little more experienced with Perl (the primary language used to build DuckDuckGo). He said that I should learn some Perl and think about contributing to the open source projects.

A second chance

I became enveloped in my classes and didn’t do much programming until I started my first computer science class (Computer Science I) in January, 2012. Then, late in February, I opened my inbox to find an email from Gabe. He was following up on my progress with “learning Perl and what not.” I told him that I’d been busy with school but that I’d love to get involved and would start learning. He said to keep him posted. Oh shit! I might actually have a chance at this internship.

Three weeks later, my first goodie was merged into the master GitHub repo. I rushed to my email to tell Gabe that I’d gotten started. I said thanks for the follow-up email and told him about the Goodies that I’d contributed. He suggested we talk via Skype. Excited that I’d be video chatting with the Gabriel Weinberg, I was expecting to talk about contributing to the open source projects. It quickly became clear, however, that this was about the internship; I didn’t even know that it was still on the table!

After a great chat with him, he wanted me to talk to Caine. After that, I was pretty confident that I had enough experience for what I’d be doing. They must’ve thought so too because less than a week later I was officially a DuckDuckGo intern. At this point, though, I started to have flashbacks to joining the MyBB development team. Despite the fact that I was completely transparent about my level of experience with the two of them, the thought of backing out “before they see that I don’t know what I’m doing” flashed across my mind. I couldn’t do that though – this was too big of an opportunity. Then I started working. “Wait a second, I can actually do this.” I realized that I’d been getting in my own way the whole time.

7/29/2012 – Removed the “Lessons Learned” section because it was way too cheesy.