https://sedimental.org/ Sedimental Thoughts on FOSS and fintech, layered by Mahmoud Hashemi 2026年05月15日T18:10:57Z hourly 1 &copy; 2026 <a href="https://sedimental.org/about.html">Mahmoud Hashemi</a> <a title="FinFam" href="https://finfam.app"><img src="/uploads/ff_icon_sm.png" alt="FinFam" /></a> <img height="14" src="/img/by-sa.png" /> Chert 0.1 https://sedimental.org/product_hunt_is_dead.html Mahmoud Hashemi https://sedimental.org/ Product Hunt is Dead 2025年09月24日T00:00:00Z 2025年09月24日T00:00:00Z <p><p>First, the good news.</p> <p>It's been one week since <a href="https://finfam.app">FinFam</a>'s <a href="https://sedimental.org/announcing_finfam.html">beta launch</a>! The <a href="https://news.ycombinator.com/item?id=45252031">Show HN post</a> trended nicely, netting enough eyeballs to make me confident that FinFam is the world's first and only <a href="https://finfam.app">collaborative financial planner with a marketplace of interactive, open-source expert opinions</a>. I'm especially gratified by the users I'm meeting <em>through</em> the product. Nothing like it.</p> <p>So, launch is going great, no regrets, right?</p> <h3 id="my_one_regret"><a href="#my_one_regret" class="toclink">My one regret</a></h3> <p><a href="https://sedimental.org/uploads/ph_awardjack.jpg" target="_blank"><img src="https://sedimental.org/uploads/ph_awardjack.jpg" alt="You too can win the award, jack." align="right" width="30%" /></a></p> <p>That brings us to the subject of today's PSA.</p> <p><strong><a href="https://en.wikipedia.org/wiki/Product_Hunt">Product Hunt</a> is <em>dead</em>.</strong></p> <p>I wasn't planning this post. PH wasn't even much of a launch priority for FinFam. But after seeing what I saw, I knew this had to skip the queue. The world had to know.</p> <p>After all, <a href="https://www.linkedin.com/feed/update/urn:li:activity:7373298636744617984/">my launch post on LinkedIn</a> mentioned our Product Hunt launch. And now I'm cringing thinking about how I even sent an email out to a few product-oriented friends linking them to our launch, perpetuating the myth.</p> <p>Hours later I would realize that Product Hunt is sadly no more. Gone was the site I knew from my days on <a href="https://www.producthunt.com/products/stripe/launches/stripe-invoicing">Stripe Invoicing</a>. What's left is a husk, active in appearance alone.</p> <h3 id="i_missed_the_memo"><a href="#i_missed_the_memo" class="toclink">I missed the memo</a></h3> <p>Turns out this has been happening for a while. Just last year, Fabian Maume asked, <a href="https://www.tetriz.io/blog/is-product-hunt-dying/">"Is Product Hunt Dying?"</a> He's got lots of data and background, so I'll stick to filling in the now-obvious answer: <strong>Yes. Product Hunt is dead.</strong></p> <p><a href="https://sedimental.org/uploads/ph_cat_zombie.jpg" target="_blank"><img src="https://sedimental.org/uploads/ph_cat_zombie.jpg" alt="Product Hunt has zombified" align="right" width="30%" /></a></p> <p>And Fabian's not alone. A quick search will <a href="https://webdesignerdepot.com/how-producthunt-com-became-overrun-with-ai-products/">reveal</a> <a href="https://www.reddit.com/r/SaaS/comments/1mnc3nu/i_analyzed_500_product_hunt_saas_launches_487_are/">dozens</a> of <a href="https://www.reddit.com/r/startups/comments/1mstags/is_product_hunt_losing_its_value_for_real/">nails</a> in <a href="https://news.ycombinator.com/item?id=41679768">the</a> <a href="https://creativerly.com/what-happened-to-product-hunt/">coffin</a>. I guess that's inevitable when the founder exits and in 2022 <a href="https://a16z.com/announcement/investing-in-prologue/">a16z merges your mature platform</a> with a crypto venture that no one remembers.</p> <p>But how does a dead platform appear to live on?</p> <h3 id="the_zombie_grift"><a href="#the_zombie_grift" class="toclink">The Zombie Grift</a></h3> <p>Product Hunt has a weird quirk where it resets every day at midnight Pacific time. Unlike Hacker News, Reddit, etc., PH doesn't have a rolling front page. This fixed daily scheduling idiosyncrasy leads to <a href="https://byvi.co/2022/04/19/how-to-launch-on-product-hunt-step-by-step-guide-with-tips-and-best-practices/">all-nighters as launch best practice</a>, and systemically, this means a platform originating in Silicon Valley is unlikely to have its front page content meaningfully decided by anyone in the western hemisphere.</p> <p>Much like with Hacker News, the first few hours of a post determine its impact. Instead, Europe, APAC, and in particular India have an outsized influence.</p> <p>So what really happens when you launch on Product Hunt?</p> <p>Well, your LinkedIn inbox turns into this:</p> <div style="text-align: center;"> <a href="https://sedimental.org/uploads/ph_linkedin_invites_redacted.png" target="_blank"> <img src="https://sedimental.org/uploads/ph_linkedin_invites_redacted.png" alt="My LinkedIn inbox, full of product hunt vote solicitors" width="65%" /> </a> <p><i>None of them signed up for FinFam, even</i></p> </div> <p>I was taken by surprise. What hurt the most was these midnight solicitors sharing screenshots of success stories from companies I recognized. They'd been instrumental in "launching" apps that I respect, and I'd hoped they wouldn't have to stoop to this. I even had personal connections to some of these founders.</p> <p>It was 4am, but I put on my investigative hat and I engaged with a couple. Here's how their process looks:</p> <div style="text-align: center;"> <a href="https://sedimental.org/uploads/ph_the_price_and_process_redacted.png" target="_blank"> <img src="https://sedimental.org/uploads/ph_the_price_and_process_redacted.png" width="45%" alt="100 bucks for 200 PH upvotes, enough to get into the top 5 for a weekday" /> </a> </div> <p>100ドル is all it takes to make it into the Top 5 for a weekday. One has to admit, it's tempting. If you've spent months building, 100ドル feels like nothing.</p> <p><strong>It <em>is</em> nothing.</strong> These aren't real users and PH's audience has never been a source of sticky users. 100ドル is too much to spend on vanity. And it's predatory to foster a "community" where clout peddlers can prey on susceptible, good-faith founders.</p> <p>If you're curious, you can see the paid votes landing via spikes in upvote speed on <a href="https://hunted.space/">hunted.space</a>. It's not hard to eyeball products which get more upvotes in the first two hours than they do in the next twenty-two.</p> <p>Suffice to say I didn't get any emails or LinkedIn invites from HN vote peddlers, despite HN sending us more than 10x the traffic.</p> <h3 id="can_product_hunt_be_revived"><a href="#can_product_hunt_be_revived" class="toclink">Can Product Hunt be revived?</a></h3> <p>To be fair, PH tries to mitigate front page manipulation. They "feature" certain launches to curate the front page. The main outcome is that the majority of launches are simply never shown to most users. No non-featured launches appear on the mobile app. The process is <a href="https://help.producthunt.com/en/articles/9883485-product-hunt-featuring-guidelines">documented</a>, but still opaque and inconsistently applied. Almost certainly ties into revenue somehow.</p> <p>A better question is "<em>Should</em> Product Hunt be revived?"</p> <p>This is far from PH's only problem. They've killed Ship and <a href="https://www.producthunt.com/p/general/product-hunt-discontinued-coming-soon-teaser-pages-did-they-work-for-you">other features</a> without replacements.</p> <p>At the crux, I just don't think a "launch" or a "product" is enough to tie together a community to develop a healthy ecosystem. The focus on the <em>new</em> draws a fast flow of products and builders that erodes the core community.</p> <p><a href="https://sedimental.org/uploads/ph_cat_x_eyes.jpg" target="_blank"><img src="https://sedimental.org/uploads/ph_cat_x_eyes.jpg" alt="RIP kitty" align="right" width="30%" /></a></p> <p>Alternatives exist, but if Product Hunt suffers from the above, I suspect these do, too:</p> <ul> <li><a href="https://betalist.com/">Betalist</a></li> <li><a href="https://peerlist.io/">Peerlist</a></li> <li><a href="https://pitchwall.co/">Pitchwall</a></li> <li><a href="https://www.sideprojectors.com/">Side Projectors</a></li> <li><a href="http://uneed.best/">uneed.best</a></li> </ul> <p><em>Edit: Someone even made a <a href="https://launchdirectories.com/">directory of directories</a>. Early reports are not promising!</em></p> <p>Contrast this with <a href="https://www.indiehackers.com/">Indie Hackers</a>, which is united by at least one value / work ethic.</p> <p>Or contrast to one of my personal faves: <a href="https://alternativeto.net/">AlternativeTo</a>, which takes a wiki approach toward the mission of cataloging all software, not just the newest.</p> <h3 id="goodbye_product_hunt"><a href="#goodbye_product_hunt" class="toclink">Goodbye Product Hunt</a></h3> <p>I guess if this ends up being PH's epitaph I should get this out of my system:</p> <p>Google Glass Kitty has always been a terrible mascot.</p> <p>The obvious choice for an iconic hunt has always been the duck:</p> <p><a href="https://sedimental.org/uploads/ph_produckhunt.jpg" target="_blank"><img src="https://sedimental.org/uploads/ph_produckhunt.jpg" alt="Pro-Duck Hunt" width="100%" /></a></p> <p><em>Edit (2025年09月25日): <a href="https://news.ycombinator.com/item?id=45362569">Hacker News seems to agree</a>.</em><p></p> <hr /> https://sedimental.org/announcing_finfam.html Mahmoud Hashemi https://sedimental.org/ Announcing FinFam 2025年09月19日T12:00:00Z 2025年09月19日T12:00:00Z <p><p>In my <a href="https://sedimental.org/what_ive_been_up_to_2025_pt1.html">last post</a>, I mentioned founding a startup.</p> <p>It's called <a href="https://finfam.app">FinFam</a>, and we're building collaborative financial planning. The GitHub of money, if you will. Enough with the telling, time for the show!</p> <p>Here's a 3-minute demo:</p> <div style="position: relative; padding-bottom: 56.25%; height: 0;"><iframe src="https://www.loom.com/embed/8acf7fe800024256937dfb4516604e54?sid=e564f57e-f851-4d01-936d-2273e7caf248" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen="" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div> <p>We launched beta this week and I couldn't be more excited. Let me tell you why.</p> <h3 id="where_s_this_coming_from"><a href="#where_s_this_coming_from" class="toclink">Where's this coming from?</a></h3> <p>Growing up as the son of starving grad students, when it comes to money, I've been known to default to what we affectionately call "poor man brain." Cautious to a fault. But my student parents turned into scientists, so I'm eminently convinceable. I just need to see the math, or better yet, a spreadsheet.</p> <p>The problem? Making those spreadsheets. And more importantly, trusting them.</p> <p>A few years back, a friend was house hunting in the Bay Area. They came to me to talk through the famous rent-vs-buy problem. They didn't want an advisor to manage money or sell them products. They came to me because I worked in fintech and thus "knew money", even though I've only been involved in two home purchases, and wouldn't consider myself an expert.</p> <p>That experience crystallized something I'd been noticing everywhere:</p> <ul> <li>People trust their friends and family to have their best interests in mind...<ul> <li>But can't trust them to have the best information.</li> </ul> </li> <li>We can trust experts to have knowledge...<ul> <li>but can't always trust their incentive alignment.</li> </ul> </li> </ul> <p>In an era of ever-advancing information (and misinformation) <a href="https://en.wikipedia.org/wiki/Information_overload">glut</a>, how do we get to a place of confidence in our hard-won <a href="https://en.wikipedia.org/wiki/Definitions_of_knowledge#Justified_true_belief">justified true belief</a>?</p> <h3 id="trust_is_social"><a href="#trust_is_social" class="toclink">Trust is social</a></h3> <p>After 15+ years building fintech at PayPal and Stripe, I saw money movement simplified and commoditized. But the difficulty of transacting moved upstream. We made the how of buying easier, while advances in technology made the what, when, and why so much harder.</p> <p>From BNPL to crypto to <a href="https://en.wikipedia.org/wiki/Vibecession">vibecession</a>, our economic realities aren't getting simpler. To compensate, 79% of young adults get financial guidance from social media (<a href="https://web.archive.org/web/20240925081010/https://www.forbes.com/advisor/investing/financial-advisor/adults-financial-advice-social-media/">Forbes</a>). Not because TikTok or YouTube has better models than Morgan Stanley, but because they trust the people sharing their stories.</p> <p>Millions of people are already collaborating on financial decisions. Privately on WhatsApp, obscurely on Discord, and full-blown publicly on Reddit:</p> <ul> <li><a href="https://reddit.com/r/personalfinance">/r/personalfinance</a> - 21 million members</li> <li><a href="https://reddit.com/r/financialindependence">/r/financialindependence</a> - 2.3 million</li> <li><a href="https://reddit.com/r/financialplanning">/r/financialplanning</a> - 1 million</li> </ul> <p>And dozens more subreddits and Internet forums (shoutout Bogleheads and <a href="https://www.refinery29.com/en-us/money-diary">Refinery29's Money Diaries</a>). They're <a href="https://finfam.app/blog/2025-07-15-sharing-financial-information-online">sharing detailed financial profiles</a> with strangers on the internet, seeking advice from generous folks in full view.</p> <p>There's a fast-emerging story about <a href="https://finfam.app/blog/tag/ai">AI</a> here, and I've got whole posts dedicated to that coming soon.</p> <p>For now, suffice to say, we need the tools to catch up to the times.</p> <h3 id="enter_finfam"><a href="#enter_finfam" class="toclink">Enter FinFam</a></h3> <p>FinFam[^name] lets families and friends collaborate on financial decisions with each other, using expert information without any commitments to said experts.</p> <p><img width="100%" alt="The FinFam logo." src="https://sedimental.org/uploads/finfam_logo_medium.png" /></p> <p>We want to holistically solve the problem of financial decisionmaking using interaction models proven by GitHub + StackOverflow + app stores. How?</p> <p>First, creators publish interactive models, to <a href="https://finfam.app/explore/views">FinFam's View marketplace</a>. Then, you, a user who has a financial question or decision to make:</p> <ol> <li>Pick up a relevant expert view. If one doesn't exist, ask in <a href="https://finfam.app/qa/">FinFam's Q&amp;A board</a>.</li> <li>Plug in your own numbers and save it to a private workspace for your inner circle</li> <li>Discuss the results, with optional AI-assisted guidance as needed</li> <li>Move forward with confidence.</li> </ol> <p>We scale the expert knowledge while embracing the fundamentals of human social trust. Users get better decisions and peace of mind.</p> <h3 id="open_source_meets_fintech"><a href="#open_source_meets_fintech" class="toclink">Open-source meets fintech</a></h3> <p>My years of work in open-source and wiki ecosystems showed me the power of collaborative, transparent tools. FinFam brings that same philosophy to personal finance.</p> <p>To further scale the knowledge, we make it possible for anyone to create a View. <a href="https://docs.finfam.app/guides/basic_view/">The View "source" format</a> is XLSX, and can be edited with Google Sheets, Excel, or LibreOffice. Any published View can also be open-sourced. Just like with code, financial models are now reviewable, forkable, and improvable by the community.</p> <p>Curious users can check the community's math. Numbers and discussions happen with people you trust. Everything is private by default, shareable by design.</p> <h3 id="what_s_next"><a href="#what_s_next" class="toclink">What's next</a></h3> <p>We launched beta this week and there are now daily spots available as we add capacity. <a href="https://finfam.app">You can sign up for early access here</a>.</p> <p>I've been using it with friends and family for months now, and it has replaced Google Sheets for the financial decisions we face. There's so much more I want to share about the vision, the technology, and the journey so far.</p> <p>But this feels like we're off to a good start.<p></p> <hr /> <p><p><em>Want to follow along? <a href="https://finfam.app/blog">Subscribe to FinFam here</a> and <a href="https://buttondown.com/sedimental">Sedimental here</a>.</em></p> <div class="footnote"> <hr /> <ol> <li id="fn:name"> <p>If you're wondering about the name, just log in, go to your default space, create a thread, and ask Finn. <a class="footnote-backref" href="https://sedimental.org/announcing_finfam.html#fnref:name" title="Jump back to footnote 1 in the text">↩</a></p> </li> </ol> </div><p></p> <hr /> https://sedimental.org/what_i_ve_been_up_to_in_2025.html Mahmoud Hashemi https://sedimental.org/ What I've been up to in 2025 2025年08月25日T00:00:00Z 2025年08月25日T00:00:00Z <p><p>Been quiet around here. Time to change that!</p> <p><img src="https://sedimental.org/uploads/illo/rocketing_sm.png" alt="We're going on a ride." align="right" width="30%" /></p> <p>The short version up front: Since starting a family and leaving Stripe, I've pursued the dream that brought me to Silicon Valley. I've founded a startup.</p> <p>After taking some parental leave, helping found <a href="http://bapya.org/">a Python non-profit</a>, and a nice long visit back home, I was raring for a challenge. So these days, outside of family, I'm all in on something new.</p> <div class="toc"><span class="toctitle">Contents</span><ul><li><a href="#why_now">Why now?</a><li><a href="#applications">Applications</a><li><a href="#monetary_misunderstandings">Monetary misunderstandings</a><li><a href="#showing_vs_telling">Showing vs Telling</a></ul></div><h3 id="why_now"><a href="#why_now" class="toclink">Why now?</a></h3> <p>I've wanted to start my own business since building <a href="https://en.wikipedia.org/wiki/Microsoft_Access">Access</a> apps in high school. But, the reality of leaving my family and moving to study in the USA, combined with the technical and creative fulfillment of the software industry, took me on a scenic route through <a href="https://sedimental.org/about.html">enterprise software</a>, <a href="https://sedimental.org/hatnote_projects.html">free culture</a>, and <a href="https://sedimental.org/open_source_projects.html">open-source</a>.</p> <p>That very same reality has since conspired to convince me to return to my original aspirations. I've lived through some exciting times in software, but nothing like now. This isn't something I imagined I'd be working on 10 years ago, but then again it's not something I thought possible even 3 years ago. What better time to be building and launching my most ambitious project ever?</p> <p>Full details on that are coming soon<sup id="fnref:soon"><a class="footnote-ref" href="https://sedimental.org/what_i_ve_been_up_to_in_2025.html#fn:soon">1</a></sup>. For now, here is a post about why.</p> <h3 id="applications"><a href="#applications" class="toclink">Applications</a></h3> <p>To start my career, I worked on software infrastructure, security, observability, and developer productivity. But after eight years, around 2016, I started longing for something more human.</p> <p>You can see this start to come out in <a href="https://sedimental.org/the_packaging_gradient.html">The Packaging Gradient</a>. At a time where it seemed like everyone around me was talking about <code>pip</code>, <code>pipenv</code>, and <code>PyPI</code>, I couldn't help but remind people that the real end goal of software has always been the application (or even the appliance). This impulse came to a head with <a href="https://sedimental.org/awesome_python_applications.html">APA</a>.</p> <p>Perhaps you, dear reader, have also been "lost in the sauce" of software: When you love computers and it dominates your thoughts, you might also spend most of your time thinking about the software that makes the software possible.</p> <p>Don't get me wrong. Languages, libraries, compilers, devtools, we need every bit of help we can get. But I fell in love with software for its potential to effect change in the world writ large. I started eyeing product. The famous full stack.</p> <p>That meant moving on from big tech, to a big startup, to a seed startup. One pandemic-fueled detour through a startup factory later, here we are. Finally, founding the startup. My own full stack.</p> <h3 id="monetary_misunderstandings"><a href="#monetary_misunderstandings" class="toclink">Monetary misunderstandings</a></h3> <p>My 15+ year software engineering career can be summed up as:</p> <ul> <li>Building fintech software for pay</li> <li>Shipping open-source Python/wiki for free</li> </ul> <p>Professionally enabling commerce while avoiding it in my personal time. I was young and conflicted. Truthfully, I still harbor some reservations, but I have to build what I know. I know about software and money.</p> <blockquote> <p>"Money is the root of all evil."</p> </blockquote> <p>If you look at the state of say, open banking in the USA, or <a href="https://www.web3isgoinggreat.com/">web3isgoinggreat</a>, or just read <a href="https://www.bloomberg.com/account/newsletters/money-stuff">Money Stuff</a>, you probably agree something's off. Money changes people. But so does the lack thereof.</p> <p>I've watched more talented and deserving developers than myself befall a variety of fates. Hollowed out by monetary excess, blinded by greed, burned out by FOSS, literally working Doordash to keep the lights on. Dropping out of software completely. Shunning the world's favorite fungible has bad outcomes for individuals.</p> <p>Bless my friends at <a href="https://support.tidelift.com/hc/en-us">Tidelift</a>, <a href="https://ostif.org/">OSTIF</a>, and other orgs working to sustain the maintainers. Paying maintainers is a worthy battle. We just need to open more fronts to navigate what's in store.</p> <h3 id="showing_vs_telling"><a href="#showing_vs_telling" class="toclink">Showing vs Telling</a></h3> <p>Lately I've been thinking a lot about my favorite David Lynch (RIP) scene. It isn't from one of his films, it's this quote:</p> <iframe width="560" height="315" src="https://www.youtube.com/embed/oQqp8DZ0pSg?si=dhsCqqxPMbEIsbPZ" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe> <blockquote> <p>"The film is the talking."</p> </blockquote> <p>I think it perfectly captures the auteur mindset. Words are extraneous. The consummate creative expresses themselves better in their native medium.</p> <p>Not that I mind words as a medium. After years of blogging and speaking, I've grown confident in my ability to tell.</p> <p>But now it's time for the show.</p> <div class="footnote"> <hr /> <ol> <li id="fn:soon"> <p>For friends who can't wait a couple weeks, shoot me an email for early access. <a class="footnote-backref" href="https://sedimental.org/what_i_ve_been_up_to_in_2025.html#fnref:soon" title="Jump back to footnote 1 in the text">↩</a></p> </li> </ol> </div><p></p> <hr /> https://sedimental.org/cruising_through_data.html Mahmoud Hashemi https://sedimental.org/ Cruising through complex data 2023年01月19日T06:00:00Z 2023年01月19日T06:00:00Z <p><p><em>This post is a showcase of data wrangling techniques in Python, using <a href="https://glom.readthedocs.io/en/latest/">glom</a>. If you haven't heard of glom, it's a data transformation library and CLI designed for Python. Think HTML templating, but for objects, dicts, and other data structures.</em></p> <p><img src="https://sedimental.org/uploads/illo/comet_multi.png" align="right" width="30%" /></p> <p>It's been almost five years since <a href="https://sedimental.org/glom_restructured_data.html">the first release of glom</a>. That version now looks quaint in comparison to the just-released glom 23. Out of all the new functionality, we're going to take a look at six techniques that'll level up your complex data handling.</p> <div class="toc"><span class="toctitle">Contents</span><ul><li><a href="#star_path_selectors">Star path selectors</a><li><a href="#deep_assignment_and_deletion">Deep assignment and deletion</a><li><a href="#the_data_trace">The Data Trace</a><li><a href="#pattern_matching">Pattern matching</a><li><a href="#streaming">Streaming</a><li><a href="#flattening_and_merging">Flattening and Merging</a><li><a href="#other_core_updates">Other core updates</a><ul><li><a href="#scope">Scope</a><li><a href="#modes">Modes</a><li><a href="#extensions">Extensions</a></ul></ul></div><p><em>NB: Throughout the post, you'll note examples linking to a site called <a href="https://yak.party/glompad/">glompad</a>. Like so many regex and JS playgrounds, glompad is glom in the browser. Very much an alpha, I'll save the details for another post. In the meantime, try it out and let me know how it goes!</em></p> <h2 id="star_path_selectors"><a href="#star_path_selectors" class="toclink">Star path selectors</a></h2> <p>Years in the making, glom's newest feature is one of the longest anticipated. Since its first release, glom's deep get has excelled at fetching single values:</p> <pre class="codehilite"><code class="language-python">target = {'a': {'b': {'c': 'd'}}} glom(target, 'a.b.c') # 'd' </code></pre> <p>As of the latest release, glom now does <a href="https://docs.python.org/3/library/glob.html">glob</a>-style <code>*</code> and <code>**</code> as path segments, aka wildcard expansion:</p> <pre class="codehilite"><code class="language-python">glom({'a': [{'k': 'v1'}, {'k': 'v2'}]}, 'a.*.k') # * is single-level # ['v1', 'v2'] glom({'a': [{'k': 'v3'}, {'k': 'v4'}]}, '**.k') # ** is recursive # ['v3', 'v4'] </code></pre> <p>Notably, this is one of the only breaking features in glom's history. Star selectors were added as an option in glom 22, and baked for a year (with warnings for any users with stars in their paths) before becoming the default in glom 23.</p> <h2 id="deep_assignment_and_deletion"><a href="#deep_assignment_and_deletion" class="toclink">Deep assignment and deletion</a></h2> <p>By default, glom makes and returns new data structures. But glom's default immutable approach isn't always a perfect fit for the messy, deeply-nested structures one gets from scraped DOMs, ancient XML, or idiosyncratic API wrappers.</p> <p>So one of glom's earliest additions, way back in 2018, enabled declarative deep assignments that would work across virtually all mutable Python objects. First with <code>Assign()</code> and the <code>assign()</code> <a href="https://glom.readthedocs.io/en/latest/faq.html#what-s-a-convenience-function">convenience function</a> (<a href="https://yak.party/glompad/#spec=%23+Modify+a+dictionary+in-place.%0AAssign%28Path%28%22a%22%2C+%22e%22%29%2C+%22new+value%22%29%0A&amp;target=%7B%22a%22%3A+%7B%22b%22%3A+%7B%22c%22%3A+%22d%22%7D%7D%7D%0A&amp;v=1">example</a>, <a href="https://glom.readthedocs.io/en/latest/mutation.html#assignment">docs</a>):</p> <pre class="codehilite"><code class="language-python">target = {'a': [{'b': 'c'}, {'d': None}]} assign(target, 'a.1.d', 'e') # let's give 'd' a value of 'e' # {'a': [{'b': 'c'}, {'d': 'e'}]} </code></pre> <p><code>Assign</code> also unlocked a super useful pattern of automatically creating nested objects without the need for <code>defaultdict</code> and friends (<a href="https://yak.party/glompad/#spec=%23+Automatically+create+dicts+for+missing+keys.%0AAssign%28Path%28%22user%22%2C+%22contact%22%2C+%22email%22%29%2C+%22foobar%40example.com%22%2C+missing%3Ddict%29%0A&amp;target=%7B%0A++++%22user%22%3A+%7B%0A++++++++%22location%22%3A+%7B%22city%22%3A+%22Berlin%22%2C+%22country%22%3A+%22DE%22%7D%2C%0A++++++++%22username%22%3A+%22foobar%22%2C%0A++++++++%22created%22%3A+1672950417%2C%0A++++%7D%0A%7D%0A&amp;v=1">example</a>):</p> <pre class="codehilite"><code class="language-python">target = {} assign(target, 'a.b.c', 'hi', missing=dict) # {'a': {'b': {'c': 'hi'}}} </code></pre> <p>And for something more destructive, there's <code>Delete()</code> and <code>delete()</code> (<a href="https://yak.party/glompad/#spec=Delete%28%27a.b.1%27%29&amp;target=%7B%27a%27%3A+%7B%27b%27%3A+%5B5%2C+6%2C+7%5D%7D%7D&amp;v=1">example</a>, <a href="https://glom.readthedocs.io/en/latest/mutation.html#deletion">docs</a>):</p> <pre class="codehilite"><code class="language-python">target = {'a': [{'b': 'c'}, {'d': None}]} delete(target, 'a.0.b') # {'a': [{}, {'d': None}]} </code></pre> <p><code>Assign()</code> and <code>Delete()</code> both shine when manipulating ElementTree-style documents from <a href="https://docs.python.org/3/library/xml.etree.elementtree.html">etree</a>, <a href="https://lxml.de/">lxml</a>, <a href="https://github.com/html5lib/html5lib-python">html5lib</a>, and the like.</p> <p>Like glom's other path-based functionality, the nuances of assigning Python <code>dict</code> keys, object attributes, and sequence indices are handled for you. There's also an <a href="https://glom.readthedocs.io/en/latest/api.html#setup-and-registration">extension system</a> for adding support especially unique types.</p> <h2 id="the_data_trace"><a href="#the_data_trace" class="toclink">The Data Trace</a></h2> <p>The main appeal of glom has always been succinct and robust data access and transformation. No single glom feature showcases this quite as much as the <em>data trace</em>.</p> <p>Data traces make glom's errors far more debuggable than Python's default exceptions. You don't see internal glom or Python stack frames; just you, your code, and your data:</p> <pre class="codehilite"><code class="language-python">&gt;&gt;&gt; target = {'planets': [{'name': 'earth', 'moons': 1}]} &gt;&gt;&gt; spec = ('planets', ['rings']) # a spec we expect to fail &gt;&gt;&gt; glom(target, spec) Traceback (most recent call last): File "&lt;stdin&gt;", line 1, in &lt;module&gt; File "/home/mahmoud/projects/glom/glom/core.py", line 1787, in glom raise err glom.core.PathAccessError: error raised while processing, details below. Target-spec trace (most recent last): - Target: {'planets': [{'name': 'earth', 'moons': 1}]} - Spec: ('planets', ['rings']) - Spec: 'planets' - Target: [{'name': 'earth', 'moons': 1}] - Spec: ['rings'] - Target: {'name': 'earth', 'moons': 1} - Spec: 'rings' glom.core.PathAccessError: could not access 'rings', part 0 of Path('rings'), got error: KeyError('rings') </code></pre> <div style="float: right; width: 40%; margin-left: 10px; padding: 5px; border: 1px solid silver;"> <div style="width: 100%"> <a target="_blank" href="https://sedimental.org/uploads/data_trace_before_after.png"><img src="https://sedimental.org/uploads/data_trace_before_after.png" align="right" width="100%" /></a> </div> <div><i>Failures before and after the data trace. Full text <a href="https://gist.github.com/mahmoud/a0923541c2c59c7cb167802c0d09a895">here</a>.</i></div> </div> <p>One day I'll write a post about how tracebacks are an oft-neglected part of a library's interface. The right traceback can turn an all-night debugging session into a quick fix anyone can push. </p> <p>For now, see the doc with examples and more explanation <a href="https://glom.readthedocs.io/en/latest/debugging.html#reading-a-glom-exception">here</a>.</p> <h2 id="pattern_matching"><a href="#pattern_matching" class="toclink">Pattern matching</a></h2> <p>While glom started as a data transformer, you often need to validate data before transforming it. Data validation fits nicely into spec format, and so glom's <a href="https://glom.readthedocs.io/en/latest/matching.html#validation-with-match"><code>Match</code> specifier</a> was born:</p> <pre class="codehilite"><code class="language-python"># load some data target = [{'id': 1, 'email': 'alice@example.com'}, {'id': 2, 'email': 'bob@example.com'}] # let's validate that the data has the types we expect spec = Match([{'id': int, 'email': str}]) result = glom(target, spec) # result here is equal to the data itself </code></pre> <p>Glom's pattern matching now features its own shorthand <a href="https://glom.readthedocs.io/en/latest/matching.html#m-expressions"><code>M</code> spec</a>, which is great for quick guards, and a <code>Regex</code> helper, too:</p> <pre class="codehilite"><code class="language-python"># using the example data above, we can also validate the contents of the data spec = Match([{'id': And(M &gt; 0, int), 'email': Regex('[^@]+@[^@]+')}]) result = glom(target, spec) # result here is again equal to the target data </code></pre> <p>Even a simple pattern matching example shows the power of the glom data trace. Check out the error message when some bad data gets added:</p> <pre class="codehilite"><code class="language-python">&gt;&gt;&gt; target.append({'id': '3', 'email': 'charlie@example.com'}) &gt;&gt;&gt; result = glom(target, spec) Traceback (most recent call last): File "&lt;stdin&gt;", line 1, in &lt;module&gt; File "../glom/core.py", line 2294, in glom raise err glom.matching.TypeMatchError: error raised while processing, details below. Target-spec trace (most recent last): - Target: [{'email': 'alice@example.com', 'id': 1}, {'email': 'bob@example.com', 'id': 2}, {'ema... (len=3) - Spec: Match([{'email': str, 'id': int}]) - Spec: [{'email': str, 'id': int}] - Target: {'email': 'charlie@example.com', 'id': '3'} - Spec: {'email': str, 'id': int} - Target: 'id' - Spec: 'id' - Target: '3' - Spec: int glom.matching.TypeMatchError: expected type int, not str </code></pre> <p>The data trace gets even sweeter when we introduce flow control with Switch. See the data trace in action in <a href="https://yak.party/glompad/#spec=%23+let%27s+classify+vowels+vs+consonants+to+show+off+Switch%27s+error+handling%0AMatch%28Switch%28%5B%28Or%28%27a%27%2C+%27e%27%2C+%27i%27%2C+%27o%27%2C+%27u%27%29%2C+Val%28%27vowel%27%29%29%2C%0A++++++++++++++%28And%28str%2C+M%2C+M%28T%5B2%3A%5D%29+%3D%3D+%27%27%29%2C+Val%28%27consonant%27%29%29%5D%29%29&amp;target=%23+An+integer+will+cause+the+expected+failure%0A3&amp;v=1">this example</a>. Users of shape-based typecheckers like <a href="https://flow.org/">Flow</a> will especially appreciate the specificity of glom's error messages in these validation cases.</p> <!-- In mid-2020, some combination of PEG and Python 2's deprecation lit a fire of innovation, one of the most ambitious of which is now captured in [PEP 634](https://peps.python.org/pep-0634/), and implemented in Python 3.10's [Structural Pattern Matching](https://docs.python.org/3/whatsnew/3.10.html#pep-634-structural-pattern-matching). --> <h2 id="streaming"><a href="#streaming" class="toclink">Streaming</a></h2> <p>For datasets too large to fit in memory, glom grew an <code>Iter()</code> specifier in 2019 (<a href="https://yak.party/glompad/#spec=Iter%28%29.split%28%29.flatten%28%29.unique%28%29.all%28%29&amp;target=%5B1%2C+2%2C+None%2C+None%2C+3%2C+None%2C+3%2C+None%2C+2%2C+4%5D&amp;v=1">example</a>, <a href="https://glom.readthedocs.io/en/latest/streaming.html#streaming-iteration">docs</a>). <code>Iter()</code> offers a readable chaining API that lazily creates nesting generators.</p> <pre class="codehilite"><code class="language-python">target = [1, 2, None, None, 3, None, 3, None, 2, 4] spec = Iter().filter().unique() # this gives a streaming generator when evaluated glom(target, spec.all()) # .all() converts the generator to a list # [1, 2, 3, 4] </code></pre> <p><code>Iter()</code>'s built-in methods also include <a href="https://glom.readthedocs.io/en/latest/streaming.html#glom.Iter.split"><code>.split()</code></a>, <a href="https://glom.readthedocs.io/en/latest/grouping.html#glom.flatten"><code>.flatten()</code></a>, <a href="https://glom.readthedocs.io/en/latest/streaming.html#glom.Iter.chunked"><code>.chunked()</code></a>, <a href="https://glom.readthedocs.io/en/latest/streaming.html#glom.Iter.slice"><code>.slice()</code></a>, <a href="https://glom.readthedocs.io/en/latest/streaming.html#glom.Iter.limit"><code>.limit()</code></a> among others. In short, endless possibilities for endless data.</p> <h2 id="flattening_and_merging"><a href="#flattening_and_merging" class="toclink">Flattening and Merging</a></h2> <p>So much data revolves around iterables that in 2019 glom introduced the ability to "reduce" those iterables to flatter values, with the introduction of <code>Flatten</code> (<a href="https://yak.party/glompad/#spec=Flatten%28%29&amp;target=%5B%7B0%7D%2C+%5B1%2C+2%2C+3%5D%2C+%284%2C+5%29%5D&amp;v=1)">example</a>, <a href="https://glom.readthedocs.io/en/latest/grouping.html#glom.flatten">docs</a>):</p> <pre class="codehilite"><code class="language-python">list_of_iterables = [{0}, [1, 2, 3], (4, 5)] flatten(list_of_iterables) # [0, 1, 2, 3, 4, 5] </code></pre> <p>Even a mix of iterables (iterators, lists, tuples) combines nicely.</p> <p>With <code>Flatten</code> came the numeric <a href="https://glom.readthedocs.io/en/latest/grouping.html#glom.Sum"><code>Sum</code></a>, not unlike the builtin:</p> <pre class="codehilite"><code class="language-python">glom(range(5), Sum()) # 15 </code></pre> <p>And the generic <a href="https://glom.readthedocs.io/en/latest/grouping.html#glom.Fold"><code>Fold</code></a>, useful for some rare cases:</p> <pre class="codehilite"><code class="language-python">target = [set([1, 2]), set([3]), set([2, 4])] result = glom(target, Fold(T, init=frozenset, op=frozenset.union)) # frozenset([1, 2, 3, 4]) </code></pre> <p>A later release brought flattening to mappings, via <code>Merge</code> (<a href="https://yak.party/glompad/#spec=Merge%28%29&amp;target=%5B%7B%27a%27%3A+%27alpha%27%7D%2C+%7B%27b%27%3A+%27B%27%7D%2C+%7B%27a%27%3A+%27A%27%7D%5D&amp;v=1">example</a>, <a href="https://glom.readthedocs.io/en/latest/grouping.html#glom.merge">docs</a>):</p> <pre class="codehilite"><code class="language-python">target = [{'a': 'alpha'}, {'b': 'B'}, {'a': 'A'}] merge(target) # {'a': 'A', 'b': 'B'} </code></pre> <p><code>Merge()</code> is great for deduping documents with a simple last-value-wins strategy.</p> <h2 id="other_core_updates"><a href="#other_core_updates" class="toclink">Other core updates</a></h2> <p>The features above, and myriad others from <a href="https://github.com/mahmoud/glom/blob/master/CHANGELOG.md">the changelog</a>, required multiple evolutions of the glom core. Underneath glom's hood is a loop that interprets the spec against the target. A simple, early version is preserved <a href="https://glom.readthedocs.io/en/latest/faq.html#how-does-glom-work">here in the docs</a>. </p> <p>However, the inner workings of the core were not part of glom's API, which limited extensibility. A lot of progress has been made in opening up glom internals for those use cases we couldn't predict.</p> <h3 id="scope"><a href="#scope" class="toclink">Scope</a></h3> <p>Most transformations only requires a target and spec. Most... but not all.</p> <p>For cases that needed additional state, like aggregation and multi-target glomming, we added the glom <code>Scope</code> (<a href="https://yak.party/glompad/#spec=T.count%28S.search%29&amp;target=%5B%27a%27%2C+%27c%27%2C+%27a%27%2C+%27b%27%5D&amp;scope=%7B%27search%27%3A+%27a%27%7D&amp;v=1">example</a>, <a href="https://glom.readthedocs.io/en/latest/api.html#the-glom-scope">docs</a>):</p> <pre class="codehilite"><code class="language-python"># Make a spec that uses the T singleton to call # the target's count method using the search value in the scope (S) count_spec = T.count(S.search) scope = {'search': 'a'} # additional context we'll pass in glom(['a', 'c', 'a', 'b'], count_spec, scope=scope) # 2 </code></pre> <p>Here, the scope is used to pass in a <code>search</code> parameter which will be used against the target (<a href="https://glom.readthedocs.io/en/latest/api.html#object-oriented-access-and-method-calls-with-t"><code>T</code></a>). Usage can get quite advanced, including specs that write to the scope (<a href="https://yak.party/glompad/#spec=%23+save+val+to+the+scope%2C+then+build+a+new+result+dict%0A%28S%28value%3DT%5B%27data%27%5D%5B%27val%27%5D%29%2C+%7B%27result%27%3A+S%5B%27value%27%5D%7D%29&amp;target=%7B%27data%27%3A+%7B%27val%27%3A+9%7D%7D&amp;v=1">example</a>):</p> <pre class="codehilite"><code class="language-python">target = {'data': {'val': 9}} spec = (S(value=T['data']['val']), {'result': S['value']}) glom(target, spec) # {'result': 9} </code></pre> <p>Here we grab <code>'val'</code>, save it to the scope as <code>'value'</code>, then use it to build our new result.</p> <h3 id="modes"><a href="#modes" class="toclink">Modes</a></h3> <p>As discussed in <a href="https://sedimental.org/cruising_through_data.html#pattern-matching">pattern matching</a> above,<br /> some applications outgrew glom's initial data transformation behavior. To handle these diverging behaviors, glom introduced the concept of <em>modes</em>.</p> <p>Glom specs stay succinct by using Python literals, and modes allow changing the interpretation of those objects. Glom comes with two documented modes, the default <code>Auto()</code> and <code>Match()</code> (<a href="https://yak.party/glompad/#spec=Auto%28%5BMatch%28int%2C+default%3DSKIP%29%5D%29&amp;target=%5B1%2C+%27a%27%2C+2%2C+%27c%27%2C+%27a%27%2C+%27b%27%5D&amp;v=1">example</a>), which can be interleaved as necessary:</p> <pre class="codehilite"><code>spec = Auto([Match(int, default=SKIP)]) target = [1, 'a', 2, 'c', 'a', 'b'] glom(target, spec) # [1, 2] </code></pre> <p>We're working on adding more. You can easily <a href="https://glom.readthedocs.io/en/latest/modes.html">add your own</a>, too.</p> <h3 id="extensions"><a href="#extensions" class="toclink">Extensions</a></h3> <p>We strive to make glom as widely applicable as possible, but data takes too many forms to count. We solve this by making glom extensible in several ways:</p> <ul> <li><a href="https://glom.readthedocs.io/en/latest/api.html#setup-and-registration">Registering</a> new target types and new operations on the target</li> <li><a href="https://glom.readthedocs.io/en/latest/custom_spec_types.html">Creating new Spec types</a></li> <li><a href="https://glom.readthedocs.io/en/latest/modes.html">Adding new modes</a></li> </ul> <p>By understanding glom's scope and <a href="https://glom.readthedocs.io/en/latest/custom_spec_types.html#the-glom-scope">its internals</a>, it becomes clear that most built-in glom functionality is implemented through these public interfaces. So while glom can feel magical at times, now you can extend glom without touching the core, and be a part of the magic, too. ☄️</p> <hr /> <p>Not bad for five years, and we haven't even scratched all the surfaces, yet. Hopefully the next showcase won't be quite so far out. <p></p> <hr /> https://sedimental.org/intentional_creation.html Mahmoud Hashemi https://sedimental.org/ Intentional Creation 2023年01月04日T00:00:00Z 2023年01月04日T00:00:00Z <p><p><em>Reliably tap into your creativity with the 4 Cs: Consume, critique, curate, create.</em></p> <p><em>This is one of my oldest ideas, finally published on <a href="https://github.com/readme/guides/intentional-creation">the GitHub ReadME Project blog</a>, along with <a href="https://github.com/readme/stories/mahmoud-hashemi">a profile</a>, in June 2022. For more like this, follow me on <a href="https://twitter.com/mhashemi">Twitter</a> or <a href="https://qoto.org/@mahmoud">Mastodon</a>. (You can also read it <a href="https://sedimental.org/intentional_creation_zh.html">in 中文 here</a>. Thanks Dominic Huang!)</em></p> <p>We all have creative potential. Whether it gets you up in the morning or keeps you up at night, you've felt its gnaw. </p> <p>Turning that potential into productivity can prove challenging in an internet-connected environment that offers a constant stream of consumables. How do we pick a direction? </p> <p>In this guide, you'll see how to distill the elements of creativity into four deliberate stages, and how to put the process to use: </p> <div class="toc"><span class="toctitle">Contents</span><ul><li><a href="#consume">Consume</a><li><a href="#critique">Critique</a><li><a href="#curate">Curate</a><li><a href="#create">Create</a><ul><li><a href="#debugging_the_process">Debugging the process</a></ul><li><a href="#putting_it_into_practice">Putting it into practice</a></ul></div><p>These 4 Cs comprise a straightforward, adaptable approach that works well in both group and solo settings. You've already started on step 1. Read on to find out what to do next.</p> <p><a target="_blank" href="https://sedimental.org/uploads/cccc_2019.png"><img title="An image of the consumption-to-creation gradient, shaped as a pyramid. CC-BY-SA." width="100%" src="https://sedimental.org/uploads/cccc_gh.png" /></a></p> <h2 id="consume"><a href="#consume" class="toclink">Consume</a></h2> <p><strong>Step 1: Turn passive consumption into active research.</strong> </p> <p>From the moment you open your eyes in the morning, you're accosted with calls to consume. Articles, videos, podcasts, the newest Wordle variant. When consumption is the default mode of our modern computing environment, how is a builder supposed to build?</p> <p>To create, we must first recognize its inverse: consumption. Consumption is a useful stage, but can be dangerous if it’s terminal. An infinite loop in this stage kills any chance of creation. </p> <p>Little is created in a vacuum. Creation still starts with consumption, albeit consumption disarmed with an intention. Turn pure consumption into active research, punctuated with critique.</p> <h2 id="critique"><a href="#critique" class="toclink">Critique</a></h2> <p><strong>Step 2: Capture your reactions in critiques, and research no faster than you can react.</strong></p> <p>As with so many software problems of our day, the answer is simple: React. No, not the JavaScript framework, but the human act of reaction. Intentional creation starts with giving yourself pause on new inputs. Seek a reaction from yourself. A semi-structured reaction, or critique, is a time-honored practice in creative fields, like architecture.</p> <p>Infinite scroll may prove challenging to overcome, but before turning your consideration to the next item in your feed, activate your critical senses. Draw some conclusions. Even unvetted, they're yours. </p> <p>If you're finding it hard to summon a critique, this is a clear sign you're consuming faster than you can reflect. If you're not reflecting, you're not learning. You may need to go deeper on individual items, or just take a break.</p> <p>Your critiques have never been easier to capture, whether typed in markdown, dictated to automatic transcription, or written down in a notepad on your desk. Try opening your editor or critique tool before opening any new resources. Feel free to open one now. </p> <center><big><em>If you're not reflecting, you're not learning.</em></big></center> <h2 id="curate"><a href="#curate" class="toclink">Curate</a></h2> <p><strong>Step 3: Curate critiques into collections that act as reservoirs of creative reference.</strong></p> <p>Critiques are only proto-creative output. Writing anything helps prime the creative pump, but criticism is raw reaction. You want a refined synthesis. Once you've got enough critiques under your belt, curate the positive examples into a collection. </p> <p>From interior designers to lab researchers to club DJs, creators recognize the value of a structured, referenceable collection. Sometimes a situation calls for urgency or direction, and an organized, well-researched collection can offer an existing solution. Sometimes, in the context of a comprehensive collection, the lack of a referenceable solution is itself a signal that it's time to invent.</p> <p>Curated collections become artifacts unto themselves. I've helped create a few, including <a href="https://0ver.org/">0ver.org</a>, <a href="https://seealso.org/">seealso.org</a>, and <a href="https://github.com/mahmoud/awesome-python-applications">the Awesome Python Applications list</a>. There’s more awesome out there beyond <a href="https://github.com/sindresorhus/awesome">Awesome Lists</a>, like <a href="https://explorabl.es/">explorabl.es</a>, the <a href="https://cooperpress.com/publications/">Cooperpress newsletters</a>, or <a href="https://en.wikipedia.org/wiki/Swipe_file">the "swipe file" phenomenon</a> used among designers and content creators. There's respectable work in curation. Still, curation is more important as a stepping stone to our original higher calling. Less is more.</p> <h2 id="create"><a href="#create" class="toclink">Create</a></h2> <p><strong>Step 4: Return to your curations regularly to discover your creative path forward.</strong></p> <p>Collections of a certain size tend to produce interesting findings. Patterns and gaps emerge that inspire creative next steps. As an example, while researching approaches to Python packaging, a pattern emerged that led to one of my most popular concepts/blog posts/talks, <a href="https://sedimental.org/the_packaging_gradient.html">The Packaging Gradient</a>.</p> <p>Whole projects can be born out of connections made with collections. My framework <a href="http://github.com/mahmoud/clastic">Clastic</a>, which was eventually used by teams at PayPal and <a href="https://www.wikilovesmonuments.org">Wiki Loves Monuments</a>, came out of the curated combination of <a href="http://pytest.org/">pytest</a> dependency-injection semantics with <a href="https://werkzeug.palletsprojects.com/">werkzeug</a> primitives.</p> <p>Realistically, the majority of creation happens below the threshold of standalone artifacts. For instance, when adding a feature to an existing system, a parallel approach in a different project serves as a useful guide. I've lost track of the number of times I've swiped techniques from Awesome Python Applications, including ones used to <a href="https://sedimental.org/tech_refresh.html">port my dayjob's 300k SLOC codebase from Python 2 to 3</a>.</p> <p>Most creative outputs have a similar lineage. Only now we have an explicit process.</p> <h3 id="debugging_the_process"><a href="#debugging_the_process" class="toclink">Debugging the process</a></h3> <p>It's easy to see creations we appreciate as towering achievements that sprung fully-formed from their creators' genius. But creation comes in fits and starts. If creation comes slowly, here are a few strategies to consider: </p> <ul> <li>Search for a natural split in an existing collection that's getting too big, and explore what makes it interesting. </li> <li>Revisit an old, contentious critique and re-react. What did you get right/wrong?</li> <li>Pick a particular exemplar and turn it into a case study. One beautiful aspect of FOSS projects is that going deep can mean getting involved. There's nothing like proximity to a problem to inspire creative thinking.</li> </ul> <p>More generally, be wary of one-size-fits-all solutions; while prescriptive techniques such as <a href="https://zettelkasten.de/">the Zettelkasten Method</a> may work for some, creation is idiosyncratic. Embrace your own process.</p> <h2 id="putting_it_into_practice"><a href="#putting_it_into_practice" class="toclink">Putting it into practice</a></h2> <p>When inspiration hits, connections can form so quickly that we take for granted what goes on. When inspiration proves less willing to strike, we can keep ourselves primed for creativity by ensuring all four activities continue in balance.</p> <p>There are a few notable benefits of intentional creation:</p> <ul> <li>When you've built something, the influences are well-documented. It can be easier to involve others when there's a clear creative thread to pull on.</li> <li>Sharing your critiques and curations invites collaboration with other creators and curators. </li> <li>Self-awareness. If you're not finding your critiques crystallizing into new thoughts and ideas for projects, that's a sign you're looking at the wrong stuff. Are you following your interests or passively consuming trending content?</li> </ul> <p>Practically, intentional creation means consciously spending less time on consumer sites, from Twitter to Hacker News, and more time taking notes, tagging bookmarks, and creating your own knowledge base. Attempt activities that are less entertainment and more you, ultimately closing the gap between you and your creative goals.</p> <p>If it sounds too simple, that's because it is. You're still accountable to you, that's the hard part. But hopefully you'll find some value in this simple hierarchy that lets you check in on your own activities and make adjustments toward a more creative end. Spend less time consuming, and more time on the other three Cs. Consume only enough to allow yourself to critique, curate, and create.</p> <p>If you made it this far, then start now. Step 2. Use any tool or service you like, from <a href="https://docs.google.com/spreadsheets/d/1Vvrnp83PDkIGKq6pYJqPGf99xqRn8Cg4MJiTGOxgS0U/edit#gid=1010469400">spreadsheets</a> to <a href="https://github.com/mahmoud/awesome-python-applications/blob/master/projects.yaml">YAML</a>, and answer this: What's your critique? <p></p> <hr /> https://sedimental.org/tech_refresh.html Mahmoud Hashemi https://sedimental.org/ Changing the Tires on a Moving Codebase 2021年03月10日T08:30:00Z 2021年03月10日T08:30:00Z <p><p><em>2020 was a year of reckonings. And for all that was beyond one’s control, as the year went on, I found myself pouring more and more into the one thing that felt within reach: futureproofing of the large enterprise web application I helped build, <a href="https://simplelegal.com">SimpleLegal</a>.</em></p> <p><em>Now complete, this replatforming easily ranks in my most complex projects, and right now, holds the top spot for the happiest ending. That happiness comes at a cost, but with some the right approach that cost may not be as high as you think.</em></p> <div class="toc"><span class="toctitle">Contents</span><ul><li><a href="#the_bottom_line">The Bottom Line</a><li><a href="#the_setup">The Setup</a><li><a href="#the_outset">The Outset</a><li><a href="#the_traction_issues">The Traction Issues</a><li><a href="#the_sentry_pivot">The Sentry Pivot</a><li><a href="#the_new_road">The New Road</a><ul><li><a href="#committing_to_transactions">Committing to transactions</a><ul><li><a href="#the_truly_atomic_request">The truly atomic request</a><li><a href="#transactional_test_setup">Transactional test setup</a></ul><li><a href="#better_than_best_practices">Better than best practices</a><ul><li><a href="#the_utility_of_namespaces">The utility of namespaces</a><li><a href="#coverage_tools">Coverage tools</a><li><a href="#flattening_database_migrations">Flattening database migrations</a></ul><li><a href="#easing_onto_the_stack">Easing onto the stack</a></ul><li><a href="#the_rollout">The Rollout</a><li><a href="#the_aftermath">The Aftermath</a></ul></div><h2 id="the_bottom_line"><a href="#the_bottom_line" class="toclink">The Bottom Line</a></h2> <p>We took <a href="https://simplelegal.com">SimpleLegal</a>’s primary product, a 300,000 line Django-1.11-Python 2.7-Redis-Postgres-10 codebase, to a Django 2.2-Python 3.8-Postgres-12 stack, on-schedule and without major site incidents. And it feels amazing.</p> <p>Speaking as tech lead on the project, what did it look like? For me, something like this:</p> <p><img width="100%" src="https://sedimental.org/uploads/tr_img/2020_commit_graph.png" title="You can see the last vestiges of normal all the way on the left." /></p> <p>But as Director of Engineering, what did it cost? <strong>3.5 dev years</strong> and just about <strong>2ドル per line of code</strong>.</p> <p>And I'm especially proud of that result, because along the way, we also substantially improved the speed and reliability of both the site and development process itself. The product now has a bright future ahead, ready to shine in sales RFPs and compliance questionnaires. Most importantly, there’ll be no worrying about when to delicately break it to a candidate that they’ll be working with unsupported technology.</p> <p>In short, a large, solid investment that’s already paying for itself. If you just came here for the estimate we wish we had, you've got it. This post is all about how your team can achieve the same result, if not better.</p> <h2 id="the_setup"><a href="#the_setup" class="toclink">The Setup</a></h2> <p>The story begins in 2013, when a freshly YC-incubated SimpleLegal made all the right decisions for a new SaaS LegalTech company: Python, Django, Postgres, Redis. In classic startup fashion, features came first, unless technology was a blocker. Packages were only upgraded incidentally.</p> <p><img width="40%" src="https://sedimental.org/uploads/tr_img/tech_debt.png" align="right" title="Masks are effective, except when it comes to dusty code." /></p> <p>By 2019, the end of this technical runway had drawn near. While Python 2 may be getting extended support from various vendors, there were precious few volunteers in sight to do Django 1 CVE patches in 2021. A web framework’s a riskier attack surface, so we finally had our compliance forcing function, and it was time to pay off our tech debt.</p> <h2 id="the_outset"><a href="#the_outset" class="toclink">The Outset</a></h2> <p>So began our Tech Refresh replatforming initiative, in Q4 2019. The goal: Upgrade the stack while still shipping features, like changing the tires of a moving car. We wanted to do it carefully, and that would take time. Here are some helpful ground rules for long-running projects:</p> <ol> <li>Any project that gets worked on 10+ hours per week deserves a 30-minute weekly sync.</li> <li>Every recurring meeting deserves a log. Put it in the invite. Use that Project Log to record progress, blockers, and decisions.</li> <li>It’s a marathon, not a sprint. Avoid relying on working nights, weekends, and holidays.</li> </ol> <p>We started with a sketch of a plan that, generously interpreted, ended up being about halfway correct. Some early guesses that turned into successes:</p> <ol> <li>Move to <a href="https://github.com/jazzband/pip-tools">pip-tools</a> and unpin dependencies based on extensive changelog analysis. Identify packages without py23 compatible versions. (Though we’ve since moved to <a href="https://github.com/python-poetry/poetry">poetry</a>.)</li> <li>Add line coverage reporting to CI</li> <li>Revamp internal testing framework to allow devs to quickly write tests</li> </ol> <p>More on these below. Other plans weren’t so realistic:</p> <ol> <li>Take our CI from ~60% to 95% line coverage in 6 months</li> <li>Parallelized conversion of app packages over the course of 3 months</li> <li>Use low traffic times around USA holidays (Thanksgiving, Christmas, New Years) to gradually roll onto the new app before 2021.</li> </ol> <p>We were young! As naïve as we were, at least we knew it would be a lot of work. To help shoulder the burden, we scouted, hired, and trained three dedicated off-shore developers.</p> <h2 id="the_traction_issues"><a href="#the_traction_issues" class="toclink">The Traction Issues</a></h2> <p>Even with added developers, by mid-2020 it was becoming obvious we were dreaming about 95% coverage, let alone 100%. Total coverage may be best practice, but 3.5 developers couldn’t cover enough ground. We were getting valuable tests, and even finding old bugs, but if we stuck with the letter of the plan, Django 2 would end up being a 2022 project. At 70%, we decided it was time to pivot.</p> <p>We realized that CI is more sensitive than most users for most of the site. So we focused in on testing the highest impact code. What’s high-impact? 1) the code that fails most visibly and 2) the code that’s hardest to retry. You can build an inventory of high-impact code in under a week by looking at traffic stats, batch job schedules, and asking your support staff.</p> <p>Around 80% of the codebase falls outside that high-traffic/high-impact list. What to do about that 80%? Lean in on error detection and fast time-to-fix.</p> <h2 id="the_sentry_pivot"><a href="#the_sentry_pivot" class="toclink">The Sentry Pivot</a></h2> <p><img width="25%" src="https://sedimental.org/uploads/tr_img/magnifyingglass.png" align="right" title="You haven't seen your code until you've seen it in a stack trace." /></p> <p>One nice thing about startup life is that it’s easy to try new tools. One practice we’ve embraced at SimpleLegal is to reserve every 5th week for developers to work on the development process itself, like a coordinated 20% time. Even the best chef can’t cook five-star food in a messy kitchen. This was our way of cleaning up the shop and ultimately speeding up the ship.</p> <p>During one such period, someone had the genius idea to add dedicated error reporting to the system, using <a href="https://sentry.io/">Sentry</a>. Within a day or two, we had a site you could visit and get stack traces. It was pretty magical, and it wasn’t until Tech Refresh that we realized that while integration takes one dev-day, full adoption can take a team months.</p> <p>You see, adding Sentry to a mature-but-fast-moving system means one thing: noise. Our live site was erroring all the time. Most errors weren’t visible or didn’t block users, who in some cases had quietly learned to work around longstanding site quirks. Pretty quickly, our developers learned to treat Sentry as a repository of debugging information. A Sentry event on its own wasn’t something to be taken seriously in 2019. That changed in 2020, with the team responsible for delivering a seamless replatform needing Sentry to be something else: a responsive site quality tool.</p> <p>How did we get there? First step, enhance the data flowing into Sentry by following <a href="https://docs.sentry.io/product/sentry-basics/guides/getting-started/#-how-many-projects-should-i-create">these best practices</a>:</p> <ol> <li>Split up your products into <a href="https://docs.sentry.io/product/sentry-basics/guides/getting-started/#-how-many-projects-should-i-create">separate Sentry projects</a>. This includes your frontend and backend.</li> <li>Tag your releases. Don’t tag dev env deployments with the branch, it clutters up the Releases UI. Add a separate branch tag for searches.</li> <li>Split up your environments. This is critical for directing alerts. Our Sentry client environment is configured by domain conventions and Django’s <a href="https://docs.djangoproject.com/en/3.1/ref/contrib/sites/">sites framework</a>. If it helps, here's a baseline, we use these environments:<ul> <li>Production: Current official release. DevOps monitored.</li> <li>Sandbox: Current official release (some companies do next release). Used by customers to test changes. DevOps monitored.</li> <li>Demo/Sales: Previous official release. Mostly internal traffic, but external visibility at prospect demo time. DevOps monitored.</li> <li>Canary: Next official release. Otherwise known as staging. Internal traffic. Dev monitored.</li> <li>ProdQA: Current official release. Used internally to reproduce support issues. Dev monitored.</li> <li>QA: Dev branches, dev release, internal traffic. Unmonitored debugging data.</li> <li>Local test/CI: Not published to Sentry by default.</li> </ul> </li> </ol> <p>With issues finally properly tagged and searchable, we used Sentry’s new <a href="https://docs.sentry.io/product/discover-queries/">Discover tool</a> to export issues weekly, and prioritize legacy errors. To start, we focused on high-visibility production errors with non-internal human users. Our specific query: <code>has:user !transaction:/api/* event.type:error !user.username:*@simplelegal.*</code></p> <p>We triaged into 4 categories: Quick fix (minor bug), Quick error (turn an opaque 500 error into a actionable 400 of some form), <a href="http://agiledictionary.com/209/spike/">Spike</a> (larger bug, requires research), and Silence (using Sentry’s ignore feature). Over 6 weeks we went from over 2500 weekly events down to less than 500.</p> <p>Further efforts have gotten us under 100 events per week, spread across a handful of issues, which is more than manageable for even a lean team. While "Sentry Zero" remains the ideal, we achieved and maintained the real goal of a responsive flow, in large part thanks to <a href="https://sentry.io/integrations/slack/">the Slack integration</a>. Our team no longer hears about server errors from our Support team. In fact, these days, we let them know when a client is having trouble and we’ve got a ticket underway.</p> <p>And it really is important to develop close ties with your support team. Embedded in our strategy above was that CI is much more sensitive than a real user. While perfection is tempting, it’s not unrealistic to ask a bit of patience from an enterprise user, provided your support team is prepared. Sync with them weekly so surprise is minimized. If they’re feeling ambitious, you can teach them some Sentry basics, too.</p> <h2 id="the_new_road"><a href="#the_new_road" class="toclink">The New Road</a></h2> <p><img width="28%" src="https://sedimental.org/uploads/tr_img/tachometer.png" align="right" title="Pedal to the metal." /></p> <p>With noise virtually eliminated, we were ready to move fast. While the lean-in on fast-fixing Sentry issues was necessary, a strong reactive game is only useful if there are proactive changes being pushed. Here are some highlights we learned when making those changes:</p> <h3 id="committing_to_transactions"><a href="#committing_to_transactions" class="toclink">Committing to transactions</a></h3> <p>Used properly, rollbacks can make it like errors never happened, the perfect complement to a fast-fix strategy.</p> <h4 id="the_truly_atomic_request"><a href="#the_truly_atomic_request" class="toclink">The truly atomic request</a></h4> <p>Get as much as possible into the transactions. Turn on <a href="https://docs.djangoproject.com/en/3.1/topics/db/transactions/#tying-transactions-to-http-requests">ATOMIC_REQUESTS</a>, if you haven’t already. Some requests do more than change the database, though, like sending notifications and enqueuing background tasks.</p> <p>At SimpleLegal, we rearchitected to defer all side effects (except logging) until a successful response was being returned. Middleware can help, but mainly we achieved this by getting rid of our Redis queue, and switching to a PostgreSQL-backed task queue/broker. This arrangement ensures that if an error occurs, the transaction is rolled back, no tasks are enqueued, and the user gets a clean failure. We spot the breakage in Sentry, toggle over to the old site to unblock, and their next retry succeeds.</p> <h4 id="transactional_test_setup"><a href="#transactional_test_setup" class="toclink">Transactional test setup</a></h4> <p>Transactionality also proved key to our testing strategy. SimpleLegal had long outgrown Django’s primitive fixture system. Most tests required complex Python to set up, making tests slow to write and slow to run. To speed up both writing and running, we wrapped the whole test session in a transaction, then, before any test cases run, we set up exemplary base states. Test cases used these base states as <a href="https://docs.pytest.org/en/stable/fixture.html">fixtures</a>, and rolled back to the base state after every test case. See <a href="https://gist.github.com/mahmoud/10f6b6b0a9c5860030693357124131df">this conftest.py excerpt</a> for details.</p> <h3 id="better_than_best_practices"><a href="#better_than_best_practices" class="toclink">Better than best practices</a></h3> <p>Software scenarios vary so widely, there’s an art to knowing which advice isn’t for you. Here’s an assortment of cul de sacs we learned about firsthand.</p> <h4 id="the_utility_of_namespaces"><a href="#the_utility_of_namespaces" class="toclink">The utility of namespaces</a></h4> <p>Given how code is divided into modules, packages, Django apps, etc., it may be tempting to treat those as units of work. Don’t start there. Code divisions can be pretty arbitrary, and it’s hard to know when you’ve pulled on a risky thread.</p> <p>Assuming there are automated refactorings, as in <a href="https://portingguide.readthedocs.io/en/latest/">a 2to3 conversion</a>, start by porting by type of transformation. That way, one need only review a command and a list of paths affected. Plus, automated fixes necessarily follow a pattern, meaning more people can fix bugs arising from the refactor.</p> <h4 id="coverage_tools"><a href="#coverage_tools" class="toclink">Coverage tools</a></h4> <p><img width="25%" src="https://sedimental.org/uploads/tr_img/scantron.png" align="right" title="Grade those tests carefully." /></p> <p>Coverage was a mixed bag for us. Obviously our coverage-first strategy wasn’t tenable, but it was still useful for prioritization and status checks. On a per-change basis, we found coverage tools to be somewhat unreliable. We never got to the bottom of why coverage acted nondeterministically, and we left the conclusion at, "off-the-shelf tools like codecov are probably not targeted at monorepos of our scale."</p> <p>In running into coverage walls, we ended up exploring many other interpretations of coverage. For us, much higher-priority than line coverage were "route coverage" (i.e., every URL has at least one integration test) and "model repr coverage" (i.e., every model object had a useful text representation, useful for debugging in Sentry). With more time, we would have liked to build tools around those, and even around online-profiling based coverage statistics, to prioritize the highest traffic lines, not just the highest traffic routes. If you’ve heard of approaches to these ends, we’d love to discuss them with you.</p> <h4 id="flattening_database_migrations"><a href="#flattening_database_migrations" class="toclink">Flattening database migrations</a></h4> <p>On the surface, reducing the number of files we needed to upgrade seems logical. Turns out, flattening <a href="https://docs.djangoproject.com/en/3.1/topics/migrations/">migrations</a> is a low-payoff strategy to get rid of files. Changing historical migration file structure complicated our rollout, while upgrading migrations we didn’t flatten was straightforward. Not to mention, if you just wanted the CI speedup, you can take the same page from <a href="https://openedx.atlassian.net/wiki/spaces/AC/pages/23003228/Everything+About+Database+Migrations#EverythingAboutDatabaseMigrations-SquashingMigrations">the Open EdX Platform</a> that we did: <a href="https://github.com/edx/edx-platform/blob/66f0f9891f00994f77604a51dbb29736aa605fa8/scripts/reset-test-db.sh#L75">build a base DB cache that you check in every couple months</a>.</p> <p>Turns out, <a href="https://sedimental.org/awesome_python_applications.html#goal-1-a-better-development-cycle">you can learn a lot from open-source applications</a>.</p> <h3 id="easing_onto_the_stack"><a href="#easing_onto_the_stack" class="toclink">Easing onto the stack</a></h3> <p>If you have more than one application, use the smaller, simpler application to pilot changes. We were lucky enough to have a separate app whose tests ran faster, making for a tighter development loop we coul learn from. Likewise, if you have more than one production environment, start rollouts with the one with the least impact.</p> <p>Clone your CI jobs for the new stack, too. They’ll all fail, but resist the urge to mark them as optional. Instead, build a single-file inventory of all tests and their current testing state. We built a small extension for our test runner, <a href="https://docs.pytest.org/en/stable/">pytest</a>, which bulk skipped tests based on a status inventory file. Then, ratchet: unskip and fix a test, update the file, check that tests pass, and repeat. Much more convenient and scannable than <a href="https://docs.pytest.org/en/latest/skipping.html#skipping-test-functions">pytest mark</a> decorators spread throughout the codebase. See <a href="https://gist.github.com/mahmoud/10f6b6b0a9c5860030693357124131df">this conftest.py excerpt</a> for details.</p> <h2 id="the_rollout"><a href="#the_rollout" class="toclink">The Rollout</a></h2> <p>In Q4 2020, we doubled up on infrastructure to run the old and new sites in parallel, backed by the same database. We got into a loop of enabling traffic to the new stack, building a queue of Sentry issues to fix, and switching it back off, while tracking the time. After around 120 hours of new stack, strategically spread around the clock and week, enough organizational confidence had been built that we could leave the site on during our most critical hours: Mondays and Tuesdays at the beginning of the month.</p> <p>The sole hiccup was <a href="https://www.zdnet.com/article/aws-outage-impacts-thousands-of-online-services/">an AWS outage</a> Thanksgiving week. At this point we were ahead of schedule, and enough confidence had been built in our fast-fix workflow that we didn’t need our original holiday testing windows. And for that, many thanks were given.</p> <p>We kept at the fast-fix crank until we were done. Done isn't when the new system has no errors, it's when traffic on the new system has fewer events than the old system. Then, fix forward, and start scheduling time to delete the scaffolding.</p> <p><img width="60%" src="https://sedimental.org/uploads/tr_img/baton.png" title="Finally, that tired old stack can rest." /></p> <h2 id="the_aftermath"><a href="#the_aftermath" class="toclink">The Aftermath</a></h2> <p>So, once you’re on current LTS versions of Django, Python, Linux, and Postgres, job complete, right?</p> <p>Thankfully, tech debt never quite hits 0. While updating and replacing core technologies on a schedule is no small feat, replacing a rusty part with a shiny one doesn’t change a design. Architectural tech debt -- mistakes in abstractions, including the lack thereof -- can present an even greater challenge. Solutions to those problems don’t generalize between projects as cleanly, but they do benefit from up-to-date and error-free foundations.</p> <p>For all the projects looking to add tread to their technical tires, we hope this retrospective helps you confidently and pragmatically retrofit your stack for years to come.</p> <p>Finally, big thanks to <a href="https://uvik.net/">Uvik</a> for the talent connection, and the talent: <a href="https://github.com/ypankovych">Yaroslav</a>, <a href="https://github.com/skhortiuk">Serhii</a>, and Oleh. Shoutouts to <a href="https://github.com/kurtbrose/">Kurt</a>, <a href="https://github.com/justinvanwinkle">Justin</a>, and Chris, my fellow leads. And the cheers to business leadership at SimpleLegal and everywhere, for seeing the value in maintainability.<p></p> <hr /> https://sedimental.org/thanks_201x.html Mahmoud Hashemi https://sedimental.org/ Thanks, 201X! 2019年12月02日T10:00:00Z 2019年12月02日T10:00:00Z <p><p><em>Thought I'd take a Sunday afternoon to reflect on, oh I don't know, a decade.</em></p> <p><img align="right" width="40%" src="https://sedimental.org/uploads/illo/legatree_med.png" /> Been a long ten years, but it's flown past. This particular decade happens to coincide with my first years of full-time professional software engineering.</p> <h2 id="the_quantity"><a href="#the_quantity" class="toclink">The Quantity</a></h2> <p>I can't possibly summarize it all, and if I tried, it'd still be colored by what's on my mind right now. But I can point to the artifacts I tried to leave along the way:</p> <ul> <li><a href="https://twitter.com/mhashemi">Twitter</a> FWIW<sup id="fnref:1"><a class="footnote-ref" href="https://sedimental.org/thanks_201x.html#fn:1">1</a></sup> (2008+)</li> <li><strong>~20</strong> <a href="https://sedimental.org/open_source_projects.html">Open-Source Projects</a> (2012+)</li> <li><strong>~15</strong> <a href="https://sedimental.org/hatnote_projects.html">Hatnote Projects</a> (2013+, <a href="https://twitter.com/hatnotable">follow us</a>)</li> <li><strong>~25</strong> entries on this blog (2015+)</li> <li><strong>+7</strong> <a href="https://medium.com/paypal-tech/search?q=python">here</a> (2014-2016)</li> <li>Not including <a href="https://www.pythondoeswhat.com/">pythondoeswhat.com</a> or <a href="https://blog.hatnote.com/">blog.hatnote.com</a></li> <li>(or other posts on the blogs only real heads know)</li> <li><strong>~10</strong> <a href="https://sedimental.org/talks.html">Talks</a> (2016+)</li> <li>Lest I forget: <a href="http://shop.oreilly.com/product/0636920047346.do?code=authd">O'Reilly's Enterprise Software with Python</a> (2016)</li> <li>And <a href="https://sedimental.org/appearances.html">several podcast/media appearances</a></li> <li><a href="https://calver.org/">calver.org</a> (2016) and <a href="https://0ver.org/">0ver.org</a> (2018) (Versioning is a fun pastime)</li> <li><a href="https://pyninsula.org/">Pyninsula</a> (2017+) - <a href="https://www.youtube.com/c/Pyninsula">YouTube</a>, <a href="https://www.meetup.com/Pyninsula-Python-Peninsula-Meetup/">Meetup</a>, <a href="https://mail.python.org/mailman3/lists/pyninsula-announce.python.org/">Email Announce</a></li> </ul> <p>Taking a chronological look at each of the above, I'm relieved to see obvious growth.</p> <p>If I were to highlight one resource, it would probably be the <a href="https://sedimental.org/talks.html">talks</a>. Despite the stress of preparation and delivery, I'm least concerned with having a massive miscommunication when we're all in the room and I can see the points hitting home. It's impossible to pick a favorite, but <a href="https://sedimental.org/talks.html#ask-the-ecosystem-lessons-from-350-foss-python-applications">Ask the Ecosystem</a> (2019), the <a href="https://sedimental.org/talks.html#restructuring-data-in-python">Restructuring Data</a> lightning talk (2018), and <a href="https://www.youtube.com/watch?v=iLVNWfPWAC8">The Packaging Gradient</a> (2017) seem like audience faves from where I'm sitting.</p> <h2 id="the_quality"><a href="#the_quality" class="toclink">The Quality</a></h2> <p>Each project, post, and talk had its own reward, but I guess I've got more than just those to show for the decade.</p> <p>On the more profit-driven side, I built tools and teams at PayPal, but once I could manage the risk, I got to dip into startups for the last few years. Lucky for me, it wasn't a total bust, and the wife and I bought a place in <a href="https://en.wikipedia.org/wiki/Japantown,_San_Jose">my favorite neighborhood (in the USA)</a>. Not a millionaire, but I'm hoping and working for a world where no one has to be.</p> <p>More recently, the Python Software Foundation <a href="http://pyfound.blogspot.com/2019/11/python-software-foundation-fellow.html">made me a Fellow</a>. This isn't something I can be nonchalant about, and I'm not going to understate how much this means, to me, working in a field like software, where concrete symbols of progress are alternatingly elusive and vanishing. Plus it's Python, and reciprocated love is nice. I have hundreds of people to thank for helping me reach this point, and I have to thank the PSF for dedicating the time to ramping up these awards. They've convinced me more than ever that we need more institutions to build this sort of advancement.</p> <p>To all of you, thank you.</p> <h2 id="the_struggle"><a href="#the_struggle" class="toclink">The Struggle</a></h2> <p>I like to think I managed to do all of the above while staying away from industry hype, on the principle that massive speculative capital influx isn't where real value is added to society, and doesn't generate the kind of innovation that excites me.</p> <p>I may have been naïve, but I came to Silicon Valley with an idea about the transformative power of software. Changing times may illustrate a grittier interpretation than the one I had and have, but I continue to hold dear software's potential for positive impact. If you've felt that vision waver, let me tell you, you're not alone.</p> <p>In the past decade, I've seen too many engineers sucked in by new technologies and ventures, only to find themselves alienated from their work. Episodes ranging from an afternoon lost to debugging Docker/k8s clusters, to years of work disappearing at the end of a VC runway. Nothing has been harder to watch than those bedraggled-but-persistent idealists regroup, each time a bit more cynical than the last.</p> <p>Even if its seeming intractibility has taken it from the center stage, the burnout conversation continues to smolder, because there's no issue realer. I know; I released more ceramics than software <a href="https://www.flickr.com/photos/mahmoudhashemi/albums/72157648555341327">back in 2014</a>.</p> <p>Some problems can be solved by <a href="https://opensource.com/article/18/9/its-time-pay-maintainers">paying the maintainers</a>, but I think the vastly bigger issue is around losing the human connection between the real effort software takes and the real benefits it brings, combined with FOSS's dearth of collaborators in supporting roles (QA, product/project/release management).</p> <p>That's why I'm incredibly thankful for the Wikimedia community for always being there, patient with schedules and issues, as long as the software got the job done. It can be a challenge to juggle projects, but I tell every budding engineer: find that direct connection to people who will appreciate your work, and avoid cynicism at all costs.</p> <p><img width="34%" align="right" src="https://sedimental.org/uploads/illo/green_field_med.png" /></p> <p>There are some interesting prospects in the works, but I'm keeping this post retro. Besides, if 2029 rolls around and all I did was break even with 2009-19, I don't see how I can be disappointed.</p> <p>Thanks again for everything in 201X, and for sticking with me in 202X.</p> <div class="footnote"> <hr /> <ol> <li id="fn:1"> <p>Despite using <a href="https://twitter.com/mhashemi">Twitter</a> for over a decade, the process of tweeting feels so perfunctory, and the service itself so tenuous, that I still can't bring myself to invest the time. I mostly use it to crosspost my blog posts or help friends promote their posts/projects.</p> <p>But until I start an email newsletter, or really get on top of <a href="https://yak.party">yak.party</a>, it's still the best I got for announcing where I'm speaking next. <a class="footnote-backref" href="https://sedimental.org/thanks_201x.html#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p> </li> </ol> </div><p></p> <hr /> https://sedimental.org/awesome_python_applications.html Mahmoud Hashemi https://sedimental.org/ Awesome Python Applications 2018年12月20日T11:20:00Z 2018年12月20日T11:20:00Z <p><p><em>What we can learn from 180+ case studies on successfully shipping Python software.</em></p> <p>If you're reading this (or hearing <a href="https://testandcode.com/55">this</a>), you read and write code, probably Python. And for all the code you've shipped, you've probably had your share of missed requirements. Somehere in the excitement of software abstraction, we can lose sight of what really matters, what makes our well-factored modules and packages and frameworks turn into real-world applications.</p> <p>That's why I'm announcing <a href="https://github.com/mahmoud/awesome-python-applications"><strong><em>Awesome Python Applications</em></strong></a>, a hand-curated list of 180+ projects, all of which are: <img align="right" width="40%" src="https://sedimental.org/uploads/illo/snake_cd.png" /></p> <ol> <li>Free software with an online source repository.</li> <li>Using Python for a considerable part of their functionality.</li> <li>Well-known, or at least prominently used in an identifiable niche.</li> <li>Maintained or otherwise demonstrably still functional on relevant platforms.</li> <li>Shipped applications, not libraries or frameworks.</li> </ol> <p>The result is a list of predominantly focused on software that installs without <code>pip</code> or PyPI, and whose audience is mostly <em>not</em> developers. There's still plenty of that in there, too, with other exceptions, but the breadth of the list <a href="https://github.com/mahmoud/awesome-python-applications#awesome-python-applications">speaks for itself</a>.</p> <p>So why spend weeks cataloguing open-source Python applications?</p> <p>Aside from holiday cheer, three big reasons.</p> <div class="toc"><span class="toctitle">Contents</span><ul><li><a href="#goal_1_a_better_development_cycle">Goal #1: A Better Development Cycle</a><li><a href="#goal_2_a_complete_python_production_loop">Goal #2: A Complete Python Production Loop</a><li><a href="#goal_3_grounding_for_the_python_ecosystem">Goal #3: Grounding for the Python Ecosystem</a><li><a href="#next_steps">Next steps</a></ul></div><h2 id="goal_1_a_better_development_cycle"><a href="#goal_1_a_better_development_cycle" class="toclink">Goal #1: A Better Development Cycle</a></h2> <p>Ever since I started <a href="https://www.youtube.com/watch?v=iLVNWfPWAC8">talking</a> <a href="https://www.youtube.com/watch?v=tfI2hdK6vVY">about</a> <a href="https://sedimental.org/the_packaging_gradient.html">Python packaging</a>, people have been asking me questions about which packaging technique is best for their software. I was struck, over and over again, how far people can get in developing an application before reaching the fundamental question of delivery. Exploring this, I landed on a more basic question:</p> <blockquote> <p>Why are so many people building applications from first principles (blog posts and Stack Overflow)?</p> </blockquote> <p>Isn't Python one of the biggest names in the software world? Aren't there dozens of successful, real-world applications written in Python? What are the chances your application is totally unique?</p> <p><em>Awesome Python Applications</em> attempts to open up a new flow for answering the toughest development questions.</p> <p>When building an application, scan the list to find projects which most closely match your project's requirements. Then, use that application as a guide for answering your own questions. This works especially well for abstract questions surrounding architecture, deployment, and testing.</p> <p>Back in school, I learned more about architecture and software development from <a href="https://github.com/wikimedia/mediawiki">the MediaWiki source code</a> than I did from any class. It continues to inspire me <a href="https://sedimental.org/hatnote_projects.html">to this day</a>. APA is the next step in enabling the holistic education of a working application with real users.</p> <p>In short, while we may lack the time <a href="http://aosabook.org/en/index.html">to write them</a>, each production application is worth a thousand blog posts.</p> <h2 id="goal_2_a_complete_python_production_loop"><a href="#goal_2_a_complete_python_production_loop" class="toclink">Goal #2: A Complete Python Production Loop</a></h2> <p>We Python programmers are also software <em>users</em>. But unlike other software users, we know how to file issues and may even make significant contributions back to our applications of choice.</p> <p><img align="right" width="30%" src="https://sedimental.org/uploads/illo/network_sm.png" /></p> <p>By choosing Python software when possible, we take one step closer to pitching in. What better way for a future application developer to get started?</p> <p>I would love to see more developers connect with software they didn't realize was Python. My (minor) contributions to the <a href="https://github.com/twisted/twisted">Twisted</a> were greatly energized by the knowledge that one of my favorite applications, <a href="https://github.com/deluge-torrent/deluge">Deluge</a>, heavily used the library. Using free software leads to creating more free software.</p> <h2 id="goal_3_grounding_for_the_python_ecosystem"><a href="#goal_3_grounding_for_the_python_ecosystem" class="toclink">Goal #3: Grounding for the Python Ecosystem</a></h2> <p>With the pace and cerebrality of technology, it can be easy to get ahead of ourselves and our end users. Infrastructure devs get disconnected from application devs, and that makes for worse software over time. This problem is compounded when applications get less developer attention. Most APA entries have three- and even two-digit starcounts, unless users are highly technical. Few major Python applications are distributed with PyPI, so <a href="https://pypistats.org">download statistics</a> can't help us either. Even if they did, lower-level libraries have way more fanout. And of course free software projects can't lay down big donations or conference sponsorships, so representation tends to be pretty sparse all around.</p> <p>These applications represent the best of the free and living portion of Python. Not only are they a source of utility and pride, but they need our support, in spirit and in practice. It is my sincere hope that the APA will help to anchor the Python community in its real-world applications.</p> <p>What does this mean, concretely? A keen eye will notice <a href="https://github.com/mahmoud/awesome-python-applications/blob/master/projects.yaml">how the list is structured</a>. This isn't just for consistent rendering, but an attempt at an API for the dataset. We must explore our ecosystem with the relzationship between libraries and applications in mind.</p> <p>I know I'm going out on a limb here, and metrics aren't everything, but it would be very interesting to see the Python FOSS ecosystem explored as an analogue of the scientific publishing framework. Can we get some sort of developer <a href="https://en.wikipedia.org/wiki/H-index"><em>h</em>-index</a> by treating libraries as "<a href="https://en.wikipedia.org/wiki/Article-level_metrics">articles</a>" and applications as "<a href="https://en.wikipedia.org/wiki/Journal-level_metrics">journals</a>"? Adding in some application userbase approximations (via social <a href="https://en.wikipedia.org/wiki/Altmetrics">altmetrics</a> and other means) can give us much deeper insight into real-world impact.</p> <h2 id="next_steps"><a href="#next_steps" class="toclink">Next steps</a></h2> <p>If this essay seems shorter than <a href="https://sedimental.org/archive.html">my usual</a>, that's because it's really an introduction to <a href="https://github.com/mahmoud/awesome-python-applications">the list itself</a>. I got caught up in several projects' codebases while doing the research, and you will, too.</p> <p>If we've missed a project, please open <a href="https://github.com/mahmoud/awesome-python-applications/issues/new">an issue</a> or PR. If you're as excited about this as I am, consider helping with some of <a href="https://github.com/mahmoud/awesome-python-applications/issues">the open issues</a>. There are still a lot of application features to survey: licenses, Python versions, frameworks, and more. And as always, watch this space (and the repo) for updates as we make more discoveries!<p></p> <hr /> https://sedimental.org/glom_restructured_data.html Mahmoud Hashemi https://sedimental.org/ Announcing glom: Restructured Data for Python 2018年05月09日T10:00:00Z 2018年05月09日T10:00:00Z <p><p><em>This post introduces <a href="https://github.com/mahmoud/glom"><strong>glom</strong></a>, Python's missing operator for nested objects and data.</em></p> <p><em>If you're an easy sell, <a href="http://glom.readthedocs.io/en/latest/api.html">full API docs</a> and <a href="http://glom.readthedocs.io/en/latest/tutorial.html">tutorial</a> are already available at <a href="https://glom.readthedocs.io/">glom.readthedocs.io</a>. <br /> Harder sells, this 5-minute post is for you.<br /> Really hard sells <a href="https://twitter.com/mhashemi/status/994111054702522369">met me at PyCon</a>,<br /> where I gave <a href="https://www.youtube.com/watch?v=3aREXkfeWek">this 5-minute talk</a>.</em></p> <p><img src="https://sedimental.org/uploads/illo/comet.png" align="right" width="30%" /></p> <h2 id="the_spectre_of_structure"><a href="#the_spectre_of_structure" class="toclink">The Spectre of Structure</a></h2> <p>In the Python world, there's a saying: <em>"Flat is better than nested."</em></p> <p>Maybe times have changed or maybe that adage just applies more to code than data. In spite of the warning, nested data continues to grow, from document stores to RPC systems to structured logs to plain ol' JSON web services.</p> <p>After all, if "flat" was the be-all-end-all, why would namespaces be <a href="https://en.wikipedia.org/wiki/Zen_of_Python">one honking great idea</a>? Nobody likes artificial flatness, nobody wants to call a function with 40 arguments.</p> <p>Nested data is tricky though. Reaching into deeply structured data can get you some ugly errors. Consider this simple line:</p> <pre class="codehilite"><code class="language-python">value = target.a['b']['c'] </code></pre> <p>That single line can result in at least four different exceptions, each less helpful than the last:</p> <pre class="codehilite"><code class="language-python">AttributeError: 'TargetType' object has no attribute 'a' KeyError: 'b' TypeError: 'NoneType' object has no attribute '__getitem__' TypeError: list indices must be integers, not str </code></pre> <p>Clearly, we need our tools to catch up to our nested data.</p> <p>Enter <strong>glom</strong>.</p> <h2 id="restructuring_data"><a href="#restructuring_data" class="toclink">Restructuring Data</a></h2> <p><a href="https://github.com/mahmoud/glom">glom</a> is a new approach to working with data in Python, featuring:</p> <ul> <li><a href="http://glom.readthedocs.io/en/latest/tutorial.html#access-granted">Path-based access</a> for nested structures</li> <li><a href="http://glom.readthedocs.io/en/latest/api.html#glom-func">Declarative data transformation</a> using lightweight, Pythonic specifications</li> <li>Readable, meaningful <a href="http://glom.readthedocs.io/en/latest/api.html#exceptions">error messages</a></li> <li>Built-in <a href="http://glom.readthedocs.io/en/latest/api.html#debugging">data exploration and debugging features</a></li> </ul> <p>A tool as simple and powerful as glom <a href="http://glom.readthedocs.io/en/latest/by_analogy.html">attracts many comparisons</a>.</p> <p>While similarities exist, and are often intentional, glom differs from other offerings in a few ways:</p> <h3 id="going_beyond_access"><a href="#going_beyond_access" class="toclink">Going Beyond Access</a></h3> <p>Many nested data tools simply perform deep gets and searches, stopping short after solving the problem posed above. Realizing that access almost always precedes assignment, glom takes the paradigm further, enabling total declarative transformation of the data.</p> <p>By way of introduction, let's start off with space-age access, the classic "deep-get":</p> <pre class="codehilite"><code class="language-python">from glom import glom target = {'galaxy': {'system': {'planet': 'jupiter'}}} spec = 'galaxy.system.planet' output = glom(target, spec) # output = 'jupiter' </code></pre> <p><img src="https://sedimental.org/uploads/illo/mjc/jupiter_med.png" align="right" width="30%" /></p> <p>Some quick terminology:</p> <ul> <li><em>target</em> is our data, be it dict, list, or any other object</li> <li><em>spec</em> is what we want <em>output</em> to be</li> </ul> <p>With <code>output = glom(target, spec)</code> committed to memory, we're ready for some new requirements.</p> <p>Our astronomers want to focus in on the Solar system, and represent planets as a list. Let's restructure the data to make a list of names:</p> <pre class="codehilite"><code class="language-python">target = {'system': {'planets': [{'name': 'earth'}, {'name': 'jupiter'}]}} glom(target, ('system.planets', ['name'])) # ['earth', 'jupiter'] </code></pre> <p>And let's say we want to capture a parallel list of moon counts with the names as well:</p> <pre class="codehilite"><code class="language-python">target = {'system': {'planets': [{'name': 'earth', 'moons': 1}, {'name': 'jupiter', 'moons': 69}]}} spec = {'names': ('system.planets', ['name']), 'moons': ('system.planets', ['moons'])} glom(target, spec) # {'names': ['earth', 'jupiter'], 'moons': [1, 69]} </code></pre> <p>We can react to changing data requirements as fast as the data itself can change, naturally restructuring our results, despite the input's nested nature. Like a list comprehension, but for nested data, our code mirrors our output.</p> <p>And we're just getting started.</p> <h3 id="true_python_native"><a href="#true_python_native" class="toclink">True Python-Native</a></h3> <p>Most other implementations are limited to a particular data format or pure model, be it <a href="http://jmespath.org/">jmespath</a> or <a href="https://en.wikipedia.org/wiki/XPath">XPath</a>/<a href="https://en.wikipedia.org/wiki/XSLT">XSLT</a>. glom makes no such sacrifices of practicality, harnessing the full power of Python itself.</p> <p>Going back to our example, let's say we wanted to get an aggregate moon count:</p> <pre class="codehilite"><code class="language-python">target = {'system': {'planets': [{'name': 'earth', 'moons': 1}, {'name': 'jupiter', 'moons': 69}]}} glom(target, {'moon_count': ('system.planets', ['moons'], sum)}) # {'moon_count': 70} </code></pre> <p>With glom, you have full access to Python at any given moment. Pass values to functions, whether built-in, imported, or defined inline with <code>lambda</code>. But <code>glom</code> doesn't stop there.</p> <p>Now we get to one of my favorite features by far. Leaning into Python's power, we unlock the following syntax:</p> <pre class="codehilite"><code class="language-python">from glom import T spec = T['system']['planets'][-1].values() glom(target, spec) # ['jupiter', 69] </code></pre> <p>What just happened?</p> <p><code>T</code> stands for <em>target</em>, and <a href="http://glom.readthedocs.io/en/latest/api.html#object-oriented-access-and-method-calls-with-t">it acts as your data's stunt double</a>. <code>T</code> records every key you get, every attribute you access, every index you index, and every method you call. And out comes a spec that's usable like any other.</p> <p>No more worrying if an attribute is <code>None</code> or a key isn't set. Take that leap with <code>T</code>. <code>T</code> never raises an exception, so worst case you get a <a href="http://glom.readthedocs.io/en/latest/api.html#exceptions">meaningful error message</a> when you run <code>glom()</code> on it.</p> <p>And if you're ok with the data not being there, just set a default:</p> <pre class="codehilite"><code class="language-python">glom(target, T['system']['comets'][-1], default=None) # None </code></pre> <p>Finally, <a href="https://en.wikipedia.org/wiki/Null_coalescing_operator">null-coalescing operators</a> for Python!</p> <p>But so much more. This kind of dynamism is what made me fall in love with Python. No other language could do it quite like this.</p> <p>That's why glom will always be a Python library first and a CLI second. Oh, didn't I mention there was a CLI?</p> <h3 id="library_first_then_cli"><a href="#library_first_then_cli" class="toclink">Library first, then CLI</a></h3> <p>Tools like <a href="https://stedolan.github.io/jq/">jq</a> provide a lot of value on the console, but leave a dubious path forward for further integration. glom's full-featured command-line interface is only a stepping stone to using it more extensively inside application logic.</p> <pre class="codehilite"><code class="language-bash">$ pip install glom $ curl -s https://api.github.com/repos/mahmoud/glom/events \ | glom '[{"type": "type", "date": "created_at", "user": "actor.login"}]' </code></pre> <p>Which gets us:</p> <pre class="codehilite"><code class="language-json">[ { "date": "2018年05月09日T03:39:44Z", "type": "WatchEvent", "user": "asapzacy" }, { "date": "2018年05月08日T22:51:46Z", "type": "WatchEvent", "user": "CameronCairns" }, { "date": "2018年05月08日T03:27:27Z", "type": "PushEvent", "user": "mahmoud" }, { "date": "2018年05月08日T03:27:27Z", "type": "PullRequestEvent", "user": "mahmoud" } ... ] </code></pre> <p>Piping hot JSON into <code>glom</code> with a cool Python literal spec, with pretty-printed JSON out. A great way to process and filter API calls, and explore some data. Something genuinely enjoyable, because you know you won't be stuck in a pipe dream.</p> <p>Everything on the command line ports directly into production-grade Python, complete with better error handling and limitless integration possibilities.</p> <p><img src="https://sedimental.org/uploads/illo/comet_multi.png" align="right" width="40%" /></p> <h2 id="next_steps"><a href="#next_steps" class="toclink">Next steps</a></h2> <p>Never before glom have I put a piece of code into production so quickly.</p> <p>Within two weeks of the first commit, glom has paid its weight in gold, with glom specs replacing <a href="http://www.django-rest-framework.org/">Django Rest Framework</a> code 2x to 5x their size, making the codebase faster and more readable. Meanwhile, glom's core is so tight that we're on pace to have more docs and tests than code very soon.</p> <p>The <code>glom()</code> function is stable, along with the rest of the API, unless otherwise specified.</p> <p>A lot of other features are baking or in the works. For now, we'll be focusing on the following growth areas:</p> <ul> <li><a href="https://github.com/mahmoud/glom/issues/7">Validation functionality</a>, in the vein of schema and cerberus</li> <li><a href="https://github.com/mahmoud/glom/issues/8">CLI robustness</a>, better error messages, etc.</li> <li><a href="https://github.com/mahmoud/glom/issues/9">Extension API</a>, clean up some internal code, open up extensions</li> <li><a href="https://github.com/mahmoud/glom/issues/10">Automatic default registration</a> of default behaviors for co-installed packages (e.g., Django)</li> </ul> <p>We'll be talking about all of this and more <a href="https://twitter.com/mhashemi/status/994111054702522369">at PyCon</a>, so swing by if you can. In either case, I hope you'll try glom out and let us know how it goes!</p> <!-- # The Story of glom * A couple years ago I built `remap`, a `map()` function for trees of Python objects * It didn't solve all my problems because it's mostly for cases where you don't know much about the structure of data * While building Montage, we tried using the "fat model" approach of teaching objects to serialize themselves, but this didn't compose well. Every API endpoint needed slightly different data * Then it dawned on me, what we needed was templating, but for basic objects like dicts, lists, etc., so that we could declaratively create JSON-serializable API responses. * Taking inspiration from lightweight templating languages like `gofmt` and `ashes`, we built the first version of glom. --><p></p> <hr /> https://sedimental.org/maintainerati_2017_github_design.html Mahmoud Hashemi https://sedimental.org/ Maintainerati 2017: GitHub Design 2017年10月17日T22:00:00Z 2017年10月17日T22:00:00Z <p><p>Last week I attended a <a href="https://twitter.com/maintainerati">Maintainerati</a> event, an unconference/mini-summit for maintainers of popular software, run as a prelude to the <a href="https://githubuniverse.com/">GitHub Universe</a> conference. After being brought up to speed on this year's secret handshake of the software elite, I had a great time in the documentation breakout group, as well as moderating a lively discussion on diversity in open-source, both of which deserve their own write-ups at some point.</p> <p><img title="One of about a dozen minitalks summarizing one of the breakout groups' discussions." width="100%" src="https://sedimental.org/uploads/maintainerati_2017_summaries.jpg" /></p> <p>Once those were through and coffee breaks were had, what I consider the main event was upon us: An opportunity to discuss with GitHub designers and developers all the different ways projects use <a href="https://en.wikipedia.org/wiki/GitHub">GitHub</a>, and how GitHub might improve to match those use cases. I think these interactions have the most direct potential to bear fruit, so in my excitement I wrote a bunch of the proceedings down:</p> <div class="toc"><span class="toctitle">Contents</span><ul><li><a href="#digested_emails">Digested emails</a><ul><li><a href="#star_spectrum">Star spectrum</a></ul><li><a href="#code_review_permissions">Code review permissions</a><li><a href="#dashboard_improvements">Dashboard improvements</a><li><a href="#thanks">Thanks!</a></ul></div><h2 id="digested_emails"><a href="#digested_emails" class="toclink">Digested emails</a></h2> <p>Discussion is one of the greatest things about GitHub, but as Jupyter developer <a href="https://github.com/ian-r-rose">Ian Rose</a> brought up, an email for every comment can be overwhelming. Daily or weekly digests for issues, or even for all your GitHub activity would be a huge improvement, especially for more lightweight users of GitHub. This may not strike subscribers of <a href="http://weekly.hatnote.com/">The Weeklypedia</a> as a surprise, but I am a big fan of email digests.</p> <p>More control over engagement levels could open a great new avenue for driving traffic for GitHub, too.</p> <h3 id="star_spectrum"><a href="#star_spectrum" class="toclink">Star spectrum</a></h3> <p>Right now you can either <a href="https://help.github.com/articles/about-stars/">star</a> repos or <a href="https://help.github.com/articles/watching-and-unwatching-repositories/">watch</a> them, effectively getting no notifications or <em>all</em> of them.</p> <p>I'd estimate about a third of the repos I star look interesting, but haven't yet reached the point where I'd use or contribute to them. So they mostly get starred and forgotten.</p> <p><a href="https://starminder.xyz"><img title="A screenshot of the simple but useful Starminder" align="right" width="40%" src="https://sedimental.org/uploads/starminder_20171017.png" /></a></p> <p>A friend of mine started a little project called <a href="https://starminder.xyz/"><em>Starminder</em></a>, which emails a nightly selection of five of my starred repos. I've been having a grand time revisiting these old stars and seeing how far they've come, even reminding me of features I was waiting to build.</p> <p>And while I love <a href="https://github.com/nkantar">Nik's work</a>, instead of relying on Starminder, it would be way better if I could tell GitHub roughly how often I'd like updates on a project, and then get <a href="https://github.com/twisted/twisted/pulse/monthly">Pulse-like info</a> delivered to my inbox on a weekly or monthly basis.</p> <p>Commit activity, high-traffic issues, and especially new tags/releases are all things I'd be very excited to get personalized, periodic updates on, without having to get every single notification as a separate email.</p> <p>One off-the-cuff idea I had was to establish some sort of star gradient, with the basic star without notifications being an option on one end of the opt-in engagement spectrum, and full-blown, every-notification "Watching" on the other. Could there be one Star dropdown to rule them all?</p> <h2 id="code_review_permissions"><a href="#code_review_permissions" class="toclink">Code review permissions</a></h2> <p>Requiring code review before merging is a pretty smart idea for any rigorous project, and <a href="https://help.github.com/articles/enabling-required-reviews-for-pull-requests/">now GitHub supports it natively</a>. However, only developers with write permissions can actually perform a code review. Here are some real-life use cases that demonstrate why this is less than ideal:</p> <ul> <li>A senior developer not involved with the project files an issue requesting a feature. I would like them to review the implementation to ensure it does what they want. The senior developer has a busy schedule and doesn't want to join the project and get a bunch of notifications, but would be qualified to review the code.</li> <li>A novice developer finds an problem in the documentation, they could review the new documentation for clarity. Their lack of experience makes them best qualified to review.</li> <li>The core maintainer implements a feature, but is actually the only developer on the project. Requesting a code review from non-project-member peers is a great way to get them to look at the code and become more involved with the project going forward.</li> </ul> <p>For bonus points all permissions could have an option to be time-limited. Designated reviewr and expiration possibilities notwithstanding, I think the best flow would include the ability to add someone to a specific PR as reviewer, without giving them any project-wide permissions.</p> <h2 id="dashboard_improvements"><a href="#dashboard_improvements" class="toclink">Dashboard improvements</a></h2> <p>I'm probably weird for doing this, but I habitually visit the normal <a href="https://github.com/">github.com</a> logged-in landing page, aka the dashboard, several times a day. Now, for the few of you who share my habit probably noticed, there's a new Discover tab, offering personalized suggestions of repos to star.</p> <p>The event stream stayed mostly the same, however, as it has for many years. But despite its maturity there are a couple events that surprisingly don't show up anywhere, even when the dashboard seems like a natural fit:</p> <ul> <li><strong>Follows</strong> - I have the better part of a thousand followers, but I can't remember if I've ever seen a notification about this. <a href="https://github.com/mahmoud?tab=followers">They seem like nice folks!</a></li> <li><strong>Stars on org-owned repos</strong> - <a href="https://github.com/python-attrs/attrs">There</a> <a href="https://github.com/hatnote/montage">are</a> <a href="https://github.com/python-hyper/hyperlink">several</a> <a href="https://github.com/kurtbrose/pyjks">repos</a> I maintain and watch, but for which I've never seen on-dash notifications. What do they all have in common? They're all owned by organizations (e.g., <a href="http://github.com/python-hyper">python-hyper</a>). Other types of notifications show up, but not stars.</li> <li><strong>Watches</strong> - Not sure I've ever gotten a notification for someone watching one of my repos, even though they're probably more interested in collaboration than the average stargazer.</li> </ul> <p>Any or all of these would certainly make my <a href="https://github.com/">github.com</a> itch yield more interesting results, and I'm sure there are some enhancements I've missed, too!</p> <h2 id="thanks"><a href="#thanks" class="toclink">Thanks!</a></h2> <p>Just wanted to say thanks to GitHub for putting together such a great event. Whether or not any of these features materializes in the near future, it was so nice to meet up with old friends and make some new ones, too.</p> <p>Focused, cross-technology encounters like these are all too rare. For my Python readers, let this serve as a reminder to get out and interact with other stacks. Python's strength is its integrative nature, and I think that can be a strength for us Pythonists as well.</p> <p>In any case, thanks for the event, GitHub! Hope to see you again next year!<p></p> <hr /> https://sedimental.org/plugin_systems.html Mahmoud Hashemi https://sedimental.org/ Plugin Systems 2017年07月11日T11:00:00Z 2017年07月11日T11:00:00Z <p><p><em>"What are plugins?" and other proceedings of the inaugural PyCon Comparative Plugin Systems <a href="https://en.wikipedia.org/wiki/Birds_of_a_feather_(computing)">BoF</a>.</em></p> <p><em>Update: This BoF and post inspired [a talk I gave at PyGotham 2017][pygotham2017].</em></p> <p>Within the programming world, and the Python ecosystem in particular, there are a lot of presumptions around plugins. Specifically, we take them for granted. "It's <em>just</em> a plugin." "Oh, <em>another</em> plugin library?"</p> <p>So for PyCon 2017, I resolved to dismiss the dismissals by revisiting plugins, and it may have been the best programming decision I've made all year.</p> <p><img align="right" width="40%" src="https://sedimental.org/uploads/illo/snake_plugin_sm.png" /></p> <div class="toc"><span class="toctitle">Contents</span><ul><li><a href="#why_plugins">Why plugins?</a><li><a href="#setting_examples">Setting examples</a><li><a href="#taxonomizing">Taxonomizing</a><ul><li><a href="#generalizability">Generalizability</a><li><a href="#discovery">Discovery</a><li><a href="#install_location">Install location</a><li><a href="#plugin_independence">Plugin independence</a><li><a href="#dependency_registration">Dependency registration</a></ul><li><a href="#drawing_a_line">Drawing a line</a><li><a href="#a_definition">A definition</a><li><a href="#motivation">Motivation</a><li><a href="#in_conclusion">In conclusion</a></ul></div><h2 id="why_plugins"><a href="#why_plugins" class="toclink">Why plugins?</a></h2> <p>For all types of software, open-source or otherwise, the scalability of development poses a problem long before scalability of performance and other technical challenges. Engaging more developers creates code contention and bugs. Too many cooks is all it takes to spoil the broth.</p> <blockquote> <p><strong>All growing projects need an API for code integration.</strong></p> </blockquote> <p>Call them plugins, modules, or extensions, from your browser to your kernel, they are <em>the</em> widely successful solution. Tellingly, the only thing wider than the success of plugin-based architecture is the variety of implementations.</p> <p>Python's dynamic nature in particular seems to encourage inventiveness. The more the merrier, usually, but at some point we cloud a tricky space. How different could these plugin systems be? How wide is the range of functionalities, really? How does a developer choose the right plugin system for a given project? For that matter, what is a plugin system anyway? No one I talked to had clear answers.</p> <p>So when <a href="https://us.pycon.org/2017/about/">PyCon 2017</a> rolled around, I knew exactly what I wanted to do: call together a team of developers to get to the bottom of the above, or at the very least, answer the question,</p> <blockquote> <p><em>"What happens when you ask a dozen veteran Python programmers to spill their guts about plugins?"</em></p> </blockquote> <p><img title="Our fearless band of extensionists" width="100%" src="https://sedimental.org/uploads/pycon_2017_plugin_bof_crop.jpg" /></p> <h2 id="setting_examples"><a href="#setting_examples" class="toclink">Setting examples</a></h2> <p>Our group leapt into action by listing off plugin systems as fast as we could:</p> <ul> <li><a href="https://docs.openstack.org/stevedore/latest/">stevedore</a></li> <li><a href="https://docs.twisted.org/en/stable/core/howto/plugin.html">twisted.plugin</a></li> <li><a href="https://www.mercurial-scm.org/wiki/WritingExtensions">Mercurial extensions</a></li> <li><a href="https://docs.pytest.org/en/latest/plugins.html">pytest plugins</a> (<a href="https://github.com/pytest-dev/pluggy">pluggy</a>)</li> <li><a href="http://gather.readthedocs.io/en/latest/">gather</a></li> <li><a href="https://docs.pylonsproject.org/projects/venusian/en/latest/">venusian</a></li> <li><a href="http://pluginbase.pocoo.org/">pluginbase</a></li> <li><a href="https://straightplugin.readthedocs.io/en/latest/">straight.plugin</a></li> <li><a href="https://docs.pylint.org/en/1.6.0/plugins.html">pylint plugins</a></li> <li><a href="http://flake8.pycqa.org/en/latest/plugin-development/">flake8 plugins</a></li> <li><a href="http://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins">raw setuptools entrypoints</a></li> <li><a href="https://zopecomponent.readthedocs.io/en/latest/">zope.component</a></li> <li><a href="https://django-extensions.readthedocs.io/en/latest/">Django command extensions</a></li> <li><a href="http://docs.sqlalchemy.org/en/latest/dialects/index.html">SQLAlchemy dialects/DBAPIs</a></li> <li><a href="http://www.sphinx-doc.org/en/stable/extdev/index.html#dev-extensions">Sphinx extensions</a></li> <li><a href="http://docs.buildout.org/en/latest/topics/extensions.html">Buildout extensions</a></li> <li><a href="http://pyarmory-pike.readthedocs.io/en/latest/">Pike</a></li> <li><a href="http://dectate.readthedocs.io/en/latest/">Dectate</a> and <a href="http://reg.readthedocs.io/en/latest/index.html">Reg</a></li> <li>Others that came and went a little too fast to jot down</li> </ul> <p>With our plate heaping with examples like these, we all felt ready to dig into our big questions.</p> <h2 id="taxonomizing"><a href="#taxonomizing" class="toclink">Taxonomizing</a></h2> <p>For our first bit of analysis, we asked: What practical and fundamental attributes differentiate these approaches? If we had to create a taxonomy, what characteristics would we look for?</p> <h3 id="generalizability"><a href="#generalizability" class="toclink">Generalizability</a></h3> <p>You'll notice our list of example plugin systems included several very specialized examples, from pylint to SQLAlchemy. Many projects even use totally internal plugin systems to achieve better factoring.</p> <p>Bespoke plugin systems like pylint's are a valuable reference for anyone looking to account for patterns in their own system, especially generic systems like <a href="http://www.giantflyingsaucer.com/blog/?p=5858">pike and stevedore</a>.</p> <h3 id="discovery"><a href="#discovery" class="toclink">Discovery</a></h3> <p>A plugin system's first job is locating the plugins to load. The split here is whether plugins are individually specified, or automatically discovered based on paths and patterns.</p> <p>In either case, we need paths. Some systems provide search functionality, exchanging explicitness for convenience. This can be a good trade, especially when plugins number in the double digits, or whenever less technical users are concerned.</p> <h3 id="install_location"><a href="#install_location" class="toclink">Install location</a></h3> <p>Closely related to discovery, our next differentiator was the degree to which the plugin system leveraged Python's own package management facilities. Some systems, like <a href="https://docs.pylonsproject.org/projects/venusian/en/latest/">venusian</a>, were designed to encourage <code>pip install</code>-ing plugins, searching for them in <code>site-packages</code>, alongside the application itself.</p> <p>Other systems have their own search paths, locating plugins in the user directory and elsewhere on the filesystem. Still other systems are designed for plugins inside the application tree, as is the case with <a href="https://docs.djangoproject.com/en/1.11/ref/applications/">Django apps</a>.</p> <h3 id="plugin_independence"><a href="#plugin_independence" class="toclink">Plugin independence</a></h3> <p>One of the most challenging parts of plugin development is finding ways of independently reusing and testing code, while keeping in mind the code's role as an optional component of another application.</p> <p>In some systems, like Django's, the tailoring is so tightly coupled that reusability doesn't make sense. But other approaches, like <a href="http://gather.readthedocs.io/en/latest/">gather</a>'s, keeps plugin code independently usable.</p> <h3 id="dependency_registration"><a href="#dependency_registration" class="toclink">Dependency registration</a></h3> <p>Almost all plugins work by providing some set of <em>hooks</em> which are findable and callable by the core. We found another differentiator in whether and how plugins could gain access to resources from the core, and even other plugins.</p> <p>Not all systems support this, preferring to keep plugins as leaf participants in the application. Those simplistic setups hit limits fast. The next best, and most common, solution is to simply pass the whole core state at the time of hook invocation, providing plugins with the same access as the core. It works, but the API becomes the whole system state.</p> <p>More advanced systems allow plugins to publish an inventory of dependencies, which the core then injects. Higher granularity enables lazier evaluation for a performance boost, and more explicit structure helps create a more maintainable application overall.</p> <h2 id="drawing_a_line"><a href="#drawing_a_line" class="toclink">Drawing a line</a></h2> <p>With our group feeling like we were approaching the nature of things, we reversed direction, asking instead: What <em>isn't</em> a plugin system?</p> <p>Establishing explicit boundaries and specific counterexamples proved instrumental to producing a final definition.</p> <p>Is <a href="https://docs.python.org/2/library/functions.html#eval"><code>eval()</code></a> a plugin system? We thought maybe, at first. But the more we thought about it, no, because the code itself was not sufficiently abstracted through a loading or namespacing system.</p> <p>Is <a href="https://en.wikipedia.org/wiki/Domain_Name_System">DNS</a> a plugin system? It has names and namespaces galore. But no, because code is not being loaded <em>in</em>. Remote services in general are beyond the boundary of what a plugin can be. They exist out there, and we call out to them. They're callouts, not plugins.</p> <h2 id="a_definition"><a href="#a_definition" class="toclink">A definition</a></h2> <p>So with our boundaries established, we were ready to offer a definition:</p> <blockquote> <p><em>A plugin system is a software facility used by a running program to discover and load code, often containing hooks called by the host application</em></p> </blockquote> <p>But, by this definition, isn't Python's built-in <code>import</code> functionality a plugin system? Mostly, yes! Python's import system is a plugin system.</p> <ul> <li>For discovery it uses <a href="https://docs.python.org/2/library/sys.html#sys.path"><code>sys.path</code></a>, various "site" directories and ".pth" files, and <a href="https://docs.python.org/2/library/sys.html#sys.path_hooks">much more</a>.</li> <li>For installation, it uses <code>site-packages</code>, <a href="https://pip.readthedocs.io/en/latest/user_guide/#user-installs">user <code>.local</code> directories</a>, and more.</li> <li>As far as independent reusability, virtually every module <a href="https://docs.python.org/2/using/cmdline.html#cmdoption-m">can be made its own entrypoint</a>.</li> <li>As for dependency registration, every module is tossed into <a href="https://docs.python.org/2/library/sys.html#sys.modules"><code>sys.modules</code></a> with the others, but also has access to <code>import</code> and <code>sys</code>, making roughly every module an equal partner in application state.</li> </ul> <p>Python's import system is a powerful one, with a <a href="https://docs.python.org/3/reference/import.html#finders-and-loaders">plugin system</a> of its own. But finders, loaders, and import hooks aren't <em>Python's</em> plugin system. For that, you need to look to <a href="https://docs.python.org/2/library/site.html">the <code>site</code> module</a>.</p> <h2 id="motivation"><a href="#motivation" class="toclink">Motivation</a></h2> <p>With our hour nearly up, all these proximate details still needed to be distilled into an ultimate motivation behind plugins. To this end, we returned to one of software engineering's fundamental principles: <a href="https://en.wikipedia.org/wiki/Separation_of_concerns">Separation of concerns</a>.</p> <p>We want to reason about our software. We want to know what state it is in. What we all want is the ability to say, "the core is ready, proceeding to load modules/extensions/plugins." We want to defer loading <em>some</em> code so that we can add extra instrumentation, checks, resiliency, and error messages to that loading process. If something misbehaves, we can do better than a stack trace and an <code>ImportError</code>.</p> <p>Python's import system is a plugin system of sorts, but because we use it all the time, we've already used up most of the concern separation potential of <code>import</code>. Hence, all the creativity around plugin systems, seeking a balance between feeling native to Python, while not still successfully separating concerns.</p> <h2 id="in_conclusion"><a href="#in_conclusion" class="toclink">In conclusion</a></h2> <p>So now we have achieved a complete view of the Python plugin system ecosystem, from motivation to manifestation.</p> <p>By numbers alone, it may seem on the face like there are more than enough Python plugin solutions. But looking at the motivation and taxonomy above, it's clear that there are still several gaps waiting to be filled.</p> <p>By taking a holistic look at the implementations and motivations, the PyCon 2017 Plugins Open Session ended with the conclusion that even <a href="https://sedimental.org/plugin_systems.html#setting_examples">Python's wide selection</a> could use expansion.</p> <p>So, until next year, go forth and continue to build! The future of well-factored code depends on it.<sup id="fnref:further"><a class="footnote-ref" href="https://sedimental.org/plugin_systems.html#fn:further">1</a></sup></p> <p><img width="50%" src="https://sedimental.org/uploads/illo/snake_puzzle_sm.png" /></p> <div class="footnote"> <hr /> <ol> <li id="fn:further"> <p>For additional reading, I recommend doing what we did after our discussion, finding and reading <a href="http://eli.thegreenplace.net/2012/08/07/fundamental-concepts-of-plugin-infrastructures">this post from Eli Bendersky</a>. While it focuses more on specific implementations and less about generalized systems, Eli's post overlaps in many very reaffirming ways, much to our relief and gratification. The worked example of building ReStructured Text plugins is a perfect complement to the post above. <a class="footnote-backref" href="https://sedimental.org/plugin_systems.html#fnref:further" title="Jump back to footnote 1 in the text">↩</a></p> </li> </ol> </div><p></p> <hr /> https://sedimental.org/the_packaging_gradient.html Mahmoud Hashemi https://sedimental.org/ The Many Layers of Packaging 2017年05月09日T13:47:00Z 2017年05月09日T13:47:00Z <p><p><em>The packaging gradient, and why PyPI isn't an app store.</em></p> <p><em>Update: I turned this post into a talk. The <a href="https://www.youtube.com/watch?v=iLVNWfPWAC8">video from PyBay is here</a>, the <a href="https://speakerdeck.com/mhashemi/the-packaging-gradient">slides are available here</a>. The long-cut video from <a href="https://www.meetup.com/BAyPIGgies/events/242072266/">BayPiggies</a> is coming, but <a href="https://speakerdeck.com/mhashemi/the-packaging-gradient-extended-edition">the "Extended Edition" slides are here</a>.</em></p> <p>One lesson threaded throughout <a href="https://www.oreilly.com/library/view/enterprise-software-with/9781491943755/"><em>Enterprise Software with Python</em></a> is that deployment is not the last step of development. The mark of an experienced engineer is to work backwards from deployment, planning and designing for the reality of production environments.</p> <p>You could learn this the hard way. Or you could come on a journey into what I call <em>the packaging gradient</em>. It's a quick and easy decision tree to figure out what you need to ship. You'll gain a trained eye, and an understanding as to why there seem to be so many conflicting opinions about how to package code.</p> <p>The first lesson on our adventure is:</p> <blockquote> <p><em>Implementation language does not define packaging solutions.</em></p> </blockquote> <p><img align="right" width="40%" src="https://sedimental.org/uploads/illo/snake_box_sm.png" /></p> <p>Packaging is all about target environment and deployment experience. Python will be used in examples, but the same decision tree applies to most general-purpose languages.</p> <p>Python was designed to be cross-platform and runs in countless environments. But don't take this to mean that Python's built-in tools will carry you anywhere you want to go. I can <a href="https://kivy.org/docs/guide/android.html">write a mobile app in Python</a>, does it make sense to install it on my phone with <a href="https://en.wikipedia.org/wiki/Pip_(package_manager)">pip</a>? As you'll see, a language's built-in tools only scratch the surface.</p> <p>So, one by one, I'm going to describe some code you want to ship, followed by the simplest acceptable packaging process that provides that repeatable deployment process we crave. We save the most involved solutions for last, right before <a href="https://sedimental.org/the_packaging_gradient.html#closing">the short version</a>. Ready? Let's go!</p> <div class="toc"><span class="toctitle">Contents</span><ul><li><a href="#prelude_the_humble_script">Prelude: The Humble Script</a><li><a href="#the_python_module">The Python Module</a><li><a href="#the_pure_python_package">The pure-Python Package</a><li><a href="#the_python_package">The Python Package</a><li><a href="#milestone_outgrowing_our_roots">Milestone: Outgrowing our roots</a><li><a href="#depending_on_pre_installed_python">Depending on pre-installed Python</a><li><a href="#depending_on_a_new_python_ecosystem">Depending on a new Python/ecosystem</a><li><a href="#bringing_your_own_python">Bringing your own Python</a><ul><li><a href="#servers_ride_the_bus">Servers ride the bus</a></ul><li><a href="#bringing_your_own_userspace">Bringing your own userspace</a><ul><li><a href="#in_our_own_image">In our own image</a><li><a href="#an_image_by_any_other_name">An image by any other name</a><li><a href="#the_whale_in_the_room">The whale in the room</a></ul><li><a href="#bringing_your_own_kernel">Bringing your own kernel</a><li><a href="#bringing_your_own_hardware">Bringing your own hardware</a><li><a href="#but_what_about">But what about...</a><ul><li><a href="#os_packages">OS packages</a><li><a href="#virtualenv">virtualenv</a><li><a href="#security">Security</a></ul><li><a href="#closing">Closing</a></ul></div><h2 id="prelude_the_humble_script"><a href="#prelude_the_humble_script" class="toclink">Prelude: The Humble Script</a></h2> <p>Everyone's first exposure to Python deployment was something so innocuous you probably wouldn't remember. You copied a script from point <em>A</em> to point <em>B</em>. Chances are, whether <em>A</em> and <em>B</em> were separate directories or computers, your days of "just use <code>cp</code>" didn't last long.</p> <p>Because while a single file is the ideal format for copying, it doesn't work when that file has unmet dependencies at the destination.</p> <p>Even simple scripts end up depending on:</p> <ul> <li>Python libraries - <a href="https://github.com/mahmoud/boltons">boltons</a>, <a href="https://github.com/kennethreitz/requests">requests</a>, <a href="https://github.com/numpy/numpy">NumPy</a></li> <li>Python, the runtime - <a href="https://en.wikipedia.org/wiki/CPython">CPython</a>, <a href="https://en.wikipedia.org/wiki/PyPy">PyPy</a></li> <li>System libraries - <a href="https://en.wikipedia.org/wiki/GNU_C_Library">glibc</a>, <a href="http://zlib.net/">zlib</a>, <a href="https://anaconda.org/anaconda/libxml2">libxml2</a></li> <li>Operating system - <a href="https://en.wikipedia.org/wiki/Ubuntu_(operating_system)">Ubuntu</a>, <a href="https://www.freebsd.org/">FreeBSD</a>, <a href="http://68.media.tumblr.com/e846f7ed786ead5cee6e4097b254b181/tumblr_mqfh4b0rV61sydj82o1_250.gif">Windows</a></li> </ul> <p>So every good packaging adventure always starts with the question:</p> <blockquote> <p><strong>Where is your code going, and what can we depend on being there?</strong></p> </blockquote> <p>First, let's look at libraries. Virtually every project these days begins with library package management, a little <code>pip install</code>. It's worth a closer look!</p> <h2 id="the_python_module"><a href="#the_python_module" class="toclink">The Python Module</a></h2> <p>Python library code comes in two sizes, <a href="https://docs.python.org/2/tutorial/modules.html">module</a> and package, practically corresponding to files and directories on disk. Packages can contain modules and packages, and in some cases can grow to be quite sprawling. The module, being a single file, is much easier to redistribute.</p> <p>In fact, if a pure-Python module imports nothing but the standard library itself, you have the unique option of being able to distribute it by simply copying the single file into your codebase.</p> <p>This type of inclusion, known as vendoring, is often glossed over, but bears many advantages. <a href="https://www.python.org/dev/peps/pep-0020/">Simple is better than complex</a>. No extra commands or formats, no build, no install. Just copy the code<sup id="fnref:licenses"><a class="footnote-ref" href="https://sedimental.org/the_packaging_gradient.html#fn:licenses">1</a></sup> and roll.</p> <p>For examples of libraries doing this, see <a href="https://bottlepy.org/docs/dev/">bottle.py</a>, <a href="https://github.com/mahmoud/ashes">ashes</a>, <a href="https://github.com/keleshev/schema">schema</a>, and, of course, <a href="https://github.com/mahmoud/boltons">boltons</a>, which also has <a href="http://boltons.readthedocs.io/en/latest/architecture.html">an architectural statement</a> on the topic.</p> <h2 id="the_pure_python_package"><a href="#the_pure_python_package" class="toclink">The pure-Python Package</a></h2> <p>Packages are the larger unit of redistributable Python. Packages are directories of code containing an <code>__init__.py</code>. Provided they contain only pure-Python modules, they can also be vendored, similar to the module above. Even very popular packages <a href="https://github.com/pypa/pip/#TODO">like pip itself</a> can be found with <code>vendor</code>, <code>lib</code>, and <code>packages</code> directories.</p> <p>Because these packages nest and sprawl, vendoring can lead to codebases that feel unwieldy. While it may seem awkward to have <code>lib</code> directories many times larger than your application, it's more common than some less-experienced devs might expect. That said, having worked on some very large codebases, I can definitely understand why core Python developers created other options for distributing Python libraries.</p> <p>For libraries that only contain Python code, whether single-file or multi-file, Python's original built-in solution still works today: <a href="https://docs.python.org/2/distutils/sourcedist.html">sdists</a>, or "source distributions". This early format has worked for well over a decade and is still supported by <code>pip</code> and the <a href="https://pypi.org/pypi">Python Package Index</a> (PyPI)<sup id="fnref:pypi"><a class="footnote-ref" href="https://sedimental.org/the_packaging_gradient.html#fn:pypi">2</a></sup>.</p> <h2 id="the_python_package"><a href="#the_python_package" class="toclink">The Python Package</a></h2> <p>Python is a great language, and one which is made all the greater by its power to integrate.</p> <p>Many libraries contain <a href="https://sedimental.org/python_by_the_c_side.html">C</a>, <a href="https://en.wikipedia.org/wiki/Cython">Cython</a>, and other statically-compiled languages that need build tools. If we distribute such code using sdists, installation will trigger a build that will fail without the tools, will take time and resources if it succeeds, and generally involve more intermediary languages and four-letter keywords than Python devs thought should be necessary.</p> <p>When you have a library that requires compilation, then it's definitely time to look into <a href="http://pythonwheels.com/">the wheel format</a>.</p> <p>Wheels are named after wheels of cheese, found in <a href="https://wiki.python.org/moin/CheeseShop">the proverbial cheese shop</a>. Aptly named, wheels really help get development rolling. Unlike source distributions like sdists, the publisher does all the building, resulting in a system-specific binary.</p> <p>The install process just decompresses and copies files into place. It's so simple that even pure-Python code gets installed <a href="https://hynek.me/articles/sharing-your-labor-of-love-pypi-quick-and-dirty/">faster</a> when packaged as a wheel instead of an sdist.</p> <p>Now even when you upload wheels, I still recommend uploading sdists as a fallback solution for those occasions when a wheel won't work. It's simply not possible to prebuild wheels for all configurations in all environments. If you're curious what that means, check out <a href="https://github.com/pypa/manylinux/blob/master/pep-513.rst#rationale">the design rationale behind manylinux1 wheels</a>.</p> <h2 id="milestone_outgrowing_our_roots"><a href="#milestone_outgrowing_our_roots" class="toclink">Milestone: Outgrowing our roots</a></h2> <p><img align="right" width="40%" src="https://sedimental.org/uploads/illo/legatree_med.png" /></p> <p>Now, three approaches in, we've hit our first milestone. So far, everything has relied on built-in Python tools. pip, PyPI, the wheel and sdist formats, all of these were designed <em>by</em> developers, <em>for</em> developers, to distribute code and tools <em>to</em> other developers.</p> <p>In other words:</p> <blockquote> <p><em>PyPI is not an app store.</em></p> </blockquote> <p>PyPI, pip, wheels, and the underlying setuptools machinations are all designed for <em>libraries</em>. Code for developer reuse.</p> <p>Going back to our first example, a "script" is more accurately described as a command-line <em>application</em>. Command-line applications can have a Python-savvy audience, so it's not totally unreasonable to host them on PyPI and install them with pip (or <a href="https://github.com/mitsuhiko/pipsi">pipsi</a>). But understand that we're approaching the limit for a good production and user-facing experience.</p> <p>So let's get explicit. By default, the built-in packaging tools are designed to depend on:</p> <ul> <li>A working Python installation</li> <li>A network connection, probably to the Internet</li> <li>Pre-installed system libraries</li> <li>A developer who is willing to sit and watch dependencies recursively download at install-time, and debug version conflicts, build errors, and myriad other issues.</li> </ul> <p>These are fine, and expected for development environments. Professionals are paid to do it, students pay to learn it, and there are even a few oddballs who enjoy this sort of thing.</p> <p>Going into our next options, notice how we have shifted gears to support <em>applications</em>. Remember that distributing applications is more a function of target platform than of implementation language. This is harder than library distribution because we stop depending on layers of the stack, and the developer who would be there to ensure the setup works.</p> <h2 id="depending_on_pre_installed_python"><a href="#depending_on_pre_installed_python" class="toclink">Depending on pre-installed Python</a></h2> <p>For our first foray into application distribution, we're going to maintain the assumption that Python exists in the target environment. This isn't the wildest assumption, CPython 2 is available on virtually every Linux and Mac machine.</p> <p>Taking Python for granted, we can turn to bundling up all of the Python libraries on which our code depends. We want a single executable file, the kind that you can double click or run by prefixing with a <code>./</code>, anywhere on a Python-enabled host. <a href="https://pex.readthedocs.io/en/latest/">The PEX format</a> gets us exactly this.</p> <p>The PEX, or Python EXecutable, is a carefully-constructed ZIP archive, with just a hint of bootstrapping. PEXs can be built for Linux, Mac, and Windows. Artifacts rely on the system Python, but unlike pip, a PEX does not install itself or otherwise affect system state. It uses mature, <a href="https://www.python.org/dev/peps/pep-0273/">standard features</a> of Python, successfully iterating on a <a href="https://docs.python.org/3/library/zipapp.html">broadly</a>-<a href="https://github.com/brownhead/superzippy">used</a> approach.</p> <p>A lot can be done with Python and Python libraries alone. If your project follows this approach, PEX is an easy choice. <a href="https://www.youtube.com/watch?v=NmpnGhRwsu0">See this 15-minute video for a solid introduction</a>.</p> <h2 id="depending_on_a_new_python_ecosystem"><a href="#depending_on_a_new_python_ecosystem" class="toclink">Depending on a new Python/ecosystem</a></h2> <p>Plain old vanilla Python leaving you wanting? That factory-installed system software can leave a lot to be desired. Lucky for us there's an upgrade well within grasp.</p> <p><a href="https://www.anaconda.com/download">Anaconda</a> is a Python distribution with expanded support for distributing libraries and applications. It's cross-platform, and has supported binary packages since before the wheel. Anaconda packages and ships system libraries like <a href="https://anaconda.org/anaconda/libxml2">libxml2</a>, as well as applications like <a href="https://anaconda.org/anaconda/postgresql">PostgreSQL</a>, which fall outside the purview of default Python packaging tools. That's because while Anaconda might seem like an innocent Python distribution from the outside, internally Anaconda blends in characteristics of a full-blown operating system, complete with its own package manager, <a href="https://conda.io/docs/">conda</a>.</p> <p>If you look inside of an Anaconda installation, or at the screenshot below, you'll find something that looks a lot like <a href="https://en.wikipedia.org/wiki/Unix_File_System">a root Linux filesystem</a> (<code>lib</code>, <code>bin</code>, <code>include</code>, <code>etc</code>), with some extra Anaconda-specific directories.</p> <p><img width="75%" src="https://sedimental.org/uploads/anaconda_internals.png" /></p> <p>What's remarkable is that the underlying operating system can be Windows, Mac, or basically any flavor of Linux. Just like that, Anaconda unassumingly blends Python libraries and system libraries, convenience and power, development and data science. And it does it all by using features built into Python and target operating systems.</p> <p>Consider that the list of cross-platform and language-agnostic package managers includes only <a href="https://en.wikipedia.org/wiki/Steam_(software)">Steam</a>, <a href="https://en.wikipedia.org/wiki/Nix_package_manager">Nix</a>, and <a href="https://en.wikipedia.org/wiki/Pkgsrc">pkgsrc</a>, and you can start to understand why conda is <a href="https://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/">often misunderstood</a>. Adding onto that, conda is adding features fast. For instance, conda is the first Python-centric package manager to do its dependency resolution up front (using <a href="https://github.com/ContinuumIO/pycosat">a SAT solver</a>), <a href="https://github.com/pypa/pip/issues/988">unlike pip</a>. More recently, <a href="https://github.com/conda/conda/blob/master/CHANGELOG.md#430-2016年12月14日--safety">conda 4.3</a> fulfilled the wishes of many by matching <a href="https://en.wikipedia.org/wiki/Advanced_Packaging_Tool">apt</a> and <a href="https://en.wikipedia.org/wiki/Yellowdog_Updater,_Modified">yum</a> with transactional package installation. Now conda matches operating system package managers in critical technical respects, except the wide-open social components of <a href="https://anaconda.org/">anaconda.org</a> make it even easier to use than, say <a href="https://askubuntu.com/questions/4983/what-are-ppas-and-how-do-i-use-them/4990#4990">PPAs</a>.</p> <p>In short, Anaconda makes a compelling and effective case, both as a development environment <a href="https://docs.conda.io/projects/conda/en/latest/commands/index.html#conda-vs-pip-vs-virtualenv-commands">comparable to pip + virtualenv</a>, and even as part of the equation <a href="https://medium.com/paypal-tech/python-packaging-at-paypal-4a90352a7ca2">in production server environments</a>. Python is lucky to host to such a rare breed.</p> <h2 id="bringing_your_own_python"><a href="#bringing_your_own_python" class="toclink">Bringing your own Python</a></h2> <p><img align="right" width="40%" src="https://sedimental.org/uploads/illo/snake_freeze_sm.png" /></p> <p>Can you imagine deploying to an environment without Python? It's a hellish scenario, I know. Luckily, your code can still bring your own, and it's ice cold. Freezing, in fact.</p> <p>When I wrote my first Python program, I naturally shared news of the accomplishment with my parents, who naturally wanted to experience this taste of <em>The Future</em> firsthand.</p> <p>Of course all I had a .py file I wrote on <a href="https://en.wikipedia.org/wiki/Knoppix">Knoppix</a>, and they were halfway around the world on a Windows 2000 machine. Luckily, this new software called <a href="https://github.com/marcelotduarte/cx_Freeze">cx_Freeze</a> was <a href="https://mail.python.org/pipermail/python-announce-list/2002-November/001824.html">just announced a couple months earlier</a>. Unluckily, no one told me, and the better part of a decade would pass before I learned how to use it.</p> <p>Fifteen years later, the process has evolved, but retained the same general shape. <a href="http://www.openwall.com/presentations/WOOT13-Security-Analysis-of-Dropbox/">Dropbox</a>, <a href="https://en.wikipedia.org/wiki/Eve_Online">EVE Online</a>, <a href="https://en.wikipedia.org/wiki/Civilization_IV">Civilization IV</a>, <a href="https://kivy.org/">kivy</a>, and countless other applications and frameworks rely on freezing to ship applications, generally to personal computing devices. Interpreter, libraries, and application logic, all rolled into an independent artifact.</p> <p>These days the list of open-source tools has expanded beyond <a href="https://github.com/marcelotduarte/cx_Freeze">cx_Freeze</a> to include <a href="http://www.pyinstaller.org/">PyInstaller</a>, <a href="https://github.com/jamesabel/osnap">osnap</a>, <a href="https://pypi.org/pypi/bbfreeze">bbFreeze</a>, <a href="http://www.py2exe.org/">py2exe</a>, <a href="https://py2app.readthedocs.io/en/latest/">py2app</a>, <a href="https://pypi.org/pypi/pynsist">pynsist</a>, <a href="http://nuitka.net/pages/overview.html">nuitka</a>, and more. There is even a conda-native option called <a href="https://github.com/conda/constructor">constructor</a>. A partial feature matrix can be found <a href="http://python-guide.readthedocs.io/en/latest/shipping/freezing/">here</a>.</p> <p>Most of these systems give you some latitude to determine exactly how independent an executable to generate. Frozen artifacts almost always ends up depending somewhat on the host operating system. See <a href="http://www.py2exe.org/index.cgi/Tutorial#Step5">this py2exe tutorial discussion of Windows system libraries</a> for a taste of the fun.</p> <p>If you're wondering about the chilly moniker, freezers owe their name to their reliance on the "frozen module" functionality built into Python. It's <a href="https://docs.python.org/2/c-api/import.html#c.PyImport_ImportFrozenModule">sparsely</a> <a href="https://docs.python.org/2/library/imp.html#imp.init_frozen">documented</a>, but basically Python code is precompiled into bytecode and frozen into the interpreter. <a href="https://docs.python.org/3/whatsnew/3.3.html">As of Python 3.3</a>, Python's import system <a href="http://sayspy.blogspot.com/2012/02/how-i-bootstrapped-importlib.html">was ported</a> from C to a frozen pure-Python implementation.</p> <h3 id="servers_ride_the_bus"><a href="#servers_ride_the_bus" class="toclink">Servers ride the bus</a></h3> <p><img align="right" width="40%" src="https://sedimental.org/uploads/illo/omnibus_med.jpg" /></p> <p>Freezing tends to be targeted more toward client software. They're great for GUIs and CLI applications run by a single user on a single machine at a time. When it comes to deploying server software bundled with its own Python, there is a very notable alternative: the <a href="https://github.com/chef/omnibus">Omnibus</a>.</p> <p>Omnibus builds "full-stack" installers designed to deploy applications to servers. It supports RedHat and Debian-based Linux distros, as well as Mac and Windows. A few years back, DataDog saw the light and <a href="https://www.datadoghq.com/blog/new-datadog-agent-omnibus-ticket-dependency-hell/">made the switch</a> for their Python-based agent. <a href="https://about.gitlab.com/">GitLab</a>'s <a href="https://about.gitlab.com/install/">on-premise solution</a> is perhaps the largest open-source usage, and has been a joy to install and upgrade.</p> <p>Unlike our multitude of freezers, Omnibus is uniquely elegant and mature. No other system has natively shipped multi-component/multi-service packages as sleekly for as long.</p> <h2 id="bringing_your_own_userspace"><a href="#bringing_your_own_userspace" class="toclink">Bringing your own userspace</a></h2> <p>Probably the newest and fastest-growing class of solution has actually been a long time coming. You may have heard it referenced by its buzzword: containerization, sometimes crudely described as "lightweight virtualization".</p> <p><a href="https://blog.jessfraz.com/post/containers-zones-jails-vms/">Better descriptions exist</a>, but the important part is this: Unlike other options so far, these packages establish a firm border between their dependencies and the libraries on the host system. This is a huge win for environmental independence and deployment repeatability.</p> <h3 id="in_our_own_image"><a href="#in_our_own_image" class="toclink">In our own image</a></h3> <p>Let's illustrate with one of the simplest and most mature implementations, <a href="https://en.wikipedia.org/wiki/AppImage">AppImage</a>.</p> <div style="text-align:center;"><img width="60%" src="https://sedimental.org/uploads/illo/snake_image_sm.png" /></div> <p>Since 2004, the aptly-named AppImage (and its predecessor <a href="https://en.wikipedia.org/wiki/AppImage#klik">klik</a>) have been providing distro-agnostic, installation-free application distribution to Linux end users, without requiring root or touching the underlying operating system. AppImages only rely on the kernel and CPU architecture.</p> <p>An AppImage is perhaps the most aptly-named solution in this whole post. It is literally an <a href="https://en.wikipedia.org/wiki/ISO_9660">ISO9660</a> image containing an entrypoint executable, plus a snapshot of a filesystem comprising a userspace, full of support libraries and other dependencies. Looking inside a mounted <a href="https://kdenlive.org/">Kdenlive</a> image, it's easy to recognize the familiar structure of a Unix filesystem:</p> <p><img width="75%" src="https://sedimental.org/uploads/kdenlive_appimage_internals.png" /></p> <p>Dozens of headlining Linux applications ship like this now. Download the AppImage, make it executable, double-click, and voila.</p> <p>If you're reading this on a Mac, you've probably had a similar experience. This is one of those rare cases where there's some consensus: Apple was one of the pioneers in image-based deployments, with <a href="https://en.wikipedia.org/wiki/Apple_Disk_Image">DMGs</a> and <a href="https://en.wikipedia.org/wiki/Bundle_(macOS)">Bundles</a>.</p> <h3 id="an_image_by_any_other_name"><a href="#an_image_by_any_other_name" class="toclink">An image by any other name</a></h3> <p>No class of formats would be complete without <a href="https://en.wikipedia.org/wiki/Format_war">a war</a>. AppImage <a href="https://en.wikipedia.org/wiki/AppImage#Reception_and_usage">inspired</a> the <a href="https://en.wikipedia.org/wiki/Flatpak">Flatpak</a> format, which was adopted by RedHat/Fedora, but was of course insufficient for Canonical/Ubuntu, who were also targeting mobile, and created <a href="https://en.wikipedia.org/wiki/Snappy_(package_manager)">Snappy</a>. A shiny update to our deb-rpm split tradition.</p> <p>Both of these formats introduce more features, as well as more complexity and dependence on the operating system. Both Snaps and Flatpaks expect the host to support their runtime, which can include <a href="https://flatpak.org/faq/#Can_Flatpak_be_used_on_servers_too_">dbus, a systemd user session, and more</a>. A lot of work is put into increased <a href="https://en.wikipedia.org/wiki/Linux_namespaces#Mount_.28mnt.29">namespacing</a> to isolate running applications into separate <a href="https://blog.jessfraz.com/post/getting-towards-real-sandbox-containers/">sandboxes</a>.</p> <p>I haven't actually seen these formats used for deploying server software. Flatpak might never support servers, Snappy is trying, but personally, I would really like to hear about or experiment with server-oriented AppImages.</p> <h3 id="the_whale_in_the_room"><a href="#the_whale_in_the_room" class="toclink">The whale in the room</a></h3> <p>Some call the technology sphere a marketplace of ideas, and that metaphor is certainly felt in this case. Whether you've heard good things or bad, we can all agree <a href="https://en.wikipedia.org/wiki/Docker_(software)">Docker</a> is the format sold the hardest. What else would you do when you've got <a href="https://www.crunchbase.com/organization/docker">180ドル million of VC</a> breathing down your neck.</p> <p>Docker lets you make an application as self-contained as AppImage, but exceeds even Snapcraft and Flatpak in the assumptions it makes. Images are managed and run by <a href="https://docs.docker.com/engine/reference/commandline/dockerd/">yet another service</a> with a lot of capabilities and tightly coupled components.</p> <p>Docker's packaging abstraction reflects this complexity. Take for instance how Docker applications default to running as <code>root</code>, despite <a href="https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/#user">their documentation</a> recommending against this. Default <code>root</code> is particularly unfriendly because namespacing is still not a reliable guard against malicious actors attacking the host system. <a href="http://blog.dscpl.com.au/2015/12/don-run-as-root-inside-of-docker.html">Root inside the container is root outside the container</a>. Always check the <a href="https://www.google.com/search?q=namespace+site:cve.mitre.org&amp;client=ubuntu&amp;hs=55V&amp;channel=fs&amp;source=lnt&amp;tbs=qdr:y&amp;sa=X">CVEs</a>. The Docker <a href="https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface">security documentation</a> also includes some good, frank discussion of what one is getting into.</p> <p>Checking in with our trendline, so far we have been shipping larger larger, more-inclusive artifacts for more independent, reliable deployments. Some container systems present us with our first clear departure from this pattern. We no longer have a single executable that runs or installs our code. Technically we have a self-contained application, but we're also back to requiring an interpreter other than the OS and CPU.</p> <p>It's not hard to imagine instances where the complexity of a runtime can overrun the advantages of self-containment. To quote Jessie Frazelle's <a href="https://blog.jessfraz.com/post/containers-zones-jails-vms/">blog post</a> again, <strong>"Complexity == Bugs"</strong>. This dynamic leads some to skip straight to our next option, but as AppImage simply demonstrates, this is not an impeachment of all image-based approaches.</p> <!-- If I have time: 2D scatterplot of relative inclusivity and execution dependability. --> <h2 id="bringing_your_own_kernel"><a href="#bringing_your_own_kernel" class="toclink">Bringing your own kernel</a></h2> <p>Now we're really packing heavy. If having your Python code, libraries, runtime, and necessary system libraries isn't enough, you can add one more piece of machinery: the operating system <a href="https://en.wikipedia.org/wiki/Kernel_(operating_system)">kernel</a> itself.</p> <p>While this type of distribution never really caught on for consumers, there is a rich ecosystem of tools and formats for VM-based server deployment, from <a href="https://en.wikipedia.org/wiki/Vagrant_(software)">Vagrant</a> to <a href="https://en.wikipedia.org/wiki/Amazon_Machine_Image">AMIs</a> to <a href="https://en.wikipedia.org/wiki/OpenStack">OpenStack</a>. The whole dang cloud.</p> <p>Like our more complex container examples above, the images used to run virtual machines are not runnable executables, and require a mediating runtime, called a <a href="https://en.wikipedia.org/wiki/Hypervisor">hypervisor</a>. These days hypervisor machinery is very mature, and may even come standard with the operating system, as is the case with <a href="https://en.wikipedia.org/wiki/Hyper-V">Windows</a> and <a href="https://developer.apple.com/reference/hypervisor">Mac</a>. The images themselves come in a <a href="https://en.wikipedia.org/wiki/VMDK">few</a> <a href="https://en.wikipedia.org/wiki/Open_Virtualization_Format">formats</a>, all of which are mature and dependable, if large. Size and build time may be the only deterrent for smaller projects prioritizing development time. Thanks to years of kernel and <a href="https://en.wikipedia.org/wiki/Hardware-assisted_virtualization">processor advancement</a>, virtualization is not as slow as many developers would assume. If you can get your software shipped faster on images, then I say go for it.</p> <p>Larger organizations save a lot from even small reductions to deployment and runtime overhead, but have to balance that against half a dozen other concerns worthy of <a href="https://www.oreilly.com/library/view/enterprise-software-with/9781491943755/">a much longer discussion elsewhere</a>.</p> <h2 id="bringing_your_own_hardware"><a href="#bringing_your_own_hardware" class="toclink">Bringing your own hardware</a></h2> <p>In a software-driven Internet obsessed with lighter and lighter weight solutions, it can be easy to forget that a lot of software is <a href="https://en.wikipedia.org/wiki/Computer_appliance">literally packaged</a>.</p> <p>If your application calls for it, you can absolutely slap it on a rackable server, <a href="https://www.raspberrypi.org/">Raspberry Pi</a>, or even a <a href="https://micropython.org/">micropython</a> and physically ship it. It may seem absurd at first, but hardware is the most sensible option for countless cases. And not limited to just consumer and IoT use cases, either. Especially where infrastructure and security are concerned, hardware is made to fit software like a glove, and can minimize exposure for all parties.</p> <div style="text-align:center;"><img width="40%" src="https://sedimental.org/uploads/illo/snake_esc_sm.png" /></div> <h2 id="but_what_about"><a href="#but_what_about" class="toclink">But what about...</a></h2> <p>Before concluding, there are some usual suspects that may be conspicuously absent, depending on how long you've been packaging code.</p> <h3 id="os_packages"><a href="#os_packages" class="toclink">OS packages</a></h3> <p>Where do OS packages like <a href="https://en.wikipedia.org/wiki/Deb_(file_format)">deb</a> and <a href="https://en.wikipedia.org/wiki/RPM_Package_Manager">RPM</a> fit into all of this? They can fit anywhere, really. If you are very sure what operating system(s) you're targeting, these packaging systems can be powerful tools for distributing and installing code. There are reasons beyond popularity that almost all production container and VM workflows rely on OS package managers. They are mature, robust, and capable of doing dependency resolution, transactional installs, and custom uninstall logic. Even systems as powerful as <a href="https://sedimental.org/the_packaging_gradient.html#servers_ride_the_bus">Omnibus</a> target OS packages.</p> <p>In <a href="https://www.oreilly.com/library/view/enterprise-software-with/9781491943755/">ESP</a>'s packaging segment, I touch on how we leveraged RPMs as a delivery mechanism for Python services in PayPal's production RHEL environment. One detail, that would have been minor and confusing in that context, but should make sense to readers now, is that PayPal didn't use the vanilla operating system setup. Instead, all machines used a separate rpmdb and install path for PayPal-specific packages, maintaining a clear divide between application and base system.</p> <h3 id="virtualenv"><a href="#virtualenv" class="toclink">virtualenv</a></h3> <p>Where do <a href="http://python-guide.readthedocs.io/en/latest/dev/virtualenvs/">virtualenvs</a> fit into all of this? Virtualenvs are indispensible for many Python development workflows, but I discourage direct use of virtualenvs for deployment. Virtualenvs can be a useful packaging primitive, but they need additional machinery to become a complete solution. The <a href="http://dh-virtualenv.readthedocs.io/en/1.0/tutorial.html">dh-virtualenv package</a> demonstrates this well for deb packaging, but you can also make a virtualenv in an RPM post-install step, or by virtue of using an installer like <a href="https://github.com/jamesabel/osnap">osnap</a>. The key is that the artifact and its install process should be self-contained, minimizing the risk of partial installs.</p> <p>This isn't virtualenv-specific, but lest it go unsaid, do not pip-install things, especially from the Internet, during production deploys. <a href="https://sedimental.org/the_packaging_gradient.html#depending_on_pre_installed_python">Scroll up and read about PEX</a>.</p> <h3 id="security"><a href="#security" class="toclink">Security</a></h3> <p>The further down the gradient you come, the harder it gets to update components of your package. Everything is more tightly bound together. This doesn't necessarily mean that it's harder to update in general, but it is still a consideration, when for years the approach has been to have system administrators and other technicians handle certain kinds of infrastructure updates.</p> <p>For example, if a kernel security issue emerges, and you're deploying containers, the host system's kernel can be updated without requiring a new build on behalf of the application. If you deploy VM images, you'll need a new build. Whether or not this dynamic makes one option more secure is still a bit of an old debate, going back to the still-unsettled matter of <a href="https://www.google.com/search?channel=fs&amp;q=static+vs+dynamic+linking">static versus dynamic linking</a>.</p> <h2 id="closing"><a href="#closing" class="toclink">Closing</a></h2> <p>Packaging in Python has a bit of a reputation for being a bumpy ride. This is mostly a confused side effect of Python's versatility. Once you understand the natural boundaries between each packaging solution, you begin to realize that the varied landscape is a small price Python programmers pay for using the most balanced, flexible language available.</p> <p>A summary of our lessons along the way:</p> <ol> <li>Language does not define packaging, environment does. Python is general-purpose, PyPI is not.</li> <li>Application packaging must not be confused with library packaging. Python is for both, but pip is for libraries.</li> <li>Self-contained artifacts are the key to repeatable deploys.</li> <li>Containment is a spectrum, from executable to installer to userspace image to virtual machine image to hardware. "Containers" are not just one thing, let alone the only option.</li> </ol> <p>Now, with map in hand, you can safely navigate the rich terrain. The Python packaging landscape is converging, but don't let that narrow your focus. Every year seems to open new frontiers, challenging existing practices for shipping Python.</p> <div style="text-align:center;"><img width="60%" src="https://sedimental.org/uploads/illo/snake_c_med.png" /></div> <div class="footnote"> <hr /> <ol> <li id="fn:licenses"> <p>Don't forget to include respective free software licenses, where applicable. <a class="footnote-backref" href="https://sedimental.org/the_packaging_gradient.html#fnref:licenses" title="Jump back to footnote 1 in the text">↩</a></p> </li> <li id="fn:pypi"> <p>Despite being called the Python Package Index, PyPI does not index packages. PyPI indexes distributions, which can contain one or more packages. For instance, pip installing <a href="https://pillow.readthedocs.io/en/stable/">Pillow</a> allows you to import PIL. Pillow is the distribution, PIL is the package. The Pillow-PIL example also demonstrates how the distribution-package separation enables multiple implementations of the same API. Pillow is a fork of <a href="https://pypi.org/pypi/PIL">the original PIL package</a>. Still, as most distributions only provide one package, please name your distribution after the package for consistency's sake. <a class="footnote-backref" href="https://sedimental.org/the_packaging_gradient.html#fnref:pypi" title="Jump back to footnote 2 in the text">↩</a></p> </li> </ol> </div><p></p> <hr /> https://sedimental.org/developer_variants.html Mahmoud Hashemi https://sedimental.org/ Developer variants 2016年08月09日T03:00:00Z 2016年08月09日T03:00:00Z <p><p><img width="34%" src="https://sedimental.org/uploads/illo/snowflake_med.png" align="right" /> Software development takes all kinds. I'm not talking about appearances or job titles. I'm talking about motivations and fulfillment.</p> <p>In my years of writing code and leading projects, I've come to learn a bit about how my teammates, and I, experience success, through a few manifest archetypes.</p> <h2 id="the_developer_mathematician"><a href="#the_developer_mathematician" class="toclink">The Developer-Mathematician</a></h2> <p>Always a source of conversation, the Developer-Mathematician, seeks truth, pure and provable. They don't want to create software. They want to unearth timeless, universal absolutes that happen to be in the neighborhood of computers.</p> <p>Catch them crafting functional code, writing property-based tests, or exhaustively searching their bookmarks for that one paper on arXiv.</p> <p>To be honest, purity and formalism can chafe when building most software. Proofs are still more suited to dissertations than development. Still, it's good to strike a healthy balance between research and development. Make time to try new testing strategies, start a weekly paper club, and keep those fundamentals sharp.</p> <h2 id="the_developer_architect"><a href="#the_developer_architect" class="toclink">The Developer-Architect</a></h2> <p>Less formal than the mathematician, but not always more practical, the Developer-Architect is brimming with potential. They want to create something original, important, and particularly elegant. They want to create something that outlasts them, something worthy of use, maintenance, and study. The creation need not be immortal or universal; the more of their mark that is left on it the better.</p> <p>Find them making high-concept pitches in response to clear gaps in the open-source ecosystem, or discussing best practices that are suspiciously similar to their own practices. If your Developer-Architect is low on ideas or recently saw one of their ideas superseded or implemented without them, they may become despondent.</p> <p>Software designers derive a lot of pleasure from the design process, but need to be reminded that architecture is far from the hardest part. To avoid turmoil and despondency, Developer-Architects must code their own implementations and design only a few steps ahead. Creative code can be very good code, and may well be worth the risk and wait.</p> <h2 id="the_developer_engineer"><a href="#the_developer_engineer" class="toclink">The Developer-Engineer</a></h2> <p>Least formal, but no less professional, the Developer Engineer is the workhorse of the software industry. Engineers build for the sake of building. Recognize them by their willingness to experiment with code, and their lack of attachment to code. If it doesn't work, the engineer has confidence: Toss it, we can build it better again.</p> <p>For motivation, the engineer needs clear requirements and a modicum of appreciation for a spec well-met. For fulfillment, the build itself often suffices, so avoid process and interruptions.</p> <p>Proofs and designs aside, I still believe when we channel the Developer Engineer, we channel our best selves. A sense of confident understanding of the problem, married with unbounded pragmatism, leading to working, shippable code. It will have bugs, and it may not be abstracted quite right for future extensibility, but it will work.</p> <h2 id="a_winning_combination"><a href="#a_winning_combination" class="toclink">A Winning Combination</a></h2> <p>We all go through phases, play different roles, and work with all sorts. Embracing the mathematician, architect, and engineer, as well as others, from tinkerers to hustlers, has taught me more than I could have learned by my undifferentiated self.</p> <p>The key is recognizing your current motivations and finding alignment of these angles within a company, within a team, and within oneself.<p></p> <hr /> https://sedimental.org/calver.html Mahmoud Hashemi https://sedimental.org/ Announcing CalVer 2016年06月22日T10:30:00Z 2016年06月22日T10:30:00Z <p><p><img align="right" width="20%" src="https://sedimental.org/uploads/illo/caltree_med.png" /> <em>It's about time.</em></p> <p>Technologists expect things to get better with time. Your current laptop has more RAM than the last, your current car is safer than its predecessor, and the latest version of your code is certainly the best ever.</p> <p>What if the same be said of versioning systems?</p> <p>Software versioning systems also get better with time. That's why today I'm pleased to announce <strong>CalVer</strong>, a calendar versioning convention based on project release dates, formally hosted on <strong><a href="http://calver.org">calver.org</a></strong>.</p> <p>Calendar versioning represents a powerful alternative to Semantic Versioning (<a href="http://semver.org">SemVer</a>). CalVer combines with or even replaces SemVer versioning systems, based on the needs of the project.</p> <h2 id="features"><a href="#features" class="toclink">Features</a></h2> <p>The <a href="http://calver.org">calver.org</a> site speaks for itself, but there you'll find:</p> <ul> <li><a href="http://calver.org/#scheme">Terms and definitions</a></li> <li>Case studies, including <a href="http://calver.org/#ubuntu">Ubuntu</a>, <a href="http://calver.org/#twisted">Twisted</a>, <a href="http://calver.org/#teradata">Teradata</a>, and <a href="http://calver.org/#other_notable_projects">more</a></li> <li>And <a href="http://calver.org/#when_to_use_calver">a short guide</a> on when to use CalVer for your future projects</li> </ul> <p>Case studies feature badges like this one, for <a href="http://calver.org/#ubuntu">Ubuntu</a>'s versioning scheme:</p> <blockquote> <p><img src="https://img.shields.io/badge/calver-YY.0M.MICRO-22bfda.svg" /></p> </blockquote> <p>You'll also find <a href="http://calver.org/users.html">a project list</a>, always seeking new additions.</p> <h2 id="rationale"><a href="#rationale" class="toclink">Rationale</a></h2> <p>Many projects have designed their version schemes to better match the needs of their developers and customers. CalVer formalizes those practices. <a href="http://calver.org">calver.org</a> began as a resource to help maintainers communicate the design choices in their versioning scheme.</p> <p>CalVer has grown to showcase prominent uses and provide a way for more projects to adopt calendar versioning in their projects. It even hosts a community-curated <a href="http://calver.org/users.html">list of projects</a> using calendar versioning.</p> <p>Even more background on the project can be found on the <a href="http://calver.org/about.html">calver.org About page</a>, as well as my previous versioning essay, <em><a href="https://sedimental.org/designing_a_version.html">Designing a version</a></em>.</p> <h2 id="compared_to_semver"><a href="#compared_to_semver" class="toclink">Compared to SemVer</a></h2> <p>Some comparisons are inevitable. SemVer, hosted at <a href="http://semver.org">semver.org</a>, is a big name in software versioning conventions. CalVer combines well with incremental-number schemes, so it's not strictly a competition. That said, here is how CalVer outshines SemVer.</p> <p>🕐 CalVer integrates objective, intuitive calendar dates. <br /> ⊠ SemVer subjectively increments numbers.</p> <p>🕑 CalVer encompasses real-world usage through a formal vocabulary. <br /> ⊠ SemVer imitates the form of a specification, albeit a confrontational one. Unlike real specifications, SemVer lacks objective verifiability, exemplars, or reference implementations.</p> <p>🕒 CalVer makes maintenance easier through powerful, objective semantics. Look at a library's version number, immediately know how recent your copy. Compare across libraries, checking that dependencies are in sync. Deprecate versions based on time. <br /> ⊠ SemVer has Tom Preston-Werner's semantics.</p> <p>🕓 CalVer's use of release dates allows for automatable, immutable versions on which everyone can agree. <br /> ⊠ SemVer introduces one more place a bug can enter a projects. Versions only go up, and a release which violates SemVer guidelines cannot be undone. That pressure means more projects <a href="https://sedimental.org/designing_a_version.html#semver_and_release_blockage">perpetually stuck in 0.x</a>.</p> <p>The <a href="https://sedimental.org/designing_a_version.html">list goes on</a>, but the message is clear. There is an alternative to SemVer, and it's about time!</p> <h2 id="next_steps"><a href="#next_steps" class="toclink">Next steps</a></h2> <p>Have a look at the <a href="http://calver.org/users.html">Users</a> list and help add any projects I may have missed. It's a big ecosystem out there, and the initial list reflects my own Linux and Python tendencies.</p> <p><img align="right" width="25%" src="https://sedimental.org/uploads/illo/calver_cal_med.png" /></p> <p>For current maintainers using calendar versioning, next time you get a raised eyebrow, just let them know: It's CalVer. Or save yourself a step and add one of <a href="http://calver.org/overview.html#case_studies">the badges</a>, linking to <a href="http://calver.org">calver.org</a>.</p> <p>For developers of new libraries, CalVer is here to stay, and <a href="http://calver.org">calver.org</a> will be there next time you're designing your versioning scheme. It's a big ecosystem out there, and once you try CalVer, I think you'll agree. Software versioning get better with time.<p></p> <hr /> https://sedimental.org/running_from_software.html Mahmoud Hashemi https://sedimental.org/ Running from software 2016年05月27日T04:11:00Z 2016年05月27日T04:11:00Z <p><p>So while PyCon 2016 starts in less than 48 hours, some kind of anticipation compelled me to polish off the last of <a href="https://www.youtube.com/channel/UCgxzjK6GuOHVKR_08TT4hJQ">the talks from last year</a>. For some reason I went for a keynote. I'm not typically a keynote attendee, and this time I'd missed something big.<sup id="fnref:pycon2016"><a class="footnote-ref" href="https://sedimental.org/running_from_software.html#fn:pycon2016">1</a></sup></p> <p><a href="https://twitter.com/jacobian">Jacob Kaplan-Moss</a>, the herald of <a href="https://www.djangoproject.com/">Django</a>, really laid something out. I'll give you the short version, but here's a video in case you want a look:</p> <iframe width="560" height="315" src="https://www.youtube.com/embed/hIJdFxYlEKE" frameborder="0" allowfullscreen=""></iframe> <p>To summarize, Jacob sets out to explain why mediocrity is acceptable. Bell curves rule everything around us. He holds up his record as a middling ultramarathon runner as proof. He surmises that lack of passion for work is leading people to feel untalented. This, combined with "brilliant asshole" programmers, is shaming people out of the industry. He wraps up with a message of inclusivity, especially toward women. Now, you can probably make sense of any other details with <a href="https://sedimental.org/uploads/jacobian_pycon2015.pdf">the slides</a>.</p> <p>Above all, Jacob and I are in complete agreement with his opening and closing. If you consider yourself an average programmer, that is fine and probably better than the alternatives. Also, as a field, software must continue reaching out to and integrating more underrepresented groups, especially women.</p> <p>That said, I'm not sure how one could have put more missteps between those two points.<sup id="fnref:1"><a class="footnote-ref" href="https://sedimental.org/running_from_software.html#fn:1">2</a></sup></p> <h2 id="the_10x_programmer"><a href="#the_10x_programmer" class="toclink">The 10x Programmer</a></h2> <p>If Jacob makes one thing clear from the keynote, it's that years of being called a 10x programmer has made him very uncomfortable. He rejects the concept, as many have. Now I, too, have at various points been called a rockstar, ninja, and 10xer, and even though I also don't identify with those labels, I will tell you that the 10x programmer is very real.<sup id="fnref:tptm"><a class="footnote-ref" href="https://sedimental.org/running_from_software.html#fn:tptm">3</a></sup></p> <p>Every 10x programmer I know spends most days as a 1x something else. Most 10x code is the result of observing and accumulating 10x more domain knowledge, then being in the right place at the right time. You do what ten developers off the street could never. I've been there, and I have the commits to prove it. And when other aspects of my life take priority, I'm an average programmer, focusing on my job and its share of 1x work.</p> <p>10x programming is a matter of insight and inspiration, confidence and autonomy. This is a circumstance so unique that it creates an obligation to teach software to the world. You never know when the right 1x programmer is going to be in the right place to transform their surroundings with a 10x moment. Many of the most creative people I know understand very little about programming, and one can't help but wonder what programming skills or insight might bring to their process.</p> <p>The great thing about Python is that you can teach so much programming with so little overhead. You give those highly creative people even a taste of programming and it opens up vast opportunities. Even just the shared vocabulary is a huge boost to cross-pollination of ideas between disciplines.</p> <p>Look at Python use among biologists, neuroscientists, and other academics and analysts. Their amazing results speak volumes. Yet by strict accounts their engineering skill wilts next to experienced Python systems engineers working at YouTube, PayPal, Dropbox, Continuum Analytics, etc.</p> <p>It's inexcusable to put such a diverse group on this single bell curve when their goals and disciplines are so different. Our language is the same and our cultures are mutually beneficial. Seeing people measured along this single dimension keeps me up at night.</p> <p>Putting it all in terms of employment is harmful. Maximizing employee utilization only creates more 1x programming. Software is more than the industry of churning out code. A programmer is more than someone who is paid to write software. A person is more than their profession.</p> <!-- * Physical tasks like labor and exercise are infinitely more quantifiable than programming tasks. --> <h2 id="the_privilege"><a href="#the_privilege" class="toclink">The Privilege</a></h2> <p>It's said that the most sure sign of privilege is ignorance. Jacob drives this all the way home, but not for lack of trying</p> <p>From the beginning of the talk, he considers the immediate situation. He disclaims most of his reputation, describes his origins as unremarkable, and points out that his biggest contributions weren't actually his. Later on in the talk, while showcasing the face of the privileged programmer, the 10x archetype, the person most likely to be able to ride on their identity, he shares a chuckle at his own resemblance.</p> <p>Moving into Jacob's running-programming analogy, the anecdote got off to a false start, but just kept going. Nobody stopped him to point out that by virtue of simply <em>being</em> an ultra-runner, he <em>is</em> the top tier. If you're in the 68th percentile of ultrarunners, then you're in the top 1% of people who run, period. Even finishing a normal marathon faster than the median time demonstrates talent and tremendous physical gifts.</p> <p>Jacob trimmed the y-axis, measured himself among the top tier, and found himself only slightly better than mediocre. The sort of guilt-inducing behavior that he claims leads people to leave the field, unfolding right on stage.</p> <h2 id="the_corporatism"><a href="#the_corporatism" class="toclink">The Corporatism</a></h2> <p>Throughout the talk, Jacob cites some statistics. The one that stuck with me was about an impending employment deficit. The U.S. government projects 1.5 million unfilled programming jobs in the year 2020. This becomes a central motivation for Jacob encouraging people to go into software<sup id="fnref:3"><a class="footnote-ref" href="https://sedimental.org/running_from_software.html#fn:3">4</a></sup>. Programming is immediately linked to coding for money.</p> <p>Jacob says software is a skill, like any other. Programming is like running marathons. Individuals are responsible for their own training. But Jacob bears a message of hope: bosses will pay you to run, even if you're not the fastest.</p> <p>Too many managers are like Jacob, subtly redirecting the creative potential of software into commodity labor. "We" need as many people as possible to learn and teach programming because some a small portion of society has decided to gamble money on software eating everything in a very particular way.</p> <p>On the contrary, people need exposure to programming for its fundamental concepts. Software offers new ways of decomposing problems and creating solutions, new approaches that are necessary to understand an increasingly fast-paced and connected world. That is totally irrespective of employment. Software design is a new way of thinking, for all people, employed as programmers or not.</p> <h2 id="in_short"><a href="#in_short" class="toclink">In short</a></h2> <p>Jacob is a much better runner than he gives himself credit for, but programming is not running.</p> <p>Software is much more than an industry. You don't need a programming job to be a good programmmer.</p> <p>This brings me back to reiterate the central thought we share: One doesn't need to compare favorably to other programmers in order to make a difference with software. So, we must accept and support programmers of all walks and skill levels.</p> <div class="footnote"> <hr /> <ol> <li id="fn:pycon2016"> <p>Suffice to say, I'm already subscribed to <a href="https://www.youtube.com/channel/UCwTD5zJbsQGJN75MwbykYNw">PyCon 2016</a> <a class="footnote-backref" href="https://sedimental.org/running_from_software.html#fnref:pycon2016" title="Jump back to footnote 1 in the text">↩</a></p> </li> <li id="fn:1"> <p>Dear Jacob, if you are reading this, I just wanted to say no harsh feelings. It was a moving talk and I'm sure that most people got the good messages that bookended the talk. I hope you don't mind the criticism and still find it as interesting as you mentioned on stage. Hope it helps with future keynotes, and I'll be <a href="https://twitter.com/mhashemi">right here</a> if you have any followups. <a class="footnote-backref" href="https://sedimental.org/running_from_software.html#fnref:1" title="Jump back to footnote 2 in the text">↩</a></p> </li> <li id="fn:tptm"> <p>This also came up in <a href="https://talkpython.fm/episodes/show/54/enterprise-software-with-python">Episode #54 of Talk Python to Me</a>, while discussing my course, <a href="https://sedimental.orgshop.oreilly.com/product/0636920047346.do">Enterprise Software with Python</a>. <a class="footnote-backref" href="https://sedimental.org/running_from_software.html#fnref:tptm" title="Jump back to footnote 3 in the text">↩</a></p> </li> <li id="fn:3"> <p>"The US Bureau of Labor Statistics estimates that by 2020 there will be a 1.5 million programming job gap, which means there will be that many jobs unfilled. That's in five years. The EU has published similar numbers, 1.2 million in 2018—three years. That means we need to be doing something to get more people into our industry." <a class="footnote-backref" href="https://sedimental.org/running_from_software.html#fnref:3" title="Jump back to footnote 4 in the text">↩</a></p> </li> </ol> </div><p></p> <hr /> https://sedimental.org/managing_python_ecosystems.html Mahmoud Hashemi https://sedimental.org/ Managing Python Ecosystems 2016年05月24日T10:00:00Z 2016年05月24日T10:00:00Z <p><p><img width="40%" align="right" src="https://sedimental.org/uploads/illo/koi_fish_med.png" title="Ecosystems as shimmery, shiny, scaley, and fishy as a koi." /></p> <p>You know that old quote:</p> <blockquote> <p><em>The wider the net you cast, the wider the variety you catch.</em></p> </blockquote> <p>Was it a wise old fisherman? Or a dogged Python programmer? Either way, words don't come much truer than those.</p> <p>Few, if any, programming languages have embodied the description "general-purpose" as wholly as Python. And with the wide net of that applicability comes a wide variety in use -- and environments.</p> <p>Library and framework developers rarely get to control how their code is used, and thus have to think about how their code fits into the whole ecosystem. From writing hybrid code for Python 2 <em>and</em> 3 to inserting shims for Pythons without threading support, there's no rest for the rigorous. Until now.</p> <h3 id="announcing_ecoutils"><a href="#announcing_ecoutils" class="toclink">Announcing <code>ecoutils</code></a></h3> <p>Ecosystems differ. Widely. Academic Python tends to be more Windows-heavy, corporate Python will probably forever be entrenched in Python 2, and one can never predict the arrival of that oddball user with the super old version of Python on <a href="https://en.wikipedia.org/wiki/Cygwin">Cygwin</a>. But these are generalities and we can do better.</p> <p>Enter <a href="http://boltons.readthedocs.io/en/latest/ecoutils.html"><code>ecoutils</code></a>. <code>ecoutils</code> is a pure-Python module that, using nothing but builtins, generates a semantic, Python-centric profile of the environment that's running it. This includes:</p> <ul> <li><strong>Host operating system</strong>: Windows, OS X, Ubuntu, Debian, CentOS, RHEL, etc.</li> <li><strong>Language version</strong>: 2.5, 2.6, 2.7, ..., 3.4, 3.5, ..., etc.</li> <li><strong>Executable runtime</strong>: CPython, PyPy, Jython, etc., (plus build date and compiler)</li> <li><strong>Features</strong>: 64-bit, IPv6, Unicode character support (UCS-2/UCS-4)</li> <li><strong>Built-in library support</strong>: OpenSSL, threading, SQLite, zlib, and more</li> <li><strong>User environment</strong>: umask, ulimit, working directory</li> <li><strong>Machine info</strong>: CPU count, hostname, filesystem encoding</li> </ul> <p><img width="40%" align="right" src="https://sedimental.org/uploads/illo/green_field_med.png" title="If only all fields were so green in software ecosystems." /></p> <p>Now, instead of crossing platform support bridges when users bring them to you, you can be proactive. Now, instead of guessing how developers are using the code, you can design for their needs and watch those needs change.</p> <p><code>ecoutils</code> only gets more valuable when code goes to production. If you manage your own machines, you know the risk of version drift and missed boxes only goes up with machine number and time. If you don't manage your machines, it's just a matter of time until someone is being trained on your boxes.</p> <p>So what does a profile look like?</p> <h3 id="generating_a_profile"><a href="#generating_a_profile" class="toclink">Generating a profile</a></h3> <p>Profiles are generated by <a href="http://boltons.readthedocs.io/en/latest/ecoutils.html#boltons.ecoutils.get_profile"><code>ecoutils.get_profile()</code></a>.</p> <p>When run as a module, <code>ecoutils</code> calls <code>get_profile()</code> and prints a JSON-formatted profile. On my fully-updated Ubuntu 14.04LTS machine, <code>python -m boltons.ecoutils</code> yields:</p> <pre class="codehilite"><code class="language-json">{ "_eco_version": "1.0.0", "cpu_count": 4, "cwd": "/home/mahmoud/projects/boltons", "fs_encoding": "UTF-8", "guid": "6b139e7bbf5ad4ed8d4063bf6235b4d2", "hostfqdn": "mahmoud-host", "hostname": "mahmoud-host", "linux_dist_name": "Ubuntu", "linux_dist_version": "14.04", "python": { "argv": "boltons/ecoutils.py", "bin": "/usr/bin/python", "build_date": "Jun 22 2015 17:58:13", "compiler": "GCC 4.8.2", "features": { "64bit": true, "expat": "expat_2.1.0", "ipv6": true, "openssl": "OpenSSL 1.0.1f 6 Jan 2014", "readline": true, "sqlite": "3.8.2", "threading": true, "tkinter": "8.6", "unicode_wide": true, "zlib": "1.2.8" }, "version": "2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2]", "version_info": [2, 7, 6, "final", 0] }, "time_utc": "2016年05月24日 07:59:40.473140", "time_utc_offset": -8.0, "ulimit_hard": 4096, "ulimit_soft": 1024, "umask": "002", "uname": { "machine": "x86_64", "node": "mahmoud-host", "processor": "x86_64", "release": "3.13.0-85-generic", "system": "Linux", "version": "#129-Ubuntu SMP Thu Mar 17 20:50:15 UTC 2016" }, "username": "mahmoud" } </code></pre> <p>Weighing in at just over 1KB, it's not too daunting! ecoutils is part of <a href="http://boltons.readthedocs.io/en/latest/">the boltons package</a>, so <code>pip install boltons</code> and see how yours compares.</p> <p>By virtue of being in boltons, the <code>ecoutils</code> module is also fully standalone, and can be used without the rest of the boltons package. ecoutils has been tested with Python 2.6, 2.7, 3.4, 3.5, and PyPy on Ubuntu, Debian, RHEL, OS X, FreeBSD, and Windows. <a href="https://github.com/mahmoud/boltons/issues">File an issue</a> if something seems to be broken. Compatibility is the goal.</p> <h3 id="transmission_and_collection"><a href="#transmission_and_collection" class="toclink">Transmission and collection</a></h3> <p>Now, ecoutils is really just part of the solution. Sure you can write out a quick profile it at the top of every log file, and you won't regret it. However, real ecosystem management means running a sort of Python analytics shop.</p> <p>For those familiar with browsing the Internet, your browser is a virtual machine that has likely been participating in a similar arrangement all day today. Like Google Analytics or <a href="http://piwik.org/">Piwik</a>, the setup involves collecting relevant data, and then sending it to a central server for storage and querying.</p> <p>Collection is handled by <code>ecoutils</code>. As far as transmission is concerned, in development environments, we have a dead-simple, side-effect-minimizing, single-file HTTP client that sends <code>ecoutils</code> profiles to a central analytics server on application startup.</p> <p>In production environments, our framework serves this information for queries on a special port, through <a href="https://web.archive.org/web/20201129194705/https://github.com/paypal/support">SuPPort</a>'s MetaService, through <a href="https://github.com/mahmoud/clastic#clastic">clastic</a>'s <a href="https://github.com/mahmoud/clastic/blob/master/clastic/meta.py">MetaApplication</a>, where this all started. Here's <a href="https://hashtags.toolforge.org/">an example of it</a> running in <a href="https://hashtags.toolforge.org/">Wikipedia Hashtags Search</a>, on a <a href="https://www.mediawiki.org/wiki/Wikimedia_Labs">managed Wikimedia environment</a>, over which I have minimal control, and need maximum information.<sup id="fnref:1"><a class="footnote-ref" href="https://sedimental.org/managing_python_ecosystems.html#fn:1">1</a></sup></p> <p>Push or pull, all the data is stored in a simple SQL (or JSONL) format, as demonstrated by <a href="https://github.com/mahmoud/espymetrics/">espymetrics</a>, the example project for my <a href="https://www.oreilly.com/library/view/enterprise-software-with/9781491943755/">Enterprise Software with Python</a> course. Nothing more enterprise than having literally dozens of environments by design, and even more than that by debt.</p> <p>One last note, data management is all about audience and context. If you're an administrator in a professional setting, the data above is great. But there are understandably some cases where you might want something less identifiable. <code>get_profile</code> has a <code>scrub</code> flag that handles that. See <a href="http://boltons.readthedocs.io/en/latest/ecoutils.html#boltons.ecoutils.get_profile">the docs</a> for details.</p> <h3 id="success_stories"><a href="#success_stories" class="toclink">Success stories</a></h3> <p>Originally designed for easier remote administration across multiple environments, a little bit of info has had far-reaching impacts. For a few examples from my work at PayPal, this approach enabled us to:</p> <ul> <li>Deprecate and remove production Python 2.6 support from our framework, simplifying our build matrix without customer impact.</li> <li>Actively engage new users attempting to use our framework with unsupported Pythons or OSes.</li> <li>Improve utilization through designing for observed CPU counts.</li> </ul> <p>In practice, <code>ecoutils</code> combines well with <a href="https://github.com/giampaolo/psutil">psutil</a> data to go even further in utilization.</p> <h3 id="building_for_variation"><a href="#building_for_variation" class="toclink">Building for variation</a></h3> <p>Some of you probably came here expecting to read yet another great post about <a href="https://virtualenv.pypa.io/en/stable/">virtualenv</a>, <a href="https://tox.readthedocs.io/en/latest/">tox</a>, and maybe even <a href="http://conda.pydata.org/docs/using/envs.html">conda envs</a>. I'm glad you've already heard of them, because they're a big part of the story. If you haven't yet explored these tools, check them out, because they are invaluable for cross-version Python testing and packaging.</p> <p>Also, if you're working on an open-source library, I can vouch for <a href="https://travis-ci.org/">Travis CI</a> (Linux) and <a href="https://www.appveyor.com/">Appveyor</a> (Windows) as very valuable providers for cross-platform testing. I use both of them on <a href="https://github.com/mahmoud/boltons">boltons</a>, and it makes it easier, not harder, for contributors to submit pull requests with confidence. Most outfits can't afford to have a team member leading support for each platform, like we do at PayPal.</p> <h3 id="conclusion"><a href="#conclusion" class="toclink">Conclusion</a></h3> <p>Python is more than just an expressive, succinct programming language. In a diverse world, Python is a tremendous force, made so by its wide deployment, cross-platform support, and external library integrations. Python gives you SQLite, JSON, SSL, Unicode, and much more, but with many necessary strings attached to Python version, build, or environment. <code>ecoutils</code> offers an experienced look at the real features that affect the value of Python components and teams.</p> <p>Don't leave ecosystems and their constituents to chance, whim, or fad. Collect the data that makes your ecosystem unique, and make measured decisions based on the realest demand: actual usage.</p> <div class="footnote"> <hr /> <ol> <li id="fn:1"> <p>When that server seems slow, remember to <a href="https://donate.wikimedia.org/wiki/Ways_to_Give">donate to Wikipedia</a>. And maybe volunteer, because money alone does not make servers run fast. <a class="footnote-backref" href="https://sedimental.org/managing_python_ecosystems.html#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p> </li> </ol> </div><p></p> <hr /> https://sedimental.org/esp.html Mahmoud Hashemi https://sedimental.org/ Enterprise Software with Python 2016年03月22日T04:04:00Z 2016年03月22日T04:04:00Z <p><p>When I first published <a href="https://sedimental.org/10_myths_of_enterprise_python.html">10 Myths of Enterprise Python</a> on <a href="https://medium.com/paypal-tech/10-myths-of-enterprise-python-8302b8f21f82">the PayPal Engineering blog</a>, there were a lot of reactions. Some I expected:</p> <p><img width="40%" align="right" src="https://sedimental.org/uploads/illo/ncc_1701d_med.png" title="The NCC-1701D, one of many illustrations from Enterprise Software with Python" /></p> <ol> <li><a href="https://twitter.com/michahell/status/544109301401661440">Surprise</a> at Python in the enterprise space.</li> <li><a href="https://twitter.com/jpheasly/status/543882786419908611">Relief</a> at more attestation of Python's use in the enterprise.</li> <li>And, as with all the best, <a href="https://news.ycombinator.com/item?id=9256082">a few flamewars</a>.</li> </ol> <p>But there was one I missed: new developers interested in professional software development.</p> <p>Really I should have seen it coming. For the better part of a decade, Python has provided me the best vocabulary for answering questions from motivated individuals looking for programming productivity. It's only logical that once they got the basics down, they'd want to take it to the next level.</p> <p>With this end in mind, I'm pleased to announce <strong><a href="https://www.oreilly.com/library/view/enterprise-software-with/9781491943755/">Enterprise Software with Python</a></strong> (ESP), a bridging class from beginner to pro<sup id="fnref:1"><a class="footnote-ref" href="https://sedimental.org/esp.html#fn:1">1</a></sup>, brought to you by <a href="http://www.oreilly.com/">O'Reilly Media</a> and yours truly.</p> <p>It's got something for everyone, but really it's designed with three groups in mind:</p> <ul> <li><strong>Recently-graduated and self-taught developers</strong>, looking for a holistic introduction to enterprise software.</li> <li><strong>Experienced developers at large organizations</strong>, looking for a relatable orientation to Python industry standards.</li> <li><strong>Technical team leaders with priorities</strong>, looking to quickly get groups on the same page of vocabulary, expectations, and practice.<sup id="fnref:4me"><a class="footnote-ref" href="https://sedimental.org/esp.html#fn:4me">2</a></sup></li> </ul> <p>As the title suggests, ESP is more than a Python class. While the perspective is Pythonic and there are several examples in Python, this is a full software development course. You will find a serious effort has been made to set expectations and develop the soft skills large organizations demand. You need architectural skills to form a technical opinion, engineering skills to implement and maintain it, and managerial skills to defend it all along the way. I can't resist a good table of contents, so this is how the course is factored to address all of these:</p> <ol> <li><strong>Introductions and definitions</strong> - A bit about me, a bunch about the course.</li> <li>Overview</li> <li>Prerequisites and viewing guide</li> <li><strong>Definitions and foundations</strong> - Know your domain, know your platform.</li> <li>What is Enterprise Software? - 9 Hallmarks of the Enterprise</li> <li>What is Python? 3 Perspectives for the Organization</li> <li>What is Python <em>Not</em>? 4 Common Misconceptions</li> <li>When to Use Python? Motivations and Applications</li> <li><strong>Architecture and design</strong> - Do your research, present your findings.</li> <li>Designing Architectures: Professional Planning</li> <li>Gathering Requirements: Understanding the 6 Aspects of Software</li> <li>Researching Environments: From Production to Development</li> <li>Choosing Dependencies: Evaluating Building Blocks</li> <li>Getting Assistance: Finding Help in the Software World</li> <li>Presenting Designs: Navigating the Organizational and Interpersonal</li> <li><strong>Engineering practices</strong> - Execution and delivery with minimal regret.</li> <li>Development Environments: Editors and Dev Tools</li> <li>Source Control, Issue Tracking, and Continuous Integration</li> <li>Workflow: Starting a Python Project</li> <li>Design Patterns: Idioms for Python Projects</li> <li>Debugging: Solving Problems in Python projects</li> <li>Security: Software Risk Management Fundamentals</li> <li>Code Review: Python Antipatterns and Collaboration</li> <li>Testing: Practical Python Quality Engineering</li> <li>Logging and Monitoring: Introspectable Python Projects</li> <li>Profiling and Performance: Strategies for High-Speed Python</li> <li>Documentation: Preserving the Legacy</li> <li>Packaging and Deployment: Going Live</li> <li><strong>Career development and further study</strong> - A good end offers a dozen new beginnings.</li> <li>Project Ideas: Building Experience</li> <li>Technology Evangelism: Building a Community</li> <li>Other Resources: Building Skills</li> <li>Closing</li> </ol> <p>Yes, it is a lot. I never pass on an opportunity to give a comprehensive treatment, but I'll save the whole motivation and process essay for later. For now, keep in mind that most segments are under 20 minutes, and the longest, <em>Profiling and Performance</em>, is only 45 minutes — shorter than most orgs' tech talks. It's all compact and practical, right down to <a href="https://github.com/mahmoud/espymetrics/">the example repo</a>.</p> <center> <a href="http://shop.oreilly.com/product/0636920047346.do?code=authd"> <img width="70%" src="https://sedimental.org/uploads/esp_01.jpg" /> </a><br /> *Actual footage from the intro. Not a prerelease render.* </center> <p>The first three parts are free, and will give you a good sense of the format, tone, and content. I kept it pretty light and approachable, complete with dozens of illustrations. Purchasers can stream the rest, and download DRM-free copies whenever you want (my personal favorite). If you have any questions or concerns, don't hesitate to reach out <a href="https://twitter.com/mhashemi">to me</a>, <a href="https://sedimental.org/about.html">personally</a>, or <a href="https://twitter.com/OReillyMedia">O'Reilly Media</a>.</p> <p><a href="https://www.oreilly.com/library/view/enterprise-software-with/9781491943755/">I hope you'll take a look</a>! It's already making waves at PayPal, and chances are there's someone you know who could use it, too.</p> <!-- Various primary-source clippings I wrote related to the class. Don't mind these. I've found even many experienced developers have a lot of skill gaps that leech at their development confidence and effectiveness in a corporate setting. This course seeks to address that. I'm exactly designing it to be an enterprise followup to courses like Jess's. The largest contingent of Python programmers I've worked with are those who know the basics of Python as a programming language, but don't know how to apply it day-to-day. Enterprise is indeed about scaling, but much more about scaling development than scaling performance (though I intend to cover both). This will appeal to all developers looking to turn professional with Python. Most instructive programming videos don't cover the expectations and best practices used within companies. Examples would be when and how to add tests, automation, source control, etc. Enterprise development is all about meetings, priorities, budgets, and compromises, and reaching scale is a matter of earning trust and proving approaches. As for the topic feedback, I wholeheartedly agree. Enterprise development is all about risk management, so my #1 priority is avoiding failure. If new developers fail, they may blame Python, or worse, their managers might blame them (and Python)! The TL;DR on my opinion of when _not_ to use Python is for web frontend development, and possibly for mobile (Kivy is really coming along, though). Python's web frontend offerings would be very hard for a new developer to sell against JavaScript unfortunately. My overarching message of when to use Python is one of positivity: we have used Python for positively everything, high performance, high reliability, high security, high accuracy (i.e., data science), you name it. There'll be a bit about Python 2 and 3, too. It's a key architectural decision, after all. :) ## Description What's makes the difference between a casual coder and a professional software engineer? How do beginner Pythonists become intermediate developers? One part masterclass, one part crash course, Enterprise Software with Python answers this question by touching on every element of the enterprise software development. PayPal's Lead Developer of Python Infrastructure Mahmoud Hashemi busts myths and offers guidance, using Python to demonstrate standard patterns and practices that apply across the software industry. Python is renowned for making it easy to get started with programming, but a lot of Python programmers are set adrift after learning the language basics. Enterprise Software with Python gives you an insider's introduction to: Defining software and software requirements for professional practice Fortifying your corporate environments with the power of open source Implementing, debugging, and reviewing project implementations Measuring, optimizing, and scaling applications at the enterprise level Preventing availability and security disasters with simple, practical changes Testing and documenting codebases for long-term maintenance Packaging and deploying optimally within your organization Winning autonomy by earning the confidence of your management and teammates Whether you are currently at a large organization, hope to work in the enterprise, or are just looking to further develop your skills, Enterprise Software with Python will help you take your craft to the next level. ## Proposal ### Who is this for? Budding Python developers looking to turn pro. Beginners who know the language and need direction in applying it in an organization. ### How does this help solve a problem or group of problems? Python itself is very easy to learn in a hobby or academic setting, but a lot of Python developers are set adrift after learning the language. If they’re lucky they have some professional development experience in other languages, but even then many of the paradigms don’t translate well. Scaling, both development practices and application architectures, is specific to every language. Illuminating that dark art starts with teaching best practices that move a beginner Pythonist to an intermediate Python engineer. --> <div class="footnote"> <hr /> <ol> <li id="fn:1"> <p>This link has a 50% off coupon code, applied at checkout. Check if your organization has Safari, first. If not, use <a href="http://shop.oreilly.com/product/0636920047346.do">this coupon-less link</a> and expense it! :) Safari users, try <a href="https://www.safaribooksonline.com/library/view/enterprise-software-with/9781491943755/">the SBO site</a>. If you're not sure if you have Safari access, contact your technology education and training department. <a class="footnote-backref" href="https://sedimental.org/esp.html#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p> </li> <li id="fn:4me"> <p>This target audience is me, but I know there are others out there. <em>Send me your tiring, huddled masses yearning to learn Python.</em> Seriously though, I can't fully quantify how much time it saves me to send a new Python initiate to a video, then have them come back with the foundations necessary to have a productive conversation. <a class="footnote-backref" href="https://sedimental.org/esp.html#fnref:4me" title="Jump back to footnote 2 in the text">↩</a></p> </li> </ol> </div><p></p> <hr /> https://sedimental.org/designing_a_version.html Mahmoud Hashemi https://sedimental.org/ Designing a version 2016年02月23日T10:27:00Z 2016年02月23日T10:27:00Z <p><p>In modern software development, a project isn't a project without a proper versioning scheme.</p> <p><img alt="The legatree" align="right" width="150px" src="https://sedimental.org/uploads/illo/legatree_med.png" /> Weak version management neglects clients like lack of source control neglects collaborators. Dependency management and migration rely on versions. Beyond the technical, a project's version bears a huge impact on the perception of the project. It informs adoption and entices users to upgrade. The version is attached to the name of the project — appearing closer and more often than the names of the maintainers. Versions are how a project builds a legacy.</p> <p>So why do projects leave versioning to afterthought? What do clients expect and what do projects need?</p> <p><em>Followup: This post culminated in the <a href="https://sedimental.org/calver.html">announcing CalVer</a> and launching <a href="http://calver.org">calver.org</a>. This page provides a thorough background to the CalVer best practices.</em></p> <div class="toc"><span class="toctitle">Contents</span><ul><li><a href="#semantic_versioning">Semantic Versioning</a><ul><li><a href="#semver_and_code_breakage">SemVer and code breakage</a><li><a href="#semver_and_release_blockage">SemVer and release blockage</a><li><a href="#semver_and_certifiability">SemVer and certifiability</a></ul><li><a href="#collective_expectations">Collective Expectations</a><ul><li><a href="#1_versions_go_up">#1 Versions go up</a><li><a href="#2_versions_correlate_to_software_quality">#2 Versions correlate to software quality</a><li><a href="#3_versions_are_numeric_except_when_they_re_not">#3 Versions are numeric, except when they're not</a></ul><li><a href="#case_study_chrome_vs_firefox">Case Study: Chrome vs. Firefox</a><li><a href="#calendar_versioning">Calendar Versioning</a><ul><li><a href="#calver_leverages_natural_understanding">CalVer leverages natural understanding</a><li><a href="#calver_has_better_semantics">CalVer has better semantics</a><li><a href="#calver_protects_projects">CalVer protects projects</a></ul><li><a href="#summary">Summary</a></ul></div><h2 id="semantic_versioning"><a href="#semantic_versioning" class="toclink">Semantic Versioning</a></h2> <p>Currently, the go-to versioning system for open-source software is referred to as <em>Semantic Versioning</em>, or <a href="http://semver.org/">SemVer</a>.</p> <p>Take a quick look at the <a href="https://pypi.org/pypi?%3Aaction=rss">40 most recent updates on the Python Package Index</a> (<a href="https://pypi.org/pypi">PyPI</a>). My glance showed all but <em>six</em> packages had the comfortable three-part versioning scheme, <code>major.minor.micro</code>. Among those packages the highest minor version was 108. The highest micro version was all the way up to 595.</p> <!-- # PyPI recent 40 * Highest minor: 108 * Highest micro: 595 * Five 4-part versions * One calendar version --> <p>So, if SemVer is so popular, it must be easy, right? Follow <a href="http://semver.org/#semantic-versioning-specification-semver">a couple straightforward steps</a>. Pick a number, add one to it. With arithmetic that simple, what could go wrong?</p> <h4 id="semver_and_code_breakage"><a href="#semver_and_code_breakage" class="toclink">SemVer and code breakage</a></h4> <p>Everyone knows it's more exciting to announce 2.0 than 1.7.0, even if there's more user demand for the latter than the former. This is especially true with SemVer, because a SemVer major version change implies breaking the API.</p> <p>As we will see, there are consequences to this. People judge value based on version number. SemVer supports this opaque apples-and-oranges comparison, punishing libraries that get it right on the first try, and encouraging libraries to break APIs to appear more mature and get that coveted 2.0.</p> <h4 id="semver_and_release_blockage"><a href="#semver_and_release_blockage" class="toclink">SemVer and release blockage</a></h4> <p>More damaging than the fatuous 2.0 is the epidemic of <strong><a href="https://en.wikipedia.org/wiki/Zeno's_paradoxes#Dichotomy_paradox">Zeno's 1.0</a></strong>.</p> <p><img width="99%" src="https://sedimental.org/uploads/illo/zeno_one_dot_oh.png" /><br /> <em>Witness the version, racing to numeric motionlessness. (Image based on <a href="https://commons.wikimedia.org/wiki/File:Zeno_Dichotomy_Paradox.png">Martin Grandjean's</a>.)</em></p> <p>To quote the <a href="http://semver.org/#how-do-i-know-when-to-release-100">second answer in SemVer's own FAQ</a>:</p> <blockquote> <p>If your software is being used in production, it should probably already be 1.0.0. If you have a stable API on which users have come to depend, you should be 1.0.0. If you’re worrying a lot about backwards compatibility, you should probably already be 1.0.0.</p> </blockquote> <p>On this count, SemVer might be found not guilty.<sup id="fnref:2"><a class="footnote-ref" href="https://sedimental.org/designing_a_version.html#fn:2">2</a></sup> If so, it's the SemVer users that didn't get the memo — myself included. Maybe if it had been in the spec itself.</p> <p>The problem is the heavy emphasis on "public API" breakage. Conservative library authors end up indefinitely preferring the <em>semantic power</em> of 0.x: <a href="http://semver.org/#spec-item-4">The ability to break APIs</a>. Whether the cause is conservatism, humility, or misunderstanding, the effect is misrepresenting the release state of many major libraries.</p> <p>A more practical scheme might help represent accurate versions for mature, production libraries like <a href="http://cython.org/">Cython</a> (0.23) and <a href="http://www.scipy.org/">SciPy</a> (0.17), both of which <a href="http://shop.oreilly.com/product/0636920033431.do">have</a> <a href="http://shop.oreilly.com/product/9781783984749.do">books</a> and nearly a <em>decade</em> of release history still on PyPI.</p> <h4 id="semver_and_certifiability"><a href="#semver_and_certifiability" class="toclink">SemVer and certifiability</a></h4> <p>Appealing to engineering aesthetics, SemVer is presented as a "specification". But, unlike the vast majority of successful <a href="https://en.wikipedia.org/wiki/Request_for_Comments">RFCs</a>, there is no validation or certification that can determine whether a project has a correct implementation. Yes, if a project API changes, but the major version is not incremented, the SemVer specification has been violated. But there's no way to test that generally, and no one does it specifically.</p> <p>SemVer is a detailed suggestion. Software breaks as quickly as SemVer's promise. The <a href="http://semver.org/#what-do-i-do-if-i-accidentally-release-a-backwards-incompatible-change-as-a-minor-version">remediations</a> do <em>not</em> happen. Better to embrace the realities of versioning, rather than argue over the MUSTs and MUST NOTs of an unenforceable specification.</p> <h2 id="collective_expectations"><a href="#collective_expectations" class="toclink">Collective Expectations</a></h2> <p>Let's take a brief moment to reconsider the humble version.</p> <p>We encounter far more software than we write. Few, if any, expect compliance with all the suggestions in SemVer. So what do we expect from our versions?</p> <p>There are three main expectations driving modern software versioning:</p> <h3 id="1_versions_go_up"><a href="#1_versions_go_up" class="toclink">#1 Versions go up</a></h3> <p>The later the release, the greater the version. Sofware should not change without a version change, and the version must go up, and never come down.</p> <h3 id="2_versions_correlate_to_software_quality"><a href="#2_versions_correlate_to_software_quality" class="toclink">#2 Versions correlate to software quality</a></h3> <p>A project name communicates an ideal. The project version communicates current progress toward that ideal. Vision pursued by version: The greater the version, the greater the software.</p> <h3 id="3_versions_are_numeric_except_when_they_re_not"><a href="#3_versions_are_numeric_except_when_they_re_not" class="toclink">#3 Versions are numeric, except when they're not</a></h3> <p>Here's where things get hairy. Numeric versions are the default, but non-numeric versions and version components abound.</p> <p>Version vernacular is now thoroughly mainstream: "alpha", "beta", "dev", "nightly", "stable", and so on. There are also named project versions, like those used in Linux distributions, such as Debian's "jessie", Ubuntu's "trusty", and Windows' "longhorn". Non-numeric versions are often hijacked for branding purposes. Numerical versions' technical utility is much more important to preserve.</p> <h2 id="case_study_chrome_vs_firefox"><a href="#case_study_chrome_vs_firefox" class="toclink">Case Study: Chrome vs. Firefox</a></h2> <p>We take our version expectations for granted, but a convention this fundamental has profound effects at scale. As mentioned above, higher versions are expected to be better, especially within a project. But there is at least one case where this impact very publically spilled out across projects: <em>The Chrome-Firefox Version Wars</em>.</p> <p>When <a href="https://en.wikipedia.org/wiki/Google_Chrome">Google Chrome</a> entered the browser race, it brought with it a fast feature release schedule and a versioning system to match. This versioning system had Chrome see a dozen major releases while <a href="https://en.wikipedia.org/wiki/Firefox">Firefox</a> was still 3.x. Firefox looked like it was being left in the dust, despite the fact that Chrome was less mature and, as anyone who used it at the time can attest, Chrome 4 wasn't half the browser Firefox 4 ended up being.</p> <p>After a couple years of <a href="http://lowendmac.com/musings/11mm/version-numbers.html">this onslaught</a>, Firefox switched its versioning system to match. Now, despite browsing for hours a day, few users or even developers <a href="http://www.extremetech.com/internet/92792-mozilla-takes-firefox-version-number-removal-a-step-further">could tell you</a> off the top of their heads what version of Firefox/Chrome they use.<sup id="fnref:3"><a class="footnote-ref" href="https://sedimental.org/designing_a_version.html#fn:3">3</a></sup></p> <p>SemVer ignored this huge precedent, <a href="http://semver.org/#if-even-the-tiniest-backwards-incompatible-changes-to-the-public-api-require-a-major-version-bump-wont-i-end-up-at-version-4200-very-rapidly">harshly judging fast-moving projects</a>. Let's call that our last straw and look at an alternative.</p> <h2 id="calendar_versioning"><a href="#calendar_versioning" class="toclink">Calendar Versioning</a></h2> <p><img align="right" width="110px" src="https://sedimental.org/uploads/illo/caltree_med.png" /> If you're an earnest engineer with honest intents of creating, releasing, and maintaining a project, then calendar versioning may be for you. <a href="http://calver.org">CalVer</a> fulfills all of <a href="https://sedimental.org/designing_a_version.html#collective_expectations">the versioning expectations</a>, so what advantages does it bring?</p> <h3 id="calver_leverages_natural_understanding"><a href="#calver_leverages_natural_understanding" class="toclink">CalVer leverages natural understanding</a></h3> <p>People are calendar-oriented. Practically, it's just easier to remember that a library was causing a live issue back in 2013 than it is to remember that up until version 1.6.18 that library had a lot of bugs.</p> <p>Furthermore, in long-term development, releases pile up and increasingly large major versions blur together. Browser versions have been rendered meaningless. But the calendar is one construct where numbers increase and cycle regularly. Leveraging that natural understanding anchors otherwise arbitrary versions.</p> <h3 id="calver_has_better_semantics"><a href="#calver_has_better_semantics" class="toclink">CalVer has better semantics</a></h3> <p>Ironically yes.</p> <p>"Semantic" Versioning is all relative. One developer's 1.0.0 is another's 0.0.1alpha. As authors, we try to ignore this and write others off as wrong. But as clients, we make snap judgments, and SemVer lets us forget and pretend. Calendar versioning is absolute and neutral, with practical advantages to boot.</p> <p>As application developers adding functionality, evaluating a new library means ascertaining maintenance status, usually by looking at the most recent release date. CalVer puts us in the ballpark right away. As maintainers depending on many libraries, calendar versioning allows us to look at the dependency list and quickly ascertain which libraries are good candidates for updating. CalVer even lets us take that a step further, with date-based deprecation.</p> <p>Many might not realize it, but the oh-so ubiquitous <a href="http://www.ubuntu.com/">Ubuntu</a> is in fact calendar versioned. For example, version 15.04 came out in April, 2015. It gets better when you remember Long-Term Support. Ubuntu's LTS support lasts for five years. So, <code>14 + 5</code>: Ubuntu 14.04's end of life will be in 2019. You don't have to look anything up. It's all right there in the CalVer semantics.<sup id="fnref:4"><a class="footnote-ref" href="https://sedimental.org/designing_a_version.html#fn:4">4</a></sup></p> <h3 id="calver_protects_projects"><a href="#calver_protects_projects" class="toclink">CalVer protects projects</a></h3> <p>If you care about the future of the project, then guard it against one of the worst fates: the fatuous 2.0. Give your project a future. Guard against the learned expectation of 2.0 or death.</p> <p>A 1.x <em>always</em> carries one advantage over a 2.0: the code is deployed and working. Avoid contempt for past decisions and current users. In engineering, utility is half of correctness.</p> <p>SemVer is set up so that every major release implies a minimum threshold of change. If the project is founded on and aiming for correctness, fewer and fewer changes are required. <a href="https://en.wikipedia.org/wiki/Donald_Knuth">Donald Knuth</a> embraced this in the extreme by having <a href="https://en.wikipedia.org/wiki/TeX#History">TeX's version approach π asymptotically</a>. Suffice to say with CalVer, you are safe to add as much or as little functionality as needed.</p> <p><a id="successors"></a> Too often projects become a victim of versioning. New projects end up masquerading as new versions. <a href="https://d3js.org/">D3</a> could have been <a href="http://mbostock.github.io/protovis/">Protovis</a> 2.0, but instead, a successor was created. Both projects coexisted and we are all the better for it. Same with <a href="https://attrs.readthedocs.org/en/stable/why.html#characteristic">characteristic and attrs</a>. Successors and CalVer protect projects and do justice by clients and code.</p> <h2 id="summary"><a href="#summary" class="toclink">Summary</a></h2> <p>Consider adding a calendar component to your next library's versioning schemes. As for my opinion, I've joined <a href="https://twitter.com/hynek">other</a> <a href="https://twitter.com/glyph">maintainers</a> in doing so for <a href="https://github.com/mahmoud/boltons/blob/master/CHANGELOG.md">boltons</a> and <a href="https://github.com/mahmoud/ashes/">ashes</a>. I've found it makes a lot of sense for libraries, and a little less sense for protocols and services.<sup id="fnref:5"><a class="footnote-ref" href="https://sedimental.org/designing_a_version.html#fn:5">5</a></sup></p> <p>Either way, think about project versions. The version is part of your project's face and your clients' integration. After spending days, weeks, and months on a project, it's worthwhile to spend a few minutes or hours designing a versioning system tailored to the needs of project users and maintainers.</p> <p><em>If you're into enterprise software considerations like these, <a href="https://sedimental.org/atom.xml">subscribe</a> or <a href="https://twitter.com/mhashemi">follow me on Twitter</a> for some details about my upcoming O'Reilly project.</em></p> <!-- If you don't have time to think about the version of the library you are writing or including, then maybe you shouldn't be writing including it. --> <!-- ============== !--> <div class="footnote"> <hr /> <ol> <li id="fn:1"> <p>Astute readers will note that it's Semantic Versioning 2.0.0. <em>"Oh, cute, Tom used his own scheme for the document."</em> But did you wonder what public API changed to trigger that major version bump? SemVer's public API has been semver.org <strong><a href="https://web.archive.org/web/20111207065319/http://semver.org/">since before 1.0</a></strong>. How about those semantics? <a class="footnote-backref" href="https://sedimental.org/designing_a_version.html#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p> </li> <li id="fn:2"> <p>I've actually been saying something similar, but more practical, for a long time:</p> <blockquote> <p>If both you (or your team) <strong>and</strong> a stranger (someone not directly advised) are both using a library in a production environment, the time for a major version has come.</p> </blockquote> <p>If it's just you and yours, that's understandable. Many great scientists took <a href="https://en.wikipedia.org/wiki/Self-experimentation_in_medicine">great risks with themselves</a> for the sake of progress. If it's just a stranger going against your explicit advice, then there's no accounting for such wildcards. But, if both of groups are using something in production, then it's time to face the facts. Tie up the loosest of ends and give it a major version. <a class="footnote-backref" href="https://sedimental.org/designing_a_version.html#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p> </li> <li id="fn:3"> <p>Here are some more resources for those interested in the Firefox release switch up:</p> <ul> <li><a href="https://support.mozilla.org/en-US/questions/896705">Support forum discussion on FF major releases</a></li> <li><a href="http://www.pcworld.com/article/224842/why_firefox_rapid_release_schedule_is_a_bad_idea.html">Firefox Rapid Release Criticized</a></li> <li><a href="http://www.theverge.com/2012/7/9/3147445/mozilla-jono-dicarlo-rapid-releases-firefox">Former Mozilla dev Jono DiCarlo on Firefox Rapid Release</a></li> <li><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=678775">The Bugzilla bug for hiding the version number</a></li> </ul> <p>At the very least this should illustrate that versions matter. They're part of your project's identity. Design them to help your user. <a class="footnote-backref" href="https://sedimental.org/designing_a_version.html#fnref:3" title="Jump back to footnote 3 in the text">↩</a></p> </li> <li id="fn:4"> <p>To illustrate the prevalence, there are actually many other examples of calendar versioning we take for granted. Off the top of my head I could think of Twisted, Windows 95/98/2000, and probably most ubiquitous: every mainstream car in circulation. <a href="https://sedimental.org/about.html" target="_blank">Email me</a> with more examples and I'll compile them somewhere. <a class="footnote-backref" href="https://sedimental.org/designing_a_version.html#fnref:4" title="Jump back to footnote 4 in the text">↩</a></p> </li> <li id="fn:5"> <p>To illustrate, if I could have it my way, we'd have OpenSSL 16.x.x. That way I can easily complain if I find someone using 10.x.x in production. That said, TLS/1.3 seems better than TLS/16.0.</p> <p>My current thought is that protocols live outside of time, because I believe it's possible to complete a protocol, but an implementation is never done. <a class="footnote-backref" href="https://sedimental.org/designing_a_version.html#fnref:5" title="Jump back to footnote 5 in the text">↩</a></p> </li> </ol> </div><p></p> <hr /> https://sedimental.org/getting_a_python_job.html Mahmoud Hashemi https://sedimental.org/ Getting a Python job 2016年01月25日T13:02:00Z 2016年01月25日T13:02:00Z <p><p>Every day, Python is the primary programming language for tens if not hundreds of thousands of professional engineers, analysts, and researchers, including yours truly. Given Python's "language of choice" status, what can you do to join those lucky ranks?</p> <p>It's a good question, and one I get often. Recently I was asked more publically than usual. <a href="https://twitter.com/mkennedy">Michael Kennedy</a>, host of the <a href="https://talkpython.fm/">Talk Python to Me podcast</a>, asked me five questions on behalf of people early in their Python/programming careers:</p> <ol> <li>What kind of Python devs do you work with and interview?</li> <li>What is the most important piece of experience that you look for in a candidate?</li> <li>If someone is applying for their first job with you, what can they present to show they have the right skillset/education?</li> <li>Open-source contributions</li> <li>Side projects</li> <li>Mobile phone apps</li> <li>Websites</li> <li>Code competitions</li> <li>If you are presented with two candidates, one with a solid CS degree, and the other with 1-2 years of experience, which would you value more?</li> <li>Why did you hire the last person you hired?</li> </ol> <p>Here are my answers, the enterprise hiring perspective, as transcribed from my parts of <a href="https://talkpython.fm/episodes/show/41/getting-your-first-dev-job-as-a-python-developer-part-2">the panel discussion</a>.</p> <div class="toc"><span class="toctitle">Contents</span><ul><li><a href="#intro">Intro</a><li><a href="#my_type_of_hiring">My type of hiring</a><li><a href="#the_most_important_experience">The most important experience</a><li><a href="#side_experience">Side experience</a><li><a href="#formal_education">Formal education</a><li><a href="#last_hire">Last hire</a><li><a href="#takeaways">Takeaways</a></ul></div><h2 id="intro"><a href="#intro" class="toclink">Intro</a></h2> <p>Hi my name is Mahmoud Hashemi. I'm lead developer of Python Infrastructure <a href="https://medium.com/paypal-tech/search?q=python">at PayPal</a>, and I'm also the presenter of Enterprise Software with Python, coming soon <a href="http://www.oreilly.com/pub/au/6849">from O'Reilly</a>. Dedicated listeners may recognize my voice from <a href="https://talkpython.fm/episodes/show/4/enterprise-python-and-large-scale-projects">episode #4 of Talk Python to Me</a>, and it's great to be back on the show.</p> <h2 id="my_type_of_hiring"><a href="#my_type_of_hiring" class="toclink">My type of hiring</a></h2> <p><strong><em>What kind of Python devs do you work with and interview?</em></strong></p> <p>I work with Python infrastructure engineers. Software infrastructure is the foundation of all sorts of software development, from web to backend to batch to automation and tools. To do it well you have to have personal experience developing in two or more of those categories. For the past year or so, my team has been adjunct to the PayPal application security team, and that's who I'm hiring for right now. So a little plug, if you have at least five years of industry experience and want to get into some ultrahigh performance Python security work, shoot me an email at <a href="https://sedimental.orgmailto:mahmoud@paypal.com">mahmoud@paypal.com</a>.</p> <p>All that said, one of the services the Python infrastructure team also performs is to do phone and in-person interviews for PayPal teams looking to expand their Python talent through hiring.</p> <h2 id="the_most_important_experience"><a href="#the_most_important_experience" class="toclink">The most important experience</a></h2> <p><strong><em>What is the most important piece of experience that you look for in a candidate?</em></strong></p> <p>The most important fundamental skills I look for are closely related to experience: environmental fluidity and personal learning abilty.</p> <p>Wait, not Python? That's right. The fact is, for more junior jobs, the Python is going to be the easiest part of the job, and new hires have plenty of time to learn, plus the team is there to help. New developers will come up to speed quickly provided they're comfortable learning in the environment.</p> <p>As for environmental fluidity, specifically, PayPal uses a lot of Linux, so I look for candidates that can demonstrate familiarity at the console, interacting with the operating system. So while I don't usually give candidates complex algorithmic questions on the spot, I <em>do</em> log them into one of PayPal's test servers and have them do some basic debugging. For the experienced, you can almost feel them relaxing into a familiar environment. For the inexperienced, the terminal can be an aptly named dark and scary place. Either way, the command line is a foundational technology critical to enterprise work, and is not going away anytime soon. Lack of command line comfort is a big yellow flag, especially when Linux is so widespread and easy to experiment with on your own.</p> <p>The other characteristic I look for is learning ability. The skills to read and research naturally, absorbing and arranging information automatically. I've been burned once or twice by talented people who were too lazy to read the docs, or too intimidated to read the source code. You don't have to do it in big gulps, but you do need to do it consistently. So I usually look at what candidates have done to learn lately, and the sources they've been consulting. Show me some code you've written and what you learned during the process. Tell me about a project that sounds much simpler than it was. What sites taught you the web? Seen any noteworthy source code lately?</p> <p>On the other hand, I watch out for HackerNewsy types. My projects have topped HN several times in the last couple years, and some lurking is fine, but I want someone ready to outgrow that consumption and commodification of creative work interleaved with press releases. Someone ready to dedicate time to actually create the sorts of things that others will upvote.</p> <h2 id="side_experience"><a href="#side_experience" class="toclink">Side experience</a></h2> <p><strong><em>If someone is applying for their first job with you, what can they present to show they have the right skillset/education?</em></strong></p> <p>When it comes to first jobs and concrete projects, I'll look at anything and everything. With new developers it's just so rare to get someone with anything interesting in their GitHub or Bitbucket account, but that is definitely my first stop. Software is increasingly portfolio driven, and I do get a bit discouraged when I see a developer who doesn't have a GitHub, or a site, or even a blog. You can cram for an interview, and you can exaggerate on a resume, but you can't really fake a meaningful commit timeline going back a year or two. Even if it's just school projects, at least I could see you've tried and you have some basic git skills. Contributions to other projects tell a good story, too. You were probably using the project for something, being productive. You took the time to understand how it worked, you were able to communicate, and lived up to someone else's standards. That's stressful for a lot of people, but that's got a lot in common with enterprise development, too.</p> <p>Side projects and apps that run in environments similar to our own are very interesting. Mobile phone apps not as much. Code competitions and scores from reddit/stackoverflow/HN are OK, but honestly those skills don't apply that well internally. This may make me unpopular, but people who have high scores on all those sites are playing games that can lead them to be impatient and unhelpful with internal people and processes. That said, if you're someone who helps out with mentorship or even get on IRC and answer questions, that could be great!</p> <h2 id="formal_education"><a href="#formal_education" class="toclink">Formal education</a></h2> <p><strong><em>If you are presented with two candidates, one with a solid CS degree, and the other with 1-2 years of experience, which would you value more?</em></strong></p> <p>Of the three hires I'd truly consider my "star" hires, none of them had a CS degree. Electrical engineering, math, and comparative literature. The things they had in common were voracious reading and extensive hours spent in some Python or POSIX environment.</p> <p>Computer science degrees aren't really necessary for the majority of enterprise work. Like I said before, environmental fluidity and willingness to read docs are far more important. A couple of CS classes get you some useful vocabulary and teach you time complexity.</p> <p>As for the concept of a degree in general, if you want to work at a big company, it's a lot easier to get in with a bachelors. You don't need much more than that. The right two years of experience can go a long way in terms of skills development, but in terms of management marketability, no degree raises eyebrows in many cases. So, in short, for enterprise software, my observation is that a computer science degree is about as good as a non-CS degree plus 2 years experience which is about as good as no degree plus 4-5 years experience, at least.</p> <p>Most professors and academic programs don't give you all that much pragmatic knowledge, even if it's pretty old stuff like emacs and terminal usage. Basically everything is about how you approach your assignments and free time. If you push beyond the requirements, you will learn much more.</p> <p>So, if you're in school, take an operating systems class. Take a networking class. Maybe a crypto class. You'll learn almost as much as running a shared server in your dorm. No, those are different types of knowledge, so consider doing both. If you're not in school, Coursera and other options are far better than nothing, and I'd like to hear about those experiences in interviews.</p> <h2 id="last_hire"><a href="#last_hire" class="toclink">Last hire</a></h2> <p><strong><em>Why did you hire the last person you hired?</em></strong></p> <p>I gave my most recent thumbs up to a developer who knew Django and was willing to continue working with it, but most importantly he could start on-site before the req closed. In large companies, empty seats have expiration dates, and everyone is willing to gamble. Because somebody is better than nobody, and even if they're worse than nobody, then you still get a backfill when they leave or are pushed out. But this developer seems to be working out, but I only helped hire him for another team.</p> <p>The last engineer I hired onto my team was recruited over the course of two years. I met him at PyCon 2012 and we collaborated on a few open-source projects. Real recruiting can be a long process, not the least of which is due to weird budgeting and bureaucracy. So please don't get frustrated if you're still waiting on an email reply from me! :)</p> <h2 id="takeaways"><a href="#takeaways" class="toclink">Takeaways</a></h2> <p>Reduced to a few bullet points, here are the key characteristics:</p> <ul> <li>Environmental fluidity</li> <li>Reading ability and conceptual familiarity</li> <li>Command line comfort</li> <li>Not HackerNewsy</li> <li>Dedication. No technical butterflies here, please.</li> <li>Pragmatism, lack of frustration</li> <li>Management marketability</li> <li>Ability/willingness to work/train/visit onsite</li> </ul> <p>The other interviewees had some interesting things to say, as well. I recommend checking out <a href="https://talkpython.fm/episodes/show/41/getting-your-first-dev-job-as-a-python-developer-part-2">the full podcast</a>, now featuring <a href="https://talkpython.fm/episodes/transcript/41/getting-your-first-dev-job-as-a-python-developer-part-2">transcripts for everyone, not just me</a>. Thanks again to Michael for having me back!<p></p> <hr /> https://sedimental.org/rwc_2016_lightning_talk.html Mahmoud Hashemi https://sedimental.org/ RWC 2016 Lightning Talk 2016年01月07日T12:20:00Z 2016年01月07日T12:20:00Z <p><blockquote> <p><em>Today I had the pleasure of talking on stage for ~2 minutes at the <a href="http://www.realworldcrypto.com/">Real World Crypto 2016 conference</a> in Stanford, CA. This is a pseudotranscript of that lightning talk.</em></p> </blockquote> <p>I'm Mahmoud Hashemi and I work as a Lead Developer at PayPal. I mostly focus on <a href="https://medium.com/paypal-tech/search?q=python">Python frameworks and software infrastructure</a>, but for the last couple years I've been working on Application Security. In fact, my first assignment, back in late 2012, was reverse engineering and reimplementing Max Levchin's Certicom elliptic curve integration, in Python.</p> <p>These days I work on PayPal's comprehensive key management (and HSM integration) system. Suffice to say, we work a lot with encryption and secure sockets. <em>Also</em> suffice to say, we're a bit nervous about <a href="https://www.openssl.org/">OpenSSL</a>. With all the news lately we've started design discussions with regard to how we can hedge our OpenSSL bets.</p> <p>In Python, this translates to a <a href="https://www.python.org/dev/peps/pep-0249/">DBAPI 2.0</a>-like abstraction layer to enable swapping out security implementations. Like many <a href="https://en.wikipedia.org/wiki/Object-relational_mapping">ORMs</a> (e.g., <a href="http://www.sqlalchemy.org/">SQLAlchemy</a>), but for security. Honestly, there are usually better/more reasons to switch SSL implementations than relational databases. We want an API that allows us to leverage other great SSL implementations, including OpenSSL-derivatives like <a href="http://www.libressl.org/">LibreSSL</a>, as well as other implementations like <a href="https://www.wolfssl.com/wolfSSL/Home.html">WolfSSL</a>. PayPal already has a diverse SSL ecosystem, with multiple versions of OpenSSL and tons of JVM-based implementations, making it a great testbed ecosystem.</p> <p>To achieve this we're hoping to have some productive discussions with the experienced engineers and cryptographers that attend RWC. It's still very early days, and there are a lot of corner cases, so we'll need all the advice we can get. Help us invest in the algorithms, not the implementations. Design for replaceability, to avoid having 17-year-old libraries serving today's security-hungry Internet. You can contact me at <a href="https://github.com/mhamoud">github.com/mahmoud</a>, <a href="https://twitter.com/mhashemi">twitter.com/mhashemi</a>, or <a href="https://sedimental.orgmailto:mahmoud@paypal.com">mahmoud@paypal.com</a>.</p> <p><img title="A partially obfuscated view from the stage of RWC2016" width="70%" src="https://sedimental.org/uploads/rwc2016_stage.jpg" /></p> <p><em>A partially obfuscated view from the stage of RWC2016</em><p></p> <hr /> https://sedimental.org/enterprise_overhaul_resolving_dns.html Mahmoud Hashemi https://sedimental.org/ Enterprise Overhaul: Resolving DNS 2015年12月21日T03:13:00Z 2015年12月21日T03:13:00Z <p><!-- Enterprise Overhaul: Resolving DNS --> <!-- Overhaul your DNS Resolution, Enterprise-style --> <!-- Overhauling DNS Resolutions for Enterprise Environments --> <!-- aka Using DNS in Enterprise Environments --> <!-- aka "In With The Old: Enterprise DNS Considerations" --> <p><em>Originally published on <a href="https://medium.com/paypal-tech/enterprise-overhaul-resolving-dns-521dac3ab601">the PayPal Engineering blog</a>. Republished here with minor modifications and updates.</em></p> <p>Everyone assumes all software engineers are great with numbers. If only they knew the truth. How many people's phone numbers can you recite? No peeking and emergency numbers don't count! Don't worry if you couldn't name that many. Here's the real embarrassing test of the day: How many sites' IP addresses can you name? No pinging and local subnets don't count!</p> <p><img width="50%" title="Most telephones still looked like this when DNS was invented." src="https://sedimental.org/uploads/illo/mjc/telephone.png" /><br /><em>Most telephones still looked like this when DNS was invented. Not pictured: the phonebook.</em></p> <p>Back in the mid-1980s, the first Domain Name System (<a href="https://en.wikipedia.org/wiki/Domain_Name_System">DNS</a>) implementations started putting our IP addresses into server-based contact lists and the Internet has never looked the same since. These days, we may associate DNS with large-scale networks, but it's important to remember that DNS really came from a very human distaste for numbers. Thirty years later, we engineers use it so much in normal Internet usage that it's easy to take for granted.</p> <p>DNS may be a mature, but the fact of networks is that it always takes at least two to tango. As new technologies and deployments emerge, the implications of integrating with DNS must still be revisited. Your datacenter is not the Internet, even if it's in the cloud. This post looks at how to resolve a few of the DNS pitfalls preying on precious reliability and performance.</p> <!-- prevent potential pitfalls that prey on projects' precious performance and predictability. --> <h3 id="a_protocol_precaution"><a href="#a_protocol_precaution" class="toclink">A protocol precaution</a></h3> <p>The client side of DNS, <em>resolution</em>, is virtually all <a href="https://en.wikipedia.org/wiki/User_Datagram_Protocol">UDP</a>. This is interesting because UDP is designed as a lightweight, <a href="https://en.wikipedia.org/wiki/Reliability_%28computer_networking%29">unreliable</a> transport. However, in many of the most common use cases, DNS calls precede <a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol">TCP</a>-backed <a href="https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol">HTTP</a> and other protocols based on reliable transports. This fundamental difference changes many things. Looking upstream, UDP does not load-balance like TCP. Because UDP is not connection-oriented or congestion-controlled, DNS traffic will act very differently at scale.</p> <p>So our first lesson is to stay true to the stateless nature of UDP and avoid putting <a href="https://www.f5.com/pdf/deployment-guides/dns-load-balancing-dg.pdf">stateful load balancers</a> in front of DNS infrastructure. Instead, configure clients and servers to conform to the built-in load-handling architecture of DNS. The Internet's DNS "deployment" is load balanced via its <a href="https://www.novell.com/documentation/dns_dhcp/?page=/documentation/dns_dhcp/dhcp_enu/data/behdbhhj.html">inherent hierarchy</a> and <a href="https://en.wikipedia.org/wiki/Anycast">IP Anycast</a>.</p> <h3 id="client_integration"><a href="#client_integration" class="toclink">Client integration</a></h3> <p>Back on the client side, you can do a lot to optimize and robustify your application's DNS integration. The first step is to take a hard look at your stack. Whether you're running Python, Java, JavaScript, or C++, the defaults may not be for you, especially when working with traffic within the datacenter.</p> <p>For example, while not supported here at PayPal, it's safe to say <a href="https://github.com/tornadoweb/tornado">Tornado</a> is a popular Python web framework, with many asynchronous networking features. But, silently and subtly, <a href="https://twitter.com/etrepum/status/585544395006550016">DNS is not one of them</a>. Tornado's <a href="http://tornado.readthedocs.org/en/latest/netutil.html#tornado.netutil.BlockingResolver">default DNS resolution behavior</a> will block the entire IO event loop, leading to big issues at scale.</p> <p>And that's just one example of library DNS defaults jeopardizing application reliability. Third-party packages and sometimes even builtins in Java, Node.js, Python, and other stacks are full of hidden DNS faux pas.</p> <p>For instance, the average off-the-shelf HTTP client seems like a neutral-enough component. Where would we be without reliable standbys like <a href="https://en.wikipedia.org/wiki/Wget">wget</a>? And that is how the trouble starts. The DNS defaults in most tools are designed to make for good Internet citizens, not reliable and performant enterprise foundations.</p> <p><a target="_blank" href="https://en.wikipedia.org/wiki/Domain_Name_System#Client_lookup"><img width="50%" title="The hops Internet applications make for you." src="https://sedimental.org/uploads/DNS_in_the_real_world.svg.png" /></a><br /><em>The hops Internet-connected applications make for you. It's no wonder the default timeout is 5000 milliseconds.</em></p> <p>The first difference is name resolution timeouts. By default, <a href="http://linux.die.net/man/5/resolv.conf">resolve.conf</a>, <a href="https://github.com/netty/netty/blob/1b8086a6c16319c93724d65af1c805363c03b6d0/resolver-dns/src/main/java/io/netty/resolver/dns/DnsNameResolver.java#L310">netty</a>, and <a href="http://c-ares.haxx.se/ares_init.html">c-ares</a> (gevent, node.js, curl) are all configured to a whopping <strong>5 seconds</strong>. But this is your enterprise, your service, and your datacenter. Look at the <a href="https://en.wikipedia.org/wiki/Service-level_agreement" title="Service-Level Agreement">SLA</a> of your service and the reliability of your DNS. If your service can't take an extra 5000 milliseconds some percentage of the time, then you should lower that timeout. I've usually recommended 200 milliseconds or less. If your infrastructure can't resolve DNS faster than that, do one or more of the following:</p> <ol> <li>Put the authoritative DNS servers topologically closer.</li> <li>Add caching DNS servers, maybe even on the same machine.</li> <li>Build application-level DNS caching.</li> </ol> <p>Option #1 is purely a network issue, and a matter for network operations to discuss. For brevity's sake, option #2 <a href="https://www.digitalocean.com/community/tutorials/how-to-configure-bind-as-a-caching-or-forwarding-dns-server-on-ubuntu-14-04">is outside</a> <a href="https://help.ubuntu.com/community/Dnsmasq">the scope</a> <a href="http://cr.yp.to/djbdns/dnscache.html">of</a> <a href="https://www.unbound.net/">this article</a>. But option #3 is the one we recommend most, because it is bureaucracy-free and relatively easy to implement, even with enterprise considerations.</p> <h4 id="application_level_dns_caching"><a href="#application_level_dns_caching" class="toclink">Application-level DNS caching</a></h4> <p>When designing an enterprise application-level DNS cache, we must recognize that we are not discussing standard-issue web components like scrapers and browsers. Most enterprise services talk to a fixed set of relatively few machines. Even the most powerful and complex production PayPal services communicate with fewer than 200 addresses, partly due to the prevalence of load balancing LTMs in our architecture.</p> <p>For our <a href="https://medium.com/paypal-tech/introducing-support-98945f023a8e">gevent-based Python stack</a>, we use an asynchronous DNS cache that refreshes those addresses every five minutes. Plus, the stack warms up our application's DNS cache by kicking off preresolution of many known DNS-addressed hosts at startup, ensuring that the first requests are as fast as later ones.</p> <!-- Linux's DNS behavior is provided via glibc. The same library that brought you string formatting and basic time functions, also nonchalantly provides DNS capabilities, with all its nuances. (TODO: how well does this resolver/cache play with TTLs?)--> <p>Some may be asking, why use a custom, application-level DNS cache when virtually every operating system caches DNS automatically? In short, when the OS cache expires, the next DNS resolution will block, causing stacks without this asynchronous DNS cache to block on the next resolution. Our DNS cache allows us to use mildly stale addresses while the cache is refreshing, making us robust to many DNS issues. For our use cases both the chances and consequences of connecting to the wrong server are so minute that it's not worth inflating outlier response times by inlining DNS. This arrangement also makes services much more robust to network glitches and DNS outages, as well as allowing for more logging and instrumentation around the explicit DNS resolution so you can see when DNS is performing badly.</p> <h3 id="denecessitizing_dns"><a href="#denecessitizing_dns" class="toclink">Denecessitizing DNS</a></h3> <p>The overhaul wouldn't be complete without exploring one final scenario. What's it like to not use DNS at all? It may sound odd, given the number of technologies built on DNS in the last 30 years. But even today, PayPal production services still communicate to each other using a statically generated IP-address-based system, like a souped-up <a href="https://en.wikipedia.org/wiki/Hosts_%28file%29">hosts file</a>. This design decision long predates my tenure here, and for a long time I considered it technical debt. But after collaborating with architects here and at other enterprise datacenters, I've come to appreciate the advantages of skipping DNS. DNS was designed for multi-authority, federated, eventually-consistent networks, like the Internet. Even the biggest datacenters are not the Internet. A datacenter is topologically smaller, has only one operational authority, and must meet much tighter reliability requirements.</p> <p><img title="A little peek at PayPal's midtier-to-midtier traffic." width="50%" src="https://sedimental.org/uploads/pp_midtier.png" /><br /> <em>A little peek at PayPal's midtier-to-midtier traffic. Each shrunken line of text is a service endpoint. It looks like a lot, but each endpoint only talks to a few others.</em></p> <p>Whether or not your system uses DNS, when you own the entire network it's still best practice to maintain a central, version-controlled, "single source of truth" repository for networking configurations. After all, even DNS server configurations have to come from somewhere. If it were possible to efficiently and reliably push that same information to every client, would you? Explicit preresolution of all service names reduces the window of inconsistency while saving the datacenter billions of network requests. If you already have a scalable deployment system, could it also fill the network topology gap, saving you the trouble of overhauling, scaling, and maintaining an Internet system for enterprise use? There's a lot packed in a question like that, but it's something to consider when designing your service ecosystem.</p> <h3 id="in_short"><a href="#in_short" class="toclink">In short</a></h3> <p>So, to sum it all up, here are the key takeaways:</p> <ul> <li>Beware the pitfalls of stateful load-balancing for DNS and UDP.</li> <li>Tighten up your timeouts according to your SLAs.</li> <li>Consider an in-application DNS cache with explicit resolution.</li> <li>The fastest and most reliable request is the request you don't have to make.</li> <li>A datacenter is not the Internet.</li> </ul> <p>If you're not careful, out-of-box solutions will fill your inbox with avoidable problems. Quality enterprise engineering means taking a microscope to libraries, with deliberate overhauling for your organization's needs.</p> <!-- "DNS + HTTP: The Reliability and Performance of the Internet, Inside the Datacenter!" - Too many might not get the joke. --><p></p> <hr /> https://sedimental.org/announcing_the_hatnote_top_100.html Mahmoud Hashemi https://sedimental.org/ Announcing the Hatnote Top 100 2015年12月14日T05:00:00Z 2015年12月14日T05:00:00Z <p><p><em>Originally published on <a href="http://blog.hatnote.com/post/135182048397/announcing-the-hatnote-top-100">the Hatnote blog</a>.</em></p> <p>Moreso than any other major site, <a href="http://wikipedia.org/">Wikipedia</a> is centered around knowledge, always growing, and brimming with information. It's important to remember that the insight of our favorite community-run encyclopedia often follows the focus of its massive readership. Here at Hatnote, we've often wondered, what great new topics is the community learning about now?</p> <p>To shed more light on Wikipedia's reading habits, we're pleased to announce the newest addition to the Hatnote family: <strong>The Hatnote Top 100</strong>, available at <strong><a href="http://top.hatnote.com">top.hatnote.com</a></strong>. Because we can't pass up a good headwear-based pun.</p> <p>Updated daily, the Top 100 is a chart of the most-visited articles on Wikipedia. Unlike the edit-oriented <a href="http://listen.hatnote.com">Listen to Wikipedia</a> and <a href="http://weekly.hatnote.com">Weeklypedia</a>, Top 100 focuses on the biggest group of Wikipedia users: the readers. Nearly 20 billion times per month, <a href="http://blog.wikimedia.org/2013/04/19/wikimedia-projects-500-million/">around 500 million people</a> read articles in over 200 languages. Top 100's daily statistics offer a window into where Wikipedia readers are focusing their attention. It also makes for a great way to discover great chapters of Wikipedia one wouldn't normally read or edit.</p> <p><a href="http://top.hatnote.com" target="_blank"><img width="40%" title="A screenshot of the Hatnote Top 100 from December 10, 2015" src="https://41.media.tumblr.com/85ece35a58888f09b20733d6f0f3d0c2/tumblr_nzclixxtTV1s4aev9o1_1280.png" /></a></p> <p>Clear rankings, day-to-day differences, social media integration, permalinks, and other familiar simple-but-critical features were designed to make popular Wikipedia articles as relatable as albums on a pop music chart. In practice, popular news stories and celebrities definitely make the Top 100, but it is satisfying to see interesting corners of history and other educational topic sharing, if not dominating, the spotlight.</p> <p>In addition to a clear and readable report, Top 100 is also a machine-readable archive, with reports dating back to November 2015, including JSON versions of the metrics, as well as <a href="http://top.hatnote.com/about.html#feeds">RSS feeds for all supported languages and projects</a>. It's all available in over a dozen languages (and we <a href="https://github.com/hatnote/top/issues">take requests for more</a>). The data comes from a variety of sources, most direct from Wikimedia, including <a href="https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/get_metrics_pageviews">a new pageview statistics API endpoint</a> that we've been proud to pilot and continue to use. And yes, as with all our projects <a href="https://github.com/hatnote/top/">the code is open-source</a>, too.</p> <p>For those of you looking to dig deeper than Wikipedia chart toppers, there are several other activity-based projects worth mentioning:</p> <ul> <li><a href="http://stats.grok.se">stats.grok.se</a> - The original, venerable pageview grapher and API</li> <li><a href="https://reportcard.wmflabs.org/">Wikimedia Report Card</a> - Advanced metrics and data used by the Wikimedia Foundation</li> <li><a href="http://wikirank.di.unimi.it/">The Open Wikipedia Ranking</a> - Traffic stats and more</li> <li><a href="https://twitter.com/WikipediaTrends">@WikipediaTrends</a> - A bot posting notable upward traffic spikes</li> <li><a href="https://en.wikipedia.org/wiki/Wikipedia:Top_25_Report">The Top 25 Report</a> - A manually-compiled weekly report of views and likely reasons</li> <li><a href="http://weekly.hatnote.com">The Weeklypedia</a> - Weekly edit statistics, emailed and archived by Hatnote</li> </ul> <p>And there are other visualizations on <a href="http://seealso.org">seealso.org</a> as well. But for those who like to keep it simple, hit up the <a href="http://top.hatnote.com">Hatnote Top 100</a>, <a href="http://top.hatnote.com/about.html#feeds">subscribe to a feed</a>, and/or <a href="https://twitter.com/hatnotable">follow us on Twitter</a>. See you there!<p></p> <hr /> https://sedimental.org/repeat_the_obvious.html Mahmoud Hashemi https://sedimental.org/ Repeat the obvious 2015年11月09日T00:05:00Z 2015年11月09日T00:05:00Z <p><!-- aka "Disclaimer: You may have read this before"--> <p>Bad things happen when we don't repeat the obvious.</p> <!-- <a href="https://sedimental.org"><img height="300px" src="https://sedimental.org/uploads/repetition/blocks.jpg"></a> --> <p>It's 9pm and I'm writing a post for <a href="https://medium.com/paypal-tech/search?q=python">the company engineering blog</a>. Every sentence is a slog. Not because I'm exacting and conciseness isn't my strong suit. My writing is slow because every word is obvious, almost patronizing.</p> <p><a href="https://en.wikipedia.org/wiki/Sierpinski_triangle"><img width="80%" src="https://sedimental.org/uploads/repetition/800px-Sierpinski_triangle_evolution.svg.png" /></a></p> <p>Obvious realities bear repetition, and so must you. Common sense is not so common. The majority of ideas floating around try too hard. They're designed to confuse, seduce, and sell. Press releases and ads push to the forefront, while reviewed articles and texts sit on shelves and in queues.</p> <!--<a href="https://en.wikipedia.org/wiki/Repeat_sign"><img height="50px"src="https://sedimental.org/uploads/repetition/YB0340_Repetition_reprise_debut.png"></a> --> <p>Repeat the obvious, so we stay on the same page. <a href="http://learncodethehardway.org/">The ways</a> we <a href="https://web.archive.org/web/2023/http://recode.net/2015/02/14/obama-everybodys-got-to-learn-how-to-code/">rush people</a> into technology leaves <a href="http://lifehacker.com/how-i-taught-myself-to-code-in-eight-weeks-511615189">little time</a> for foundations. Software is so new and developers so in-demand, every wave brings more fresh minds than the last. Developers are arriving faster than knowledge can diffuse.</p> <p>Repeat the obvious, to keep perspective. Technology may favor the new, but fundamentals do exist. Without reminders, time buries working technologies in the dust of silence.</p> <p>Repeat the obvious, to avoid bizarre dark ages. Take functional programming's disappearance in the 1990s/2000s, cast aside in favor of object orientated hype. Or that one time when not enough programmers talked about and taught <a href="https://en.wikipedia.org/wiki/Event_loop">event-driven servers</a> programming and <a href="https://en.wikipedia.org/wiki/Node.js">Frankenstein</a> was cast as revolutionary.</p> <!-- <a href="https://en.wikipedia.org/wiki/Da_capo"><img height="75px"src="https://sedimental.org/uploads/repetition/YB0335_Repetition_dacapo.png"></a> --> <p>So I hope you'll forgive the repetition. It hurts me more than it hurts you, and believe me when I say it helps many. Documentation does not equal disussion. The modern media landscape demands a technology have both docs and discourse to remain useful.</p> <p>Until we live in a world where reference rules over repetition, you can help by writing about something painfully obvious to you. Bad things happen when we don't repeat the obvious.</p> <!-- https://www.flickr.com/photos/ryan_orr/467847865 --> <!-- TODO: --><p></p> <hr /> https://sedimental.org/remap.html Mahmoud Hashemi https://sedimental.org/ Remap: Nested Data Multitool for Python 2015年09月24日T12:25:00Z 2015年09月24日T12:25:00Z <p><blockquote> <p><em>This entry is the first in a series of "cookbooklets" showcasing more advanced <a href="https://boltons.readthedocs.org">Boltons</a>. If all goes well, the next 5 minutes will literally save you 5 hours.</em></p> </blockquote> <div class="toc"><span class="toctitle">Contents</span><ul><li><a href="#intro">Intro</a><li><a href="#normalize_keys_and_values">Normalize keys and values</a><li><a href="#drop_empty_values">Drop empty values</a><li><a href="#convert_dictionaries_to_ordereddicts">Convert dictionaries to OrderedDicts</a><li><a href="#sort_all_lists">Sort all lists</a><li><a href="#collect_interesting_values">Collect interesting values</a><li><a href="#add_common_keys">Add common keys</a><li><a href="#corner_cases">Corner cases</a><li><a href="#wrap_up">Wrap-up</a></ul></div><h2 id="intro"><a href="#intro" class="toclink">Intro</a></h2> <p>Data is everywhere, especially within itself. That's right, whether it's public APIs, document stores, or plain old configuration files, data <em>will</em> nest. And that nested data will find you.</p> <p><a href="https://en.wikipedia.org/wiki/Flat_design">UI fads</a> aside, developers have always liked "flat". Even Python, so often turned to for data wrangling, only has succinct built-in constructs for dealing with flat data. <a href="https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions">List comprehensions</a>, <a href="https://docs.python.org/2/reference/expressions.html#generator-expressions">generator expressions</a>, <a href="https://docs.python.org/2/library/functions.html#map">map</a>/<a href="https://docs.python.org/2/library/functions.html#filter">filter</a>, and <a href="https://docs.python.org/2/library/itertools.html">itertools</a> are all built for flat work. In fact, the allure of flat data is likely a direct result of this common gap in most programming languages.</p> <p><a href="https://commons.wikimedia.org/wiki/File:Russian-Matroshka2.jpg"> <img width="45%" src="https://sedimental.org/uploads/Russian-Matroshka2.jpg" /> </a></p> <p><strong>Let's change that.</strong> First, let's meet this nested adversary. Provided you overlook my taste in media, it's hard to fault nested data when it reads as well as this <a href="https://en.wikipedia.org/wiki/YAML">YAML</a>:</p> <pre class="codehilite"><code class="language-yaml">reviews: shows: - title: Star Trek - The Next Generation rating: 10 review: Episodic AND deep. &lt;3 Data. tags: ['space'] - title: Monty Python's Flying Circus rating: 10 tags: ['comedy'] movies: - title: The Hitchiker's Guide to the Galaxy rating: 6 review: So great to see Mos Def getting good work. tags: ['comedy', 'space', 'life'] - title: Monty Python's Meaning of Life rating: 7 review: Better than Brian, but not a Holy Grail, nor Completely Different. tags: ['comedy', 'life'] prologue: title: The Crimson Permanent Assurance rating: 9 </code></pre> <p>Even this very straightforwardly nested data can be a real hassle to manipulate. How would one add a default review for entries without one? How would one convert the ratings to a 5-star scale? And what does all of this mean for more complex real-world cases, exemplified by this excerpt from <a href="https://api.github.com/users/mahmoud/events">a real GitHub API</a> response:</p> <p><a id="github_event_data"></a></p> <pre class="codehilite"><code class="language-json">[{ "id": "3165090957", "type": "PushEvent", "actor": { "id": 130193, "login": "mahmoud", "gravatar_id": "", "url": "https://api.github.com/users/mahmoud", "avatar_url": "https://avatars.githubusercontent.com/u/130193?" }, "repo": { "id": 8307391, "name": "mahmoud/boltons", "url": "https://api.github.com/repos/mahmoud/boltons" }, "payload": { "push_id": 799258895, "size": 1, "distinct_size": 1, "ref": "refs/heads/master", "head": "27a4bc1b6d1da25a38fe8e2c5fb27f22308e3260", "before": "0d6486c40282772bab232bf393c5e6fad9533a0e", "commits": [ { "sha": "27a4bc1b6d1da25a38fe8e2c5fb27f22308e3260", "author": { "email": "mahmoud@hatnote.com", "name": "Mahmoud Hashemi" }, "message": "switched reraise_visit to be just a kwarg", "distinct": true, "url": "https://api.github.com/repos/mahmoud/boltons/commits/27a4bc1b6d1da25a38fe8e2c5fb27f22308e3260" } ] }, "public": true, "created_at": "2015年09月21日T10:04:37Z" }] </code></pre> <p>The astute reader may spot some inconsistency and general complexity, but don't run away.</p> <p><big><strong>Remap</strong>, the <a href="https://en.wikipedia.org/wiki/Recursion_(computer_science)">recursive</a> <a href="https://docs.python.org/2/library/functions.html#map">map</a>, is here to save the day.</big></p> <p>Remap is a Pythonic traversal utility that creates a transformed copy of your nested data. It uses three callbacks -- <code>visit</code>, <code>enter</code>, and <code>exit</code> -- and is designed to accomplish the vast majority of tasks by passing only one function, usually <code>visit</code>. <a href="http://boltons.readthedocs.org/en/latest/iterutils.html#boltons.iterutils.remap">The API docs have full descriptions</a>, but the basic rundown is:</p> <ul> <li><code>visit</code> transforms an individual item</li> <li><code>enter</code> controls how container objects are created and traversed</li> <li><code>exit</code> controls how new container objects are populated</li> </ul> <p>It may sound complex, but the examples shed a lot of light. So let's get remapping!</p> <h2 id="normalize_keys_and_values"><a href="#normalize_keys_and_values" class="toclink">Normalize keys and values</a></h2> <p>First, let's import the modules and data we'll need.</p> <pre class="codehilite"><code class="language-python">import json import yaml # https://pypi.org/pypi/PyYAML from boltons.iterutils import remap # https://pypi.org/pypi/boltons review_map = yaml.load(media_reviews) event_list = json.loads(github_events) </code></pre> <p>Now let's turn back to that GitHub API data. Earlier one may have been annoyed by the inconsistent type of <code>id</code>. <code>event['repo']['id']</code> is an integer, but <code>event['id']</code> is a string. When sorting events by ID, you would not want <a href="https://en.wikipedia.org/wiki/Lexicographical_order">string ordering</a>.</p> <p>With <code>remap</code>, fixing this sort inconsistency couldn't be easier:</p> <pre class="codehilite"><code class="language-python">from boltons.iterutils import remap def visit(path, key, value): if key == 'id': return key, int(value) return key, value remapped = remap(event_list, visit=visit) assert remapped[0]['id'] == 3165090957 # You can even do it in one line: remap(event_list, lambda p, k, v: (k, int(v)) if k == 'id' else (k, v)) </code></pre> <p>By default, <code>visit</code> gets called on every item in the root structure, including <a href="https://docs.python.org/2/tutorial/datastructures.html#more-on-lists">lists</a>, <a href="https://docs.python.org/2/tutorial/datastructures.html#dictionaries">dicts</a>, and other containers, so let's take a closer look at its signature. <code>visit</code> takes three arguments we're going to see in all of remap's callbacks:</p> <ul> <li><code>path</code> is a <a href="https://docs.python.org/2/tutorial/datastructures.html#tuples-and-sequences">tuple</a> of keys leading up to the current item</li> <li><code>key</code> is the current item's key</li> <li><code>value</code> is the current item's value</li> </ul> <p><code>key</code> and <code>value</code> are exactly what you would expect, though it may bear mentioning that the <code>key</code> for a list item is its index. <code>path</code> refers to the keys of all the parents of the current item, not including the <code>key</code>. For example, looking at <a href="https://sedimental.org/remap.html#github_event_data">the GitHub event data</a>, the commit author's name's path is <code>(0, 'payload', 'commits', 0, 'author')</code>, because the key, <code>name</code>, is located in the author of the first commit in the payload of the first event.</p> <p>As for the return signature of <code>visit</code>, it's very similar to the input. Just return the new <code>(key, value)</code> you want in the remapped output.</p> <h2 id="drop_empty_values"><a href="#drop_empty_values" class="toclink">Drop empty values</a></h2> <p>Next up, GitHub's move away from <a href="https://en.wikipedia.org/wiki/Gravatar">Gravatars</a> left an artifact in their API: a blank <code>'gravatar_id'</code> key. We can get rid of that item, and any other blank strings, in a jiffy:</p> <pre class="codehilite"><code class="language-python">drop_blank = lambda p, k, v: v != "" remapped = remap(event_list, visit=drop_blank) assert 'gravatar_id' not in remapped[0]['actor'] </code></pre> <p>Unlike the previous example, instead of a <code>(key, value)</code> pair, this <code>visit</code> is returning a <code>bool</code>. For added convenience, when <code>visit</code> returns <code>True</code>, <code>remap</code> carries over the original item unmodified. Returning <code>False</code> drops the item from the remapped structure.</p> <p>With the ability to arbitrarily transform items, pass through old items, and drop items from the remapped structure, it's clear that the <code>visit</code> function makes the majority of recursive transformations trivial. So many tedious and error-prone lines of traversal code turn into one-liners that usually <code>remap</code> with a <code>visit</code> callback is all one needs. With that said, the next recipes focus on <code>remap</code>'s more advanced callable arguments, <code>enter</code> and <code>exit</code>.</p> <h2 id="convert_dictionaries_to_ordereddicts"><a href="#convert_dictionaries_to_ordereddicts" class="toclink">Convert dictionaries to OrderedDicts</a></h2> <p>So far we've looked at actions on remapping individual items, using the <code>visit</code> callable. Now we turn our attention to actions on containers, the parent objects of individual items. We'll start doing this by looking at the <code>enter</code> argument to <code>remap</code>.</p> <pre class="codehilite"><code class="language-python"># from collections import OrderedDict from boltons.dictutils import OrderedMultiDict as OMD from boltons.iterutils import remap, default_enter def enter(path, key, value): if isinstance(value, dict): return OMD(), sorted(value.items()) return default_enter(path, key, value) remapped = remap(review_list, enter=enter) assert remapped['reviews'].keys()[0] == 'movies' # True because 'reviews' is now ordered and 'movies' comes before 'shows' </code></pre> <p>The <code>enter</code> callable controls both if and how an object is traversed. Like <code>visit</code>, it accepts <code>path</code>, <code>key</code>, and <code>value</code>. But instead of <code>(key, value)</code>, it returns a tuple of <code>(new_parent, items)</code>. <code>new_parent</code> is the container that will receive items remapped by the <code>visit</code> callable. <code>items</code> is an iterable of <code>(key, value)</code> pairs that will be passed to <code>visit</code>. Alternatively, <code>items</code> can be <code>False</code>, to tell remap that the current value should not be traversed, but that's getting pretty advanced. The API docs have some other <code>enter</code> details to consider.</p> <p>Also note how this code builds on the default remap logic by calling through to the <code>default_enter</code> function, imported from the same place as <code>remap</code> itself. Most practical use cases will want to do this, but of course the choice is yours.</p> <h2 id="sort_all_lists"><a href="#sort_all_lists" class="toclink">Sort all lists</a></h2> <p>The last example used <code>enter</code> to interact with containers before they were being traversed. This time, to sort all lists in a structure, we'll use the <code>remap</code>'s final callable argument: <code>exit</code>.</p> <pre class="codehilite"><code class="language-python">from boltons.iterutils import remap, default_exit def exit(path, key, old_parent, new_parent, new_items): ret = default_exit(path, key, old_parent, new_parent, new_items) if isinstance(ret, list): ret.sort() return ret remap(review_list, exit=exit) </code></pre> <p>Similar to the <code>enter</code> example, we're building on <code>remap</code>'s default behavior by importing and calling <code>default_exit</code>. Looking at the arguments passed to <code>exit</code> and <code>default_exit</code>, there's the <code>path</code> and <code>key</code> that we're used to from <code>visit</code> and <code>enter</code>. <code>value</code> is there, too, but it's named <code>old_parent</code>, to differentiate it from the new value, appropriately called <code>new_parent</code>. At the point <code>exit</code> is called, <code>new_parent</code> is just an empty structure as constructed by <code>enter</code>, and <code>exit</code>'s job is to fill that new container with <code>new_items</code>, a list of <code>(key, value)</code> pairs returned by <code>remap</code>'s calls to <code>visit</code>. Still with me?</p> <p>Either way, here we don't interact with the arguments. We just call <code>default_exit</code> and work on its return value, <code>new_parent</code>, sorting it in-place if it's a <code>list</code>. Pretty simple! In fact, <em>very</em> attentive readers might point out this can be done with <code>visit</code>, because <code>remap</code>'s very next step is to call <code>visit</code> with the <code>new_parent</code>. You'll have to forgive the contrived example and let it be a testament to the rarity of overriding <code>exit</code>. Without going into the details, <code>enter</code> and <code>exit</code> are most useful when teaching <code>remap</code> how to traverse nonstandard containers, such as non-iterable Python objects. As mentioned in the <a href="https://sedimental.org/remap.html#drop_empty_values">"drop empty values"</a> example, <code>remap</code> is designed to maximize the mileage you get out of the <code>visit</code> callback. Let's look at an advanced usage reason that's true.</p> <h2 id="collect_interesting_values"><a href="#collect_interesting_values" class="toclink">Collect interesting values</a></h2> <p>Sometimes you just want to traverse a nested structure, and you don't need the result. For instance, if we wanted to collect the full set of tags used in media reviews. Let's create a <code>remap</code>-based function, <code>get_all_tags</code>:</p> <pre class="codehilite"><code class="language-python">def get_all_tags(root): all_tags = set() def visit(path, key, value): all_tags.update(value['tags']) return False remap(root, visit=visit, reraise_visit=False) return all_tags print(get_all_tags(review_map)) # set(['space', 'comedy', 'life']) </code></pre> <p>Like the first recipe, we've used the <code>visit</code> argument to <code>remap</code>, and like the second recipe, we're just returning <code>False</code>, because we don't actually care about contents of the resulting structure.</p> <p>What's new here is the <code>reraise_visit=False</code> keyword argument, which tells <code>remap</code> to <strong>keep</strong> any item that causes a <code>visit</code> exception. This practical convenience lets <code>visit</code> functions be shorter, clearer, and just more <acronym title="Easier to Ask Forgiveness than Permission"><a href="https://en.wikipedia.org/wiki/Python_syntax_and_semantics#Exceptions">EAFP</a></acronym>. Reducing the example to a one-liner is left as an exercise to the reader.</p> <h2 id="add_common_keys"><a href="#add_common_keys" class="toclink">Add common keys</a></h2> <p>As a final advanced <code>remap</code> example, let's look at adding items to structures. Through the examples above, we've learned that <code>visit</code> is best-suited for 1:1 transformations and dropping values. This leaves us with two main approaches for addition. The first uses the <code>enter</code> callable and is suitable for making data consistent and adding data which can be overridden.</p> <pre class="codehilite"><code class="language-python">base_review = {'title': '', 'rating': None, 'review': '', 'tags': []} def enter(path, key, value): new_parent, new_items = default_enter(path, key, value) try: new_parent.update(base_review) except: pass return new_parent, new_items remapped = remap(review_list, enter=enter) assert review_list['shows'][1]['review'] == '' # True, the placeholder review is holding its place </code></pre> <p>The second method uses the <code>exit</code> callback to override values and calculate new values from the new data.</p> <pre class="codehilite"><code class="language-python">def exit(path, key, old_parent, new_parent, new_items): ret = default_exit(path, key, old_parent, new_parent, new_items) try: ret['review_length'] = len(ret['review']) except: pass return ret remapped = remap(review_list, exit=exit) assert remapped['shows'][0]['review_length'] == 27 assert remapped['movies'][0]['review_length'] == 42 # True times two. </code></pre> <p>By now you might agree that <code>remap</code> is making such feats positively routine. Come for the nested data manipulation, stay for the <a href="https://en.wikipedia.org/wiki/Phrases_from_The_Hitchhiker's_Guide_to_the_Galaxy#Answer_to_the_Ultimate_Question_of_Life.2C_the_Universe.2C_and_Everything_.2842.29">number jokes</a>.</p> <h2 id="corner_cases"><a href="#corner_cases" class="toclink">Corner cases</a></h2> <p>This whole guide has focused on data that came from "real-world" sources, such as JSON API responses. But there are certain rare cases which typically only arise from within Python code: <a href="http://pythondoeswhat.blogspot.com/2015/09/loopy-references.html">self-referential objects</a>. These are objects that contain references to themselves or their parents. Have a look at this trivial example:</p> <pre class="codehilite"><code class="language-python">self_ref = [] self_ref.append(self_ref) </code></pre> <p>The experienced programmer has probably seen this before, but most Python coders might even think the second line is an error. It's a list containing itself, and it has the rather cool <a href="https://docs.python.org/2/reference/datamodel.html#object.__repr__">repr</a>: <code>[[...]]</code>.</p> <p>Now, this is pretty rare, but reference loops do come up in programming. The <em>good</em> news is that remap handles these just fine:</p> <pre class="codehilite"><code class="language-python">print(repr(remap(self_ref))) # prints "[[...]]" </code></pre> <p>The more common corner case that arises is that of duplicate references, which remap also handles with no problem:</p> <pre class="codehilite"><code class="language-python">my_set = set() dupe_ref = (my_set, [my_set]) remapped = remap(dupe_ref) assert remapped[0] is remapped[-1][-1] # True, of course </code></pre> <p>Two references to the same set go in, two references to a copy of that set come out. That's right: only one copy is made, and then used twice, preserving the original structure.</p> <h2 id="wrap_up"><a href="#wrap_up" class="toclink">Wrap-up</a></h2> <p>If you've made it this far, then I hope you'll agree that <code>remap</code> is useful enough to be your new friend. If that wasn't enough detail, then <a href="http://boltons.readthedocs.org/en/latest/iterutils.html#boltons.iterutils.remap">there are the docs</a>. <code>remap</code> is <a href="https://github.com/mahmoud/boltons/blob/master/tests/test_iterutils.py">well-tested</a>, but making something this general-purpose is a tricky area. Please <a href="https://github.com/mahmoud/boltons/issues">file bugs and requests</a>. Don't forget about <a href="https://docs.python.org/2/library/pprint.html">pprint</a> and <a href="https://docs.python.org/2/library/repr.html">repr</a>/<a href="https://docs.python.org/3/library/reprlib.html">reprlib</a>, which can help with reading large structures. As always, <a href="https://twitter.com/mhashemi">stay tuned</a> for <a href="https://sedimental.org/tagged/boltons/">future boltons cookbooklets</a>, and much much more.</p> <p><a href="https://commons.wikimedia.org/wiki/File:First_matryoshka_museum_doll_open.jpg"> <img src="https://sedimental.org/uploads/First_matryoshka_museum_doll_open.jpg" /> </a></p> <!-- TODO: closing matroska image --> <!-- """The marker approach to solving self-reference problems in remap won't work because we can't rely on exit returning a traversable, mutable object. We may know that the marker is in the items going into exit but there's no guarantee it's not being filtered out or being made otherwise inaccessible for other reasons. On the other hand, having enter return the new parent instance before it's populated is a pretty workable solution. The division of labor stays clear and exit still has some override powers. Also note that only mutable structures can have self references (unless getting really nasty with the Python C API). The downside is that enter must do a bit more work and in the case of immutable collections, the new collection is discarded, as a new one has to be created from scratch by exit. The code is still pretty clear overall. Not that remap is supposed to be a speed demon, but here are some thoughts on performance. Memorywise, the registry grows linearly with the number of collections. The stack of course grows in proportion to the depth of the data. Many intermediate lists are created, but for most data list comprehensions are much faster than generators (and generator expressions). The ABC isinstance checks are going to be dog slow. As soon as a couple large enough use case cross my desk, I'll be sure to profile and optimize. It's not a question of if isinstance+ABC is slow, it's which pragmatic alternative passes tests while being faster. ## Remap design principles Nested structures are common. Virtually all compact Python iterative interaction is flat (list comprehensions, map/filter, generator expressions, itertools, even other iterutils). remap is a succinct solution to both quick and dirty data wrangling, as well as expressive functional interaction with nested structures. * visit() should be able to handle 80% of my pragmatic use cases, and the argument/return signature should be similarly pragmatic. * enter()/exit() are for more advanced use cases and the signature can be more complex. * 95%+ of applications should be covered by passing in only one callback. * Roundtripping should be the default. Don't repeat the faux pas of HTMLParser where, despite the nice SAX-like interface, it is impossible (or very difficult) to regenerate the input. Roundtripped results compare as equal, realistically somewhere between copy.copy and copy.deepcopy. * Leave streaming for another day. Generators can be handy, but the vast majority of data is of easily manageable size. Besides, there's no such thing as a streamable dictionary. """ --><p></p> <hr /> https://sedimental.org/python_community_intro.html Mahmoud Hashemi https://sedimental.org/ Python Community Intro 2015年09月22日T00:00:00Z 2015年09月22日T00:00:00Z <p><p>The <a href="https://www.python.org/psf/" title="Python Software Foundation">PSF</a> just created a new mailing list, <a href="https://mail.python.org/mailman/listinfo/psf-community">"PSF-Community"</a>, then autosubscribed a bunch of people and solicited introductions. At first I was surprised, but I was quickly charmed by <a href="https://mail.python.org/pipermail/psf-community/2015-September/thread.html">the response</a> and joined in on the action. Here's what I <a href="https://mail.python.org/pipermail/psf-community/2015-September/000084.html">wrote</a>:</p> <blockquote> <p>If Alex Martelli <a href="https://mail.python.org/pipermail/psf-community/2015-September/000081.html">is doing it</a>, then brace yourselves because the floodgates are open.</p> <p>I first used Python as a junior in a South Dakota high school, off a Knoppix CD because "<a href="https://en.wikipedia.org/wiki/Live_CD">Live CDs</a>" were all the rage then. It was a good fad because I didn’t have a computer, and the Windows machines at school weren’t writable and didn’t have Python (2.2 at the time). I read a bit of the tutorial and wrote a really bad <a href="https://en.wikipedia.org/wiki/Generating_primes#Prime_sieves">prime number sieve</a>.</p> <p>After a professional loop through Java, C++, C#, and finally PHP, I resumed Python development in 2009 as a full-stack web developer at PayPal. I wrote the tool that (still) manages all the pricing arrangements.</p> <p>From there I hired my first teammate and we wrote a couple other business-critical components before standardizing out PayPal’s first grassroots alternative stack. That was early 2011 and since then we’ve had <a href="https://medium.com/paypal-tech/10-myths-of-enterprise-python-8302b8f21f82">a lot of fun</a> and <a href="https://medium.com/paypal-tech/introducing-support-98945f023a8e">come so far</a>. Now we’re focusing on PayPal’s security offerings: putting Python at the very heart of PayPal’s availability model, handling <em>billions</em> of requests per day. And believe me when I say that’s it’s the best thing that’s happened to PayPal’s security in a long time! The details will have to wait for a future blog post (and upcoming O’Reilly project). Or, if you’re remotely as excited as I am, you can email me directly. :)</p> <p>On the side, I really enjoy working on <a href="http://listen.hatnote.com/#en">Wikipedia</a>-<a href="http://weekly.hatnote.com/">based</a> <a href="http://rcmap.hatnote.com/#en,de,ru,ja,es,fr">projects</a> &gt; <a href="http://seealso.hatnote.com/">under</a> <a href="https://twitter.com/hatnotable">the banner of</a> <a href="http://blog.hatnote.com/">Hatnote</a>, all Python. Most recently, we did <a href="http://blog.hatnote.com/post/124069724187/wikipedia-and-ifttt-a-technical-guide">the official Wikipedia IFTTT channel</a> (handling 1.3 million requests per day). And because I can’t get enough, a bunch of <a href="https://github.com/mahmoud">open-source stuff</a>, most notably <a href="http://boltons.readthedocs.org/en/latest/">Boltons</a>, where I’ve been particularly busy lately.</p> <p>If you’re in the Bay Area, do <em>not</em> hesitate to reach out to talk about Python, Wikipedia, security, federated and open systems (like BBS stuff), or even PayPal!</p> <p>Specifically, this is sort of odd, but October 14th at 1pm, I'm doing an overview of Python usage at PayPal, and would like to invite anyone senior and curious to be my guest and come to PayPal in San Jose to check it out. Guido came in 2012 and <a href="https://www.flickr.com/photos/mahmoudhashemi/16860083512/in/album-72157651024763880/">he loved it</a>. And stuff now is waaaay cooler!</p> <p>Anyways, I just wanted to end by saying thanks to you all. If you hadn't been so numerous and out there, I probably would have gotten myself fired long before any of this bore fruit. ;)</p> <p>THANKS!</p> <p>Mahmoud</p> </blockquote> <p>There were a lot of autosubscribed folks deploring the spammish inquisition and threatening unsubscription, so here's hoping my straw didn't break too many camels backs.<p></p> <hr /> https://sedimental.org/10_myths_of_enterprise_python.html Mahmoud Hashemi https://sedimental.org/ 10 Myths of Enterprise Python 2015年08月25日T00:00:00Z 2015年08月25日T00:00:00Z <p><p><em>(Originally posted <a href="https://medium.com/paypal-tech/search?q=python">on the PayPal Engineering blog</a>, reproduced here with minor updates, link fixes, etc.)</em></p> <p>PayPal enjoys a remarkable amount of linguistic pluralism in its programming culture. In addition to the long-standing popularity of C++ and Java, an increasing number of teams are choosing JavaScript and Scala, and <a href="https://www.braintreepayments.com/">Braintree</a>'s acquisition has introduced a sophisticated Ruby community.</p> <p>One language in particular has both a long history at eBay and PayPal and a growing mindshare among developers: <a href="https://www.python.org/">Python</a>.</p> <p>Python has enjoyed many years of grassroots usage and support from developers across eBay. Even before official support from management, technologists of all walks went the extra mile to reap the rewards of developing in Python. I joined PayPal a few years ago, and chose Python to work on internal applications, but I've personally found production PayPal Python code from nearly <strong>15 years ago</strong>.</p> <p>Today, Python powers <strong>over 50 projects</strong>, including:</p> <ul> <li><strong>Features and products</strong>, such as <strong>eBay Now</strong> and <a href="https://www.crunchbase.com/organization/redlaser">RedLaser</a></li> <li><strong>Operations and infrastructure</strong>, both <a href="http://www.openstack.org/">OpenStack</a> and proprietary</li> <li><strong>Mid-tier services and applications</strong>, like the one used to set PayPal's prices and check customer feature eligibility</li> <li><strong>Monitoring agents and interfaces</strong>, used for several deployment and security use cases</li> <li><strong>Batch jobs for data import</strong>, price adjustment, and more</li> <li>And far too many developer tools to count</li> </ul> <p>In the coming series of posts I'll detail the initiatives and technologies that led the eBay/PayPal Python community to grow from just under 25 engineers in 2011 to <strong>over 260</strong> in 2014. For this introductory post, I'll be focusing on the 10 myths I've had to debunk the most in eBay and PayPal's enterprise environments.</p> <p><a name="myth-1"></a></p> <h3 id="myth_1_python_is_a_new_language"><a href="#myth_1_python_is_a_new_language" class="toclink"><a href="https://sedimental.org/10_myths_of_enterprise_python.html#python-is-new" name="python-is-new">Myth #1</a>: Python is a new language</a></h3> <p>What with all the startups using it and <a href="http://www.nostarch.com/pythonforkids">kids learning it these days</a>, it's easy to see how this myth still persists. Python is actually <a href="https://en.wikipedia.org/wiki/Python_(programming_language)#History">over 23 years old</a>, originally released in 1991, 4 years before Java. A now-famous early usage of Python was in 1996: <a href="https://news.ycombinator.com/item?id=8587697">Google's first successful web crawler</a>.</p> <p>If you're curious about the long history of Python, <a href="https://en.wikipedia.org/wiki/Guido_van_Rossum">Guido van Rossum</a>, Python's creator, <a href="http://python-history.blogspot.com/2009/01/introduction-and-overview.html">has taken the care to tell the whole story</a>.</p> <p><a name="myth-2"></a></p> <h3 id="myth_2_python_is_not_compiled"><a href="#myth_2_python_is_not_compiled" class="toclink"><a href="https://sedimental.org/10_myths_of_enterprise_python.html#python-is-not-compiled" name="python-is-not-compiled">Myth #2</a>: Python is not compiled</a></h3> <p>While not requiring a separate compiler toolchain like C++, Python is in fact compiled to bytecode, much like Java and many other compiled languages. Further compilation steps, if any, are at the discretion of the runtime, be it CPython, PyPy, Jython/JVM, IronPython/CLR, or some other process virtual machine. See <a href="https://sedimental.org/10_myths_of_enterprise_python.html#myth-6">Myth #6</a> for more info.</p> <p>The general principle at PayPal and elsewhere is that the compilation status of code should not be relied on for security. It is much more important to secure the runtime environment, as virtually every language <a href="https://docs.python.org/2/library/dis.html">has</a> <a href="http://boomerang.sourceforge.net/">a</a> <a href="http://jd.benow.ca/">decompiler</a>, or <a href="https://docs.python.org/2/library/site.html">can</a> <a href="http://www.opensourceforu.com/2011/08/lets-hook-a-library-function/">be</a> <a href="http://docs.oracle.com/javase/6/docs/api/java/lang/instrument/Instrumentation.html">intercepted</a> to dump protected state. See the next myth for even more Python security implications.</p> <p><a name="myth-3"></a></p> <h3 id="myth_3_python_is_not_secure"><a href="#myth_3_python_is_not_secure" class="toclink"><a href="https://sedimental.org/10_myths_of_enterprise_python.html#python-is-not-secure" name="python-is-not-secure">Myth #3</a>: Python is not secure</a></h3> <p>Python's affinity for the lightweight may not make it seem formidable, but the intuition here can be misleading. One central tenet of security is to present as small a target as possible. Big systems are anti-secure, as they tend to <a href="http://www.jwz.org/xscreensaver/toolkits.html">overly centralize behaviors</a>, as well as <a href="https://www.schneier.com/essays/archives/1999/11/a_plea_for_simplicit.html">undercut developer comprehension</a>. Python keeps these demons at bay by encouraging simplicity. Furthermore, <a href="https://en.wikipedia.org/wiki/CPython">CPython</a>[cypython] addresses these issues by being a simple, stable, and easily-auditable virtual machine. In fact, a recent analysis by <a href="http://www.coverity.com/why-coverity/">Coverity</a> Software <a href="http://www.coverity.com/press-releases/coverity-finds-python-sets-new-level-of-quality-for-open-source-software/">resulted in CPython receiving their highest quality rating</a>.</p> <p>Python also features an extensive array of open-source, industry-standard security libraries. At PayPal, where we take security and trust very seriously, we find that a combination of <a href="https://docs.python.org/2/library/hashlib.html">hashlib</a>, <a href="https://github.com/dlitz/pycrypto">PyCrypto</a>, and <a href="https://www.openssl.org/">OpenSSL</a>, via <a href="https://github.com/pyca/pyopenssl">PyOpenSSL</a> and our own custom bindings, cover all of PayPal's diverse security and performance needs.</p> <p>For these reasons and more, Python has seen some of its fastest adoption at PayPal (and eBay) within the application security group. Here are just a few security-based applications utilizing Python for PayPal's security-first environment:</p> <ul> <li>Creating security agents for facilitating key rotation and consolidating cryptographic implementations</li> <li>Integrating with industry-leading <acronym title="Hardware Security Module"><a href="https://en.wikipedia.org/wiki/Hardware_security_module">HSM</a></acronym> technologies</li> <li>Constructing TLS-secured wrapper proxies for less-compliant stacks</li> <li>Generating keys and certificates for our internal mutual-authentication schemes</li> <li>Developing active vulnerability scanners</li> </ul> <p>Plus, myriad Python-built operations-oriented systems with security implications, such as firewall and connection management. In the future we'll definitely try to put together a deep dive on PayPal Python security particulars.</p> <p><a name="myth-4"></a></p> <h3 id="myth_4_python_is_a_scripting_language"><a href="#myth_4_python_is_a_scripting_language" class="toclink"><a href="https://sedimental.org/10_myths_of_enterprise_python.html#python-is-for-scripting" name="python-is-for-scripting">Myth #4</a>: Python is a scripting language</a></h3> <p>Python can indeed be used for scripting, and is one of the forerunners of the domain due to its simple syntax, cross-platform support, and ubiquity among Linux, Macs, and other Unix machines.</p> <p>In fact, Python may be one of the most flexible technologies among general-use programming languages. To list just a few:</p> <ol> <li>Telephony infrastructure (<a href="https://en.wikipedia.org/wiki/Twilio">Twilio</a>)</li> <li>Payments systems (<a href="https://en.wikipedia.org/wiki/PayPal">PayPal</a>, [Venmo][venmo])</li> <li>Neuroscience and psychology (<a href="http://www.frontiersin.org/neuroinformatics/researchtopics/Python_in_neuroscience/8">citation</a>)</li> <li>Numerical analysis and engineering (<a href="https://en.wikipedia.org/wiki/NumPy">numpy</a>, <a href="https://numba.pydata.org/">numba</a>, and <a href="https://wiki.python.org/moin/NumericAndScientific">many more</a>)</li> <li>Animation (<a href="https://en.wikipedia.org/wiki/LucasArts">LucasArts</a>, <a href="https://disneyanimation.com/open-source/">Disney</a>, <a href="https://en.wikipedia.org/wiki/DreamWorks_Animation">Dreamworks</a>)</li> <li>Gaming backends (<a href="https://en.wikipedia.org/wiki/Eve_Online">Eve Online</a>, <a href="https://en.wikipedia.org/wiki/Second_Life">Second Life</a>, <a href="https://en.wikipedia.org/wiki/Battlefield_(series)">Battlefield</a>, and <a href="https://wiki.python.org/moin/PythonGames">so many others</a>)</li> <li>Email infrastructure (<a href="https://en.wikipedia.org/wiki/GNU_Mailman">Mailman</a>, <a href="https://www.mailgun.com/">Mailgun</a>)</li> <li>Media storage and processing (<a href="https://en.wikipedia.org/wiki/YouTube">YouTube</a>, <a href="http://instagram-engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-instances">Instagram</a>, <a href="https://tech.dropbox.com/">Dropbox</a>)</li> <li>Operations and systems management (<a href="https://en.wikipedia.org/wiki/Rackspace">Rackspace</a>, <a href="http://www.openstack.org/">OpenStack</a>)</li> <li>Natural language processing (<a href="http://www.nltk.org/">NLTK</a>)</li> <li>Machine learning and computer vision (<a href="http://scikit-learn.org/stable/">scikit-learn</a>, <a href="http://orange.biolab.si/">Orange</a>, <a href="http://simplecv.org/">SimpleCV</a>)</li> <li>Security and penetration testing (<a href="https://github.com/dloss/python-pentest-tools">so many</a>)</li> <li>Big Data (<a href="http://discoproject.org/">Disco</a>, <a href="http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/">Hadoop support</a>)</li> <li>Internet infrastructure (DNS) (BIND 10)</li> </ol> <p>Not to mention websites and web services aplenty. In fact, PayPal engineers seem to have a penchant for going on to start Python-based web properties. <a href="https://en.wikipedia.org/wiki/YouTube">YouTube</a> and <a href="http://yelp.com">Yelp</a>, for instance.</p> <p><a name="myth-5"></a></p> <h3 id="myth_5_python_is_weakly_typed"><a href="#myth_5_python_is_weakly_typed" class="toclink"><a href="https://sedimental.org/10_myths_of_enterprise_python.html#python-is-weakly-typed" name="python-is-weakly-typed">Myth #5</a>: Python is weakly-typed</a></h3> <p>Python's type system is characterized by strong, dynamic typing. <a href="https://en.wikipedia.org/wiki/Type_system">Wikipedia can explain more</a>.</p> <p>Not that it is a competition, but as a fun fact, Python is more strongly-typed than Java. Java has a split type system for primitives and objects, with <code>null</code> lying in a sort of gray area. On the other hand, modern Python has a unified strong type system, where the type of <code>None</code> is well-specified. Furthermore, the JVM itself is also dynamically-typed, as it <a href="https://en.wikipedia.org/wiki/HotSpot#History">traces its roots back</a> to an implemention of a Smalltalk VM acquired by Sun.</p> <p><a href="https://docs.python.org/2/reference/datamodel.html">Python's type system</a> is very nice, but for enterprise use there are much bigger concerns at hand.</p> <p><a name="myth-6"></a></p> <h3 id="myth_6_python_is_slow"><a href="#myth_6_python_is_slow" class="toclink"><a href="https://sedimental.org/10_myths_of_enterprise_python.html#python-is-slow" name="python-is-slow">Myth #6</a>: Python is slow</a></h3> <p>First, a critical distinction: Python is a programming language, not a runtime. There are several Python implementations:</p> <ol> <li><a href="https://en.wikipedia.org/wiki/CPython"><strong>CPython</strong></a> is the reference implementation, and also the most widely distributed and used.</li> <li><a href="https://en.wikipedia.org/wiki/Jython"><strong>Jython</strong></a> is a mature implementation of Python for usage with the JVM.</li> <li><a href="https://en.wikipedia.org/wiki/IronPython"><strong>IronPython</strong></a> is Microsoft's Python for the Common Language Runtime, aka .NET.</li> <li><a href="http://pypy.org/"><strong>PyPy</strong></a> is an up-and-coming implementation of Python, with advanced features such as JIT compilation, incremental garbage collection, and more.</li> </ol> <p>Each runtime has its own performance characteristics, and none of them are slow per se. The more important point here is that it is a mistake to assign performance assessments to a programming languages. Always assess an application runtime, most preferably against a particular use case.</p> <p>Having cleared that up, here is a small selection of cases where Python has offered significant performance advantages:</p> <ol> <li>Using <a href="https://en.wikipedia.org/wiki/NumPy">NumPy</a> as <a href="https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl">an interface to Intel's MKL SIMD</a></li> <li><a href="http://pypy.org/">PyPy</a>'s JIT compilation <a href="http://morepypy.blogspot.com/2011/08/pypy-is-faster-than-c-again-string.html">achieves faster-than-C performance</a></li> <li><a href="https://disqus.com/">Disqus</a> scales from <a href="http://blog.disqus.com/post/62187806135/scaling-django-to-8-billion-page-views">250 to 500 million users on the same 100 boxes</a></li> </ol> <p>Admittedly these are not the newest examples, just my favorites. It would be easy to get side-tracked into the wide world of high-performance Python and the unique offerings of runtimes. Instead of addressing individual special cases, attention should be drawn to the generalizable impact of developer productivity on end-product performance, especially in an enterprise setting.</p> <p>Given enough time, a disciplined developer can execute the only proven approach to achieving accurate and performant software:</p> <ol> <li><strong>Engineer</strong> for correct behavior, including the development of respective tests</li> <li><strong>Profile</strong> and measure performance, identifying bottlenecks</li> <li><strong>Optimize</strong>, paying proper respect to the test suite and <a href="https://en.wikipedia.org/wiki/Amdahl%27s_law">Amdahl's Law</a>, and taking advantage of Python's strong roots in C.</li> </ol> <p>It might sound simple, but even for seasoned engineers, this can be a very time-consuming process. Python was designed from the ground up with developer timelines in mind. In our experience, it's not uncommon for Python projects to undergo three or more iterations in the time it C++ and Java to do just one. Today, PayPal and eBay have seen multiple success stories wherein Python projects outperformed their C++ and Java counterparts, all thanks to fast development times enabling careful tailoring and optimization. You know, the fun stuff.</p> <p><a name="myth-7"></a></p> <h3 id="myth_7_python_does_not_scale"><a href="#myth_7_python_does_not_scale" class="toclink"><a href="https://sedimental.org/10_myths_of_enterprise_python.html#python-does-not-scale" name="python-does-not-scale">Myth #7</a>: Python does not scale</a></h3> <p>Scale has many definitions, but by any definition, <a href="https://www.youtube.com/yt/press/statistics.html">YouTube is a web site at scale</a>. More than 1 billion unique visitors per month, over 100 hours of uploaded video per minute, and going on 20% of peak Internet bandwidth, all with Python as a core technology. <a href="http://techcrunch.com/2013/07/11/how-did-dropbox-scale-to-175m-users-a-former-engineer-details-the-early-days/">Dropbox</a>, <a href="http://blog.disqus.com/post/62187806135/scaling-django-to-8-billion-page-views">Disqus</a>, <a href="http://www.infoworld.com/article/2608078/application-development/expert-interview--how-to-scale-django.html">Eventbrite</a>, <a href="http://highscalability.com/blog/2013/8/26/reddit-lessons-learned-from-mistakes-made-scaling-to-1-billi.html">Reddit</a>, <a href="http://www.slideshare.net/twilio/asynchronous-architectures-for-implementing-scalable-cloud-services-evan-cooke-gluecon-2012">Twilio</a>, <a href="http://www.slideshare.net/twilio/asynchronous-architectures-for-implementing-scalable-cloud-services-evan-cooke-gluecon-2012">Instagram</a>, <a href="http://www.slideshare.net/YelpEngineering/scale-presentation-michael-stoppelman-oct-2014">Yelp</a>, <a href="http://highscalability.com/eve-online-architecture">EVE Online</a>, <a href="http://highscalability.com/second-life-architecture-grid">Second Life</a>, and, yes, eBay and PayPal all have Python scaling stories that prove scale is more than just possible: it's a pattern.</p> <p>The key to success is simplicity and consistency. CPython, the primary Python virtual machine, maximizes these characteristics, which in turn makes for a very predictable runtime. One would be hard pressed to find Python programmers concerned about garbage collection pauses or application startup time. With strong platform and networking support, Python naturally lends itself to smart horizontal scalability, as manifested in systems like <a href="http://bittorrent.cvs.sourceforge.net/viewvc/bittorrent/BitTorrent/">BitTorrent</a>.</p> <p>Additionally, scaling is all about measurement and iteration. Python is built with <a href="https://docs.python.org/2/library/profile.html">profiling</a> and optimization in mind. See <a href="https://sedimental.org/10_myths_of_enterprise_python.html#myth-6">Myth #6</a> for more details on how to vertically scale Python.</p> <p><a name="myth-8"></a></p> <h3 id="myth_8_python_lacks_good_concurrency_support"><a href="#myth_8_python_lacks_good_concurrency_support" class="toclink"><a href="https://sedimental.org/10_myths_of_enterprise_python.html#python-lacks-concurrency" name="python-lacks-concurrency">Myth #8</a>: Python lacks good concurrency support</a></h3> <p>Occasionally debunking <a href="https://sedimental.org/10_myths_of_enterprise_python.html#myth-6">performance</a> and <a href="https://sedimental.org/10_myths_of_enterprise_python.html#myth-7">scaling</a> myths, and someone tries to get technical, "Python lacks concurrency," or, "What about the GIL?" If dozens of counterexamples are insufficient to bolster one's confidence in Python's ability to scale vertically and horizontally, then an extended explanation of a <a href="https://en.wikipedia.org/wiki/CPython">CPython</a> implementation detail probably won't help, so I'll keep it brief.</p> <p>Python has great concurrency primitives, including [generators][gen_concurrency], <a href="https://greenlet.readthedocs.org/en/latest/">greenlets</a>, <a href="https://twistedmatrix.com/documents/14.0.0/core/howto/defer.html">Deferreds</a>, and <a href="http://pythonhosted.org/futures/">futures</a>. Python has great concurrency frameworks, including <a href="https://eventlet.readthedocs.io/en/latest/">eventlet</a>, <a href="http://www.gevent.org/">gevent</a>, and <a href="https://twisted.org/">Twisted</a>. Python has had some amazing work put into customizing runtimes for concurrency, including <a href="http://www.stackless.com/">Stackless</a> and <a href="http://pypy.org/">PyPy</a>. All of these and more show that there is no shortage of engineers effectively and unapologetically using Python for concurrent programming. Also, all of these are officially support and/or used in enterprise-level production environments. For examples, refer to <a href="https://sedimental.org/10_myths_of_enterprise_python.html#myth-7">Myth #7</a>.</p> <p>The Global Interpreter Lock, or GIL, is a performance optimization for most use cases of Python, and a development ease optimization for virtually all CPython code. The GIL makes it much easier to use OS threads or <a href="https://en.wikipedia.org/wiki/Green_threads">green threads</a> (greenlets usually), and does not affect using multiple processes. For more information, <a href="http://programmers.stackexchange.com/questions/186889/why-was-python-written-with-the-gil">see this great Q&amp;A on the topic</a> and <a href="https://docs.python.org/3/library/concurrency.html">this overview from the Python docs</a>.</p> <p>Here at PayPal, a typical service deployment entails multiple machines, with multiple processes, multiple threads, and a very large number of greenlets, amounting to a very robust and scalable concurrent environment. In most enterprise environments, parties tends to prefer a fairly high degree of overprovisioning, for general prudence and disaster recovery. Nevertheless, in some cases Python services still see millions of requests per machine per day, handled with ease.</p> <p><a name="myth-9"></a></p> <h3 id="myth_9_python_programmers_are_scarce"><a href="#myth_9_python_programmers_are_scarce" class="toclink"><a href="https://sedimental.org/10_myths_of_enterprise_python.html#python-programmers-scarce" name="python-programmers-scarce">Myth #9</a>: Python programmers are scarce</a></h3> <p>There is some truth to this myth. There are not as many Python web developers as PHP or Java web developers. This is probably mostly due to a combined interaction of industry demand and education, though <a href="http://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-most-popular-introductory-teaching-language-at-top-us-universities/fulltext">trends in education suggest that this may change</a>.</p> <p>That said, Python developers are far from scarce. There are millions worldwide, as evidenced by the dozens of Python conferences, tens of thousands of StackOverflow questions, and companies like YouTube, Bank of America, and LucasArts/Dreamworks employing Python developers by the hundreds and thousands. At eBay and PayPal we have hundreds of developers who use Python on a regular basis, so what's the trick?</p> <p>Well, why scavenge when one can create? Python is exceptionally easy to learn, and is a first programming language <a href="http://www.nostarch.com/pythonforkids">for children</a>, <a href="http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-189-a-gentle-introduction-to-programming-using-python-january-iap-2011/">university students</a>, and <a href="https://developers.google.com/edu/python/?csw=1">professionals</a> alike. At eBay, it only takes one week to show real results for a new Python programmer, and they often really start to shine as quickly as 2-3 months, all made possible by the Internet's rich cache of interactive tutorials, books, documentation, and open-source codebases.</p> <p>Another important factor to consider is that projects using Python simply do not require as many developers as other projects. As mentioned in Myth #7, lean, effective teams like Instagram are a common trope in Python projects, and this has certainly been our experience at eBay and PayPal.</p> <p><a name="myth-10"></a></p> <h3 id="myth_10_python_is_not_for_big_projects"><a href="#myth_10_python_is_not_for_big_projects" class="toclink"><a href="https://sedimental.org/10_myths_of_enterprise_python.html#python-not-for-big-projects" name="python-not-for-big-projects">Myth #10</a>: Python is not for big projects</a></h3> <p><a href="https://sedimental.org/10_myths_of_enterprise_python.html#myth-7">Myth #7</a> discussed running Python projects at scale, but what about <em>developing</em> Python projects at scale? As mentioned in <a href="https://sedimental.org/10_myths_of_enterprise_python.html#myth-9">Myth #9</a>, most Python projects tend not to be people-hungry. while Instagram reached hundreds of millions of hits a day at the time of their <a href="http://www.slate.com/blogs/business_insider/2013/11/14/facebook_s_1_billion_instagram_buy_did_kevin_systrom_sell_too_soon.html">billion dollar acquisition</a>, the whole company was <a href="http://instagram-engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-instances">still only a group of a dozen or so people</a>. Dropbox in 2011 <a href="http://www.forbes.com/sites/victoriabarret/2011/10/18/dropbox-the-inside-story-of-techs-hottest-startup/">only had 70 engineers</a>, and other teams were similarly lean. So, can Python scale to large teams?</p> <p>Bank of America actually has <a href="http://news.efinancialcareers.com/us-en/173476/investment-banking-tech-guru-quits-starts-firm/">over 5,000 Python developers, with over 10 million lines of Python in one project alone</a>. JP Morgan underwent <a href="http://www.quora.com/When-why-and-to-what-extent-did-Bank-of-America-rebuild-its-entire-tech-stack-with-Python">a similar transformation</a>. YouTube also has engineers in the thousands and lines of code <a href="http://highscalability.com/blog/2012/3/26/7-years-of-youtube-scalability-lessons-in-30-minutes.html">in the millions</a>. Big products and big teams use Python every day, and while it has excellent modularity and packaging characteristics, beyond a certain point much of the general development scaling advice stays the same. Tooling, strong conventions, and code review are what make big projects a manageable reality.</p> <p>Luckily, Python starts with a good baseline on those fronts as well. We use <a href="https://github.com/pyflakes/pyflakes/">PyFlakes</a> and <a href="https://pypi.org/pypi/flake8">other tools</a> to perform static analysis of Python code before it gets checked in, as well as adhering to <a href="https://www.python.org/dev/peps/pep-0008/">PEP8</a>, Python's language-wide base style guide.</p> <p>Finally, it should be noted that, in addition to the scheduling speedups mentioned in <a href="https://sedimental.org/10_myths_of_enterprise_python.html#myth-6">Myth #6</a> and <a href="https://sedimental.org/10_myths_of_enterprise_python.html#myth-7">#7</a>, projects using Python generally require fewer developers, as well. Our most common success story starts with a Java or C++ project slated to take a team of 3-5 developers somewhere between 2-6 <em>months</em>, and ends with a single motivated developer completing the project in 2-6 <strong>weeks</strong>. It's not unheard of for some projects to take hours instead of weeks, as well.</p> <p>A miracle for some, but a fact of modern development, and often a necessity for a competitive business.</p> <h3 id="a_clean_slate"><a href="#a_clean_slate" class="toclink">A clean slate</a></h3> <p>Mythology can be a fun pastime. Discussions around these myths remain some of the most active and educational, both internally and externally, because implied in every myth is a recognition of Python's strengths. Also, remember that the appearance of these seemingly tedious and troublesome concerns is a sign of steadily growing interest, and with steady influx of interested parties comes the constant job of education. Here's hoping that this post manages to extinguish a flame war and enable a project or two to talk about the real work that can be achieved with Python.</p> <p>Keep an eye out for future posts where I'll dive deeper into the details touched on in this overview. If you absolutely must have details before then, shoot me an email at mahmoud@paypal.com. Until then, happy coding!<p></p> <hr /> https://sedimental.org/designing_a_fast.html Mahmoud Hashemi https://sedimental.org/ Designing a fast 2015年07月16日T00:00:00Z 2015年07月16日T00:00:00Z <p><p>I wake up with a jolt, spilling most of my breakfast cereal onto a thirsty couch. My eyes find the clock. Cleaning will have to wait. I'm downing water like there's no tomorrow, but really tomorrow starts in one minute. Still drinking. All work is thirsty work if the day is long enough, and engineering is no exception. Time's up.</p> <p>From the literal break of dawn to sunset, no food, drink, or other respite. It's <a href="https://en.wikipedia.org/wiki/Ramadan">Ramadan</a>. What does this mean, practically? Well, summertime here in Silicon Valley, it means from 4am to 9pm, I battle human nature while writing emails and software. But, far from an antiquated ritual, I see Ramadan as an exercise in lifestyle design.</p> <p><a href="https://www.flickr.com/photos/mahmoudhashemi/15900668295/in/album-72157647187331183/"> <img src="https://sedimental.org/uploads/silicon_valley_pano_1_med.jpg" width="100%" title="The South Bay packs nearly 500 hours of summertime sun into one month. Oh, goodie." /></a></p> <p>As we near the end of Ramadan <a href="https://en.wikipedia.org/wiki/Islamic_calendar">1436</a>, this year has proven that even in modern and diverse environs, every year brings the same reactions and questions as 1435. Mostly boiling down to:</p> <ul> <li><a href="https://sedimental.org/designing_a_fast.html#what-not-even-water">"What? Not even water?"</a></li> <li><a href="https://sedimental.org/designing_a_fast.html#why">"Why?"</a></li> <li><a href="https://sedimental.org/designing_a_fast.html#how">"How?"</a></li> </ul> <h2 id="what_not_even_water"><a href="#what_not_even_water" class="toclink">What? Not even water?</a></h2> <p>A bit facetious, but this really is the most common question I get. So just to be clear, traditional interpretation calls for no food, drink (including water), or drugs. From the crack of dawn to sunset. Or <a href="http://www.wunderground.com/sky/ShowSky.asp?TheLat=37.34486389&amp;TheLon=-121.88478088&amp;TimeZoneName=America/Los_Angeles">in the technical terms</a>, the beginning of sunrise's <a href="https://en.wikipedia.org/wiki/Twilight#Astronomical_twilight">astronomical twilight</a> to the beginning of sunset's <a href="https://en.wikipedia.org/wiki/Twilight#Civil_twilight">civil twilight</a>.</p> <p>Individuals adjust according to limitations. If you're not healthy enough to fast, you don't fast. If you feel like you can't complete a fast, you don't. If the <a href="http://www.theatlantic.com/international/archive/2013/07/how-to-fast-for-ramadan-in-the-arctic-where-the-sun-doesnt-set/277834/">sun doesn't set</a>, just do something reasonable. Your intentions are your own, and self-harm does not enter into the purposes of Ramadan.</p> <h2 id="why"><a href="#why" class="toclink">Why?</a></h2> <p>Everyone has their reasons, but first off Ramadan is not some sort of collective diet. Yes, Ramadan is used by many as a springboard to stymie smoking, overeating, and other unhealthy physical habits. But for me, fasting is about building four virtues:</p> <ul> <li>Empathy</li> <li>Reflection</li> <li>Discipline</li> <li>Confidence</li> </ul> <p>Not exactly the stuff of classrooms and annual compliance trainings. And yet people are expected to just find these characteristics within themselves, even in environments most antithetical. Countless well-compensated designers and engineers know about the limits of limitless life. We almost immediately <a href="http://www.fastcompany.com/3027379/work-smart/the-psychology-of-limitations-how-and-why-constraints-can-make-you-more-creative">pine</a> <a href="http://tympanus.net/codrops/2011/10/28/be-more-creative-through-design-constraints/">for</a> <a href="https://medium.com/the-year-of-the-looking-glass/constraints-are-hard-23a05df9bdce">constraints</a>. <a href="https://en.wikipedia.org/wiki/Negative_liberty">Negative liberty</a> only goes so far, then real freedom becomes about the ability to formulate and follow the orders you give yourself. Design grants creative autonomy, but design tools offer a hundred possibilities draped in a thousand distractions.</p> <p>Empathy is the most obvious trait built by fasting, and the one promoted most when I was younger. There are poor people in the world, and all should experience their hunger and thirst to understand. Fasting puts you on the path closest to the one they walk, building a visceral empathy that simple imagination can't match. When was the last time you were <a href="https://www.youtube.com/watch?v=oOg5VxrRTi0">hungry like the wolf</a>? One month of senses too sharp for civil society. One month of feeling the natural appetites object and interrupt your every thought. But it keeps one connected to so many people, from the most <a href="https://en.wikipedia.org/wiki/List_of_hunger_strikes">intense protesters</a> to as many as <a href="http://www.feedingamerica.org/hunger-in-america/impact-of-hunger/child-hunger/child-hunger-fact-sheet.html">a fifth of</a> <a href="http://people.uwec.edu/jamelsem/papers/healthy_lunch/taras_nutrition_paper.pdf">American students</a>.</p> <p>Reflection is critical to the Ramadan fast. Take away food and water, and within a few hours you're transported to the banks of a personal <a href="https://en.wikipedia.org/wiki/Walden">Walden Pond</a>. In much the same way that exercise burns off dirty, anxious energy, fasting stops it from being produced in the first place. It quiets the shores of one's psyche and in the stillness, all is clear. This is the part of Ramadan I look forward to most: a staycation from my usual self-imposed obligations. The line between essential and unnecessary is bright. I don't know much about meditation, but most days of the month, around sunset, I find a certain peaceful state, every thought sorted away in its right place.</p> <p>Midday is another story. Shouldering a normal workload with the added constraint of a fast is the definition of a stress test. Except unlike software and other commonly-tested constructs, the systems at work here involved grow and strengthen naturally. During Ramadan, I stockpile this discipline to burn over the next 11 months. Discipline complements motivation, especially with creative work like software and architecture. Whereas frustration obviates motivation, <a href="http://www.wisdomination.com/screw-motivation-what-you-need-is-discipline/">discipline rises to the occasion</a>, grateful for the opportunity to push through and grow.</p> <!-- It's a bit crude for direct linking, but I didn't have time to find a better one --> <p>All of the above pours into the last attribute. Confidence is deeply linked to feelings of sufficiency: the ability to say, "What I have is enough to do what I want to do." I'm a big fan of water myself, but even something as essential as hydration isn't as <a href="https://www.kickstarter.com/projects/905031711/trago-the-worlds-first-smart-water-bottle">big</a> <a href="https://www.kickstarter.com/projects/582920317/hidrateme-smart-water-bottle?ref=video">a deal</a> <a href="https://en.wikipedia.org/wiki/Vessyl">as we make it</a>. My adolescent fascination with basketball was rooted in <a href="http://www.thenational.ae/sport/north-american-sport/ramadan-or-not-hakeem-olajuwon-a-dominant-force-in-nba">Hakeem Olajuwon playing whole NBA games</a> against the Chicago Bulls, 12 hours into a fast. More recently, <a href="http://www.ibtimes.co.uk/ramadan-2014-did-algeria-lose-germany-because-their-players-were-fasting-1454792">a fasting Algeria played a strong World Cup game</a> against winners-to-be Germany. People thirst for confidence, not water. Ramadan is a reminder that personal excess breeds anxiety. Consumerism's advertising immerses us in false dependence. Ramadan is the gentle reaffirmation you send yourself that, yes, <em>you</em> can do more with less.</p> <h2 id="how"><a href="#how" class="toclink">How?</a></h2> <p>At this point, the <em>how</em> is more of a logistical appendix, but this year's approach was particularly successful. Each year, Ramadan's approach gets me nervous. No matter how many times I fast, despite having survived and thrived not one year ago, I still get skittish at the thought of it. I focus in on the circumstances new to the year, and can't help tweaking my design.</p> <p>Everyone has different lives and schedules, but my Ramadan unfolds in three phases:</p> <ul> <li>Phase 1: Just make it through in one piece. The first 4-5 days.</li> <li>Phase 2: Requires a conscious and concerted effort. The middle twenty days or so.</li> <li>Phase 3: The fast is the new normal. Usually just the last few days of the month.</li> </ul> <p>My Ramadan technique goes into effect from day 1. It can be a rough transition, involving some falling asleep while eating cereal, but the long-day summer technique has been perfected over years. Granted, its design leans on the unique schedule afforded a young software engineer. Not everyone can switch away from a standard work-a-day-sleep-at-night schedule. The median practicing Western Muslim probably approaches Ramadan like this:</p> <ul> <li>Get to work at 9am.</li> <li>Work til 5pm.</li> <li>Get home at 6pm. Cook, clean, tend to kids.</li> <li>Eat at 9pm.</li> <li>Sleep around midnight.</li> <li>Wake up before 4am, eat again.</li> <li>Sleep until 6-8am.</li> </ul> <p>Straightforward enough, but far from optimal. There's no period of sleep longer than 4 hours, which leaves my energy on a different valence altogether. For the last three years, I've improved on the naïve solution, by switching to a <a href="https://en.wikipedia.org/wiki/Segmented_sleep">bimodal sleep schedule</a>:</p> <ul> <li>Get to work around 11am.</li> <li>Skip lunch, hit the books til 5-6pm.</li> <li>Get home, take a long nap at 7pm. This last bit would just be clockwatching anyways.</li> <li>Wake up at 9pm. Dinner for breakfast!</li> <li>Read, write, and code for the next 6 hours.</li> <li>3:45am. Eat breakfast, taking care not to fall asleep.</li> <li>Sleep through til 10am and repeat.</li> </ul> <p>It's a fun change of pace. If the workday seems short, keep in mind that there are no meal or snack breaks, so it evens out. Similarly, there's a lot of new time discovered in these quiet, contemplative nights. Overall my energy, while restricted, stays predictable and manageable. I'm no Hakeem Olajuwon or Algerian footballist, but this year I managed to continue to bike everywhere, several times riding 6 to 15 miles per day. Other innovations this year have included playing violin to stay awake and just eating a small bowl of raisin bran for breakfast. Eating less is unintuitive, but I wake up less thirsty than trying to cram in more calories, and hunger is easier to manage than thirst. Oh, and <span title="aka sparkling water aka the original 0-calorie beverage">bubble water</span>.</p> <p><a href="https://sedimental.org/uploads/bubble_water.jpg"> <img src="https://sedimental.org/uploads/bubble_water.jpg" width="100%" title="Bubble water: It's good for sippin!TM" /></a></p> <!-- After a 17-hour day, your body greets food and water like parched earth does rain. Unfamiliar, no matter how much water you drink, it takes at least an hour before you begin to feel hydrated again. --> <p>Sometimes during the day I'd find myself impatient, checking the calendar to see how many days are left. But just as many times at night I've caught myself lamenting the quickness with which my split days have slid past. With <a href="https://en.wikipedia.org/wiki/Eid_al-Fitr">Eid-ul-Fitr</a> right around the corner, I must admit I am pleased with the special satisfaction brought by another year, another fast well designed.<p></p> <hr /> https://sedimental.org/colophon.html Mahmoud Hashemi https://sedimental.org/ Colophon 2015年05月01日T00:00:00Z 2015年05月01日T00:00:00Z <p><p>Most blogs, like this one, are reverse-chronological, causing the first post to appear last in the archive. This convention makes a <a href="https://en.wiktionary.org/wiki/colophon">colophon</a> the <a href="https://en.wikipedia.org/wiki/King's_Pawn_Game">King's Pawn Game</a> of web authorship; there's no better place to showcase certain implementation details than the first post of a blog.</p> <p>This site is generated with <a href="https://github.com/mahmoud/chert">Chert</a><sup id="fnref:pronounce"><a class="footnote-ref" href="https://sedimental.org/colophon.html#fn:pronounce">1</a></sup>, an open-source static site generator built with <a href="http://python.org">Python</a>, <a href="https://en.wikipedia.org/wiki/Markdown">Markdown</a><sup id="fnref:emd"><a class="footnote-ref" href="https://sedimental.org/colophon.html#fn:emd">2</a></sup>, <a href="https://github.com/mahmoud/ashes">ashes</a>, <a href="http://pygments.org/">pygments</a>, and <a href="https://en.wikipedia.org/wiki/YAML">YAML</a>. Chert is named for a very common <a href="https://en.wikipedia.org/wiki/Chert">fine-grained sedimentary rock</a>, often referred to as <em>flint</em>, which has been of critical use to firestarters through the ages.</p> <p><em>(Keep an eye out for a forthcoming, longer entry on why I built Chert and what makes it different.)</em></p> <div class="footnote"> <hr /> <ol> <li id="fn:pronounce"> <p>English pronunciation rhymes with <em>dirt</em>, maintainer/Farsi pronunciation: <em>chair</em> with a <em>t</em> at the end. <a class="footnote-backref" href="https://sedimental.org/colophon.html#fnref:pronounce" title="Jump back to footnote 1 in the text">↩</a></p> </li> <li id="fn:emd"> <p>Enhanced Markdown, including support for <a href="https://python-markdown.github.io/extensions/footnotes/">footnotes</a>, <a href="https://python-markdown.github.io/extensions/definition_lists/">definition lists</a>, and <a href="https://python-markdown.github.io/extensions/toc/">tables of contents</a>. <a class="footnote-backref" href="https://sedimental.org/colophon.html#fnref:emd" title="Jump back to footnote 2 in the text">↩</a></p> </li> </ol> </div><p></p> <hr />

AltStyle によって変換されたページ (->オリジナル) /