• Posted on

    Learning Dutch online in 2023

    Back in 2020, I left France to live in the Netherlands. Since then, I have spent some time learning the language and reached an intermediate (B1-B2) level. While learning Dutch is not a hard requirement to live and work in the country, it does make day-to-day life a bit more convenient. There are plenty of resources out there to learn languages, so here’s my pick of tools that are worth spending your time on.

    A dutch windmill
    Photo by Artem Shuba

    Duolingo

    Duolingo is probably the most famous language-learning app around. Despite its popularity, many people in the language learning community will dismiss it as just a toy. It is, in my opinion, actually great for beginners. One of the main benefits of it is that it focuses on complete sentences, and will drill the sentence structure of your target language in your head.

    A Duolingo translation exercise

    However, once you have acquired a basic idea of your target language’s grammar the application loses any usefulness. Duolingo is sadly extremely slow at introducing new vocabulary, and going across the whole course is unnecessarily time-consuming. Which I guess is a consequence of its subscription business model. The gamification makes it addictive enough that I reached a 300 days learning streak, but I would honestly not recommend using it for more than six months.

    Anki

    Anki is a very often recommended option in language learning communities. It is a spaced repetition app, that will show you flashcards and ask you if you know the correct translation. Unlike Duolingo, Anki is essentially focused on vocabulary.

    The Anki mac interface

    It is free and open-source software, so the application gives you a lot of customization options, which lets you introduce new words at the rhhythm you want. A few Dutch decks are available online, but many Anki users prefer creating their own decks with words they encounter in target language content.

    Anki didn’t stick with me, I think the issue was the lack of variety in exercises. Doing the same thing to every flashcard turned the activity into a chore (instead of the addictive game that Duolingo was).

    Memrise

    Memrise is my personal favorite. Like Anki, it is a spaced repetition app, which focuses on memorizing vocabulary. But it is much more enjoyable than Anki because it alternates between different exercise variations. Adding to that, their repetition algorithm seems smarter, making for a more enjoyable and efficient learning experience.

    The memrise interface

    It contains seven official courses, for a total of more than 2000 words. Which is largely enough content to start engaging with native content. Additional content is provided by the community.

    Just like Duolingo, some features are locked behind a subscription.

    Italki

    Unlike the previous apps, Italki is not focused on providing exercises but on matching students with language teachers. Some could say it is the “Uber” of language learning, providing cheap and personalized courses. It is a pretty good way of getting into some conversation practice, which is especially useful if you are not in the country. The experience will vary from teacher to teacher, but I had a great time using it.

    The italki teacher selection

    Native content

    Of course, apps and courses are a great help when learning a language, but exposure to native content is what will help the most in the journey toward fluency. In the beginning, I found NOS Jeugdjournaal to be very helpful. It is the kid’s version of the NOS newspaper and comes with shorter articles, simpler vocabulary, and less depressing news. Most articles are accompanied by video clips which also helps build your listening, and their daily “Ochtendjournaal” can be easily fit into a routine.

    The NOS Jeugdjournaal home page

    For more advanced learners, there are quite a few Dutch TV shows and movies around. Amongst my personal favorites are the Dirty Lines show, which is about two brothers creating the first dutch phone-sex line, and the movie De Oost, which covers the story of a dutch soldier in the Indonesian War of Independence. You can find a list of Dutch movies on IMDB.

  • Posted on

    How much can you really get out of a 4$ VPS?

    When starting a new project, evaluating the budget needed for cloud hosting can be a tricky question. On one side, you will hear horror stories of people waking up to an unexpected 100k$ bill from their provider. But on the other hand, you will see providers advertising costs sometimes as low as 4$ per month for a virtual machine. In this article, I will perform load testing on one of those 4$ VPS (from an unnamed provider) to figure out if the promise of running your production on such a low budget is realistic.

    A unicorn toy in front of small stack of coins
    Photo by Annie Spratt

    The test application

    For this test, I designed a simple CRUD application in Go. It mimics a blogging application and lets user create posts, lists the latest posts, and display a single post. In other words, it has the following three routes:

    • a GET / route that renders an HTML template and shows the title of the 10 latest posts
    • a GET /<post_id> route that renders an HTML template and shows the title and body of the selected post
    • a POST / route that accepts a JSON with the post title and body, timestamps it, stores it in the database and redirects to GET /

    For the database, I chose to use MongoDB. I picked it because it is simple to set up, popular, and claims of being web scale.

    The application was developed without making any particular performance optimizations. The only database-specific optimization I created was to create an index on the post timestamps, which allows listing the latest posts decently fast.

    Both the application and MongoDB were deployed using Docker with docker-compose.

    The load test

    I used K6 to perform a load test. K6 is a software that will generate “virtual users” who continuously run test scenarios defined using javascript.

    I defined two scenarios:

    • 10% of users would be creating posts
    • 90% of users display the latest 10 posts, and then open one of those posts (picked at random).

    The test would progressively ramp up until we reach 50 virtual users and then come back down, for a total test duration of 1min30s.

    k6 output showing 94091 succesful tests

    Launching the test from my local computer, k6 managed to succesfully run more than 94k request, with an average duration of 21ms per request, and a 95 percentile duration of 33ms. While this test didn’t reach the point where the server would be failing, a closer look at the data already gives more insights. I exported k6 metrics to a CSV, and used pandas to analyze the data. Plotting the request durations against the number of requests per second we can observe that the duration starts spiking when k6 sends around 1300 requests/seconds.

    The request duration and request rate graph

    During the tests, we can identify a potential bottleneck. The CPU load increases with the number of virtual users, and quickly reaches 100%. This is shown in the htop screenshot below, with both mongo and the application itself requesting most of the available CPU. In contrast, both RAM and disk throughput seemed to be steadily lower than the system capabilities.

    Htop showing 100% CPU usage

    Conclusion and limitations

    This test shows that, as long as you don’t plan on building the next Twitter, a very cheap VPS might be fine for the start of a project. However, this result might seriously differ from real life applications because those contain complex business logic requiring more resources than a simple CRUD application. Adding to that, more networking overhead is bound to happen when clients connect from different IPs and use TLS, which I did not do in the above test.

    The best way to determine the hosting budget for a real application would be to test it until failure using a distributed k6 setup. This can be done using the k6 operator for kubernetes or the (somewhat expensive) k6 cloud.

    You can find the source code for the application, k6 scenarios and analysis script used in this article on my GitHub.

  • Posted on

    Does ChatGPT dream about cryptographic cats?

    Back in 2017, the tech world seemed to be constantly talking about a single subject: Blockchains. Two years ago, Vitalik Buterin revolutionized the nascent field by creating Ethereum. Ethereum was at the time a cryptographic protocol that would allow people to make distributed computations, without having to blindly trust the entire network. It was built on Bitcoin’s concept of blockchain: a distributed, unalterable public ledger of transactions secured through enormous amounts of computing power. But it was not just an alternative currency like Bitcoin. Ethereum was able to run any sort of computation, and could not only be used as a ledger for ETH, but as a database for distributed applications.

    A cat stepping on a paper sheet with ideogram-looking symbols
    A "cryptographic cat" generated by Midjourney

    This technological breakthrough led to a lot of promises being made at the time:

    Sadly, Ethereum had a few issues. First, it was extremely slow. Unlike traditional distributed systems, its fault-tolerant nature required every node to run every computation. This also made the system costly, as node owners were to be compensated with a “gas fee” for running computations. And in addition to that, a public read-only ledger is by definition a privacy nightmare, and would not fare well with laws like the GDPR or HIPAA. Everybody wanted a secure, decentralized internet, but nobody wanted a slow, costly, and overly-public internet.

    While Ethereum has improved on many of its past technical issues, almost none of the wild promises made in 2017 have been fulfilled. However, Ethereum has been for the past few years one of the wildest avenues for financial speculators. Not only people can speculate on the price of ETH, but also on derivative tokens. After a wave of ICOs (the crypto equivalent of a penny stock entering the market), speculators decided to stop pretending to bet on anything that made sense and started trading jpeg files. The “CryptoKitties” application soon became the biggest hit on the Ethereum network.

    Just an ugly drawing of a cat
    A cryptokitty (I didn't pay for it)

    In 2023, everyone in the tech world is talking about Generative AI. Last year saw the release of multiple image generation models such as DALL-E 2, Stable Diffusion, and MidJourney. Also notably, OpenAI decided to tease the upcoming GPT-4 text generation model, by repackaging its GPT-3 model in the form of a chatbot: ChatGPT.

    Just like Ethereum, those are truly impressive technological breakthroughs. In less than a decade, image generation models went from being able to create vaguely psychedelic patterns (DeepDream) to completely generating paintings in the style of any popular artist. In the same way, text generation models are now good enough, they could easily be mistaken for a human. Thankfully, ChatGPT kindly discloses it’s an AI language model when asked.

    This progress in AI also comes with its set of crazy promises:

    This optimism seems to forget a few limitations of this technology, which I did ask ChatGPT itself to tell us about:

    1. Lack of understanding: Generative models may generate outputs that lack context, meaning or coherence.
    2. Computational costs: Training generative models can be computationally expensive and require large amounts of data and computing power.
    3. Bias in data: If the training data is biased, the model may generate outputs that reflect this bias.
    4. Mode collapse: Models may generate limited outputs and fail to cover the diversity of the data.
    5. Overfitting: Generative models may memorize the training data and fail to generalize to new data.
    6. Explainability: Generative models are often complex and difficult to understand or explain, making it challenging to assess their decision-making processes.

    But just like Ethereum, it looks like ChatGPT is already attracting financial speculators. Last week, BuzzFeed, an usually pretty un-noticeable penny stock, rose by more than 300% on the simple news that the company would make use of generative AI. As the quote goes, “History never repeats itself, but it does often rhyme”.

  • Posted on

    Plaid Layoffs and beyond

    Last week, Plaid announced laying off 20% of its workforce (260 people). Today I signed my termination agreement, which makes this week my last week as part of Plaid’s infrastructure team.

    An unsigned contract
    Photo by Kelly Sikkema

    Moving forward

    While being laid off and having to leave a very talented team is never a happy moment, I have decided to use the time ahead of me as an opportunity to focus on personal growth and figure out were I want my career to go next. I will not be seeking new full-time employment immediately.

    First, I want to get back into actually building things by myself. My programming skills and technology knowledge has immensely increased since I left college. Since then, I have not had the time and energy to get into serious side projects. I intend to use the next few months to fix that, although I am not sure exactly what form my new projects will take.

    Second, I need some time to explore infrastructure tooling outside of kubernetes. I enjoy working with kubernetes, but I am also starting to suspect that we are close to the peak of the hype cycle. If that’s the case, having a career focused on k8s might become as attractive as being an Enterprise JavaBeans expert. To avoid that, I intend to seek out “what’s new” in the infrastructure space and explore new technologies.

    While I’m talking about diversifying my skill portfolio, I also intend to explore subjects that are way outside of my comfort zone, mainly regarding communication. I think effective communication is an essential skill to anyone working in tech, which is currently my weak point. To fix that, I will try to inform myself about writing and visual communication techniques. I intend to practice by writing and talking more about what I do. And while I’m at it, I might try to finish learning dutch.

    So with all that, expect more activity on this blog in the next few months. I will do my best to talk about the projects I’m creating, the technologies I’m trying and everything I discover along the way.

  • Posted on

    DEFCON 30

    This summer I had the opportunity to attend DEFCON 30, a cybersecurity conference gathering around 27000 hackers in the fabulous city of Las Vegas, Nevada. With more than 30 villages and 3 main conference tracks, the event managed to cover pretty much every subject from malware analysis to online drug dealing.

    The welcome to Las Vegas sign
    Photo by Grant Cai

    Best talks

    Roger Dingledine from the Tor Project made a fairly news-relevant talk explaining how Russia is trying to block Tor. It gives an explanation of the software produced by the Tor project, such as Tor, Tor browser, and pluggable transports (like meek). Those last ones are the most important here since they can help bypass attempts made by dictatorships to block Tor. The talk dives a bit deeper into Russia’s censorship of Tor and explains its numerous flaws and shortcomings.

    Another very interesting talk was from Nikita Kurtin, about bypassing Android permissions. This talk shows perfectly how thinking outside the box can lead you to completely break complex permission systems. In this case, he uses a mix of UX and system tricks to get users to agree to anything, all the time.

    And lastly, Minh Duong gave the most fun talk of this conference by explaining how he Rick Roll’d his entire school district. It explains how he managed to take over his school network, using known vulnerabilities and software misconfigurations, and progressively escalated his position until he was able to play “Never gonna give you up” everywhere. Definitely, a good example of realistic hacking, far away from academic papers and armchair exploit development.

    The villages

    Each village provided either a set of talks, and activities. I didn’t stick too long in the Cloud and AppSec villages, as I wanted to use the conference to also discover subjects I am less used to. The physical security, tamper-evident, and lockpicking villages were particularly interesting to me, as I had not really explored non-computer topics of security before. And honestly, they almost made me think picking locks was going to be easy!

    The car hacking and voting machine villages also allowed me to have a glimpse into topics that will probably become quite important to the industry in the near future. The biohacking village was also interesting as it provided a few medical devices to try and break, although I am not sure if anyone managed to actually root anything during the conference.

    The other stuff

    At night, the talks and villages left room for parties. Not only this made for a good socializing opportunity, but we also managed to see an absolutely awesome show by Taiko Project.

    I didn’t really take the time to solve the badge challenges, but I still found it very cool that it contains an actual playable keyboard.

    The DEFCON 30 keyboard badges

    And I was almost going to forget but, Vegas was strange, but also actually a nice city. I don’t think I would mind having to face the desert heat once more if I have the occasion.

subscribe via RSS