Førstesiden Bli medlem Kontakt Informasjon Medlemsfordeler Utvalg Kalender NUUG/HIO prisen Dokumenter Innmelding Ressurser Mailinglister Wiki Linker Om de aktive Kart NUUG i media Planet NUUG webmaster@nuug.no
Powered by Planet! Last updated: December 11, 2019 09:30 PM

Planet NUUG

December 09, 2019

Petter Reinholdtsen

Artikkel om Nikita i Arkheion nummer 2019/2

Jeg hadde i dag gleden av å oppdage at en artikkel om arkivsystemet Nikita som vi skrev i sommer, nå er publisert i Arkheion, fagtidsskrift for kommunial arkivsektor. Du finner artikkelen på side 30-33 i nummer 2019/2, PDF kan lastes ned fra nettstedet til tidsskriftet. Kanskje publiseringen kan føre til at noen flere får øynene opp for verdien av et åpent standardisert API for arkivering og søk i arkivet.

Som vanlig, hvis du bruker Bitcoin og ønsker å vise din støtte til det jeg driver med, setter jeg pris på om du sender Bitcoin-donasjoner til min adresse 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b. Merk, betaling med bitcoin er ikke anonymt. :)

December 09, 2019 02:05 PM

December 08, 2019

Nicolai Langfeldt

Bluray with menus on Linux - on Ubuntu

For the longest time it was impossible to play BluRay disks on Linux due to the lack of players that could do it.  VLC has been the most capable video player on Linux and some time ago they managed it.

I run Ubuntu at home.  I can easily install VLC but some parts were missing to get it working.
  1. Be root: sudo -i
  2. libaacs decodes Blurays: apt-get install libaacs0
  3. BluRays or at least VLC need Java 8: apt-get install openjdk-8-jre
  4. Ubuntu and VLC does not agree on the right directory name:  cd /usr/lib/jvm/
  5. Link the right one: ln -s java-1.8.0-openjdk-amd64 java-8-openjdk
  6. This library implements BD-J menus apt-get install libbluray-bdj
Now insert a BluRay disk and play it: vlc bluray://

It should start up with menus. Use arrow keys to navigate and Enter to choose.

by nicolai (noreply@blogger.com) atDecember 08, 2019 09:51 PM

December 07, 2019

Petter Reinholdtsen

When terms and policy turn users away

When asked to accept terms of use and privacy policies that state it will to remove rights I otherwise had or accept unreasonable terms undermining my privacy, I choose away the service. I simply do not have the conscience to accept terms I have no indention of upholding. But how are the system and service providers to know how many people they scared away? Normally I just quietly walk away. But today, I tried a new approach. I sent the following email (removing the specifics, as I am not out to take the specific service in question) to the service provider I decided to not use, to at least give them one data point on how many users are unhappy with their terms:

From: Petter Reinholdtsen
Subject: When terms of use turn users away
To: [contact@some.site]
Date: Sat, 07 Dec 2019 16:30:56 +0100

Dear [Site Owner],

I was eager to test the system, as it seemed like a fun and interesting application of [some] technology, but after reading the terms of use and privacy policy on <URL: https://www.[some.site]/terms-of-use > and <URL: https://www.[some.site]/privacy-policy > I want you to know that I decided to turn away. There were several provisions in the terms and policy turning me off, but the final term that convinced me was being asked to sign away my right to reverse engineer.

--
Happy hacking
Petter Reinholdtsen

I do not expect much to come out of it, but sharing it here in case others want to give something similar a try too. If companies discover their terms scare away enough people, perhaps they will be improved...

As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

December 07, 2019 08:15 PM

October 03, 2019

Peter Hansteen (That Grumpy BSD Guy)

Badness, Enumerated by Robots

A condensed summary of the blacklist data generated from traffic hitting bsdly.net and cooperating sites.

After my runbsd.info entry (previously bsdjobs.com) was posted, there has been an uptick in interest about the security related data generated at the bsdly.net site. I have written quite extensively about these issues earlier so I'll keep this piece short. If you want to go deeper, the field note-like articles I reference and links therein will offer some further insights.

There are three separate sets of downloadable data, all automatically generated and with only very occasional manual intervention.


Known spam sources during the last 24 hours

This is the list directly referenced in the BSDjobs.com piece.

This is a greytrapping based list, where the conditions for inclusion are simple: Attempts at delivery to known-bad addresses in domains we handle mail for have happened within the last 24 hours.

In addition there will occasionally be some addresses added by cron jobs I run that pick the IP addresses of hosts that sent mail that made it through greylisting performed by our spamd(8) but did not pass the subsequent spamassassin or clamav treatment. The bsdly.net system is part of the bgp-spamd cooperation.

The traplist has a home page and at one point was furnished with a set of guidelines.

A partial history (the log starts 2017-05-20) of when spamtraps were added and from which sources can be found in this log (or at this alternate location). Read on for a bit of information on the alternate sources.

Misc other bots: SSH Password bruteforcing, malicious web activity, POP3 Password Bruteforcing.

The bruteforcers list is really a combination of several things, delivered as one file but with minimal scripting ability you should be able to dig out the distinct elements, described in this piece.

The (usually) largest chunk is a list of hosts that hit the rate limit for SSH connections described in the article or that was caught trying to log on as a non-existent user or other undesirable activity aimed at my sshd(8) service. Some as yet unpublished scriptery helps me feed the miscreants that the automatic processes do not catch into the table after a manual quality check.

The second part is a list of IP addresses that tried to access our web service in undesirable ways, including trying for specific URLs or files that will never be found at any world-facing part of our site.

After years of advocating short lifetimes (typically 24 hours) for blacklist entries only to see my logs fill up with attempts made at slightly slower speeds, I set the lifetime for entries in this data set to 28 days. The background including some war stories of monitoring SSH password groping can be found in this piece, while the more recent piece here covers some of the weeding out bad web activity.

The POP3 gropers list comes in two variations. Again lists of IP addresses caught trying to access a service, most of those accesses are to non-existent user names with an almost perfect overlap with the spamtraps list, local-part only (the part before the @ sign).

The big list is a complete corpus of IP addresses that have tried these kinds of accesses since I started recording and trapping them (see this piece for some early experience and this one for the start of the big collection).

There is also a smaller set, produced from the longterm table described in this piece. For much the same reason I did not stick to 24-hour expiry for the SSH list, this one has six-week expiry. With some minimal scriptery I run by hand one or two times per day, any invalid POP3 accesses to valid accounts get their IP adresses added to the longterm table and the exported list.

If you're wondering about the title, the term "enumerating badness" stems from Marcus Ranum's classic piece The Six Dumbest Ideas in Computer Security. Please do read that one.

Here are a few other references other than those referenced in the paragraphs above that you might find useful:

The Book of PF, 3rd edition
Hey, spammer! Here's a list for you! which contains the announcement of the bsdly.net traplist.
Effective Spam and Malware Countermeasures, a more complete treatment of those keywords

If you're interested in further information on any of this, the most useful contact information is in the comment blocks in the exported lists.

by Peter N. M. Hansteen (noreply@blogger.com) atOctober 03, 2019 01:43 PM

September 22, 2019

Ole Aamot GNOME Development Blog

GStreamer Conference 2019

I have published a lightening talk on GNOME Radio for GUADEC 2019 in Thessaloniki, Greece on August 23, 2019.

On September 10, 2019 I released GNOME Radio (gnome-radio) version 0.2.0 and I released GNOME Internet Radio Locator (gnome-internet-radio-locator) version 2.0.6 with support for Middle East Broadcasting Center in Dubai, Saudi Arabia on September 22, 2019.

On October 29, 2019 I am going via Paris on Air France to the GStreamer Conference 2019 held between October 30, 2019 and November 1, 2019 in Lyon, France to give a 5-minute lightening talk on GNOME Radio as part of my Bachelor thesis in Electrical Engineering at Oslo Metropolitan University in Oslo, Norway with the earliest delivery on June 30, 2020.

by oleaamot atSeptember 22, 2019 10:53 PM

August 10, 2019

Ole Aamot GNOME Development Blog

GNOME Radio Project

On August 8, 2019 I announced my position paper on GNOME Radio, gnome-radio-0.1.0 and gnomeradio.org to the GNOME Community on gnome-announce-list.

GNOME Radio is the Public Network Radio Software for Accessing Free Audio Broadcasts from the Internet on GNOME.

Visit gnomeradio.org and wiki.gnome.org/Apps/Radio for details on GNOME Radio.

by oleaamot atAugust 10, 2019 10:15 AM

August 07, 2019

Peter Hansteen (That Grumpy BSD Guy)

Goodness, Enumerated by Robots. Or, Handling Those Who Do Not Play Well With Greylisting

SMTP email is not going away any time soon. If you run a mail service, when and to whom you present the code signifying a temporary local problem code is well worth your attention.

SMTP email is everywhere and is used by everyone.

If you are a returning reader, there is a higher probability that you run a mail service yourself than in the general population.

This in turn means that you will be aware that one of the rather annoying oversights of the original and still-current specifications of the SMTP based mail system is that while it's straightforward to announce which systems are supposed to receive mail for a domain, specifying which hosts would be valid email senders was not part or the original specification at all.

Any functioning domain MUST have at least one MX (mail exchanger) record published via the domain name system, and registrars will generally not even let you register a domain unless you have set up somewhere to receive mail for the domain.

But email worked most of the time anyway, and while you would occasionally hear about valid mail not getting delivered, it was a rarer occurrence than you might think.

Then a few years along, the Internet grew out of the pure research arena and became commercial, and spam started happening. Even in the early days of spam it seems that a significant subset of the messages, possibly even the majority, was sent with faked sender addresses in domains not connected to the actual senders.

Over time people have tried a number of approaches to the problems involved in getting rid of unwanted commercial and/or malware carrying email. If you are interested in a deeper dive into the subject, you could jump over to my earlier piece Effective Spam and Malware Countermeasures - Network Noise Reduction Using Free Tools.

Two very different methods of reducing spam traffic were originally formulated at roughly the same time, and each method's adherents are still duking it out over which approach is the better one.

One method consists simply of implementing a strict interpretation of a requirement that was already formulated in the SMTP RFC at the time.

The other is a complicated extension of the SMTP-relevant data that is published via DNS, and full implementation would require reconfiguration of every SMTP email system in the world.

As you might have guessed, the first is what is commonly referred to as greylisting, where we point to the RFC's requirement that on encountering a temporary error, the sender MUST (RFC language does not get stronger than this) retry delivery at a later time and keep trying for a reasonable amount of time.

Spammers generally did not retry as per the RFC specifications, and even early greylisting adopters saw huge drop in the volume of spam that actually made it to mailboxes.

On the other hand, end users would sometimes wonder why their messages were delayed, and some mail administrators did not take well to seeing the volume of data sitting in the mail spool directories grow measurably, if not usually uncontrollably, while successive retries after waiting were in progress.

In what could almost almost appear as a separate, unconnected universe, other network engineers set out to fix the now glaringly obvious omission in the existing RFCs.

A way to announce valid senders was needed, and the specification that was to be known as the Sender Policy Framework (SPF for short) was offered to the world. SPF offered a way to specify which IP addresses valid mail from a domain were supposed to come from, and even included ways to specify how strictly the limitations it presented should be enforced at the receiving end.

The downsides were that all mail handling would need to be upgraded with code that supported the specification, and as it turned out, traditional forwarding such as performed by common mailing list software would not easily be made compatible with SPF.

The flame wars over both methods. You either remember them or should be able to imagine how they played out.

And while the flames grew less frequent and generally less fierce over time, mail volumes grew to the level where operators would have a large number of servers for outgoing mail, and while the site would honor the requirement to retry delivery, the retries would not be guaranteed to come from the same IP address as the original attempt.

It was becoming clear to greylisting practitioners that interpreting published SPF data as known good senders was the most workable way forward. Several of us already had started maintaining nospamd tables (see eg this slide and this), and using the output of

$ host -ttxt domain.tld

(sometimes many times over because some domains use include statements), we generally made do. I even made a habit of publishing my nospamd file.

As hinted in this slide, smtpctl (part of the OpenSMTPd system and in your OpenBSD base system) now since OpenBSD 6.3 is able to retrieve the entire contents of the published SPF information for any domain you feed it.

Looking over my old nospamd file during the last week or so I found enough sedimentary artifacts there, including IP addresses for which there was no explanation and that lacked a reverse lookup, that I turned instead to deciphering which domains had been problematic and wrote a tiny script to generate a fresh nospamd on demand, based on fresh SPF lookups on those domains.

For those wary of clicking links to scripts, it reads like this:

#!/bin/sh
domains=`cat thedomains.txt`
outfile=nospamd
generatedate=`date`
operator="Peter Hansteen <peter@bsdly.net>"
locals=local-additions

echo "##############################################################################################">$outfile;
echo "# This is the `hostname` nospamd generated from domains at $generatedate. ">>$outfile;
echo "# Any questions should be directed to $operator. ">>$outfile;
echo "##############################################################################################">>$outfile;
echo >>$outfile;

for dom in $domains; do
echo "processing $dom";
echo "# $dom starts #########">>$outfile;
echo >>$outfile;
echo $dom | doas smtpctl spf walk >>$outfile;
echo "# $dom ends ###########">>$outfile;
echo >>$outfile;
done

echo "##############################################################################################">>$outfile;
echo "# processing done at `date`.">>$outfile;
echo "##############################################################################################">>$outfile;

echo "adding local additions from $locals";
echo "# local additions below here ----" >>$outfile;
cat $locals >> $outfile;

If you have been in the habit of fetching my nospamd, you have been fetching the output of this script for the last day or so.

What it does is simply read a prepared list of domains, run them through smtpctl spf walk and slap the results in a file which you would then load into the pf configuration on your spamd machine. You can even tack on a few local additions that for whatever reason do not come naturally from the domains list.

But I would actually recommend you do not fetch my generated data, and rather use this script or a close relative of it (it's a truly trivial script and you probably can create a better version) and your own list of domains to generate a nospamd tailored to your local environment.

The specific list of domains is derived from more than a decade of maintaining my setup and the specific requests for whitelisting I have received from my users or quick fixes to observed problems in that period. It is conceivable that some domains that were problematic in the past no longer are, and unless we actually live in the same area, some of the domains in my list are probably not relevant to your users. There is even the possibility that some of the larger operators publish different SPF information in specific parts of the world, so the answers I get may not even match yours in all cases.

So go ahead, script and generate! This is your chance to help the robots generate some goodness, for the benefit of your users.

In related news, a request from my new colleagues gave me an opportunity to update the sometimes-repeated OpenBSD and you presentation so it now has at least some information on OpenBSD 6.4. You could call the presentation a bunch of links in a thin wrapper of advocacy and you would not be very wrong.

If you have comments or questions on any of the issues raised in this article, please let me know, preferably via the (moderated) comments field, but I have also been known to respond to email and via various social media message services.

Update 2018-11-11: A few days after I had posted this article, an incident happened that showed the importance of keeping track of both goodness and badness for your services. This tweet is my reaction to a few quick glances at the bsdly.net mail server log:

The downside of maintaining a 55+ thousand entry spamtrap list and whitelisting by SPF is seeing one of the whitelisted sites apparently trying to spam every one of your spamtraps (see https://t.co/ulWt1EloRp). Happening now. Wondering is collecting logs and forwarding worth it?
— Peter N. M. Hansteen (@pitrh) November 9, 2018
A little later I'm clearly pondering what to do, including doing another detailed writeup.
Then again it is an indication that the collected noise is now a required part of the spammer lexicon. One might want to point sites at throwing away outgoing messages to any address on https://t.co/3uthWgKWmL (direct link to list https://t.co/mTaBpF5ucU - beware of html tags!).
— Peter N. M. Hansteen (@pitrh) November 9, 2018
Fortunately I had had some interaction with this operator earlier, so I knew roughly how to approach them. I wrote a couple of quick messages to their abuse contacts and made sure to include links to both my spamtrap resources and a fresh log excerpt that indicated clearly that someone or someones in their network was indeed progressing from top to bottom of the spamtraps list.
I ended up contacting their abuse@ with pointers to the logs that showed evidence of several similar campaigns over the last few days (the period I cared to look at) plus pointers to the spamtrap list and articles. About 30m after the second email to abuse@ the activity stopped.
— Peter N. M. Hansteen (@pitrh) November 10, 2018
As the last tweet says, delivery attempts stopped after progressing to somewhere into the Cs. The moral might be that a list of spamtraps like the one I publish might be useful for other sites to filtering their outgoing mail. Any activity involving the known-bad addresses would be a strong indication that somebody made a very unwise purchasing decision involving address lists.

Update 2019-08-07: Gmail seems to be stuck on considering bsdly.net mail spam these days. If you are using a Google-attached mail service and have not received mail you were expecting from me, please check your spam folder and if you find anything, please use the "Report as not spam" feature.

by Peter N. M. Hansteen (noreply@blogger.com) atAugust 07, 2019 09:56 AM

December 02, 2018

NUUG Foundation

Reisestipend - 2019

NUUG Foundation utlyser reisestipender for 2019. Søknader kan sendes inn til enhver tid.

December 02, 2018 04:10 PM

November 14, 2018

Dag-Erling Smørgrav

Bump

Time for my annual “oh shit, I forgot to bump the copyright year again” round-up!

In the F/OSS community, there are two different philosophies when it comes to applying copyright statements to a project. If the code base consists exclusively (or almost exclusively) of code developed for that specific project by the project’s author or co-authors, many projects will have a single file (usually named LICENSE) containing the license, a list of copyright holders, and the copyright dates or ranges. However, if the code base incorporates a significant body of code taken from other projects or contributed by parties outside the project, it is customary to include the copyright statements and either the complete license or a reference to it in each individual file. In my experience, projects that use the BSD, ISC, MIT, adjacent licenses tend to use the latter model regardless.

The advantage of the second model is that it’s hard to get wrong. You might forget to add a name to a central list, but you’re far less likely to forget to state the name of the author when you add a new file. The disadvantage is that it’s really, really easy to forget to update the copyright year when you commit a change to an existing file that hasn’t been touched in a while.

So, how can we automate this?

One possibility is to have a pre-commit hook that updates it for you (generally a bad idea), or one that rejects the commit if it thinks you forgot (better, but not perfect; what if you’re adding a file from an outside source?), or one that prints a big fat warning if it thinks you forgot (much better, especially with Git since you can commit --amend once you’ve fixed it, before pushing).

But how do you fix the mistake retroactively, without poring over commit logs to figure out what was modified when?

Let’s start by assuming that you have a list of files that were modified in 2017, and that each file only has one copyright statement that needs to be updated to reflect that fact. The following Perl one-liner should do the trick:

perl -p -i -e 'if (/Copyright/) { s/ ([0-9]{4})-20(?:0[0-9]|1[0-6]) / $1-2017 /; s/ (20(?:0[0-9]|1[0-6])) / $1-2017 /; }'

It should be fairly self-explanatory if you know regular expressions. The first substitution handles the case where the existing statement contains a range, in which case we extend it to include 2017, and the second substitution handles the case where the existing statement contains a single year, which we replace with a range starting with the original year and ending with 2017. The complexity stems mostly from having to take care not to replace 2018 (or later) with 2017; our regexes only match years in the range 2000-2016.

OK, so now we know how to fix the files, but how do we figure out which ones need fixing?

With Git, we could try something like this:

git diff --name-only 'HEAD@{2017-01-01}..HEAD@{2018-01-01}'

This is… imperfect, though. The first problem is that it will list every file that was touched, including files that were added, moved, renamed, or deleted. Files that were added should be assumed to have had a correct copyright statement at the time they were added; files that were only moved or renamed should not be updated, since their contents did not change; and files that were deleted are no longer there to be updated.¹ We should restrict our search to files that were actually modified:

git diff --name-only --diff-filter M 'HEAD@{2017-01-01}..HEAD@{2018-01-01}'

Some of those changes might be too trivial to copyright, though. This is a fairly complex legal matter, but to simplify, if the change was inevitable and there was no room for creative expression — for instance, a function in a third-party library you are using was renamed, so both the reason for the change and the nature of the change are external to the work itself — then it is not protected. So perhaps you should remove --name-only and review the diff, which is when you realize that half those files were only modified to update their copyright statements because you forgot to do so in 2016. Let’s try to exclude them mechanically, rather than manually. Unfortunately, git diff does not have anything that resembles diff -I, so we have to write our own diff command which does that, and ask git to use it:

$ echo 'exec diff -u -ICopyright "$@"' >diff-no-copyright
$ chmod a+rx diff-no-copyright
$ git difftool --diff-filter M --extcmd $PWD/diff-no-copyright 'HEAD@{2017-01-01}..HEAD@{2018-01-01}'

This gives us a diff, though, not a list of files. We can try to extract the names as follows:

$ git difftool --diff-filter M --no-prompt --extcmd $PWD/diff-no-copyright 'HEAD@{2017-01-01}..HEAD@{2018-01-01}' | awk '/^---/ { print $2 }'

Wait… no, that’s just garbage. The thing is, git difftool works by checking out both versions of a file and diffing them, so what we get is a list of the names of the temporary files it created. We have to be a little more creative:

$ echo '/usr/bin/diff -q -ICopyright "$@" >/dev/null || echo "$BASE"' >list-no-copyright
$ chmod a+rx list-no-copyright
$ git difftool --diff-filter M --no-prompt --extcmd $PWD/list-no-copyright 'HEAD@{2017-01-01}..HEAD@{2018-01-01}'

Much better. We can glue this together with our Perl one-liner using xargs, then repeat the process for 2018.

Finally, how about Subversion? On the one hand, Subversion is far simpler than Git, so we can get 90% of the way much more easily. On the other hand, Subversion is far less flexible than Git, so we can’t go the last 10% of the way. Here’s the best I could do:

$ echo 'exec diff -u -ICopyright "$@"' >diff-no-copyright
$ chmod a+rx diff-no-copyright
$ svn diff --ignore-properties --diff-cmd $PWD/diff-no-copyright -r'{2017-01-01}:{2018-01-01}' | awk '/^---/ { print $2 }'

This will not work properly if you have files with names that contain whitespace; you’ll have to use sed with a much more complicated regex, which I leave as an exercise.


¹ I will leave the issue of move-and-modify being incorrectly recorded as delete-and-add to the reader. One possibility is to include added files in the list by using --diff-filter AM, and review them manually before committing.

by Dag-Erling Smørgrav atNovember 14, 2018 07:11 PM

October 22, 2018

Dag-Erling Smørgrav

DNS over TLS in FreeBSD 12

With the arrival of OpenSSL 1.1.1, an upgraded Unbound, and some changes to the setup and init scripts, FreeBSD 12.0, currently in beta, now supports DNS over TLS out of the box.

DNS over TLS is just what it sounds like: DNS over TCP, but wrapped in a TLS session. It encrypts your requests and the server’s replies, and optionally allows you to verify the identity of the server. The advantages are protection against eavesdropping and manipulation of your DNS traffic; the drawbacks are a slight performance degradation and potential firewall traversal issues, as it runs over a non-standard port (TCP port 853) which may be blocked on some networks. Let’s take a look at how to set it up.

Basic setup

As a simple test case, let’s set up our 12.0-ALPHA10 VM to use Cloudflare’s DNS service:

# uname -r
12.0-ALPHA10
# cat >/etc/rc.conf.d/local_unbound <<EOF
local_unbound_enable="YES"
local_unbound_tls="YES"
local_unbound_forwarders="1.1.1.1@853 1.0.0.1@853"
EOF
# service local_unbound start
Performing initial setup.
destination:
/var/unbound/forward.conf created
/var/unbound/lan-zones.conf created
/var/unbound/control.conf created
/var/unbound/unbound.conf created
/etc/resolvconf.conf not modified
Original /etc/resolv.conf saved as /var/backups/resolv.conf.20181021.192629
Starting local_unbound.
Waiting for nameserver to start... good
# host www.freebsd.org
www.freebsd.org is an alias for wfe0.nyi.freebsd.org.
wfe0.nyi.freebsd.org has address 96.47.72.84
wfe0.nyi.freebsd.org has IPv6 address 2610:1c1:1:606c::50:15
wfe0.nyi.freebsd.org mail is handled by 0 .

Note that this is not a configuration you want to run in production—we will come back to this later.

Performance

The downside of DNS over TLS is the performance hit of the TCP and TLS session setup and teardown. We demonstrate this by flushing our cache and (rather crudely) measuring a cache miss and a cache hit:

# local-unbound-control reload
ok
# time host www.freebsd.org >x
host www.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.553 total
# time host www.freebsd.org >x
host www.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.005 total

Compare this to querying our router, a puny Soekris net5501 running Unbound 1.8.1 on FreeBSD 11.1-RELEASE:

# time host www.freebsd.org gw >x
host www.freebsd.org gw > x 0.00s user 0.00s system 0% cpu 0.232 total
# time host www.freebsd.org 192.168.144.1 >x
host www.freebsd.org gw > x 0.00s user 0.00s system 0% cpu 0.008 total

or to querying Cloudflare directly over UDP:

# time host www.freebsd.org 1.1.1.1 >x      
host www.freebsd.org 1.1.1.1 > x 0.00s user 0.00s system 0% cpu 0.272 total
# time host www.freebsd.org 1.1.1.1 >x
host www.freebsd.org 1.1.1.1 > x 0.00s user 0.00s system 0% cpu 0.013 total

(Cloudflare uses anycast routing, so it is not so unreasonable to see a cache miss during off-peak hours.)

This clearly shows the advantage of running a local caching resolver—it absorbs the cost of DNSSEC and TLS. And speaking of DNSSEC, we can separate that cost from that of TLS by reconfiguring our server without the latter:

# cat >/etc/rc.conf.d/local_unbound <<EOF
local_unbound_enable="YES"
local_unbound_tls="NO"
local_unbound_forwarders="1.1.1.1 1.0.0.1"
EOF
# service local_unbound setup
Performing initial setup.
destination:
Original /var/unbound/forward.conf saved as /var/backups/forward.conf.20181021.205328
/var/unbound/lan-zones.conf not modified
/var/unbound/control.conf not modified
Original /var/unbound/unbound.conf saved as /var/backups/unbound.conf.20181021.205328
/etc/resolvconf.conf not modified
/etc/resolv.conf not modified
# service local_unbound start
Starting local_unbound.
Waiting for nameserver to start... good
# time host www.freebsd.org >x
host www.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.080 total
# time host www.freebsd.org >x
host www.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.004 total

So does TLS add nearly half a second to every cache miss? Not quite, fortunately—in our previous tests, our first query was not only a cache miss but also the first query after a restart or a cache flush, resulting in a complete load and validation of the entire path from the name we queried to the root. The difference between a first and second cache miss is quite noticeable:

# time host www.freebsd.org >x 
host www.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.546 total
# time host www.freebsd.org >x
host www.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.004 total
# time host repo.freebsd.org >x
host repo.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.168 total
# time host repo.freebsd.org >x
host repo.freebsd.org > x 0.00s user 0.00s system 0% cpu 0.004 total

Revisiting our configuration

Remember when I said that you shouldn’t run the sample configuration in production, and that I’d get back to it later? This is later.

The problem with our first configuration is that while it encrypts our DNS traffic, it does not verify the identity of the server. Our ISP could be routing all traffic to 1.1.1.1 to its own servers, logging it, and selling the information to the highest bidder. We need to tell Unbound to validate the server certificate, but there’s a catch: Unbound only knows the IP addresses of its forwarders, not their names. We have to provide it with names that will match the x509 certificates used by the servers we want to use. Let’s double-check the certificate:

# :| openssl s_client -connect 1.1.1.1:853 |& openssl x509 -noout -text |& grep DNS
DNS:*.cloudflare-dns.com, IP Address:1.1.1.1, IP Address:1.0.0.1, DNS:cloudflare-dns.com, IP Address:2606:4700:4700:0:0:0:0:1111, IP Address:2606:4700:4700:0:0:0:0:1001

This matches Cloudflare’s documentation, so let’s update our configuration:

# cat >/etc/rc.conf.d/local_unbound <<EOF
local_unbound_enable="YES"
local_unbound_tls="YES"
local_unbound_forwarders="1.1.1.1@853#cloudflare-dns.com 1.0.0.1@853#cloudflare-dns.com"
EOF
# service local_unbound setup
Performing initial setup.
destination:
Original /var/unbound/forward.conf saved as /var/backups/forward.conf.20181021.212519
/var/unbound/lan-zones.conf not modified
/var/unbound/control.conf not modified
/var/unbound/unbound.conf not modified
/etc/resolvconf.conf not modified
/etc/resolv.conf not modified
# service local_unbound restart
Stopping local_unbound.
Starting local_unbound.
Waiting for nameserver to start... good
# host www.freebsd.org
www.freebsd.org is an alias for wfe0.nyi.freebsd.org.
wfe0.nyi.freebsd.org has address 96.47.72.84
wfe0.nyi.freebsd.org has IPv6 address 2610:1c1:1:606c::50:15
wfe0.nyi.freebsd.org mail is handled by 0 .

How can we confirm that Unbound actually validates the certificate? Well, we can run Unbound in debug mode (/usr/sbin/unbound -dd -vvv) and read the debugging output… or we can confirm that it fails when given a name that does not match the certificate:

# perl -p -i -e 's/cloudflare/cloudfire/g' /etc/rc.conf.d/local_unbound
# service local_unbound setup
Performing initial setup.
destination:
Original /var/unbound/forward.conf saved as /var/backups/forward.conf.20181021.215808
/var/unbound/lan-zones.conf not modified
/var/unbound/control.conf not modified
/var/unbound/unbound.conf not modified
/etc/resolvconf.conf not modified
/etc/resolv.conf not modified
# service local_unbound restart
Stopping local_unbound.
Waiting for PIDS: 33977.
Starting local_unbound.
Waiting for nameserver to start... good
# host www.freebsd.org
Host www.freebsd.org not found: 2(SERVFAIL)

But is this really a failure to validate the certificate? Actually, no. When provided with a server name, Unbound will pass it to the server during the TLS handshake, and the server will reject the handshake if that name does not match any of its certificates. To truly verify that Unbound validates the server certificate, we have to confirm that it fails when it cannot do so. For instance, we can remove the root certificate used to sign the DNS server’s certificate from the test system’s trust store. Note that we cannot simply remove the trust store entirely, as Unbound will refuse to start if the trust store is missing or empty.

While we’re talking about trust stores, I should point out that you currently must have ca_root_nss installed for DNS over TLS to work. However, 12.0-RELEASE will ship with a pre-installed copy.

Conclusion

We’ve seen how to set up Unbound—specifically, the local_unbound service in FreeBSD 12.0—to use DNS over TLS instead of plain UDP or TCP, using Cloudflare’s public DNS service as an example. We’ve looked at the performance impact, and at how to ensure (and verify) that Unbound validates the server certificate to prevent man-in-the-middle attacks.

The question that remains is whether it is all worth it. There is undeniably a performance hit, though this may improve with TLS 1.3. More importantly, there are currently very few DNS-over-TLS providers—only one, really, since Quad9 filter their responses—and you have to weigh the advantage of encrypting your DNS traffic against the disadvantage of sending it all to a single organization. I can’t answer that question for you, but I can tell you that the parameters are evolving quickly, and if your answer is negative today, it may not remain so for long. More providers will appear. Performance will improve with TLS 1.3 and QUIC. Within a year or two, running DNS over TLS may very well become the rule rather than the experimental exception.

by Dag-Erling Smørgrav atOctober 22, 2018 09:36 AM

July 10, 2018

Nicolai Langfeldt

Epost er så 1995!

Her om dagen ble jeg gjort oppmerksom på at friprog senteret strever litt, de gjør greie for det i Farvel epost.

At det skal være lettere å følge opp henvendelser på twitter/linkedin/facebook virker mildest talt merkelig. At det skal gjøre det lettere for dem å ignorere eller svare nei på henvendelser de burde ignorere eller svare nei på virker også merkelig. Kan ikke tro at det vil gjøre bildet av henvendelser og hva som er svart på mindre oversiktlig. Status sefæren er et sosialt rom, ikke egentlig et saks- og henvendelses-behandlings-rom. Antar uten videre at de som bruker twitter/facebook/... seriøst til slikt sørger for å hente henvendelsene inn i saks- og henvendelses-systemet sitt så de kan se hva de har tatt stilling til og behandlet.

Nuvel, spent på hva de må gjøre for at dette skal lykkes - for andre verdier av "lykkes" enn "jeg følger ikke med på twitter" >:-)

by nicolai (noreply@blogger.com) atJuly 10, 2018 05:54 AM

April 14, 2018

NUUG Foundation

Perl Toolchain Summit 2018

NUUG Foundation støtter Perl Toolchain Summit 2018 Konferansen for utviklerne bak Perl holdes i Oslo 19.-22. april 2018

April 14, 2018 02:35 PM

October 23, 2017

Espen Braastad

ZFS NAS using CentOS 7 from tmpfs

Following up on the CentOS 7 root filesystem on tmpfs post, here comes a guide on how to run a ZFS enabled CentOS 7 NAS server (with the operating system) from tmpfs.

Hardware

Preparing the build environment

The disk image is built in macOS using Packer and VirtualBox. Virtualbox is installed using the appropriate platform package that is downloaded from their website, and Packer is installed using brew:

$ brew install packer

Building the disk image

Three files are needed in order to build the disk image; a Packer template file, an Anaconda kickstart file and a shell script that is used to configure the disk image after installation. The following files can be used as examples:

Create some directories:

$ mkdir ~work/centos-7-zfs/
$ mkdir ~work/centos-7-zfs/http/
$ mkdir ~work/centos-7-zfs/scripts/

Copy the files to these directories:

$ cp template.json ~work/centos-7-zfs/
$ cp ks.cfg ~work/centos-7-zfs/http/
$ cp provision.sh ~work/centos-7-zfs/scripts/

Modify each of the files to fit your environment.

Start the build process using Packer:

$ cd ~work/centos-7-zfs/
$ packer build template.json

This will download the CentOS 7 ISO file, start an HTTP server to serve the kickstart file and start a virtual machine using Virtualbox:

Packer installer screenshot

The virtual machine will boot into Anaconda and run through the installation process as specified in the kickstart file:

Anaconda installer screenshot

When the installation process is complete, the disk image will be available in the output-virtualbox-iso folder with the vmdk extension.

Packer done screenshot

The disk image is now ready to be put in initramfs.

Putting the disk image in initramfs

This section is quite similar to the previous blog post CentOS 7 root filesystem on tmpfs but with minor differences. For simplicity reasons it is executed on a host running CentOS 7.

Create the build directories:

$ mkdir /work
$ mkdir /work/newroot
$ mkdir /work/result

Export the files from the disk image to one of the directories we created earlier:

$ export LIBGUESTFS_BACKEND=direct
$ guestfish --ro -a packer-virtualbox-iso-1508790384-disk001.vmdk -i copy-out / /work/newroot/

Modify /etc/fstab:

$ cat > /work/newroot/etc/fstab << EOF
tmpfs       /         tmpfs    defaults,noatime 0 0
none        /dev      devtmpfs defaults         0 0
devpts      /dev/pts  devpts   gid=5,mode=620   0 0
tmpfs       /dev/shm  tmpfs    defaults         0 0
proc        /proc     proc     defaults         0 0
sysfs       /sys      sysfs    defaults         0 0
EOF

Disable selinux:

echo "SELINUX=disabled" > /work/newroot/etc/selinux/config

Disable clearing the screen on login failure to make it possible to read any error messages:

mkdir /work/newroot/etc/systemd/system/getty@.service.d
cat > /work/newroot/etc/systemd/system/getty@.service.d/noclear.conf << EOF
[Service]
TTYVTDisallocate=no
EOF

Now jump to the Initramfs and Result sections in the CentOS 7 root filesystem on tmpfs and follow those steps until the end when the result is a vmlinuz and initramfs file.

ZFS configuration

The first time the NAS server boots on the disk image, the ZFS storage pool and volumes will have to be configured. Refer to the ZFS documentation for information on how to do this, and use the following command only as guidelines.

Create the storage pool:

$ sudo zpool create data mirror sda sdb mirror sdc sdd

Create the volumes:

$ sudo zfs create data/documents
$ sudo zfs create data/games
$ sudo zfs create data/movies
$ sudo zfs create data/music
$ sudo zfs create data/pictures
$ sudo zfs create data/upload

Share some volumes using NFS:

zfs set sharenfs=on data/documents
zfs set sharenfs=on data/games
zfs set sharenfs=on data/music
zfs set sharenfs=on data/pictures

Print the storage pool status:

$ sudo zpool status
  pool: data
 state: ONLINE
  scan: scrub repaired 0B in 20h22m with 0 errors on Sun Oct  1 21:04:14 2017
config:

	NAME        STATE     READ WRITE CKSUM
	data        ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    sdd     ONLINE       0     0     0
	    sdc     ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    sda     ONLINE       0     0     0
	    sdb     ONLINE       0     0     0

errors: No known data errors

October 23, 2017 11:20 PM

February 13, 2017

Mimes brønn

En innsynsbrønn full av kunnskap

Mimes brønn er en nettjeneste som hjelper deg med å be om innsyn i offentlig forvaltning i tråd med offentleglova og miljøinformasjonsloven. Tjenesten har et offentlig tilgjengelig arkiv over alle svar som er kommet på innsynsforespørsler, slik at det offentlige kan slippe å svare på de samme innsynshenvendelsene gang på gang. Du finner tjenesten på

https://www.mimesbronn.no/

I følge gammel nordisk mytologi voktes kunnskapens kilde av Mime og ligger under en av røttene til verdenstreet Yggdrasil. Å drikke av vannet i Mimes brønn ga så verdifull kunnskap og visdom at den unge guden Odin var villig til å gi et øye i pant og bli enøyd for å få lov til å drikke av den.

Nettstedet vedlikeholdes av foreningen NUUG og er spesielt godt egnet for politisk interesserte personer, organisasjoner og journalister. Tjenesten er basert på den britiske søstertjenesten WhatDoTheyKnow.com, som allerede har gitt innsyn som har resultert i dokumentarer og utallige presseoppslag. I følge mySociety for noen år siden gikk ca 20 % av innsynshenvendelsene til sentrale myndigheter via WhatDoTheyKnow. Vi i NUUG håper NUUGs tjeneste Mimes brønn kan være like nyttig for innbyggerne i Norge.

I helgen ble tjenesten oppdatert med mye ny funksjonalitet. Den nye utgaven fungerer bedre på små skjermer, og viser nå leveringsstatus for henvendelsene slik at innsender enklere kan sjekke at mottakers epostsystem har bekreftet mottak av innsynshenvendelsen. Tjenesten er satt opp av frivillige i foreningen NUUG på dugnad, og ble lansert sommeren 2015. Siden den gang har 121 brukere sendt inn mer enn 280 henvendelser om alt fra bryllupsutleie av Operaen og forhandlinger om bruk av Norges topp-DNS-domene .bv til journalføring av søknader om bostøtte, og nettstedet er en liten skattekiste av interessant og nyttig informasjon. NUUG har knyttet til seg jurister som kan bistå med å klage på manglende innsyn eller sviktende saksbehandling.

– «NUUGs Mimes brønn var uvurderlig da vi lyktes med å sikre at DNS-toppdomenet .bv fortsatt er på norske hender,» forteller Håkon Wium Lie.

Tjenesten dokumenterer svært sprikende praksis i håndtering av innsynshenvendelser, både når det gjelder responstid og innhold i svarene. De aller fleste håndteres raskt og korrekt, men det er i flere tilfeller gitt innsyn i dokumenter der ansvarlig etat i ettertid ønsker å trekke innsynet tilbake, og det er gitt innsyn der sladdingen har vært utført på en måte som ikke skjuler informasjonen som skal sladdes.

– «Offentlighetsloven er en bærebjelke for vårt demokrati. Den bryr seg ikke med hvem som ber om innsyn, eller hvorfor. Prosjektet Mimes brønn innebærer en materialisering av dette prinsippet, der hvem som helst kan be om innsyn og klage på avslag, og hvor dokumentasjon gjøres offentlig. Dette gjør Mimes Brønn til et av de mest spennende åpenhetsprosjektene jeg har sett i nyere tid.» forteller mannen som fikk åpnet opp eierskapsregisteret til skatteetaten, Vegard Venli.

Vi i foreningen NUUG håper Mimes brønn kan være et nyttig verktøy for å holde vårt demokrati ved like.

by Mimes Brønn atFebruary 13, 2017 02:07 PM

January 06, 2017

Espen Braastad

CentOS 7 root filesystem on tmpfs

Several years ago I wrote a series of posts on how to run EL6 with its root filesystem on tmpfs. This post is a continuation of that series, and explains step by step how to run CentOS 7 with its root filesystem in memory. It should apply to RHEL, Ubuntu, Debian and other Linux distributions as well. The post is a bit terse to focus on the concept, and several of the steps have potential for improvements.

The following is a screen recording from a host running CentOS 7 in tmpfs:

Sensor

Build environment

A build host is needed to prepare the image to boot from. The build host should run CentOS 7 x86_64, and have the following packages installed:

yum install libvirt libguestfs-tools guestfish

Make sure the libvirt daemon is running:

systemctl start libvirtd

Create some directories that will be used later, however feel free to relocate these to somewhere else:

mkdir -p /work/initramfs/bin
mkdir -p /work/newroot
mkdir -p /work/result

Disk image

For simplicity reasons we’ll fetch our rootfs from a pre-built disk image, but it is possible to build a custom disk image using virt-manager. I expect that most people would like to create their own disk image from scratch, but this is outside the scope of this post.

Use virt-builder to download a pre-built CentOS 7.3 disk image and set the root password:

virt-builder centos-7.3 -o /work/disk.img --root-password password:changeme

Export the files from the disk image to one of the directories we created earlier:

guestfish --ro -a /work/disk.img -i copy-out / /work/newroot/

Clear fstab since it contains mount entries that no longer apply:

echo > /work/newroot/etc/fstab

SELinux will complain about incorrect disk label at boot, so let’s just disable it right away. Production environments should have SELinux enabled.

echo "SELINUX=disabled" > /work/newroot/etc/selinux/config

Disable clearing the screen on login failure to make it possible to read any error messages:

mkdir /work/newroot/etc/systemd/system/getty@.service.d
cat > /work/newroot/etc/systemd/system/getty@.service.d/noclear.conf << EOF
[Service]
TTYVTDisallocate=no
EOF

Initramfs

We’ll create our custom initramfs from scratch. The boot procedure will be, simply put:

  1. Fetch kernel and a custom initramfs.
  2. Execute kernel.
  3. Mount the initramfs as the temporary root filesystem (for the kernel).
  4. Execute /init (in the initramfs).
  5. Create a tmpfs mount point.
  6. Extract our CentOS 7 root filesystem to the tmpfs mount point.
  7. Execute switch_root to boot on the CentOS 7 root filesystem.

The initramfs will be based on BusyBox. Download a pre-built binary or compile it from source, put the binary in the initramfs/bin directory. In this post I’ll just download a pre-built binary:

wget -O /work/initramfs/bin/busybox https://www.busybox.net/downloads/binaries/1.26.1-defconfig-multiarch/busybox-x86_64

Make sure that busybox has the execute bit set:

chmod +x /work/initramfs/bin/busybox

Create the file /work/initramfs/init with the following contents:

#!/bin/busybox sh

# Dump to sh if something fails
error() {
	echo "Jumping into the shell..."
	setsid cttyhack sh
}

# Populate /bin with binaries from busybox
/bin/busybox --install /bin

mkdir -p /proc
mount -t proc proc /proc

mkdir -p /sys
mount -t sysfs sysfs /sys

mkdir -p /sys/dev
mkdir -p /var/run
mkdir -p /dev

mkdir -p /dev/pts
mount -t devpts devpts /dev/pts

# Populate /dev
echo /bin/mdev > /proc/sys/kernel/hotplug
mdev -s

mkdir -p /newroot
mount -t tmpfs -o size=1500m tmpfs /newroot || error

echo "Extracting rootfs... "
xz -d -c -f rootfs.tar.xz | tar -x -f - -C /newroot || error

mount --move /sys /newroot/sys
mount --move /proc /newroot/proc
mount --move /dev /newroot/dev

exec switch_root /newroot /sbin/init || error

Make sure it is executable:

chmod +x /work/initramfs/init

Create the root filesystem archive using tar. The following command also uses xz compression to reduce the final size of the archive (from approximately 1 GB to 270 MB):

cd /work/newroot
tar cJf /work/initramfs/rootfs.tar.xz .

Create initramfs.gz using:

cd /work/initramfs
find . -print0 | cpio --null -ov --format=newc | gzip -9 > /work/result/initramfs.gz

Copy the kernel directly from the root filesystem using:

cp /work/newroot/boot/vmlinuz-*x86_64 /work/result/vmlinuz

Result

The /work/result directory now contains two files with file sizes similar to the following:

ls -lh /work/result/
total 277M
-rw-r--r-- 1 root root 272M Jan  6 23:42 initramfs.gz
-rwxr-xr-x 1 root root 5.2M Jan  6 23:42 vmlinuz

These files can be loaded directly in GRUB from disk, or using iPXE over HTTP using a script similar to:

#!ipxe
kernel http://example.com/vmlinuz
initrd http://example.com/initramfs.gz
boot

January 06, 2017 08:34 PM

July 15, 2016

Mimes brønn

Hvem har drukket fra Mimes brønn?

Mimes brønn har nå vært oppe i rundt et år. Derfor vi tenkte det kunne være interessant å få en kortfattet statistikk om hvordan tjenesten er blitt brukt.

I begynnelsen av juli 2016 hadde Mimes brønn 71 registrerte brukere som hadde sendt ut 120 innsynshenvendelser, hvorav 62 (52%) var vellykkede, 19 (16%) delvis vellykket, 14 (12%) avslått, 10 (8%) fikk svar at organet ikke hadde informasjonen, og 12 henvendelser (10%; 6 fra 2016, 6 fra 2015) fortsatt var ubesvarte. Et fåtall (3) av hendvendelsene kunne ikke kategoriseres. Vi ser derfor at rundt to tredjedeler av henvendelsene var vellykkede, helt eller delvis. Det er bra!

Tiden det tar før organet først sender svar varierer mye, fra samme dag (noen henvendelser sendt til Utlendingsnemnda, Statens vegvesen, Økokrim, Mediatilsynet, Datatilsynet, Brønnøysundregistrene), opp til 6 måneder (Ballangen kommune) eller lenger (Stortinget, Olje- og energidepartementet, Justis- og beredskapsdepartementet, UDI – Utlendingsdirektoratet, og SSB har mottatt innsynshenvendelser som fortsatt er ubesvarte). Gjennomsnittstiden her var et par uker (med unntak av de 12 tilfellene der det ikke har kommet noe svar). Det følger av offentlighetsloven § 29 første ledd at henvendelser om innsyn i forvaltningens dokumenter skal besvares «uten ugrunnet opphold», noe som ifølge Sivilombudsmannen i de fleste tilfeller skal fortolkes som «samme dag eller i alle fall i løpet av 1-3 virkedager». Så her er det rom for forbedring.

Klageretten (offentleglova § 32) ble benyttet i 20 av innsynshenvendelsene. I de fleste (15; 75%) av tilfellene førte klagen til at henvendelsen ble vellykket. Gjennomsnittstiden for å få svar på klagen var en måned (med unntak av 2 tillfeller, klager sendt til Statens vegvesen og Ruter AS, der det ikke har kommet noe svar). Det er vel verdt å klage, og helt gratis! Sivilombudsmannen har uttalt at 2-3 uker ligger over det som er akseptabel saksbehandlingstid for klager.

Flest henvendelser var blitt sendt til Utenriksdepartementet (9), tett etterfulgt av Fredrikstad kommune og Brønnøysundregistrene. I alt ble henvendelser sendt til 60 offentlige myndigheter, hvorav 27 ble tilsendt to eller flere. Det står over 3700 myndigheter i databasen til Mimes brønn. De fleste av dem har dermed til gode å motta en innsynshenvendelse via tjenesten.

Når vi ser på hva slags informasjon folk har bedt om, ser vi et bredt spekter av interesser; alt fra kommunens parkeringsplasser, reiseregninger der statens satser for overnatting er oversteget, korrespondanse om asylmottak og forhandlinger om toppdomenet .bv, til dokumenter om Myanmar.

Myndighetene gjør alle mulige slags ting. Noe av det gjøres dÃ¥rlig, noe gjør de bra. Jo mer vi finner ut om hvordan  myndighetene fungerer, jo større mulighet har vi til Ã¥ foreslÃ¥ forbedringer pÃ¥ det som fungerer dÃ¥rlig… og applaudere det som  bra.  Er det noe du vil ha innsyn i, sÃ¥ er det bare Ã¥ klikke pÃ¥ https://www.mimesbronn.no/ og sÃ¥ er du i gang 🙂

by Mimes Brønn atJuly 15, 2016 03:56 PM

June 01, 2016

Kevin Brubeck Unhammer

Maskinomsetjing vs NTNU-eksaminator

Twitter-brukaren @IngeborgSteine fekk nyleg ein del merksemd då ho tvitra eit bilete av nynorskutgåva av økonomieksamenen sin ved NTNU:

Dette var min økonomieksamen på "nynorsk". #nynorsk #noregsmållag #kvaialledagar https://t.co/RjCKSU2Fyg
Ingeborg Steine (@IngeborgSteine) May 30, 2016

Kreative nyvinningar som *kvisleis og alle dialektformene og arkaismane ville vore usannsynlege å få i ei maskinomsett utgåve, så då lurte eg på kor mykje betre/verre det hadde blitt om eksaminatoren rett og slett hadde brukt Apertium i staden? Ingeborg Steine var så hjelpsam at ho la ut bokmålsutgåva, så då får me prøva 🙂

NTNU-nob-nno.jpeg

Ingen kvisleis og fritt for tær og fyr, men det er heller ikkje perfekt: Visse ord manglar frå ordbøkene og får dermed feil bøying, teller blir tolka som substantiv, ein anna maskin har feil bøying på førsteordet (det mangla ein regel der) og at blir ein stad tolka som adverb (som fører til det forunderlege fragmentet det verta at anteke tilvarande). I tillegg blir språket gjenkjent som tatarisk av nettsida, så det var kanskje litt tung norsk? 🙂 Men desse feila er ikkje spesielt vanskelege å retta på – utviklingsutgåva av Apertium gir no:

NTNU-nob-nno-svn.jpeg

Det er enno eit par småting som kunne vore retta, men det er allereie betre enn dei fleste eksamenane eg fekk utdelt ved UiO …

by unhammer atJune 01, 2016 09:45 AM

October 18, 2015

Anders Nordby

Fighting spam with SpamAssassin, procmail and greylisting

On my private server we use a number of measures to stop and prevent spam from arriving in the users inboxes: - postgrey (greylisting) to delay arrival (hopefully block lists will be up to date in time to stop unwanted mail, also some senders do not retry) - SpamAssasin to block mails by scoring different aspects of the emails. Newer versions of it has URIBL (domain based, for links in the emails) in addtition to the tradional RBL (IP based) block lists. Which works better. I also created my own URIBL block list which you can use, dbl.fupp.net. - Procmail. For user on my server, I recommend this procmail rule: :0 * ^X-Spam-Status: Yes .crapbox/ It will sort emails that has a score indicating it is spam into mailbox "crapbox". - blocking unwanted and dangerous attachments, particularly for Windows users.

by Anders (noreply@blogger.com) atOctober 18, 2015 01:09 PM

April 23, 2015

Kevin Brubeck Unhammer

Orddelingsomsetjing

I førre innlegg i denne serien gjekk eg kort gjennom ymse metodar for å generera omsetjingskandidatar til tospråklege ordbøker; i dette innlegget skal eg gå litt meir inn på kandidatgenerering ved omsetjing av enkeltdelane av samansette ord. Me har som nemnt allereie ei ordbok mellom bokmål og nordsamisk, som me vil utvida til bokmål–lulesamisk og bokmål–sørsamisk. Og ordboka blei utvikla for å omsetja typisk «departementsspråk», så ho er full av lange, samansette ord. Og på samisk kan me setja saman ord omtrent på same måte som på norsk (i tillegg til ein haug med andre måtar, men det hoppar me glatt over for no). Dette bør me kunna utnytta, sånn at viss me veit kva «klage» er på lulesamisk, og me veit kva «frist» er, så har me iallfall éin fornuftig hypotese for kva «klagefrist» kan vera på lulesamisk 🙂

Orddeling er flott når du skal omsetja ordbøker. Særskrivingsfeil er flott når du vil smila litt.
«Ananássasuorma» jali «ananássa riŋŋgu»? Ij le buorre diehtet.

Altså kan me bruka dei få omsetjingane me allereie har mellom bokmål og lulesamisk/sørsamisk til å laga fleire omsetjingar, ved å omsetja deler av ord, og så setja dei saman igjen. Me har òg eit par omsetjingar liggande mellom nordsamisk og lulesamisk/sørsamisk, så me kan bruka same metoden der (og utnytta det at me har ei bokmål–nordsamisk-ordbok til å slutta riŋgen tilbake til bokmål).

Dekning og presisjon

Dessverre (i denne samanhengen) har me òg ofte fleire omsetjingar av kvart ord; i dei eksisterande bokmål–lulesamisk-ordbøkene me ser på (i stor grad basert på ordboka til Anders Kintel) står det at «klage» kan vera mellom anna gujdalvis, gujddim, luodjom eller kritihkka, medan «frist» kan vera  ájggemierre, giehtadaláduvvat, mierreduvvam eller ájggemærráj. Viss me tillet kvar venstredel å gå med kvar høgredel, får me 16 moglege kandidatar for dette eine ordet! Sannsynlegvis er ikkje meir enn ein eller to av dei brukande (og kanskje ikkje det ein gong). I snitt får me rundt dobbelt så mange kandidatar som kjeldeord med denne metoden. Så me bør finna metodar for å kutta ned på dårlege kandidatar.

Den komplementære utfordringa er å få god nok dekning. Av og til ser me at me ikkje har ei omsetjing av delane av ordet, sjølv om me har omsetjingar av ord med dei same delene i seg. Den setninga krev nok eit døme 🙂 Me vil gjerne ha ein kandidat for ordet «øyekatarr» på lulesamisk, altså samansetjinga «øye+katarr». Me har kanskje ei omsetjing for «øye» i materialet vårt, men ingenting for «katarr». Derimot står det at «blærekatarr» er gådtjåráhkkovuolssje. Så for å utvida dekninga, kan me i tillegg dela opp kjeldematerialet vårt i alle par av samansetjingsdelar; viss me veit at desse orda kan analyserast som «blære+katarr» og gådtjåráhkko+vuolssje, så kan det jo synast som at «blære» er gådtjåráhkko og «katarr» er vuolssje (og Giellatekno har heldigvis gode morfologiske analysatorar som fint deler opp slike ord på rette staden). Og dette gir ei god utviding av materialet – faktisk får me kandidatar for nesten dobbelt så mange av dei orda som me ønsker kandidatar for, viss me utvidar kjeldematerialet på denne måten. Men det har ei stor ulempe òg: Me får over dobbelt så mange lule-/sørsamiske kandidatar per bokmålsord (i snitt rundt fire kandidatar per kjeldeord).

Filtrering og rangering

Me vil innskrenka dei moglege kandidatane til dei som mest sannsynleg er gode. Den beste testen er å sjå om kandidaten finst i korpus, og då helst i same parallellstilte setning (dette er oftast ein bra kandidat). Viss ikkje, så kan me òg sjå på om kandidaten og kjeldeordet har liknande frekvensar, eller om kandidaten har frekvens i det heile.

Orddelingsomsetjinga foreslo tsavtshvierhtie for «virkemiddel», og der stod dei i ein parallellsetning òg:
<s xml:lang="sma" id="2060"/>Daesnie FoU akte vihkeles tsavtshvierhtie .
<s xml:lang="nob" id="2060"/>Her er FoU er et viktig virkemiddel .

– då er det nok eit godt ordpar.

Uheldigvis har me så lite tekstgrunnlag for lule-/sørsamisk at me fort går tom for kandidatar med frekvens i det heile. For sørsamisk har me t.d. berre kandidatar med korpustreff for rundt 10 % av orda me lagar kandidatar for.

Ein annan test, som fungerer på alle ord, er å sjå om det får analyse av dei morfologiske analysatorane våre; viss ikkje (og viss det i tillegg ikkje har korpustreff) er det oftast feil. Men dette fjernar berre rundt 1/4 av kandidatane; med den oppdelte ordboka vår (kor me òg har med par av delar av ord) har me enno i snitt rundt tre kandidatar per kjeldeord.

(Ein test som eg prøvde, men avslo, var filtrering basert på liknande ordlengd. Det verkar jo logisk at lange ord blir omsett til lange og korte til korte, men det finst mange gode unntak. I tillegg fjernar det alt for få dårlege kandidatar til at det ser ut til å vera verdt det.)

Det parallelle korpusmaterialet vårt er altfor lite, men når me skal generera kandidatar til ordbøker så er det jo ikkje parallelle setningar me prøver å predikera, men parallelle ord og ordbokspar. Og då er jo læringsgrunnlaget vårt eigentleg dei eksisterande ordbøkene våre … Derfor prøvde eg å sjå på kva for samansetjingsdelar som faktisk var brukt i dei tidlegare omsetjingane våre, og kva for par av delar som ofte opptredde i tidlegare omsetjingar, og kva for delar som sjeldan eller aldri gjorde det. Til dømes har den oppdelte ordboka vår for bokmål–lulesamisk desse para:

Her ser me at «løyve» anten kan vera loahpádus eller doajmmaloahpe – skal «taxiløyve» då vera táksiloahpádus eller táksidoajmmaloahpe? På bakgrunn av dette materialet bør me nok satsa på det første – sjølv om doajmmaloahpe står oppført, så er det berre loahpádus som opptrer i samansette ord.

Då kan me prøva å generera kandidatar for alle bokmålsorda i materialet vårt, både dei me eigentleg er ute etter å finna kandidatar for, og dei me allereie har omsetjingar for. Gå så gjennom dei genererte kandidatane for dei orda me allereie har omsetjingar for, og tel opp dei para av orddelar som genererte slike ord. Me har kanskje laga kandidatane barggo+loahpádus og barggo+dajmmaloahpe for «arbeids+løyve»; når me så går gjennom dei eksisterande omsetjingane og finn at «arbeidsløyve» stod i ordboka med omsetjinga barggoloahpádus, så aukar me frekvensen til paret «løyve»–loahpádus med éin, medan «løyve»–dajmmaloahpe blir verande null.

For no har berre filtrert ut dei kandidatane kor paret til anten første- eller andreledd hadde nullfrekvens. I følgje litt manuell evaluering frå ein lingvist er det omtrent berre dårlege ord som blir kasta ut, så det filteret ser ut til å fungera bra. På den andre sida blir berre rundt 10 % av kandidatane fjerna viss me berre hiv ut dei med nullfrekvens, så neste steg blir å bruka frekvensane til å få ei full rangering.

Viss alle ord kunne delast i nøyaktig to delar, så ville det kanskje vore nok å telja opp par av delar og enkeltdelar for å estimera sannsyn, altså f(s,t)/f(s).  Men av og til kan ord delast på fleire måtar, til dømes kan me sjå på «sommersiidastyre» som «sommer+siidastyre» eller «sommersiida+styre» (eg har valt å halda meg til todelingar av ord, for å unngå for mange alternative kandidatar). Viss omsetjinga er giessesijddastivrra, med analysane giesse+sijddastivrra eller giessesijdda+stivrra, så har me ikkje utan vidare nokon grunn til å velja den eine over den andre (vel, me har lengd i dette tilfellet, men det gjeld ikkje i alle slike døme, og me kan ha par av analysar som er 2–3 eller 3–2). Då kan me heller ikkje seia kva for par av orddelar (s,t) me skal auka når me ser «sommersiidastyre»–giessesijddastivrra i treningsmaterialet. Men viss me i tillegg ser «styre»–stivvra ein annan stad, så har me plutseleg eit grunnlag til å ta ei avgjerd. Metodar som Expectation Maximization kan kombinera relaterte frekvensar på denne måten for å finna fram til gode estimat, men eg har ikkje komme så langt at eg har fått implementert dette enno.

by unhammer atApril 23, 2015 06:11 PM

January 06, 2015

thefastestwaytobreakamachine

NSA-proof SSH

ssh-pictureOne of the biggest takeaways from 31C3 and the most recent Snowden-leaked NSA documents is that a lot of SSH stuff is .. broken.

I’m not surprised, but then again I never am when it comes to this paranoia stuff. However, I do run a ton of SSH in production and know a lot of people that do. Are we all fucked? Well, almost, but not really.

Unfortunately most of what Stribika writes about the “Secure Secure Shell” doesn’t work for old production versions of SSH. The cliff notes for us real-world people, who will realistically be running SSH 5.9p1 for years is hidden in the bettercrypto.org repo.

Edit your /etc/ssh/sshd_config:


Ciphers aes256-ctr,aes192-ctr,aes128-ctr
MACs hmac-sha2-512,hmac-sha2-256,hmac-ripemd160
KexAlgorithms diffie-hellman-group-exchange-sha256

sshh
Basically the nice and forward secure aes-*-gcm chacha20-poly1305 ciphers, the curve25519-sha256 Kex algorithm and Encrypt-Then-MAC message authentication modes are not available to those of us stuck in the early 2000s. That’s right, provably NSA-proof stuff not supported. Upgrading at this point makes sense.

Still, we can harden SSH, so go into /etc/ssh/moduli and delete all the moduli that have 5th column < 2048, and disable ECDSA host keys:

cd /etc/ssh
mkdir -p broken
mv moduli ssh_host_dsa_key* ssh_host_ecdsa_key* ssh_host_key* broken
awk '{ if ($5 > 2048){ print } }' broken/moduli > moduli
# create broken links to force SSH not to regenerate broken keys
ln -s ssh_host_ecdsa_key ssh_host_ecdsa_key
ln -s ssh_host_dsa_key ssh_host_dsa_key
ln -s ssh_host_key ssh_host_key

Your clients, which hopefully have more recent versions of SSH, could have the following settings in /etc/ssh/ssh_config or .ssh/config:

Host all-old-servers

    Ciphers aes256-gcm@openssh.com,aes128-gcm@openssh.com,chacha20-poly1305@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
    MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-ripemd160-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-ripemd160
    KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256

Note: Sadly, the -ctr ciphers do not provide forward security and hmac-ripemd160 isn’t the strongest MAC. But if you disable these, there are plenty of places you won’t be able to connect to. Upgrade your servers to get rid of these poor auth methods!

Handily, I have made a little script to do all this and more, which you can find in my Gone distribution.

There, done.

sshh obama

Updated Jan 6th to highlight the problems of not upgrading SSH.
Updated Jan 22nd to note CTR mode isn’t any worse.
Go learn about COMSEC if you didn’t get trolled by the title.

by kacper atJanuary 06, 2015 04:33 PM

December 08, 2014

thefastestwaytobreakamachine

sound sound

Intermission..

Recently I been doing some video editing.. less editing than tweaking my system tho.
If you want your jack output to speak with Kdenlive, a most excellent video editing suite,
and output audio in a nice way without choppyness and popping, which I promise you is not nice,
you’ll want to pipe it through pulseaudio because the alsa to jack stuff doesn’t do well with phonom, at least not on this convoluted setup.

Remember, to get that setup to work, ALSA pipes to jack with the pcm.jack { type jack .. thing, and remove the alsa to pulseaudio stupidity at /usr/share/alsa/alsa.conf.d/50-pulseaudio.conf

So, once that’s in place, it won’t play even though Pulse found your Jack because your clients are defaulting out on some ALSA device… this is when you change /etc/pulse/client.conf and set default-sink = jack_out.

by kacper atDecember 08, 2014 12:18 AM

February 24, 2013

Bjørn Venn

Chromebook; a real cloud computer – but will it work in the clouds?

Lyst på én? Den er ikke i salg i Norge enda, men du kan kjøpe den på Amazon. Les her hvordan jeg kjøpte min på Amazon (bla litt nedover på siden). Med norsk moms, levert til Rimi-butikken 100 meter fra der jeg bor, kom den på 1.850 kroner. Det er den så absolutt verdt:)

by Bjorn Venn atFebruary 24, 2013 07:34 PM

February 22, 2013

Bjørn Venn

Hvem klarer å skaffe meg en slik før påske?

Chromebook pixel

Den nye Chromebook-en til Google, Chromebook Pixel. Foreløbig kun i salg i USA og UK via Google Play og BestBuy.

Verden er urettferdig:)

by Bjorn Venn atFebruary 22, 2013 12:44 PM

October 31, 2011

Anders Nordby

Taile wtmp-logg i 64-bit Linux med Perl?

Jeg liker å la ting skje hendelsesbasert, og har i den forbindelse lagd et script for å rsynce innhold etter opplasting med FTP. Jeg tailer da wtmp-loggen med Perl, og starter sync når brukeren er eller har blitt logget ut (kort idle timeout). Å taile wtmp i FreeBSD var noe jeg for lenge siden fant et fungerende eksempel på nettet:
$typedef = 'A8 A16 A16 L'; $sizeof = length pack($typedef, () ); while ( read(WTMP, $buffer, $sizeof) == $sizeof ) { ($line, $user, $host, $time) = unpack($typedef, $buffer); # Gjør hva du vil med disse verdiene her }
FreeBSD bruker altså bare verdiene line (ut_line), user (ut_name), host (ut_host) og time (ut_time), jfr. utmp.h. Linux (x64, hvem bryr seg om 32-bit?) derimot, lagrer en hel del mer i wtmp-loggen, og etter en del Googling, prøving/feiling og kikking i bits/utmp.h kom jeg frem til:
$typedef = "s x2 i A32 A4 A32 A256 s2 l i2 i4 A20"; $sizeof = length pack($typedef, () ); while ( read(WTMP, $buffer, $sizeof) == $sizeof ) { ($type, $pid, $line, $id, $user, $host, $term, $exit, $session, $sec, $usec, $addr, $unused) = unpack($typedef, $buffer); # Gjør hva du vil med disse verdiene her }
Som bare funker, flott altså. Da ser jeg i sanntid brukere som logger på og av, og kan ta handlinger basert på dette.

by Anders (noreply@blogger.com) atOctober 31, 2011 07:37 PM

A complete feed is available in any of your favourite syndication formats linked by the buttons below.

[RSS 1.0 Feed] [RSS 2.0 Feed] [Atom Feed] [FOAF Subscriptions] [OPML Subscriptions]

Subscriptions