Wednesday, September 9, 2009

Linux|"Argument list too long"

"Argument list too long": Beyond Arguments and Limitations

"Argument list too long": Beyond Arguments and Limitations

May 9th, 2002 by Alessandre S. Naro in

Four approaches to getting around argument length limitations on the command line.

At some point during your career as a Linux user, you may have come across the following error:

[user@localhost directory]$ mv * ../directory2 bash: /bin/mv: Argument list too long 

The "Argument list too long" error, which occurs anytime a user feeds too many arguments to a single command, leaves the user to fend for oneself, since all regular system commands (ls *, cp *, rm *, etc...) are subject to the same limitation. This article will focus on identifying four different workaround solutions to this problem, each method using varying degrees of complexity to solve different potential problems. The solutions are presented below in order of simplicity, following the logical principle of Occam's Razor: If you have two equally likely solutions to a problem, pick the simplest.

Method #1: Manually split the command line arguments into smaller bunches.

Example 1

[user@localhost directory]$ mv [a-l]* ../directory2 [user@localhost directory]$ mv [m-z]* ../directory2 

This method is the most basic of the four: it simply involves resubmitting the original command with fewer arguments, in the hope that this will solve the problem. Although this method may work as a quick fix, it is far from being the ideal solution. It works best if you have a list of files whose names are evenly distributed across the alphabet. This allows you to establish consistent divisions, making the chore slightly easier to complete. However, this method is a poor choice for handling very large quantities of files, since it involves resubmitting many commands and a good deal of guesswork.

Method #2: Use the find command.

Example 2

[user@localhost directory]$ find $directory -type f -name '*' -exec mv {} $directory2/. \; 

Method #2 involves filtering the list of files through the find command, instructing it to properly handle each file based on a specified set of command-line parameters. Due to the built-in flexibility of the find command, this workaround is easy to use, successful and quite popular. It allows you to selectively work with subsets of files based on their name patterns, date stamps, permissions and even inode numbers. In addition, and perhaps most importantly, you can complete the entire task with a single command.

The main drawback to this method is the length of time required to complete the process. Unlike Method #1, where groups of files get processed as a unit, this procedure actually inspects the individual properties of each file before performing the designated operation. The overhead involved can be quite significant, and moving lots of files individually may take a long time.

Method #3: Create a function. *

Example 3a

function large_mv () {       while read line1; do                 mv directory/$line1 ../directory2         done } ls -1 directory/ | large_mv 

Although writing a shell function does involve a certain level of complexity, I find that this method allows for a greater degree of flexibility and control than either Method #1 or #2. The short function given in Example 3a simply mimics the functionality of the find command given in Example 2: it deals with each file individually, processing them one by one. However, by writing a function you also gain the ability to perform an unlimited number of actions per file still using a single command:

Example 3b

function larger_mv () {       while read line1; do                 md5sum directory/$line1 >>  ~/md5sums                 ls -l directory/$line1 >> ~/backup_list                 mv directory/$line1 ../directory2         done } ls -1 directory/ | larger_mv 

Example 3b demonstrates how you easily can get an md5sum and a backup listing of each file before moving it.

Unfortunately, since this method also requires that each file be dealt with individually, it will involve a delay similar to that of Method #2. From experience I have found that Method #2 is a little faster than the function given in Example 3a, so Method #3 should be used only in cases where the extra functionality is required.

Method #4: Recompile the Linux kernel. **

This last method requires a word of caution, as it is by far the most aggressive solution to the problem. It is presented here for the sake of thoroughness, since it is a valid method of getting around the problem. However, please be advised that due to the advanced nature of the solution, only experienced Linux users should attempt this hack. In addition, make sure to thoroughly test the final result in your environment before implementing it permanently.

One of the advantages of using an open-source kernel is that you are able to examine exactly what it is configured to do and modify its parameters to suit the individual needs of your system. Method #4 involves manually increasing the number of pages that are allocated within the kernel for command-line arguments. If you look at the include/linux/binfmts.h file, you will find the following near the top:

/*  * MAX_ARG_PAGES defines the number of pages allocated for   arguments  * and envelope for the new program. 32 should suffice, this gives  * a maximum env+arg of 128kB w/4KB pages!  */ #define MAX_ARG_PAGES 32 

In order to increase the amount of memory dedicated to the command-line arguments, you simply need to provide the MAX_ARG_PAGES value with a higher number. Once this edit is saved, simply recompile, install and reboot into the new kernel as you would do normally.

On my own test system I managed to solve all my problems by raising this value to 64. After extensive testing, I have not experienced a single problem since the switch. This is entirely expected since even with MAX_ARG_PAGES set to 64, the longest possible command line I could produce would only occupy 256KB of system memory--not very much by today's system hardware standards.

The advantages of Method #4 are clear. You are now able to simply run the command as you would normally, and it completes successfully. The disadvantages are equally clear. If you raise the amount of memory available to the command line beyond the amount of available system memory, you can create a D.O.S. attack on your own system and cause it to crash. On multiuser systems in particular, even a small increase can have a significant impact because every user is then allocated the additional memory. Therefore always test extensively in your own environment, as this is the safest way to determine if Method #4 is a viable option for you.

Conclusion

While writing this article, I came across many explanations for the "Argument list too long" error. Since the error message starts with "bash:", many people placed the blame on the bash shell. Similarly, seeing the application name included in the error caused a few people to blame the application itself. Instead, as I hope to have conclusively demonstrated in Method #4, the kernel itself is to "blame" for the limitation. In spite of the enthusiastic endorsement given by the original binfmts.h author, many of us have since found that 128KB of dedicated memory for the command line is simply not enough. Hopefully, by using one of the methods above, we can all forget about this one and get back to work.

Notes:

* All functions were written using the bash shell.

** The material presented in Method #4 was gathered from a discussion on the linux-kernel mailing list in March 2000. See the "Argument List too Long" thread in the linux-kernel archives for the full discussion.

Tuesday, August 25, 2009

Java| Thread Dump

My Load Test » Java Thread Dump

Java Thread Dump

A Java thread dump is a way of finding out what every thread in the JVM is doing at a particular point in time. This is especially useful if your Java application sometimes seems to hang when running under load, as an analysis of the dump will show where the threads are stuck.

You can generate a thread dump under Unix/Linux by running kill -QUIT <pid>, and under Windows by hitting Ctl + Break.

A great example of where this would be useful is the well-known Dining Philosophers deadlocking problem. Taking example code from Concurrency: State Models & Java Programs, we can cause a deadlock situation and then create a thread dump.

Thursday, August 13, 2009

Django|SerializedDataField

Custom Fields in Django | David Cramer's Blog

I was helping someone today in the Django IRC channel and the question came across about storing a denormalized data set in a single field. Typically I do such things by either serializing the data, or by separating the values with a token (comma for example).

Django has a built-in field type for CommaSeparatedIntegerField, but most of the time I'm storing strings, as I already have the integers available elsewhere. As I began to answer the person's question by giving him an example of usage of serialization + custom properties, until I realized that it would be much easier to just write this as a Field subclass.

So I quickly did, and replaced a few lines of repetitive code with two new field classes in our source:

Update: There were some issues with my understanding of how the metaclass was working. I've corrected the code and it should function properly now.

SerializedDataField

This field is typically used to store raw data, such as a dictionary, or a list of items, or could even be used for more complex objects.

 from django.db import models   try:     import cPickle as pickle except:     import pickle   import base64   class SerializedDataField(models.TextField):     """Because Django for some reason feels its needed to repeatedly call     to_python even after it's been converted this does not support strings."""     __metaclass__ = models.SubfieldBase       def to_python(self, value):         if value is None: return         if not isinstance(value, basestring): return value         value = pickle.loads(base64.b64decode(value))         return value       def get_db_prep_save(self, value):         if value is None: return         return base64.b64encode(pickle.dumps(value))

SeparatedValuesField

An alternative to the CommaSeparatedIntegerField, it allows you to store any separated values. You can also optionally specify a token parameter.

 from django.db import models   class SeparatedValuesField(models.TextField):     __metaclass__ = models.SubfieldBase       def __init__(self, *args, **kwargs):         self.token = kwargs.pop('token', ',')         super(SeparatedValuesField, self).__init__(*args, **kwargs)       def to_python(self, value):         if not value: return         if isinstance(value, list):             return value         return value.split(self.token)       def get_db_prep_value(self, value):         if not value: return         assert(isinstance(value, list) or isinstance(value, tuple))         return self.token.join([unicode(s) for s in value])       def value_to_string(self, obj):         value = self._get_val_from_obj(obj)         return self.get_db_prep_value(value)

Friday, August 7, 2009

Django|Tips to keep your Django/mod_python memory usage down

Tips to keep your Django/mod_python memory usage down - WebFaction

Tips to keep your Django/mod_python memory usage down

Updated Jan 28 at 04:44 CDT (first posted May 30 at 09:57 CDT) by Remi in Django, Memory, Tips  - 16 comment(s)

Most people manage to run their Django site on mod_python within the memory limits of their "Shared 1" or "Shared 2" plans but a few people are struggling to stay within the limits.

So here are a few tips that you can use to try and keep your memory usage down when using Django on mod_python:

  • Make sure that you set DEBUG to False in settings.py: if it isn't, set it to False and restart apache. Amongst other things, DEBUG mode stores all SQL queries in memory so your memory usage will quickly increase if you don't turn it off.
  • Use "ServerLimit" in your apache config: by default apache will spawn lots of processes, which will use lots of memory. You can use the "ServerLimit" directive to limit these processes. A value of 3 or 4 is usually enough for most sites if your static data is not served by your Django instance (see below).
  • Check that no big objects are being loaded in memory: for instance, check that your code isn't loading hundreds or thousands of database records in memory all at once. Also, if your application lets people download or upload big files, check that these big files are not being loaded in memory all at once.
  • Serve your static data from our main server: this is a general advice for all django sites: make sure that your static data (images, stylesheets, ...) is served directly by our main apache server. This will save your Django app from having to serve all these little extra requests. Details on how to do that can be found here and here.
  • Use "MaxRequestsPerChild" in your apache config: sometimes there are some slow memory leaks that you can't do anything about (they can be in the tools that you use themselves for instance). If this is the case then you can use the "MaxRequestsPerChild" to tell apache to only serve a certain number of requests before killing the process and starting a fresh one. Reasonable values are usually between 100 and 1000. Another more extreme/uglier version of this technique is to setup a cronjob to run "stop/start" once in a while.
  • Find out and understand how much memory you're using: to find out what your processes are and how much memory they're using, you can run the "ps -u <username> -o pid,rss,command" command, like this:
    1 [testweb14@web14 bin]$ ps -u testweb14 -o pid,rss,command 
    2PID RSS COMMAND 
    323111 1404 -bash 
    427988 3848 /home/testweb14/webapps/django/apache2/bin/httpd -f /home/testweb14/webapps/django/apache2/conf/httpd.c 
    5 27989 10312 /home/testweb14/webapps/django/apache2/bin/httpd -f /home/testweb14/webapps/django/apache2/conf/httpd. 
    627990 9804 /home/testweb14/webapps/django/apache2/bin/httpd -f /home/testweb14/webapps/django/apache2/conf/httpd.c 
    7 28078 760 ps -u testweb14 -o pid,rss,command 
    8[testweb14@web14 bin]$ 
    view plain | print | ?
    As you can see we have three "httpd" processes running that use respectively 3848KB, 10312KB and 9804KB of memory (there are various ways to interpret the memory used by a process on Linux and we have chosen to use the "Resident Set Size" (RSS) or your processes).

    The first one is the apache "supervisor" and the other two are the "workers" (in this example, "ServerLimit" is set to 2). The memory used by the supervisor usually doesn't change too much, but the memory used by the workers can increase greatly if you have bad memory leaks in your application.

    So the total memory used by our Apache/django instance in this example is 3848KB + 10312KB + 9804KB = 23MB.

Wednesday, August 5, 2009

Django|Speed up with NginX, Memcached, and django-compress

How to Speed up Your Django Sites with NginX, Memcached, and django-compress | Code Spatter

How to Speed up Your Django Sites with NginX, Memcached, and django-compress

Posted on April 23rd, 2009 by Greg Allard in Django, Programming, Server Administration | View commentsComments

A lot of these steps will speed up any kind of application, not just django projects, but there are a few django specific things. Everything has been tested on IvyLees which is running in a Debian/Ubuntu environment.

These three simple steps will speed up your server and allow it to handle more traffic.

Reducing the Number of HTTP Requests

Yahoo has developed a firefox extension called YSlow. It analyzes all of the traffic from a website and gives a score on a few categories where improvements can be made.

It recommends reducing all of your css files into one file and all of your js files into one file or as few as possible. There is a pluggable, open source django application available to help with that task. After setting up django-compress, a website will have css and js files that are minified (excess white space and characters are removed to reduce file size). The application will also give the files version numbers so that they can be cached by the web browser and won't need to be downloaded again until a change is made and a new version of the file is created. How to setup the server to set a far future expiration is shown below in the lightweight server section.

Setting up Memcached

Django makes it really simple to set up caching backends and memcached is easy to install.

sudo aptitude install memcached, python-setuptools

We will need setuptools so that we can do the following command.

sudo easy_install python-memcached

Once that is done you can start the memcached server by doing the following:

sudo memcached -d -u www-data -p 11211 -m 64

-d will start it in daemon mode, -u is the user for it to run as, -p is the port, and -m is the maximum number of megabytes of memory to use.

Now open up the settings.py file for your project and add the following line:

CACHE_BACKEND = 'memcached://127.0.0.1:11211/'

Find the MIDDLEWARE_CLASSES section and add this to the beginning of the list:

    'django.middleware.cache.UpdateCacheMiddleware',

and this to the end of the list:

    'django.middleware.cache.FetchFromCacheMiddleware',

For more about caching with django see the django docs on caching. You can reload the server now to try it out.

sudo /etc/init.d/apache2 reload

To make sure that memcached is set up correctly you can telnet into it and get some statistics.

telnet localhost 11211

Once you are in type stats and it will show some information (press ctrl ] and then ctrl d to exit). If there are too many zeroes, it either isn't working or you haven't visited your site since the caching was set up. See the memcached site for more information.

Don't Use Apache for Static Files

Apache has some overhead involved that makes it good for serving php, python, or ruby applications, but you do not need that for static files like your images, style sheets, and javascript. There are a few options for lightweight servers that you can put in front of apache to handle the static files. Lighttpd (lighty) and nginx (engine x) are two good options. Adding this layer in front of your application will act as an application firewall so there is a security bonus to the speed bonus.

There is this guide to install a django setup with nginx and apache from scratch. If you followed my guide to set up your server or already have apache set up for your application, then there are a few steps to get nginx handling your static files.

sudo aptitude install nginx

Edit the config file for your site (sudo nano /etc/apache2/sites-available/default) and change the port from 80 to 8080 and change the ip address (might be *) to 127.0.0.1. The lines will look like the following

NameVirtualHost 127.0.0.1:8080 <VirtualHost 127.0.0.1:8080>

Also edit the ports.conf file (sudo nano /etc/apache2/ports.conf) so that it will listen on 8080.

Listen 8080

Don't restart the server yet, you want to configure nginx first. Edit the default nginx config file (sudo nano /etc/nginx/sites-available/default) and find where it says

        location / {                root   /var/www/nginx-default;                index  index.html index.htm;         }

and replace it with

location / {     proxy_pass http://192.168.0.180:8080;     proxy_redirect off;     proxy_set_header Host $host;     proxy_set_header X-Real-IP $remote_addr;     proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;     client_max_body_size 10m;     client_body_buffer_size 128k;     proxy_connect_timeout 90;     proxy_send_timeout 90;     proxy_read_timeout 90;     proxy_buffer_size 4k;     proxy_buffers 4 32k;     proxy_busy_buffers_size 64k;     proxy_temp_file_write_size 64k;  } location /files/ {     root /var/www/myproject/;     expires max; }

/files/ is where I've stored all of my static files and /var/www/myproject/ is where my project lives and it contains the files directory.

Set static files to expire far in the future

expires max; will tell your users' browsers to cache the files from that directory for a long time. Only use that if you are use those files won't change. You can use expires 24h; if you aren't sure.

Configure gzip

Edit the nginx configuration to use gzip on all of your static files (sudo nano /etc/nginx/nginx.conf). Where it says gzip on; make sure it looks like the following:

    gzip  on;     gzip_comp_level 2;     gzip_proxied any;     gzip_types      text/plain text/html text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript;

The servers should be ready to be restarted.

sudo /etc/init.d/apache2 reload sudo /etc/init.d/nginx reload

If you are having any problems I suggest reading through this guide and seeing if you have something set up differently.

Speedy Django Sites

Those three steps should speed up your server and allow for more simultaneous visitors. There is a lot more that can be done, but getting these three easy things out of the way first is a good start.

Django|compression and other best practices

Speed up Django with far-future expires, compression and other best practices — Greg Brown

Speed up Django with far-future expires, compression and other best practices

As a web developer with a shoddy rural internet connection, I'm always interested in speeding up my sites. One technique for doing this is far-future expires — i.e. telling the browser to cache media requests forever, then changing the uri when the media changes. In this article, I outline how to implement this and several other techniques in django.

Goals

  1. Reduce http requests for css and js files to a bare minimum.
  2. Add far-future-expires headers to all static content
  3. Gzip all css and js content
  4. Reduce css/js filesize by minification

The django-compress App

First up, I installed the django-compress app. I was about to build my own solution when I realised this one did exactly what I needed — gotta love the django community. Configuring is straightforward — the project wiki has articles on installation, configuration, and usage..

I copied the compress/ directory into my django/library/ folder (which is on the python path) and added "compress" to my INSTALLED_APPS.

Apache Configuration

Once I had django-compress up and running, I had achieved goal #1. To achieve #2 and #3 I needed to configure apache to send the right headers along with each request. To do this, I put the following directive in my httpd.conf file:

<DirectoryMatch /path-to-django-projects/([^/]+)/media>      Order allow,deny     Allow from all      # Insert mod_deflate filter     SetOutputFilter DEFLATE     # Netscape 4.x has some problems...     BrowserMatch ^Mozilla/4 gzip-only-text/html     # Netscape 4.06-4.08 have some more problems     BrowserMatch ^Mozilla/4\.0[678] no-gzip     # MSIE masquerades as Netscape, but it is fine     BrowserMatch \bMSIE !no-gzip !gzip-only-text/html     # Don't compress images     SetEnvIfNoCase Request_URI \     \.(?:gif|jpe?g|png)$ no-gzip dont-vary     # Make sure proxies don't deliver the wrong content     Header append Vary User-Agent env=!dont-vary      # MOD EXPIRES SETUP     ExpiresActive on     ExpiresByType text/javascript "access plus 10 year"     ExpiresByType application/x-javascript "access plus 10 year"     ExpiresByType text/css "access plus 10 years"     ExpiresByType image/png  "access plus 10 years"     ExpiresByType image/x-png  "access plus 10 years"     ExpiresByType image/gif  "access plus 10 years"     ExpiresByType image/jpeg  "access plus 10 years"     ExpiresByType image/pjpeg  "access plus 10 years"     ExpiresByType application/x-flash-swf  "access plus 10 years"     ExpiresByType application/x-shockwave-flash  "access plus 10 years"      # No etags as we're using far-future expires     FileETag none  </DirectoryMatch> 

Notes

  • <DirectoryMatch /path-to-django-projects/([^/]+)/media> is equivalent to writing <Directory /path-to-django-projects/site-name/media> for each site.
  • mod_deflate configuration directives from the Apache site.
  • Note that I'm sending far-future-expires headers for images and flash too — at this stage, that means I have to manually change the filenames whenever I change the content.

This means that for all my django sites' media directories:

  • static content (except images) is gzipped via mod_deflate
  • everything gets a header telling the browser to cache it for 10 years

Note you will need mod_deflate and mod_expires enabled in your apache config - if you have apache 2.2 it should just be a matter of copying the relevant files from apache2/mods-available/ to /apache2/mods-enabled/.

Minification

Step #4 was the trickiest of the lot, and many would argue that it's not really worth the trouble. Depending on how verbosely you comment your js and css, it may or may not be worthwhile for you — personally, I just thought I may as well go the whole hog. In the end, I probably only saved a few percent worth of bandwidth for my small content sites, but it'll be more significant with js-heavy web-apps.

For js minification, django-compress comes with jsmin built in. I've found this to be ideal for the job, and it is enabled by default.

For CSS, django-compress comes with CSSTidy — a CSS parser and optimiser — built in, in the form of csstidy_python. (You can also use a csstidy binary if you have one installed.) Personally, I find CSSTidy messes with my css, and more significantly, messes with that of my css framework of choice, 960.gs. I was after something that simply stripped whitespace, newlines and comments, without parsing the code. After scouring the web, I came across Slimmer — a lightweight pyhon app that did exactly what I needed. After installing it, I added the following file to the django-compress app's filters directory.

#compress/filters/slimmer_css/__init__.py  import slimmer from compress.filter_base import FilterBase  class SlimmerCSSFilter(FilterBase):     def filter_css(self, css):         return slimmer.css_slimmer(css) 

Then it was simply a matter of adding the following line to my settings.py file, as per the django-compress documentation:

COMPRESS_CSS_FILTERS = ('compress.filters.slimmer_css.SlimmerCSSFilter',) 

So my complete django-compress configuration in settings.py was as follows:

# compress app settings COMPRESS_CSS = {     'all': {         'source_filenames': (             'css/lib/reset.css',             'css/lib/text.css',             'css/lib/960.css',             'css/style.css',         ),         'output_filename': 'compress/c-?.css',         'extra_context': {             'media': 'screen,projection',         },     },      # other CSS groups goes here } COMPRESS_JS = {     'all': {         'source_filenames': ('js/lib/jquery.js', 'js/behaviour.js',),         'output_filename': 'compress/j-?.js',     }, }  COMPRESS = True COMPRESS_VERSION = True COMPRESS_CSS_FILTERS = ('compress.filters.slimmer_css.SlimmerCSSFilter',) 

Other best practices

I keep all my css within the <head> tags, and js at the bottom of the page — this is because the page browser needs to download all the css before it can start rendering the page, but doesn't need the js. It doesn't actually speed up the site, but it gives the impression of loading faster, and the user is unlikely to click on anything before the js has loaded anyway.

For a definitive guide, see Yahoo's performance rules. I also recommend Yahoo's YSlow, and if you are one of the 3 remaining web developers without it, Firebug.

Monday, July 27, 2009

Sphinx|In-Depth django-sphinx Tutorial

In-Depth django-sphinx Tutorial | David Cramer's Blog

Again, I still suck at documentation, and my "tutorials" aren't in-depth enough. So hopefully this covers all of the questions regarding using the django-sphinx module.

The first thing you're going to need to do is install the Sphinx search software. You will be able to get this through http://www.sphinxsearch.com/, or probably even port or aptitude.

Configure Sphinx

Once you have successfully installed Sphinx you need to configure it. Follow the directions in on their website for the basic configuration, but most importantly, you need to configure a search index which can relate to one of your models.

Here is an example of an index from Curse's File model, which let's you search via name, description, and tags on a file. Please note, that "base" is a base source definition we created which has a few defaults which we use, but this is unrelated to your source definition.

source files_file_en : base { sql_query			= \ 	SELECT files_file.id, files_file.name, files_data.description, files_file.tags as tag \ 	FROM files_file JOIN files_data \ 	ON files_file.id = files_data.file_id \ 	AND files_data.lang = 'en' \ 	AND files_file.visible = 1 \ 	GROUP BY files_file.id sql_query_info		= SELECT * FROM files_file WHERE id=$id } 

Now that you have your source defined, you need to build an index which uses this source. I do recommend placing all of your sphinx information somewhere else, maybe /var/sphinx/data.

index files_file_en { 	source			= files_file_en 	path			= /var/data/files_file_en 	docinfo			= extern 	morphology			= none 	stopwords			= 	min_word_len		= 2 	charset_type		= sbcs 	min_prefix_len		= 0 	min_infix_len		= 0 }

Configure Django

Now that you've configured your search index you need to setup the configuration for Django. The first step to doing this is to install the django-sphinx wrapper. First things first, download the zip archive, or checkout the source from http://code.google.com/p/django-sphinx/.

Once you have your files on the local computer or server, you can simple do sudo python setup.py install to install the library.

After installation you need to edit a few settings in settings.py, which, again, being that I suck at documentation, isn't posted on the website.

The two settings you need to add are these:

SPHINX_SERVER = 'localhost' SPHINX_PORT = 3312 

Setup Your Model

Now you are fully able to utilize Sphinx within Django. The next step is to actually attach your search index to a model. To do this, you will need to import djangosphinx and then attach the manager to a model. See the example below:

from django.db import models import djangosphinx   class File(models.model):     name = models.CharField()     tags = models.CharField() # We actually store tags for efficiency in tag,tag,tag format here       objects = models.Manager()     search  = djangosphinx.SphinxSearch(index="files_file_en")

The index argument is optional, and there are several other parameters you can pass, but you'll have to look in the code (or pydoc if I did it right, but probably not).

Once we've defined the search manager on our model, we can access it via Model.manager_name and pass it many things like we could with a normal object manager in Django. The typical usage is Model.search.query('my fulltext query') which would then query sphinx, grab a list of IDs, and then do a Model.objects.filter(pk__in=[list of ids]) and return this result set.

Search Methods

There are a few additional methods which you can use on your search queryset besides the default query method. order_by, filter, count, and exclude to name a few. These don't *quite* work the same as Django's as they're used directly within the search wrapper. So here's a brief rundown of these:

  • query
    This is your basic full-text search query. It works exactly the same as passing your query to the full-text engine. It's search type will be based on the search mode, which, by default, is SPH_MATCH_EXTENDED.
  • filter/exclude
    The filter and excludes method holds the same idea as the normal queryset methods, except that it is used directly in Sphinx. What this means, is that you can only filter on attribute fields that are present in your search index.
  • order_by
    The order_by method also passes its parameters to Sphinx, with one exception. There are four reserved keywords: @id, @weight, @rank, and @relevance. These are detailed in the Sphinx documentation.
  • select_related
    This method is directly passed onto the Django queryset and holds no value to Sphinx.
  • index_on
    Allows you to specify which index(es) you are querying for. To query for multiple indexes you need to include a "content_type" name in your fields.

Sphinx works with Mysql and Postgres, just remember to run configure with the –with-pgsql option.

Seems the search command line tools doesn't like postgres though..

Saturday, July 25, 2009

Django|snippets: MediaWiki Markup

Django snippets: MediaWiki Markup

MediaWiki-style markup parse(text) -- returns safe-html from wiki markup code based off of mediawiki 

Thursday, July 23, 2009

SSH| User Howto

SSH User Howto - OIWiki

SSH User Howto

From OIWiki

Jump to: navigation, search

Contents

[hide]

About SSH

What is Secure Shell (SSH)

See WikiPedia if you need a full rundown Wikipedia SSH page Essentially it is an protocol for connecting to and executing commands on a remote system, using a secure encrypted tunnel.

What is SFTP

SFTP is a File Transfer Protocol that uses an SSH to authenticate and encrypt its traffic. It is essentially a sub-service of the SSH server.

Why not use FTP or RSH etc

Both FTP and RSH use no encryption and pass passwords over the network in plain text. This makes it possible for the passwords to be captured in a number of ways, which is obviously bad for the users and the systems security. Therefore whenever possible SSH/SFTP should be used for file transfers and remote connections - use of FTP or RSH in legacy applications requires Infosec approval.

What are SSH keys?

One of the ways SSH improves security is to identify a user by use of "key". The benefit of a key is that at no time does a password or even the key traverse the network, a challenge response mechanism is used to validate that the incoming user is who they say they are.

This works because there is a "private" and a "public" key. Basically the client proves who they are by exchanging values with the server using the public keys. The client uses the private key to provide a response which validates that it has the private key associated with the public key. When the server validates a correct response, it allows access.

By using keys, there is no need to provide passwords, which allows non-interactive or passwordless interactive logins.

SSH Versions

In UNIX, there are two main flavours of SSH in common usage - OpenSSH and Secure Shell (commercial SSH). Both are compatible, but each is slightly different in syntax, file formats and configuration. The simple way to tell is once you are on a system, use the "ssh -V" command to identify what version you are using.

The below sections outline the basic usage of each from a user perspective, as well as how to work between the two versions when the need arises.

OpenSSH

Identification

OpenSSH is the more common version and "ssh -V" will have either OpenSSH or OpenSSL in the output:

$ ssh -V Sun_SSH_1.1, SSH protocols 1.5/2.0, OpenSSL 0x0090704f  $ ssh -V OpenSSH_3.6.1p2, SSH protocols 1.5/2.0, OpenSSL 0x0090701f 

Configuration

OpenSSH stores its configuration files under the ".ssh" directory of the users home directory.

By default, it will identify a user using the keyfiles "~/.ssh/id_dsa" and "~/.ssh/id_rsa"

It will validate an incoming user by matching public keys stored in the "~/.ssh/authorized_keys" file.

Creating Keys

To create a key using OpenSSH, use the ssh-keygen command. The command below says to create a key using DSA encryption of 1024 bit strength with no passphrase (-N) to the file id_dsa:

$ cd ~/.ssh $ ssh-keygen -t dsa -b 1024 -N -f id_dsa 

You will see two files created - and id_dsa and an id_dsa.pub. The id_dsa will now be used by the ssh command to attempt to authenticate you to other servers.

If you have both RSA and DSA keys created, it will try them both.

Allowing Access

To allow a remote user to login to your account using SSH, you simply need to append their public key to your ~/.ssh/authorized_keys file. For example:

$ cat bobskey.pub >> ~/.ssh/authorized_keys 

Be sure the public key is in the OpenSSH format however. If it is in the SecureSSH format, use the ssh-keygen command to convert it:

$ ssh-keygen -i -f secsshkey.pub > opensshkey.pub 

You can then append the converted key to the authorized_keys file

Secure SSH

Identification

Secure SSH is a commercially produced SSH implementation. In its version output it will not reference OpenSSL and generally says a vendor:

$ ssh -V ssh2: SSH Secure Shell 2.4.0 on alphaev56-dec-osf4.0e  $ ssh -V ssh: SSH Tectia Server 4.0.5 on powerpc-ibm-aix5.1.0.0 

Configuration

Secure SSH stores its configuration under the ".ssh2" directory of a users home directory.

By default it identifies a user using key files listed the "~/.ssh2/identification" file.

It will validate an incoming user by matching public key files listed in the "~/.ssh2/authorization" file.

Creating Keys

To create a key using Secure SSH, again use the ssh-keygen command. The command below says to create a key using DSA encryption of 1024 bit strength with no passphrase (-P):

$ ssh-keygen -t dsa -b 1024 -P Generating 1024-bit dsa key pair    5 oOOo.oOo.oOo Key generated. 1024-bit dsa, root@o9030004, Thu May 31 2007 13:12:37 Private key saved to //.ssh2/id_dsa_1024_a Public key saved to //.ssh2/id_dsa_1024_a.pub 

To use this key to authenticate to remote servers, append the filename to the identification file as such:

$ echo "Key id_dsa_1024_a" >> ~/.ssh2/identification 

You can append multiple lines to this file and the SSH client will attempt them in order.

Allowing Access

To allow remote access, you need to copy the public key file (i.e id_dsa_1024_a.pub) to the remote system and place it under the users .ssh2 directory. You then need to list the key file in the ~/.ssh2/authorization file as such:

$ echo "Key  id_dsa_1024_a.pub" >> ~/.ssh2/authorization 

Generally it is helpful to identify the server and user that the key is from by the filename, for example "cdun1410-ipg_as.pub" so you know what file is what.

If you are copying the public key from an OpenSSH system, then you need to first convert it to the SECSSH format by using the ssh-keygen command on the OpenSSH system:

openssh$ ssh-keygen -e -f id_dsa.pub > securessh.pub 

You can then copy the resulting key and place on the Secure SSH system.

Working between Secure and OpenSSH

Both versions are just as secure, it just happens that the commercial version is made by a company called "Secure Communications". Both are also compatible and able to exchange keys provided that, as above, you convert the key files for use on the systems as needed.

The simple way to tell if you have a SecureSSH or OpenSSH key file is by viewing it.

A SecureSSH key file will have a "BEGIN SSH2" and "END SSH2" line surrounding the actual key text.

An OpenSSH private key will have BEGIN <keytype> PRIVATE KEY" around the key, and the public keys begin with either "ssh-dss" or "ssh-rsa" followed by the key text in a single line.

Common Issues

As a first step, try using ssh with the "-v" flag to get more verbose details on what the SSH client is attempting to do. Pay attention to what key files it attempts to use, and the responses from the remote server.

User Accounts

Even though you may authenticate to a server correctly, SSH is still at the mercy of the user account on the remote system. If the account is locked, expired or otherwise inaccessable it will appear as if the SSH connection is simply disconnecting.

If you can login using a password, then the account is ok. Most likely you have an issue with your configuration or key files as described below.

Key location

As a first step, make sure you are not using the wrong configuration directory. OpenSSH will not look at a ~/.ssh2 directory, and Commercial SSH wont look at a ~/.ssh directory.

Key Formats

Ensure that the key files have been converted as appropriate on the server system. See the above sections for details on conversion.

File Permissions

One of the more common gotchas with SSH is that it is militantly pedantic about file permissions. If the file permissions are not secure enough, SSH will ignore the key completely. This applies to the users configuration directory (~/.ssh or ~/.ssh2) as well as the key files. If the users home directory or the .ssh directory is writable by anyone other than the user, SSH will ignore it and all its contents competely. This applies to both the SSH client and the SSH server.

Here is what your permissions should be:

  • user home directory - 0755
  • ssh directory - 0755
  • private key files - 0400
  • public key files - 0644
  • other config files - 0644

As a first step, these permissions should be validated and set on both the client and the server to ensure that the SSH command is not ignoring your key files.

Troubleshooting

  1. Ensure you can login to the remote system interactivly with a password - if you cannot, you have account issues and should raise a Clarify case to the administrators of the system you are connecting to.
  2. Verify you have created the correct private and public keys on the client system (i.e the system you are initiating the connection from).
  3. Verify the permissions of those files are correct
  4. use ssh -v and verify that the ssh client is attempting to use the key files
  5. On the remote system, verify the public keys are installed, are converted to the correct format, in their correct locations and have the correct permissions
  6. If you still cannot connect, try from another system that you know does work or has worked, to ensure that there is not some other change on the server system preventing your connection
  7. If you get to here, you should raise a clarify case to your systems administration group for investigation

Git| common commands

Git - Fast Version Control System

Commands

Here is a list of the most common commands you're likely to use on a day-to-day basis.

Local Commands

git config Get and set repository or global options
git init Create an empty git repository or reinitialize an existing one
git add Add file contents to the index
git status Show the working tree status
git commit Record changes to the repository
git log Show commit history
git show Show information on any object
git tag Create, list, delete or verify tags

Remotey Commands

git clone Clone a repository into a new directory
git remote Manage set of tracked repositories
git pull Fetch from and merge with another repository or a local branch
git fetch Download objects and refs from another repository
git push Update remote refs along with associated objects

Branchy Commands

git checkout Checkout a branch or paths to the working tree
git branch List, create, or delete branches
git merge Join two or more development histories together
git rebase Forward-port local commits to the updated upstream head

Patchy Commands

git diff Show changes between commits, commit and working tree, etc
git apply Apply a patch on a git index file and a working tree
git format-patch Prepare patches for e-mail submission
git am Apply a series of patches from a mailbox