Next-in-Thread Next Message

Question problem with IDE and Linux SMP 

Forum: PCMCIA ATA/IDE Device Issues
Date: 2000, Apr 24
From: Peter A. Castro doctor

About 6 months ago I added a second processor to my system and rebuilt a linux 2.2.14 kernel with SMP. I'd rebuild PCMCIA-CS 3.1.14 under this config. Since then, I've been experiencing hangs when I do large or continuous file copies from either a SanDisk PC Card II (ATA Flash) or a SanDisk CF in a SanDisk CF Adapter. Both appear to the system as ATA/IDE drives. At the moment I'm only reading data from these cards. I haven't tried writing to them for fear that they will be trashed. I've loaded a large file (test1.dat) onto the card from a friends machine for testing. This file is about 20Mb. I also have a bunch of smaller files (~512k) whose sum totals more that 20Mb. If I attempt to copy anything that takes more that 3-4 seconds or totals more than 8Mb in size, I get into a condition where the device no longer responds; the console keeps saying that some operation timed-out (see logs below) because the resource was busy. I also see in the process list what looks like an attempt by the system to umount the card! At this point, I must reboot to clear the condition, and because of the resource problem a clean shutdown isn't possible.

If I take the same kernel config and rebuild without SMP (and rebuild PCMCIA-CS under the non-SMP config) things work normally. Large copies complete without any problems.

Has anyone else experienced this? Any ideas as to what's going on? Below are logs of a copy of the large file. I suspect a race condition having to do with SMP, but where I couldn't really guess. I don't really want to have to go back to a single-processor config. Any help would be greatly appreciated!

The PCMCIA card reader is an Antec DataChute PCMCIA ISA reader/writer. It's based on the VIA83C469 chipset. As my config shows below, I use the i82365 module. I've tried every changing every config parameter available, in various combinations, for the i82365 without any real success.

My /etc/pcmcia.conf is pretty simple:
 ------------------------------------------------------------
 PCMCIA=yes
 PCIC=i82365
 PCIC_OPTS="irq_list=5"
 CARDMGR_OPTS="-v"
 ------------------------------------------------------------

Here's the syslog of what happened.  This card has always
reported a status=0x51 and error=0x04 apon insertion, but has
never given me any problems (under a UP kernel).  Any idea
what these errors represent??
 ------------------------------------------------------------
 Linux PCMCIA Card Services 3.1.14
   kernel build: 2.2.14 #3 SMP Sun Feb 27 19:54:47 PST 2000
   options:  [pci] [cardbus] [pnp]
 PnP: PNP BIOS installation structure at 0xc00f7430
 PnP: PNP BIOS version 1.0, entry at f0000:6d84, dseg at f0000
 Intel PCIC probe:
   VIA VT83C469 rev 00 ISA-to-PCMCIA at port 0x3e0 ofs 0x00
     host opts [0]: [ring]
     host opts [1]: [ring]
     ISA irqs (default) = 5 polling interval = 1000 ms
 cs: IO port probe 0x1000-0x17ff: clean.
 cs: IO port probe 0x0100-0x04ff: excluding 0x140-0x15f
 cs: memory probe 0x0d0000-0x0dffff: excluding 0xd0000-0xd3fff
 hde: SunDisk SDP3B-220, ATA DISK drive
 ide2 at 0x100-0x107,0x10e on irq 5
 hde: SunDisk SDP3B-220, 210MB w/1kB Cache, CHS=840/16/32
  hde: hde1
 ide_cs: hde: Vcc = 5.0, Vpp = 0.0
 VFS: Disk change detected on device ide2(33,0)
  hde: hde1
 hde: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
 hde: drive_cmd: error=0x04 { DriveStatusError }
 VFS: Disk change detected on device ide2(33,1)
  hde: hde1
 VFS: Disk change detected on device ide2(33,1)
  hde: hde1
 hde: irq timeout: status=0xff { Busy }
 ide2: reset timed-out, status=0xff
 hde: status timeout: status=0xff { Busy }
 hde: drive not ready for command
 ide2: reset timed-out, status=0xff
 hde: status timeout: status=0xff { Busy }
 end_request: I/O error, dev 21:01 (hde), sector 68276
 hde: drive not ready for command
 hde: status timeout: status=0xff { Busy }
 hde: drive not ready for command
 ide2: reset timed-out, status=0xff
 hde: status timeout: status=0xff { Busy }
 hde: drive not ready for command
 ide2: reset timed-out, status=0xff
 hde: status timeout: status=0xff { Busy }
 end_request: I/O error, dev 21:01 (hde), sector 68277
 hde: drive not ready for command
 hde: status timeout: status=0xff { Busy }
 hde: drive not ready for command
 ide2: reset timed-out, status=0xff
 hde: status timeout: status=0xff { Busy }
 hde: drive not ready for command
 ide2: reset timed-out, status=0xff
 hde: status timeout: status=0xff { Busy }
 end_request: I/O error, dev 21:01 (hde), sector 68278
 hde: drive not ready for command
 hde: status timeout: status=0xff { Busy }
 hde: drive not ready for command
 ------------------------------------------------------------

Part of a process list at the time of the first Busy message:
 ------------------------------------------------------------
  433  ?  S    0:01 /sbin/cardmgr -v
  468   1 D    0:00 cp /pcmcia/test1.dat .
  469  ?  S    0:00 sh -c ./ide stop hde 2>&1
  470  ?  S    0:00 sh ./ide stop hde
  481  ?  D    0:00 umount -v /dev/hde1
 ------------------------------------------------------------

Next-in-Thread Next Message

Messages Inline: 0 1

None I have not heard this one before

Re: Question problem with IDE and Linux SMP (Peter A. Castro)
Date: 2000, Apr 24
From: David Hinds <dhinds@pcmcia.sourceforge.org>

This does not sound like a common problem.  You should also check your
system log messages, which will include the kernel messages as well as
messages from cardmgr.

The errors at insertion time are not a problem.  They pop up for some
kernel versions, when the IDE driver tries to issue a "door lock"
command that the card doesn't recognize.  The driver won't try this
again if it fails the first time.

PCMCIA under SMP is not a particularly well tested configuration, and
I can't test it myself, so I can't really rule out the possibility
that the code contains SMP bugs.  Looking at the eject detection code,
I don't really see how it could be sensitive to whether it is running
on an SMP kernel or not.  But I also don't see how the hardware could
mis-report an eject in this situation.  Do you think all your hardware
should be solid?  (no overclocking, power supply has plenty of
capacity, no other strange symptoms while running SMP?)

You can effectively disable the test for an ejected card by setting
PCIC_OPTS="poll_interval=2000000000".  You would then need to issue
"cardctl insert" and "cardctl eject" commands by hand.  If this is an
effective work-around, that would at least narrow down the location of
the problem.

-- Dave

None Have done extensive testing, still have problems.

Re: None I have not heard this one before (David Hinds)
Date: 2000, Apr 26
From: Peter A. Castro doctor

Hi Dave!
  Thanks for responding!

>This does not sound like a common problem.  You should also
>check your system log messages, which will include the kernel
>messages as well as messages from cardmgr.

Actually, the messages I included in my original posting *are* from the system log (I removed the leading timestamp as it didn't really provide any useful extra information). What you see is what you get.

>The errors at insertion time are not a problem.  They pop up
>for some kernel versions, when the IDE driver tries to issue
>a "door lock" command that the card doesn't recognize.  The
>driver won't try this again if it fails the first time.

I figured as much and have been ignoring those messages.

>PCMCIA under SMP is not a particularly well tested
>configuration, and I can't test it myself, so I can't really
>rule out the possibility that the code contains SMP bugs.

Is there anything I can do to enable debug output to help trouble shoot the problem? Debugging modules isn't my speciality :-( Or is there a place in the cardmgr where I can place a break-point?

>Looking at the eject detection code, I don't really see how
>it could be sensitive to whether it is running on an SMP
>kernel or not.  But I also don't see how the hardware could

Since the amount of data/time between hangs varys somewhat, I'd guess there's a race condition that is SMP dependent. Again, if I rebuild the kernel+pcmcia-cs for non-SMP, everything works as expected. Exact same hardware config and software.

>mis-report an eject in this situation.  Do you think all your
>hardware should be solid?  (no overclocking, power supply
>has plenty of capacity, no other strange symptoms while
>running SMP?)

I've scrutinized the hardware pretty thoroughly. The load for all the drives and peripherals I have are well within the tollerance of the power-supply (it's a 400W PS). No other software/hardware problems, and I drive this machine pretty hard sometimes (but not while using pcmcia cards ;-).

The card read/writer is an ISA card, but I have several ISA cards plugged in and they all work perfectly.

>You can effectively disable the test for an ejected card by
>setting PCIC_OPTS="poll_interval=2000000000".  You would then
>need to issue "cardctl insert" and "cardctl eject" commands
>by hand.  If this is an effective work-around, that would at
>least narrow down the location of the problem.

I'll try this, but I'm not sure weither an "eject" is being erroneously generated, or if the interrupt is being missed somehow and an error condition is being raised as an "eject". The timing is too close to tell from the logs. And once the "Busy" occurs, I've very limited as to what I can do to probe for information. Again, if you've got any ideas on getting more diagnostics out of this I will try and generate it. Any suggestions on things to try? What extra information about my system can I provide to help identify possible problem areas?

Thanks!

None Re: Have done extensive testing, still have problems.

Re: None Have done extensive testing, still have problems. (Peter A. Castro)
Date: 2000, Apr 26
From: David Hinds <dhinds@pcmcia.sourceforge.org>

Just try the poll_interval thing first.  I'm not really sure what to
suggest regarding debugging the IDE driver; that's out of my area of
expertise.  Once the card is configured, the kernel IDE driver is "on
its own" until an eject event is processed; the PCMCIA modules are not
active during normal card operation.

-- Dave

Ok That works!

Re: None Re: Have done extensive testing, still have problems. (David Hinds)
Date: 2000, May 02
From: Peter A. Castro doctor

Yep, setting poll_interval=2000000000 and manually doing the insert and eject works. So, that will work for now. But, what's the real fix? Is there anyway I can help you shoot this?

problem with IDE and Linux SMP


Add Message to: "problem with IDE and Linux SMP"

Members Subscribe Admin Mode Show Frames Help for HyperNews at pcmcia-cs.sourceforge.net 1.10